Mercurial convert filename encoding
Solution 1:
You are right that the convert extension doesn't support this in a nice way currently. That is, you cannot ask it to recode from encoding X to encoding Y. However, you can ask it to rename the files one by one for you! First create a file called rename.py
with
import sys
for path in sys.stdin:
old = path[:-1] # strip newline
new = old.decode("cp1251").encode("utf-8")
print 'rename "%s" "%s"' % (old, new)
Then run
$ hg manifest --all | python rename.py > rename.txt
This creates your file map. You can now use
$ hg convert --filemap rename.txt cp1251-repo utf-8-repo
to convert the repository into a new repository. In the new repository, it will look like the files have always been saved using UTF-8 file names.
Note: The file names are now stored as UTF-8 in the repository. This means that checkouts will look fine on moderns Linux machines. Windows, however, does not use UTF-8 file names. The FixUtf-8 extension must be used to make Mercurial convert the UTF-8 file names into UTF-16 on the fly. This will create readable file names on Windows too.
Note: Everybody will have to re-clone the new repository! Changing any part of the history inevitably changes all the changesets hashes too. So to pull this off, you need to either
- make everybody push to the server,
- convert the repositories on the server,
- have people re-clone
or
- make everybody run the above commands on their local repositories
- convert the repositories on the server
Either way works since the conversion is deterministic and so your users can run it themselves if they have Python available. If they only have a TortoiseHg installation, then it's probably easiest if you convert for them on your server.
I looked at making the convert extension support this more directly and have sent a patch to the Mercurial mailinglist for more direct support for this.
Solution 2:
I had the same problem. I needed to convert bunch of repositories, so I wrote a script that converts all repositories given as list.
usage:
hg_convert_filenames_encoding.py [-h] [-i INPUT_ENCODING] [-o OUTPUT_ENCODING] [-b] [-u] [repositories [repositories ...]]
You can get from my repository at BitBucket.