Mercurial convert filename encoding

Solution 1:

You are right that the convert extension doesn't support this in a nice way currently. That is, you cannot ask it to recode from encoding X to encoding Y. However, you can ask it to rename the files one by one for you! First create a file called rename.py with

import sys
for path in sys.stdin:
    old = path[:-1] # strip newline
    new = old.decode("cp1251").encode("utf-8")
    print 'rename "%s" "%s"' % (old, new)

Then run

$ hg manifest --all | python rename.py > rename.txt

This creates your file map. You can now use

$ hg convert --filemap rename.txt cp1251-repo utf-8-repo

to convert the repository into a new repository. In the new repository, it will look like the files have always been saved using UTF-8 file names.

Note: The file names are now stored as UTF-8 in the repository. This means that checkouts will look fine on moderns Linux machines. Windows, however, does not use UTF-8 file names. The FixUtf-8 extension must be used to make Mercurial convert the UTF-8 file names into UTF-16 on the fly. This will create readable file names on Windows too.

Note: Everybody will have to re-clone the new repository! Changing any part of the history inevitably changes all the changesets hashes too. So to pull this off, you need to either

  1. make everybody push to the server,
  2. convert the repositories on the server,
  3. have people re-clone

or

  1. make everybody run the above commands on their local repositories
  2. convert the repositories on the server

Either way works since the conversion is deterministic and so your users can run it themselves if they have Python available. If they only have a TortoiseHg installation, then it's probably easiest if you convert for them on your server.

I looked at making the convert extension support this more directly and have sent a patch to the Mercurial mailinglist for more direct support for this.

Solution 2:

I had the same problem. I needed to convert bunch of repositories, so I wrote a script that converts all repositories given as list.

usage:

hg_convert_filenames_encoding.py [-h] [-i INPUT_ENCODING] [-o OUTPUT_ENCODING] [-b] [-u] [repositories [repositories ...]]

You can get from my repository at BitBucket.