Change File Encoding to utf-8 via vim in a script

I just got knocked down after our server has been updated from Debian 4 to 5. We switched to UTF-8 environment and now we have problems getting the text printed correctly on the browser, because all files are in non-utf8 encodings like iso-8859-1, ascii, etc.

I tried many different scripts.

The first one I tried is "iconv". That one doesn't work, it changes the content, but the file's encoding is still non-utf8.

Same problem with enca, encamv, convmv and some other tools I installed via apt-get.

Then I found a python code, which uses chardet Universal Detector module, to detect encoding of a file (which works fine), but using the unicode class or the codec class to save it as utf-8 doesn't work, without any errors.

The only way I found to get the file and its content converted to UTF-8, is vi.

These are the steps I do for one file:

vi filename.php
:set bomb
:set fileencoding=utf-8
:wq

That's it. That one works perfect. But how can I get this running via a script? I would like to write a script (Linux shell) which traverses a directory taking all php files, then converting them using vi with the commands above. As I need to start the vi app, I do not know how to do something like this:

"vi --run-command=':set bomb, :set fileencoding=utf-8' filename.php"

Hope someone can help me.


Solution 1:

This is the simplest way I know of to do this easily from the command line:

vim +"argdo se bomb | se fileencoding=utf-8 | w" $(find . -type f -name *.php)

Or better yet if the number of files is expected to be pretty large:

find . -type f -name *.php | xargs vim +"argdo se bomb | se fileencoding=utf-8 | w"

Solution 2:

You could put your commands in a file, let's call it script.vim:

set bomb
set fileencoding=utf-8
wq

Then you invoke Vim with the -S (source) option to execute the script on the file you wish to fix. To do this on a bunch of files you could do

find . -type f -name "*.php" -exec vim -S script.vim {} \;

You could also put the Vim commands on the command line using the + option, but I think it may be more readable like this.

Note: I have not tested this.

Solution 3:

You may actually want set nobomb (BOM = byte order mark), especially in the [not windows] world.

e.g., I had a script that didn't work as there was a byte order mark at the start. It isn't usually displayed in editors (even with set list in vi), or on the console, so its difficult to spot.

The file looked like this

#!/usr/bin/perl
...

But trying to run it, I get

./filename
./filename: line 1: #!/usr/bin/perl: No such file or directory

Not displayed, but at the start of the file, is the 3 byte BOM. So, as far as linux is concerned, the file doesn't start with #!

The solution is

vi filename
:set nobomb
:set fileencoding=utf-8
:wq

This removes the BOM at the start of the file, making it correct utf8.

NB Windows uses the BOM to identify a text file as being utf8, rather than ANSI. Linux (and the official spec) doesn't.

Solution 4:

The accepted answer will keep the last file open in Vim. This problem can be easily resolved using the -c option of Vim,

vim +"argdo set bomb | set fileencoding=utf-8 | w" -c ":q" file1.txt file2.txt

If you need only process one file, the following will also work,

vim -c ':set bomb' -c ':set fileencoding=utf-8' -c ':wq' file1.txt