problem executing a bash script with utf8 encoding
I have a bash script encoded in utf8 .
Within the script i use sed command using §
as a separator .
Now when i run execute this script sed
complains about the separator.
If i use normal char as a separator for ex @
then everything works.
I have viewed the script in putty[set utf8 in putty] and the character appears fine.
Also Linux default char set from locale
command shows
LC_CSET=en_US.UTF-8
What could have gone wrong?
Earlier i used to have windows-1252 encoding for the shell scripts and this used to work.
Solution 1:
Probably your version of sed
does not support multibyte separator characters. If you look at the way §
is encoded in the two character sets, you'll see the difference:
% locale
LANG="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_CTYPE="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_ALL=
% printf § > section.utf8
% hexdump -C section.utf8
00000000 c2 a7 |..|
00000002
% iconv -f UTF-8 -t WINDOWS-1252 < section.utf8 > section.win1252
% hexdump -C section.win1252
00000000 a7 |.|
00000001
Various versions of sed
will give you more or less helpful messages. On my OS X 10.6 system, I get the somewhat cryptic:
% sed 's§foo§bar§'
sed: 1: "s§foo§bar§": RE error: illegal byte sequence
The version of sed
that Ubuntu 10.04 LTS uses is more helpful:
% sed 's§foo§bar§'
sed: -e expression #1, char 2: delimiter character is not a single-byte character