Why is character "£" in a string interpreted strange in the command cut?

I'm developing a bash script and came up with the following strange behaviour!

$ echo £ |cut -c 1
�

The sign £ is passed to the next command cut whose filter is picking one character only.

When I modify the filter in the cut command to pick 2 characters, then the £ is passed through!

$ echo £ |cut -c 1-2
£

Not a severe problem, I have a workaround solution in the script, but why does the filter in the cut command require 2 positions instead of 1 when picking a £ sign?

Solution 1:

The cut command in Ubuntu is not multi-byte character aware. Characters are the same as bytes for this version of the cut command.

The pound sign (£) is a UTF-8 character that consists of two bytes (c2 and a3):

$ echo £ | od -t x1
0000000 c2 a3 0a
0000003

Note: The 0a character is the "New Line" (ASCII "Line Feed" character).

When you cut the first character from the line, you are selecting only the c2 part of £, and this is not a valid UTF-8 character. As a result you get the strange question mark � (the replacement character) on screen:

$ echo £ | cut -c 1 | od -t x1
0000000 c2 0a
0000002

Note: The above was tested with the latest version of cut in Ubuntu 20.10 (GNU coreutils version 8.32).

If you want to select multi-byte characters, you can use the grep (GNU grep version 3.4) command like this:

$ echo x£β | grep -o '^.'
x
$ echo x£β | grep -o '^..'
x£
$ echo x£β | grep -o '^...'
x£β

_{This answer was improved with the help of the comments.}

Solution 2:

In UTF-8 encoding, the hex value of £ is 0xC2 0xA3 (c2a3) which is 11000010 10100011 in binary.

So it's two bytes (like two character). cut -c considers each byte a character which produces �.

$ echo -n £ | xxd
00000000: c2a3                                     ..

$ echo -n £ | wc --bytes
2

Why I can find *main.o but cannot *.o?

How do I move my LVM 250 GB root partition to a new 120GB hard disk?

analyse disk space usage of an encrypted home directory

How can I tell if I have permission to run a particular command?

How do I prompt users with a GUI dialog box to choose file/directory path, via the command-line?

How to repair corrupt package installation (mysql)

What's the difference between ls and la? Why do they give the same output?

What's the difference between - (one hyphen) and -- (two hyphens) in a command? [duplicate]

How to count occurrences of text in a file?

How to write a shell script to assign letter grades to numeric ranges?

With Ubuntu 18.04, how can I have a hot corner that locks the computer?

How to wipe a hard disk completely so that no data recovery tools can retrieve anything? [duplicate]

Why is character "£" in a string interpreted strange in the command cut?

Solution 1:

Solution 2:

Related

Recent Posts