What makes the bash interpreter utility interpret differently standard and smart double quotes, present in a shell script, created in TextEdit?

Solution 1:

There are two things going on here: First, bash recognizes the plain ASCII double-quote, " (character code 0x22) as a double-quote; it does not recognize the fancy unicode left double-quote, (unicode U+201C, UTF-8 encoding 0xe2809c) and the corresponding right double-quote, (unicode U+201D, UTF-8 encoding 0xe2809d) as anything other than random sequences of bytes (or maybe as random characters, if it's using a UTF-8 locale). This is the fundamental thing to realize: as far as bash is concerned, and are not actually quotes, they're just things that happen to look like a quotes when they're printed out.

The second complication is that the unicode double-quotes are multibyte characters, so if bash isn't in a UTF-8 locale it may treat some of the bytes differently than others(!)

To see the effect of the first thing, try replacing each occurrence of a double-quote with the string WIBBLE -- another arbitrary sequence that has no special meaning to the shell:

$ echo "The path to my home directory is: $HOME bar"
The path to my home directory is: /Users/gordon bar
$ echo “The path to my home directory is: $HOME bar”
“The path to my home directory is: /Users/gordon bar”
$ echo WIBBLEThe path to my home directory is: $HOME barWIBBLE
WIBBLEThe path to my home directory is: /Users/gordon barWIBBLE

In the first command (with ASCII double-quotes), the quotes are parsed and removed by bash before the argument(s) are passed to the echo command, and hence not printed. In the second and third (with fancy double-quotes and WIBBLE in place of plain quotes), they're just treated as part of the strings to be passed to echo, so echo prints them as part of its output.

$ echo "The path to my home directory is: $HOME (foo) bar"
The path to my home directory is: /Users/gordon (foo) bar
$ echo “The path to my home directory is: $HOME (foo) bar”
-bash: syntax error near unexpected token `('
$ echo WIBBLEThe path to my home directory is: $HOME (foo) barWIBBLE
-bash: syntax error near unexpected token `('

In the second and third commands (with fancy double-quotes and WIBBLE), bash sees parentheses in a non-quoted portion of the command (remember: as far as bash is concerned, fancy quotes are not actually quotes), in a place where they aren't allowed by shell syntax, and therefore complains.

$ echo “The path to my home directory is: $HOME”
“The path to my home directory is: ??
$ echo WIBBLEThe path to my home directory is: $HOMEWIBBLE
WIBBLEThe path to my home directory is:

Here, something weirder is happening. In the second command, it's looking for a variable named HOMEWIBBLE, not finding it, so replacing it with blank. In the case of the first one, with the fancy double-quotes, it looks to me like it's treating each byte of the UTF-8 encoding of as a separate character, treating the first as part of the variable name (again causing the variable not to be found), and then just passing the second and third bytes through, giving an invalid UTF-8 character, which gets printed as ??. Using a hex dump to get a better idea what's going on gives this:

$ echo “$HOME”
“??
$ echo “$HOME” | xxd -g1
00000000: e2 80 9c 80 9d 0a                                ......

Note that the first goes through fine, and shows up in the hex dump as e2 80 9c (the expected UTF-8 encoded fancy double-quote), but after that is just 80 9d -- the first e2 of the second fancy quote got eaten somehow! (BTW, the 0a at the end is a linefeed, marking the end of the output.) To see what's happening, let me define a shell variable as HOME+the first byte of the encoding of , and watch what happens:

$ eval $'HOME\xe2=foo'
$ echo “$HOME”
“foo??
$ echo “$HOME” | xxd -g1
00000000: e2 80 9c 66 6f 6f 80 9d 0a                       ...foo...

...So there's what's going on: it's treating the first byte of the double-quote's encoding as part of the variable name, substituting it (if defined), and then just passing through the orphaned second and third bytes, leaving invalid UTF-8. I'm not sure if this is a bash bug, oddity of its parsing, or what.

Anyway, the details are rather messy, but the takeaway should be clear: don't use fancy quotes in your shell scripts; they won't work right. And the same applies to fancy single-quotes and other unicode punctuation marks.