Which characters need to be escaped when using Bash?

Solution 1:

There are two easy and safe rules which work not only in sh but also bash.

1. Put the whole string in single quotes

This works for all chars except single quote itself. To escape the single quote, close the quoting before it, insert the single quote, and re-open the quoting.

'I'\''m a s@fe $tring which ends in newline
'

sed command: sed -e "s/'/'\\\\''/g; 1s/^/'/; \$s/\$/'/"

2. Escape every char with a backslash

This works for all characters except newline. For newline characters use single or double quotes. Empty strings must still be handled - replace with ""

\I\'\m\ \a\ \s\@\f\e\ \$\t\r\i\n\g\ \w\h\i\c\h\ \e\n\d\s\ \i\n\ \n\e\w\l\i\n\e"
"

sed command: sed -e 's/./\\&/g; 1{$s/^$/""/}; 1!s/^/"/; $!s/$/"/'.

2b. More readable version of 2

There's an easy safe set of characters, like [a-zA-Z0-9,._+:@%/-], which can be left unescaped to keep it more readable

I\'m\ a\ s@fe\ \$tring\ which\ ends\ in\ newline"
"

sed command: LC_ALL=C sed -e 's/[^a-zA-Z0-9,._+@%/-]/\\&/g; 1{$s/^$/""/}; 1!s/^/"/; $!s/$/"/'.


Note that in a sed program, one can't know whether the last line of input ends with a newline byte (except when it's empty). That's why both above sed commands assume it does not. You can add a quoted newline manually.

Note that shell variables are only defined for text in the POSIX sense. Processing binary data is not defined. For the implementations that matter, binary works with the exception of NUL bytes (because variables are implemented with C strings, and meant to be used as C strings, namely program arguments), but you should switch to a "binary" locale such as latin1.


(You can easily validate the rules by reading the POSIX spec for sh. For bash, check the reference manual linked by @AustinPhillips)

Solution 2:

format that can be reused as shell input

Edit february 2021: bash ${var@Q}

Under bash, you could store your variable content with Parameter Expansion's @ command for Parameter transformation:

${parameter@operator}
       Parameter transformation.  The expansion is either a transforma‐
       tion of the value of parameter or  information  about  parameter
       itself,  depending on the value of operator.  Each operator is a
       single letter:

       Q      The expansion is a string that is the value of  parameter
              quoted in a format that can be reused as input.
...
       A      The  expansion  is  a string in the form of an assignment
              statement or declare command  that,  if  evaluated,  will
              recreate parameter with its attributes and value.

Sample:

$ var=$'Hello\nGood world.\n'
$ echo "$var"
Hello
Good world.

$ echo "${var@Q}"
$'Hello\nGood world.\n'

$ echo "${var@A}"
var=$'Hello\nGood world.\n'

Old answer

There is a special printf format directive (%q) built for this kind of request:

printf [-v var] format [arguments]

 %q     causes printf to output the corresponding argument
        in a format that can be reused as shell input.

Some samples:

read foo
Hello world
printf "%q\n" "$foo"
Hello\ world

printf "%q\n" $'Hello world!\n'
$'Hello world!\n'

This could be used through variables too:

printf -v var "%q" "$foo
"
echo "$var"
$'Hello world\n'

Quick check with all (128) ascii bytes:

Note that all bytes from 128 to 255 have to be escaped.

for i in {0..127} ;do
    printf -v var \\%o $i
    printf -v var $var
    printf -v res "%q" "$var"
    esc=E
    [ "$var" = "$res" ] && esc=-
    printf "%02X %s %-7s\n" $i $esc "$res"
done |
    column

This must render something like:

00 E ''         1A E $'\032'    34 - 4          4E - N          68 - h      
01 E $'\001'    1B E $'\E'      35 - 5          4F - O          69 - i      
02 E $'\002'    1C E $'\034'    36 - 6          50 - P          6A - j      
03 E $'\003'    1D E $'\035'    37 - 7          51 - Q          6B - k      
04 E $'\004'    1E E $'\036'    38 - 8          52 - R          6C - l      
05 E $'\005'    1F E $'\037'    39 - 9          53 - S          6D - m      
06 E $'\006'    20 E \          3A - :          54 - T          6E - n      
07 E $'\a'      21 E \!         3B E \;         55 - U          6F - o      
08 E $'\b'      22 E \"         3C E \<         56 - V          70 - p      
09 E $'\t'      23 E \#         3D - =          57 - W          71 - q      
0A E $'\n'      24 E \$         3E E \>         58 - X          72 - r      
0B E $'\v'      25 - %          3F E \?         59 - Y          73 - s      
0C E $'\f'      26 E \&         40 - @          5A - Z          74 - t      
0D E $'\r'      27 E \'         41 - A          5B E \[         75 - u      
0E E $'\016'    28 E \(         42 - B          5C E \\         76 - v      
0F E $'\017'    29 E \)         43 - C          5D E \]         77 - w      
10 E $'\020'    2A E \*         44 - D          5E E \^         78 - x      
11 E $'\021'    2B - +          45 - E          5F - _          79 - y      
12 E $'\022'    2C E \,         46 - F          60 E \`         7A - z      
13 E $'\023'    2D - -          47 - G          61 - a          7B E \{     
14 E $'\024'    2E - .          48 - H          62 - b          7C E \|     
15 E $'\025'    2F - /          49 - I          63 - c          7D E \}     
16 E $'\026'    30 - 0          4A - J          64 - d          7E E \~     
17 E $'\027'    31 - 1          4B - K          65 - e          7F E $'\177'
18 E $'\030'    32 - 2          4C - L          66 - f      
19 E $'\031'    33 - 3          4D - M          67 - g      

Where first field is hexa value of byte, second contain E if character need to be escaped and third field show escaped presentation of character.

Why ,?

You could see some characters that don't always need to be escaped, like ,, } and {.

So not always but sometime:

echo test 1, 2, 3 and 4,5.
test 1, 2, 3 and 4,5.

or

echo test { 1, 2, 3 }
test { 1, 2, 3 }

but care:

echo test{1,2,3}
test1 test2 test3

echo test\ {1,2,3}
test 1 test 2 test 3

echo test\ {\ 1,\ 2,\ 3\ }
test  1 test  2 test  3

echo test\ {\ 1\,\ 2,\ 3\ }
test  1, 2 test  3 

Solution 3:

To save someone else from having to RTFM... in bash:

Enclosing characters in double quotes preserves the literal value of all characters within the quotes, with the exception of $, `, \, and, when history expansion is enabled, !.

...so if you escape those (and the quote itself, of course) you're probably okay.

If you take a more conservative 'when in doubt, escape it' approach, it should be possible to avoid getting instead characters with special meaning by not escaping identifier characters (i.e. ASCII letters, numbers, or '_'). It's very unlikely these would ever (i.e. in some weird POSIX-ish shell) have special meaning and thus need to be escaped.