Which characters need to be escaped when using Bash?
Solution 1:
There are two easy and safe rules which work not only in sh
but also bash
.
1. Put the whole string in single quotes
This works for all chars except single quote itself. To escape the single quote, close the quoting before it, insert the single quote, and re-open the quoting.
'I'\''m a s@fe $tring which ends in newline
'
sed command: sed -e "s/'/'\\\\''/g; 1s/^/'/; \$s/\$/'/"
2. Escape every char with a backslash
This works for all characters except newline. For newline characters use single or double quotes. Empty strings must still be handled - replace with ""
\I\'\m\ \a\ \s\@\f\e\ \$\t\r\i\n\g\ \w\h\i\c\h\ \e\n\d\s\ \i\n\ \n\e\w\l\i\n\e"
"
sed command: sed -e 's/./\\&/g; 1{$s/^$/""/}; 1!s/^/"/; $!s/$/"/'
.
2b. More readable version of 2
There's an easy safe set of characters, like [a-zA-Z0-9,._+:@%/-]
, which can be left unescaped to keep it more readable
I\'m\ a\ s@fe\ \$tring\ which\ ends\ in\ newline"
"
sed command: LC_ALL=C sed -e 's/[^a-zA-Z0-9,._+@%/-]/\\&/g; 1{$s/^$/""/}; 1!s/^/"/; $!s/$/"/'
.
Note that in a sed program, one can't know whether the last line of input ends with a newline byte (except when it's empty). That's why both above sed commands assume it does not. You can add a quoted newline manually.
Note that shell variables are only defined for text in the POSIX sense. Processing binary data is not defined. For the implementations that matter, binary works with the exception of NUL bytes (because variables are implemented with C strings, and meant to be used as C strings, namely program arguments), but you should switch to a "binary" locale such as latin1.
(You can easily validate the rules by reading the POSIX spec for sh
. For bash, check the reference manual linked by @AustinPhillips)
Solution 2:
format that can be reused as shell input
Edit february 2021: bash ${var@Q}
Under bash, you could store your variable content with Parameter Expansion's @
command for Parameter transformation:
${parameter@operator} Parameter transformation. The expansion is either a transforma‐ tion of the value of parameter or information about parameter itself, depending on the value of operator. Each operator is a single letter: Q The expansion is a string that is the value of parameter quoted in a format that can be reused as input. ... A The expansion is a string in the form of an assignment statement or declare command that, if evaluated, will recreate parameter with its attributes and value.
Sample:
$ var=$'Hello\nGood world.\n'
$ echo "$var"
Hello
Good world.
$ echo "${var@Q}"
$'Hello\nGood world.\n'
$ echo "${var@A}"
var=$'Hello\nGood world.\n'
Old answer
There is a special printf
format directive (%q
) built for this kind of request:
printf [-v var] format [arguments]
%q causes printf to output the corresponding argument in a format that can be reused as shell input.
Some samples:
read foo
Hello world
printf "%q\n" "$foo"
Hello\ world
printf "%q\n" $'Hello world!\n'
$'Hello world!\n'
This could be used through variables too:
printf -v var "%q" "$foo
"
echo "$var"
$'Hello world\n'
Quick check with all (128) ascii bytes:
Note that all bytes from 128 to 255 have to be escaped.
for i in {0..127} ;do
printf -v var \\%o $i
printf -v var $var
printf -v res "%q" "$var"
esc=E
[ "$var" = "$res" ] && esc=-
printf "%02X %s %-7s\n" $i $esc "$res"
done |
column
This must render something like:
00 E '' 1A E $'\032' 34 - 4 4E - N 68 - h
01 E $'\001' 1B E $'\E' 35 - 5 4F - O 69 - i
02 E $'\002' 1C E $'\034' 36 - 6 50 - P 6A - j
03 E $'\003' 1D E $'\035' 37 - 7 51 - Q 6B - k
04 E $'\004' 1E E $'\036' 38 - 8 52 - R 6C - l
05 E $'\005' 1F E $'\037' 39 - 9 53 - S 6D - m
06 E $'\006' 20 E \ 3A - : 54 - T 6E - n
07 E $'\a' 21 E \! 3B E \; 55 - U 6F - o
08 E $'\b' 22 E \" 3C E \< 56 - V 70 - p
09 E $'\t' 23 E \# 3D - = 57 - W 71 - q
0A E $'\n' 24 E \$ 3E E \> 58 - X 72 - r
0B E $'\v' 25 - % 3F E \? 59 - Y 73 - s
0C E $'\f' 26 E \& 40 - @ 5A - Z 74 - t
0D E $'\r' 27 E \' 41 - A 5B E \[ 75 - u
0E E $'\016' 28 E \( 42 - B 5C E \\ 76 - v
0F E $'\017' 29 E \) 43 - C 5D E \] 77 - w
10 E $'\020' 2A E \* 44 - D 5E E \^ 78 - x
11 E $'\021' 2B - + 45 - E 5F - _ 79 - y
12 E $'\022' 2C E \, 46 - F 60 E \` 7A - z
13 E $'\023' 2D - - 47 - G 61 - a 7B E \{
14 E $'\024' 2E - . 48 - H 62 - b 7C E \|
15 E $'\025' 2F - / 49 - I 63 - c 7D E \}
16 E $'\026' 30 - 0 4A - J 64 - d 7E E \~
17 E $'\027' 31 - 1 4B - K 65 - e 7F E $'\177'
18 E $'\030' 32 - 2 4C - L 66 - f
19 E $'\031' 33 - 3 4D - M 67 - g
Where first field is hexa value of byte, second contain E
if character need to be escaped and third field show escaped presentation of character.
Why ,
?
You could see some characters that don't always need to be escaped, like ,
, }
and {
.
So not always but sometime:
echo test 1, 2, 3 and 4,5.
test 1, 2, 3 and 4,5.
or
echo test { 1, 2, 3 }
test { 1, 2, 3 }
but care:
echo test{1,2,3}
test1 test2 test3
echo test\ {1,2,3}
test 1 test 2 test 3
echo test\ {\ 1,\ 2,\ 3\ }
test 1 test 2 test 3
echo test\ {\ 1\,\ 2,\ 3\ }
test 1, 2 test 3
Solution 3:
To save someone else from having to RTFM... in bash:
Enclosing characters in double quotes preserves the literal value of all characters within the quotes, with the exception of
$
,`
,\
, and, when history expansion is enabled,!
.
...so if you escape those (and the quote itself, of course) you're probably okay.
If you take a more conservative 'when in doubt, escape it' approach, it should be possible to avoid getting instead characters with special meaning by not escaping identifier characters (i.e. ASCII letters, numbers, or '_'). It's very unlikely these would ever (i.e. in some weird POSIX-ish shell) have special meaning and thus need to be escaped.