Escaping characters in bash (for JSON)
I'm using git, then posting the commit message and other bits as a JSON payload to a server.
Currently I have:
MSG=`git log -n 1 --format=oneline | grep -o ' .\+'`
which sets MSG to something like:
Calendar can't go back past today
then
curl -i -X POST \
-H 'Accept: application/text' \
-H 'Content-type: application/json' \
-d "{'payload': {'message': '$MSG'}}" \
'https://example.com'
My real JSON has another couple of fields.
This works fine, but of course when I have a commit message such as the one above with an apostrophe in it, the JSON is invalid.
How can I escape the characters required in bash? I'm not familiar with the language, so am not sure where to start. Replacing '
with \'
would do the job at minimum I suspect.
Using Python:
This solution is not pure bash, but it's non-invasive and handles unicode.
json_escape () {
printf '%s' "$1" | python -c 'import json,sys; print(json.dumps(sys.stdin.read()))'
}
Note that JSON is part of the standard python libraries and has been for a long time, so this is a pretty minimal python dependency.
Or using PHP:
json_escape () {
printf '%s' "$1" | php -r 'echo json_encode(file_get_contents("php://stdin"));'
}
Use like so:
$ json_escape "ヤホー"
"\u30e4\u30db\u30fc"
jq
can do this.
Lightweight, free, and written in C, jq
enjoys widespread community support with over 15k stars on GitHub. I personally find it very speedy and useful in my daily workflow.
Convert string to JSON
$ echo '猫に小判' | jq -aRs .
"\u732b\u306b\u5c0f\u5224\n"
$ printf 'ô\nè\nà\n' | jq -Rs .
"ô\nè\nà\n"
To explain,
-
-a
means "ascii output" (omitted in the second example) -
-R
means "raw input" -
-s
means "include linebreaks" (mnemonic: "slurp") -
.
means "output the root of the JSON document"
Git + Grep Use Case
To fix the code example given by the OP, simply pipe through jq.
MSG=`git log -n 1 --format=oneline | grep -o ' .\+' | jq -aRs .`
Instead of worrying about how to properly quote the data, just save it to a file and use the @
construct that curl
allows with the --data
option. To ensure that the output of git
is correctly escaped for use as a JSON value, use a tool like jq
to generate the JSON, instead of creating it manually.
jq -n --arg msg "$(git log -n 1 --format=oneline | grep -o ' .\+')" \
'{payload: { message: $msg }}' > git-tmp.txt
curl -i -X POST \
-H 'Accept: application/text' \
-H 'Content-type: application/json' \
-d @git-tmp.txt \
'https://example.com'
You can also read directly from standard input using -d @-
; I leave that as an exercise for the reader to construct the pipeline that reads from git
and produces the correct payload message to upload with curl
.
(Hint: it's jq ... | curl ... -d@- 'https://example.com'
)
I was also trying to escape characters in Bash, for transfer using JSON, when I came across this. I found that there is actually a larger list of characters that must be escaped – particularly if you are trying to handle free form text.
There are two tips I found useful:
- Use the Bash
${string//substring/replacement}
syntax described in this thread. - Use the actual control characters for tab, newline, carriage return, etc. In vim you can enter these by typing Ctrl+V followed by the actual control code (Ctrl+I for tab for example).
The resultant Bash replacements I came up with are as follows:
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//\\/\\\\} # \
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//\//\\\/} # /
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//\'/\\\'} # ' (not strictly needed ?)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//\"/\\\"} # "
JSON_TOPIC_RAW=${JSON_TOPIC_RAW// /\\t} # \t (tab)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//
/\\\n} # \n (newline)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//^M/\\\r} # \r (carriage return)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//^L/\\\f} # \f (form feed)
JSON_TOPIC_RAW=${JSON_TOPIC_RAW//^H/\\\b} # \b (backspace)
I have not at this stage worked out how to escape Unicode characters correctly which is also (apparently) required. I will update my answer if I work this out.
OK, found out what to do. Bash supports this natively as expected, though as always, the syntax isn't really very guessable!
Essentially ${string//substring/replacement}
returns what you'd image, so you can use
MSG=${MSG//\'/\\\'}
To do this. The next problem is that the first regex doesn't work anymore, but that can be replaced with
git log -n 1 --pretty=format:'%s'
In the end, I didn't even need to escape them. Instead, I just swapped all the ' in the JSON to \". Well, you learn something every day.