Reading quoted/escaped arguments correctly from a string

I'm encountering an issue passing an argument to a command in a Bash script.

poc.sh:

#!/bin/bash

ARGS='"hi there" test'
./swap ${ARGS}

swap:

#!/bin/sh
echo "${2}" "${1}"

The current output is:

there" "hi

Changing only poc.sh (as I believe swap does what I want it to correctly), how do I get poc.sh to pass "hi there" and test as two arguments, with "hi there" having no quotes around it?


Solution 1:

A Few Introductory Words

If at all possible, don't use shell-quoted strings as an input format.

  • It's hard to parse consistently: Different shells have different extensions, and different non-shell implementations implement different subsets (see the deltas between shlex and xargs below).
  • It's hard to programmatically generate. ksh and bash have printf '%q', which will generate a shell-quoted string with contents of an arbitrary variable, but no equivalent exists to this in the POSIX sh standard.
  • It's easy to parse badly. Many folks consuming this format use eval, which has substantial security concerns.

NUL-delimited streams are a far better practice, as they can accurately represent any possible shell array or argument list with no ambiguity whatsoever.


xargs, with bashisms

If you're getting your argument list from a human-generated input source using shell quoting, you might consider using xargs to parse it. Consider:

array=( )
while IFS= read -r -d ''; do
  array+=( "$REPLY" )
done < <(xargs printf '%s\0' <<<"$ARGS")

swap "${array[@]}"

...will put the parsed content of $ARGS into the array array. If you wanted to read from a file instead, substitute <filename for <<<"$ARGS".


xargs, POSIX-compliant

If you're trying to write code compliant with POSIX sh, this gets trickier. (I'm going to assume file input here for reduced complexity):

# This does not work with entries containing literal newlines; you need bash for that.
run_with_args() {
  while IFS= read -r entry; do
    set -- "$@" "$entry"
  done
  "$@"
}
xargs printf '%s\n' <argfile | run_with_args ./swap

These approaches are safer than running xargs ./swap <argfile inasmuch as it will throw an error if there are more or longer arguments than can be accommodated, rather than running excess arguments as separate commands.


Python shlex -- rather than xargs -- with bashisms

If you need more accurate POSIX sh parsing than xargs implements, consider using the Python shlex module instead:

shlex_split() {
  python -c '
import shlex, sys
for item in shlex.split(sys.stdin.read()):
    sys.stdout.write(item + "\0")
'
}
while IFS= read -r -d ''; do
  array+=( "$REPLY" )
done < <(shlex_split <<<"$ARGS")

Solution 2:

Embedded quotes do not protect whitespace; they are treated literally. Use an array in bash:

args=( "hi there" test)
./swap "${args[@]}"

In POSIX shell, you are stuck using eval (which is why most shells support arrays).

args='"hi there" test'
eval "./swap $args"

As usual, be very sure you know the contents of $args and understand how the resulting string will be parsed before using eval.