How can I sanitize or escape absolute paths returned by realpath or readlink?

Solution 1:

How to do this correctly

First of all, always quote your variables. What you are trying to do works fine if you quote it properly:

$ pwd
/home/terdon/foo/\e[92mM@r|< +'|'|_e|\|\|0rth [`-_-"]
$ ls
pullingATerdon

I have kept the strange file name you have chosen (although I have no idea why you chose it) for the sake of consistency.

Now, let's assign the path of pullingATerdon to a variable and then try to open the file:

$ bacon="$(realpath pullingATerdon)"
$ echo "$bacon"
/home/terdon/foo/\e[92mM@r|< +'|'|_e|\|\|0rth [`-_-"]/pullingATerdon
$ ls $bacon
ls: cannot access '+'\''|'\''|_e|\|\|0rth': No such file or directory
ls: cannot access '[`-_-"]/pullingATerdon': No such file or directory
'/home/terdon/foo/\e[92mM@r|<':

That fails, as expected. But, if we now quote it correctly:

$ ls -l "$bacon"
-rw-r--r-- 1 terdon terdon 0 Mar 14 23:15 '/home/terdon/foo/\e[92mM@r|< +'\''|'\''|_e|\|\|0rth [`-_-"]/pullingATerdon'

It works as expected. And yes, you can also open the path in a (proper) editor: emacs "$bacon" will work just fine. OK, so will vim and anything else. Your choice of editor, though unfortunate, is not relevant.


Why yours failed

A quick way to trace what actually happened in your case is to use set -x (turn it off again with set +x), which causes the shell to print each command it will run before running it. turn on the shell's debugging messages with set -x:

$ set -x
$ /bin/ls $bacon 
+ ls '/home/terdon/foo/\e[92mM@r|<' '+'\''|'\''|_e|\|\|0rth' '[`-_-"]/pullingATerdon'
ls: cannot access '+'\''|'\''|_e|\|\|0rth': No such file or directory
ls: cannot access '[`-_-"]/pullingATerdon': No such file or directory
'/home/terdon/foo/\e[92mM@r|<':

That shows us that ls was run with three separate arguments: '/home/terdon/foo/\e[92mM@r|<', '+'\''|'\''|_e|\|\|0rth' and '[`-_-"]/pullingATerdon'. This happens because the shell performs word splitting and glob expansion on unquoted strings. In this case, the problem is word splitting, since the shell saw the spaces in the path and read each space separated string as a separate argument.

The mkdir example is slightly different but that's because you're showing us the error message from the second invocation of the command. I guess you tried it once, and then ran it a second time to get the output for your question. The first time you ran it, it would have looked like this:

$ mkdir $(realpath pullingATerdon)
++ realpath pullingATerdon
+ mkdir '/home/terdon/foo/\e[92mM@r|<' '+'\''|'\''|_e|\|\|0rth' '[`-_-"]/pullingATerdon'
mkdir: cannot create directory ‘[`-_-"]/pullingATerdon’: No such file or directory

Again, that will try to create three directories, not one, because of word splitting. First, it created (successfully) the directory /home/terdon/foo/\e[92mM@r|<:

$ ls -l /home/terdon/foo/
total 8
drwxr-xr-x 2 terdon terdon 4096 Mar 15 00:20 '\e[92mM@r|<'
drwxr-xr-x 3 terdon terdon 4096 Mar 15 00:20 '\e[92mM@r|< +'\''|'\''|_e|\|\|0rth [`-_-"]'

It then, also successfully, created a directory called +'|'|_e|\|\|0rth in your current directory:

$ ls -l
total 4
drwxr-xr-x 2 terdon terdon 4096 Mar 15 00:37 '+'\''|'\''|_e|\|\|0rth'
-rw-r--r-- 1 terdon terdon    0 Mar 15 00:36  pullingATerdon

And then, it attempted to create the directory [`-_-"]/pullingATerdon. This failed because mkdir, by default, doesn't create subdirectories (it can, if you run it with -p):

$ mkdir baz/bar
mkdir: cannot create directory ‘baz/bar’: No such file or directory

Since your unquoted string contained a /, mkdir considered that a path of two directories, tried to find the top one, and failed.

That's why it failed, but what happened is more complicated. The string you used is actually a shell glob, specifically a glob range, which matches all files in the current directory whose name is one of the 5 characters `,-, _ or ". Since you have no such files in your current directory, the glob doesn't match anything and, as is the default behavior in bash, returns itself:

$ echo "[\`-_-\"]/pullingATerdon"  ## some escaping is needed here
+ echo '[`-_-"]/pullingATerdon'    ## but it echoes the right thing
[`-_-"]/pullingATerdon             ## and matches nothing, so returns itself.

To clarify, here's what happens if you give a glob that does match something:

$ echo [p]*   ## any filename starting with a p
pullingATerdon
$ echo "[p]*" ## the string "[p]*"
[p]*

The unquoted [p*] is expanded to the list of matching file names (just one, in this case) and that is what is passed to echo. Yet another reason why you should quote all the things.

Finally, the actual error you show is from the second time you ran the command and it fails at the first step, when trying to create /home/terdon/foo/\e[92mM@r|<, because the previous invocation had already created that directory.


More generally, whenever you find yourself working with arbitrary file names, always use shell globs. Things like this:

for file in *; do command "$file"; done

That will work for any file name. No matter what it happens to contain. In our example above, you could have done:

emacs /home/terdon/*92mM*/pullingATerdon

Any glob that identifies the target file uniquely will do. That way, you don't need to worry about the special characters and can just let the shell handle them.


Some useful references:

  1. How can I find and safely handle file names containing newlines, spaces or both? : One of the FAQs on the excellent Grey Cat's Wiki.

  2. Security implications of forgetting to quote a variable in bash/POSIX shells : the same post I referenced at the beginning of this answer. A great and very detailed explanation of all the things that could go wrong if you fail to quote your shell variables correctly.

  3. Why does my shell script choke on whitespace or other special characters? : everything you ever wanted to know about handling arbitrary file names in the shell.

  4. When is double-quoting necessary? : More about quotes and variables and, specifically, the few cases where you don't need to quote them