How to decode URL-encoded string in shell?

Solution 1:

Here is a simple one-line solution.

$ function urldecode() { : "${*//+/ }"; echo -e "${_//%/\\x}"; }

It may look like perl :) but it is just pure bash. No awks, no seds ... no overheads. Using the : builtin, special parameters, pattern substitution and the echo builtin's -e option to translate hex codes into characters. See bash's manpage for further details. You can use this function as separate command

$ urldecode https%3A%2F%2Fgoogle.com%2Fsearch%3Fq%3Durldecode%2Bbash
https://google.com/search?q=urldecode+bash

or in variable assignments, like so:

$ x="http%3A%2F%2Fstackoverflow.com%2Fsearch%3Fq%3Durldecode%2Bbash"
$ y=$(urldecode "$x")
$ echo "$y"
http://stackoverflow.com/search?q=urldecode+bash

Solution 2:

If you are a python developer, this maybe preferable:

For Python 3.x(default):

echo -n "%21%20" | python3 -c "import sys; from urllib.parse import unquote; print(unquote(sys.stdin.read()));"

For Python 2.x(deprecated):

echo -n "%21%20" | python -c "import sys, urllib as ul; print ul.unquote(sys.stdin.read());"

urllib is really good at handling URL parsing

Solution 3:

With BASH, to read the per cent encoded URL from standard in and decode:

while read; do echo -e ${REPLY//%/\\x}; done

Press CTRL-D to signal the end of file(EOF) and quit gracefully.

You can decode the contents of a file by setting the file to be standard in:

while read; do echo -e ${REPLY//%/\\x}; done < file

You can decode input from a pipe either, for example:

echo 'a%21b' | while read; do echo -e ${REPLY//%/\\x}; done
  • The read built in command reads standard in until it sees a Line Feed character. It sets a variable called REPLY equal to the line of text it just read.
  • ${REPLY//%/\\x} replaces all instances of '%' with '\x'.
  • echo -e interprets \xNN as the ASCII character with hexadecimal value of NN.
  • while repeats this loop until the read command fails, eg. EOF has been reached.

The above does not change '+' to ' '. To change '+' to ' ' also, like guest's answer:

while read; do : "${REPLY//%/\\x}"; echo -e ${_//+/ }; done
  • : is a BASH builtin command. Here it just takes in a single argument and does nothing with it.
  • The double quotes make everything inside one single parameter.
  • _ is a special parameter that is equal to the last argument of the previous command, after argument expansion. This is the value of REPLY with all instances of '%' replaced with '\x'.
  • ${_//+/ } replaces all instances of '+' with ' '.

This uses only BASH and doesn't start any other process, similar to guest's answer.

Solution 4:

This is what seems to be working for me.

#!/bin/bash
urldecode(){
  echo -e "$(sed 's/+/ /g;s/%\(..\)/\\x\1/g;')"
}

for f in /opt/logs/*.log; do
    name=${f##/*/}
    cat $f | urldecode > /opt/logs/processed/$HOSTNAME.$name
done

Replacing '+'s with spaces, and % signs with '\x' escapes, and letting echo interpret the \x escapes using the '-e' option was not working. For some reason, the cat command was printing the % sign as its own encoded form %25. So sed was simply replacing %25 with \x25. When the -e option was used, it was simply evaluating \x25 as % and the output was same as the original.

Trace:

Original: Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en

sed: Mozilla\x252F5.0\x2520\x2528Macintosh\x253B\x2520U\x253B\x2520Intel\x2520Mac\x2520OS\x2520X\x252010.6\x253B\x2520en

echo -e: Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en

Fix: Basically ignore the 2 characters after the % in sed.

sed: Mozilla\x2F5.0\x20\x28Macintosh\x3B\x20U\x3B\x20Intel\x20Mac\x20OS\x20X\x2010.6\x3B\x20en

echo -e: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en

Not sure what complications this would result in, after extensive testing, but works for now.