How can I encode and decode percent-encoded strings on the command line?
How can I encode and decode percent-encoded (URL encoded) strings on the command line?
I'm looking for a solution that can do this:
$ percent-encode "ændrük"
%C3%A6ndr%C3%BCk
$ percent-decode "%C3%A6ndr%C3%BCk"
ændrük
These commands do what you want (using Python 2):
python -c "import urllib, sys; print urllib.quote(sys.argv[1])" æ
python -c "import urllib, sys; print urllib.unquote(sys.argv[1])" %C3%A6
If you want to encode spaces as +
, replace urllib.quote
with urllib.quote_plus
.
I'm guessing you will want to alias them ;-)
shell
Try the following command line:
$ echo "%C3%A6ndr%C3%BCk" | sed 's@+@ @g;s@%@\\x@g' | xargs -0 printf "%b"
ændrük
You may define it as alias and add it to your shell rc files:
$ alias urldecode='sed "s@+@ @g;s@%@\\\\x@g" | xargs -0 printf "%b"'
Then every time when you need it, simply go with:
$ echo "http%3A%2F%2Fwww" | urldecode
http://www
bash
When scripting, you can use the following syntax:
input="http%3A%2F%2Fwww"
decoded=$(printf '%b' "${input//%/\\x}")
However above syntax won't handle pluses (+
) correctly, so you've to replace them with spaces via sed
.
You can also use the following urlencode()
and urldecode()
functions:
urlencode() {
# urlencode <string>
local length="${#1}"
for (( i = 0; i < length; i++ )); do
local c="${1:i:1}"
case $c in
[a-zA-Z0-9.~_-]) printf "$c" ;;
*) printf '%%%02X' "'$c"
esac
done
}
urldecode() {
# urldecode <string>
local url_encoded="${1//+/ }"
printf '%b' "${url_encoded//%/\\x}"
}
Note that your urldecode() assumes the data contains no backslash.
bash + xxd
Bash function with xxd
tool:
urlencode() {
local length="${#1}"
for (( i = 0; i < length; i++ )); do
local c="${1:i:1}"
case $c in
[a-zA-Z0-9.~_-]) printf "$c" ;;
*) printf "$c" | xxd -p -c1 | while read x;do printf "%%%s" "$x";done
esac
done
}
Found in cdown's gist file, also at stackoverflow.
Python
Try to define the following aliases:
alias urldecode='python -c "import sys, urllib as ul; print ul.unquote_plus(sys.argv[1])"'
alias urlencode='python -c "import sys, urllib as ul; print ul.quote_plus(sys.argv[1])"'
Usage:
$ urlencode "ændrük"
C%26ndrC%3Ck
$ urldecode "%C3%A6ndr%C3%BCk"
ændrük
Source: ruslanspivak
PHP
Using PHP you can try the following command:
$ echo oil+and+gas | php -r 'echo urldecode(fgets(STDIN));' // Or: php://stdin
oil and gas
or just:
php -r 'echo urldecode("oil+and+gas");'
Use -R
for multiple line input.
Perl
In Perl you can use URI::Escape
.
decoded_url=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0])' "$encoded_url")
Or to process a file:
perl -i -MURI::Escape -e 'print uri_unescape($ARGV[0])' file
sed
Using sed
can be achieved by:
cat file | sed -e's/%\([0-9A-F][0-9A-F]\)/\\\\\x\1/g' | xargs echo -e
awk
Try anon solution:
awk -niord '{printf RT?$0chr("0x"substr(RT,2)):$0}' RS=%..
See: Using awk printf to urldecode text.
decoding file names
If you need to remove url encoding from the file names, use deurlname
tool from renameutils
(e.g. deurlname *.*
).
See also:
- Can wget decode uri file names when downloading in batch?
- How to remove URI encoding from file names?
Related:
- How to decode URL-encoded string in shell? at SO
- Decoding URL encoding (percent encoding) at unix SE
Percent-encode reserved URI characters and non-ASCII characters
jq -s -R -r @uri
-s
(--slurp
) reads input lines into an array and -s -R
(--slurp --raw-input
) reads the input into a single string. -r
(--raw-output
) outputs the contents of strings instead of JSON string literals.
Percent-encode all characters
xxd -p|tr -d \\n|sed 's/../%&/g'
tr -d \\n
removes the linefeeds that are added by xxd -p
after every 60 characters.
Percent-encode all characters except ASCII alphanumeric characters in Bash
eu () {
local LC_ALL=C c
while IFS= read -r -n1 -d '' c
do
if [[ $c = [[:alnum:]] ]]
then
printf %s "$c"
else
printf %%%02x "'$c"
fi
done
}
Without -d ''
this would skip linefeeds and null bytes. Without IFS=
this would replace characters in IFS
with %00
. Without LC_ALL=C
this would for example replace あ
with %3042
in a UTF-8 locale.
Pure bash solution for decoding only:
$ a='%C3%A6ndr%C3%BCk'
$ echo -e "${a//%/\\x}"
ændrük