Characters allowed in GET parameter
Which characters are allowed in GET parameters without encoding or escaping them? I mean something like this:
http://www.example.org/page.php?name=XYZ
What can you have there instead of XYZ? I think only the following characters:
- a-z (A-Z)
- 0-9
- -
- _
Is this the full list or are there additional characters allowed?
I hope you can help me. Thanks in advance!
Solution 1:
There are reserved characters, that have a reserved meanings, those are delimiters — :/?#[]@
— and subdelimiters — !$&'()*+,;=
There is also a set of characters called unreserved characters — alphanumerics and -._~
— which are not to be encoded.
That means, that anything that doesn't belong to unreserved characters set is supposed to be %-encoded, when they do not have special meaning (e.g. when passed as a part of GET
parameter).
See also RFC3986: Uniform Resource Identifier (URI): Generic Syntax
Solution 2:
The question asks which characters are allowed in GET parameters without encoding or escaping them.
According to RFC3986 (general URL syntax) and RFC7230, section 2.7.1 (HTTP/S URL syntax) the only characters you need to percent-encode are those outside of the query set, see the definition below.
However, there are additional specifications like HTML5, Web forms, and the obsolete Indexed search, W3C recommendation. Those documents add a special meaning to some characters notably, to symbols like = & + ;.
Other answers here suggest that most of the reserved characters should be encoded, including "/" "?". That's not correct. In fact, RFC3986, section 3.4 advises against percent-encoding "/" "?" characters.
it is sometimes better for usability to avoid percent- encoding those characters.
RFC3986 defines query component as:
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component.
The conclusion is that XYZ part should encode:
special: # % = & ;
Space
sub-delims
out of query set: [ ]
non ASCII encodable characters
Unless special symbols = & ; are key=value separators.
Encoding other characters is allowed but not necessary.
Solution 3:
I did a test using the Chrome address bar and a $QUERY_STRING
in bash, and observed the following:
~!@$%^&*()-_=+[{]}\|;:',./?
and grave (backtick)
are passed through as plaintext.
,
"
, <
and >
are converted to %20
, %22
, %3C
and %3E
respectively.
#
is ignored, since it is used by ye olde anchor.
Personally, I'd say bite the bullet and encode with base64 :)
Solution 4:
All of the rules concerning the encoding of URIs (which contains URNs and URLs) are specified in the RFC1738 and the RFC3986, here's a TL;DR of these long and boring documents:
Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a URI under certain circumstances. The characters allowed in a URI are either reserved or unreserved. Reserved characters are those characters that sometimes have special meaning, but they are not the only characters that needs encoding.
There are 66 unreserved characters that doesn't need any encoding:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~
There are 18 reserved characters which needs to be encoded: !*'();:@&=+$,/?#[]
, and all the other characters must be encoded.
To percent-encode a character, simply concatenate "%" and its ASCII value in hexadecimal. The php functions "urlencode" and "rawurlencode" do this job for you.
Solution 5:
From RFC 1738 on which characters are allowed in URLs:
Only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.
The reserved characters are ";", "/", "?", ":", "@", "=" and "&", which means you would need to URL encode them if you wish to use them.