How to properly create HTML links in PHP?

This question is about the proper use of rawurlencode, http_build_query & htmlspecialchars.

Until now my standard way of creating HTML link in vanilla PHP was this:

$qs = [
    'foo' => 'foo~bar',
    'bar' => 'bar foo',
];
echo '<a href="?' . http_build_query($qs) . '">Link</a>';

Recently I have learned that this is not 100% correct. Here are few issues:

  • http_build_query uses by default PHP_QUERY_RFC1738 instead of PHP_QUERY_RFC3986. RFC3986 is the standard and superseded RFC1738 which in PHP is only kept for legacy use.
  • While the "special" HTML characters in the key and value part will be encoded to the percent-encoded representation, the argument separator will be an ampersand. In most sane situations this would not be a problem, but sometimes your key name might be quot; and then your link will become invalid:

    $qs = [
        'a' => 'a',
        'quot;' => 'bar',
    ];
    echo '<a href="?' . http_build_query($qs) . '">Link</a>';
    

    The code above will generate this link: ?a=a"%3B=bar!
    IMO this implies that the function http_build_query needs to be called context-aware with the 3-rd argument &amp; when in HTML, and with just & when in header('Location: ...');. Another option would be to pass it through htmlspecialchars before displaying in HTML.

  • PHP manual for urlencode (which should be deprecated long time ago IMO) suggests to encode only the value part of query string and then pass the whole query string through htmlentities before displaying in HTML. This looks very incorrect to me; the key part could still contain forbidden URL characters.

    $query_string = 'foo=' . urlencode($foo) . '&bar=' . urlencode($bar);
    echo '<a href="mycgi?' . htmlentities($query_string) . '">';
    

My conclusion is to do something along this lines:

$qs = [
    'a' => 'a',
    'quot;' => 'bar foo',
];
echo '<a href="?' . http_build_query($qs, null, '&amp;', PHP_QUERY_RFC3986) . '">Link</a>';

What is the recommended way to create HTML links in PHP? Is there an easier way than what I came up with? Have I missed any crucial points?


Solution 1:

How to dynamically build HTML links with query string?

If you need to create query string to be used in HTML link (e.g. <a href="index.php?param1='.$value.'">Link</a>) then you should use http_build_query. This function accepts 4 parameters, with the first one being an array/object of query data. For the most part the other 3 parameters are irrelevant.

$qs = [
    'a' => 'a',
    'quot;' => 'bar foo',
];
echo '<a href="?' . http_build_query($qs) . '">Link</a>';

However, you should still pass the output of the function through htmlspecialchars to encode the & correctly. "A good framework will do this automatically, like Laravel's {{ }}"

echo '<a href="?' . htmlspecialchars(http_build_query($qs)) . '">Link</a>';

Alternatively you can pass the third argument to http_build_query as '&amp;', leaving the second one null. This will use &amp; instead of & which is what htmlspecialchars would do.

About spaces.
For use in form data (i.e. query strings) the space should be encoded as + and in any other place it should be encoded as %20 e.g. new%20page.php?my+field=my+val. This is to ensure backwards comparability with all browsers. You can use the newer RFC3986 which will encode the spaces as %20 and it will work in all common browsers as well as be up to date with modern standards.

echo '<a href="?' . http_build_query($qs, null, '&amp;', PHP_QUERY_RFC3986) . '">Link</a>';

rawurlencode vs urlencode

For any part of URL before ? you should use rawurlencode. For example:

$subdir = rawurlencode('blue+light blue');
echo '<a href="'.$subdir.'/index.php">rawurlencode</a>';

If in the above example you used urlencode the link would be broken. urlencode has very limited use and should be avoided.

Do not pass whole URL through rawurlencode. Separators / and other special characters in URL should not be encoded if they are to fulfil their function.


Footnote

There is no general agreement on the best practices for using http_build_query, other than the fact it should be passed through htmlspecialchars just like any other output in HTML context.

Laravel uses http_build_query($array, null, '&', PHP_QUERY_RFC3986)

CodeIgniter uses http_build_query($query)

Symfony uses http_build_query($extra, '', '&', PHP_QUERY_RFC3986)

Slim uses http_build_query($queryParams)

CakePHP uses http_build_query($query)

Twig uses http_build_query($url, '', '&', PHP_QUERY_RFC3986)