PHP $_SERVER['HTTP_HOST'] vs. $_SERVER['SERVER_NAME'], am I understanding the man pages correctly?

Solution 1:

That’s probably everyone’s first thought. But it’s a little bit more difficult. See Chris Shiflett’s article SERVER_NAME Versus HTTP_HOST.

It seems that there is no silver bullet. Only when you force Apache to use the canonical name you will always get the right server name with SERVER_NAME.

So you either go with that or you check the host name against a white list:

$allowed_hosts = array('foo.example.com', 'bar.example.com');
if (!isset($_SERVER['HTTP_HOST']) || !in_array($_SERVER['HTTP_HOST'], $allowed_hosts)) {
    header($_SERVER['SERVER_PROTOCOL'].' 400 Bad Request');
    exit;
}

Solution 2:

Just an additional note - if the server runs on a port other than 80 (as might be common on a development/intranet machine) then HTTP_HOST contains the port, while SERVER_NAME does not.

$_SERVER['HTTP_HOST'] == 'localhost:8080'
$_SERVER['SERVER_NAME'] == 'localhost'

(At least that's what I've noticed in Apache port-based virtualhosts)

As Mike has noted below, HTTP_HOST does not contain :443 when running on HTTPS (unless you're running on a non-standard port, which I haven't tested).

Solution 3:

Use either. They are both equally (in)secure, as in many cases SERVER_NAME is just populated from HTTP_HOST anyway. I normally go for HTTP_HOST, so that the user stays on the exact host name they started on. For example if I have the same site on a .com and .org domain, I don't want to send someone from .org to .com, particularly if they might have login tokens on .org that they'd lose if sent to the other domain.

Either way, you just need to be sure that your webapp will only ever respond for known-good domains. This can be done either (a) with an application-side check like Gumbo's, or (b) by using a virtual host on the domain name(s) you want that does not respond to requests that give an unknown Host header.

The reason for this is that if you allow your site to be accessed under any old name, you lay yourself open to DNS rebinding attacks (where another site's hostname points to your IP, a user accesses your site with the attacker's hostname, then the hostname is moved to the attacker's IP, taking your cookies/auth with it) and search engine hijacking (where an attacker points their own hostname at your site and tries to make search engines see it as the ‘best’ primary hostname).

Apparently the discussion is mainly about $_SERVER['PHP_SELF'] and why you shouldn't use it in the form action attribute without proper escaping to prevent XSS attacks.

Pfft. Well you shouldn't use anything in any attribute without escaping with htmlspecialchars($string, ENT_QUOTES), so there's nothing special about server variables there.

Solution 4:

This is a verbose translation of what Symfony uses to get the host name (see the second example for a more literal translation):

function getHost() {
    $possibleHostSources = array('HTTP_X_FORWARDED_HOST', 'HTTP_HOST', 'SERVER_NAME', 'SERVER_ADDR');
    $sourceTransformations = array(
        "HTTP_X_FORWARDED_HOST" => function($value) {
            $elements = explode(',', $value);
            return trim(end($elements));
        }
    );
    $host = '';
    foreach ($possibleHostSources as $source)
    {
        if (!empty($host)) break;
        if (empty($_SERVER[$source])) continue;
        $host = $_SERVER[$source];
        if (array_key_exists($source, $sourceTransformations))
        {
            $host = $sourceTransformations[$source]($host);
        } 
    }

    // Remove port number from host
    $host = preg_replace('/:\d+$/', '', $host);

    return trim($host);
}

Outdated:

This is my translation to bare PHP of a method used in Symfony framework that tries to get the hostname from every way possible in order of best practice:

function get_host() {
    if ($host = $_SERVER['HTTP_X_FORWARDED_HOST'])
    {
        $elements = explode(',', $host);

        $host = trim(end($elements));
    }
    else
    {
        if (!$host = $_SERVER['HTTP_HOST'])
        {
            if (!$host = $_SERVER['SERVER_NAME'])
            {
                $host = !empty($_SERVER['SERVER_ADDR']) ? $_SERVER['SERVER_ADDR'] : '';
            }
        }
    }

    // Remove port number from host
    $host = preg_replace('/:\d+$/', '', $host);

    return trim($host);
}

Solution 5:

Is it "safe" to use $_SERVER['HTTP_HOST'] for all links on a site without having to worry about XSS attacks, even when used in forms?

Yes, it's safe to use $_SERVER['HTTP_HOST'], (and even $_GET and $_POST) as long as you verify them before accepting them. This is what I do for secure production servers:

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
$reject_request = true;
if(array_key_exists('HTTP_HOST', $_SERVER)){
    $host_name = $_SERVER['HTTP_HOST'];
    // [ need to cater for `host:port` since some "buggy" SAPI(s) have been known to return the port too, see http://goo.gl/bFrbCO
    $strpos = strpos($host_name, ':');
    if($strpos !== false){
        $host_name = substr($host_name, $strpos);
    }
    // ]
    // [ for dynamic verification, replace this chunk with db/file/curl queries
    $reject_request = !array_key_exists($host_name, array(
        'a.com' => null,
        'a.a.com' => null,
        'b.com' => null,
        'b.b.com' => null
    ));
    // ]
}
if($reject_request){
    // log errors
    // display errors (optional)
    exit;
}
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
echo 'Hello World!';
// ...

The advantage of $_SERVER['HTTP_HOST'] is that its behavior is more well-defined than $_SERVER['SERVER_NAME']. Contrast ➫➫:

Contents of the Host: header from the current request, if there is one.

with:

The name of the server host under which the current script is executing.

Using a better defined interface like $_SERVER['HTTP_HOST'] means that more SAPIs will implement it using reliable well-defined behavior. (Unlike the other.) However, it is still totally SAPI dependent ➫➫:

There is no guarantee that every web server will provide any of these [$_SERVER entries]; servers may omit some, or provide others not listed here.

To understand how to properly retrieve the host name, first and foremost you need to understand that a server which contains only code has no means of knowing (pre-requisite for verifying) its own name on the network. It needs to interface with a component that supplies it its own name. This can be done via:

  • local config file

  • local database

  • hardcoded source code

  • external request (curl)

  • client/attacker's Host: request

  • etc

Usually its done via the local (SAPI) config file. Note that you have configured it correctly, e.g. in Apache ➫➫:

A couple of things need to be 'faked' to make the dynamic virtual host look like a normal one.

The most important is the server name which is used by Apache to generate self-referential URLs, etc. It is configured with the ServerName directive, and it is available to CGIs via the SERVER_NAME environment variable.

The actual value used at run time is controlled by the UseCanonicalName setting.

With UseCanonicalName Off the server name comes from the contents of the Host: header in the request. With UseCanonicalName DNS it comes from a reverse DNS lookup of the virtual host's IP address. The former setting is used for name-based dynamic virtual hosting, and the latter is used for** IP-based hosting.

If Apache cannot work out the server name because there is no Host: header or the DNS lookup fails then the value configured with ServerName is used instead.