Why does HTML require that multiple spaces show up as a single space in the browser?

I have long recognized that any set of whitespace in an HTML file will only be displayed as a single space. For instance, this:

<p>Hello.        Hello. Hello. Hello.                       Hello.</p>

displays as:

Hello. Hello. Hello. Hello. Hello.

This is perfectly fine, as if you need multiple spaces of pre-formatted text you can just use the <pre> tag. But what is the reason? More precisely, why is this in the specification for HTML?


Solution 1:

Spaces are compacted in HTML because there's a distinction between how HTML is formatted and how it should be rendered. Consider a page like this:

<html>
    <body>
        <a href="mylink">A link</a>
    </body>
</html>

If the HTML was indented using spaces for example, the link would be preceded by several spaces.

Solution 2:

To try to address the "why" it may be because HTML was based on SGML which had specified it that way. It was in turn based on GML from the early 60's. The reason for white space handling could very well be because data was entered one "card" at a time back then which could result in undesired breakup of sentences and paragraphs. One difference in the old GML is that it specified that there has to be two spaces between sentences (like the old typewriter rules) which may have established a precedenct that spaces are independent of the markup.

Solution 3:

As others have said, it's in the HTML specification.

If you want to preserve whitespace in output, you can use the <pre> tag:

<pre>This     text has              extra spaces

and

    newlines</pre>

But this will also generally display the text in a different font.

Solution 4:

Not only is it in the specification, but there is some sense to it. If spaces weren't compacted, you would have to put all your html on a single line. so something like this:

<div>
    <h1>Title</h1>
    <p>
       This is some text
       <a href="#">Read More</a>
    </p>
</div>

Would have some strange alignment with spaces all over the place. The only way to get it right would be to compact that code, which would be difficult to maintain.