How to Pretty Print HTML to a file, with indentation

Solution 1:

I ended up using BeautifulSoup directly. That is something lxml.html.soupparser uses for parsing HTML.

BeautifulSoup has a prettify method that does exactly what it says it does. It prettifies the HTML with proper indents and everything.

BeautifulSoup will NOT fix the HTML, so broken code, remains broken. But in this case, since the code is being generated by lxml, the HTML code should be at least semantically correct.

In the example given in my question, I will have to do this :

from BeautifulSoup import BeautifulSoup as bs
root = lh.tostring(sliderRoot) #convert the generated HTML to a string
soup = bs(root)                #make BeautifulSoup
prettyHTML = soup.prettify()   #prettify the html

Solution 2:

Though my answer might not be helpful now, I am dropping it here to act as a reference to anybody else in future.

lxml.html.tostring(), indeed, doesn't pretty print the provided HTML in spite of pretty_print=True.

However, the "sibling" of lxml.html - lxml.etree has it working well.

So one might use it as following:

from lxml import etree, html

document_root = html.fromstring("<html><body><h1>hello world</h1></body></html>")
print(etree.tostring(document_root, encoding='unicode', pretty_print=True))

The output is like this:

<html>
  <body>
    <h1>hello world</h1>
  </body>
</html>

Solution 3:

If you store the HTML as an unformatted string, in a variable html_string, it can be done using beautifulsoup4 as follows:

from bs4 import BeautifulSoup
print(BeautifulSoup(html_string, 'html.parser').prettify())

Solution 4:

If adding one more dependency is not a problem, you can use the html5print package. The advantage over the other solutions, is that it also beautifies both CSS and Javascript code embedded in the HTML document.

To install it, execute:

pip install html5print

Then, you can either use it as a command:

html5-print ugly.html -o pretty.html

or as Python code:

from html5print import HTMLBeautifier
html = '<title>Page Title</title><p>Some text here</p>'
print(HTMLBeautifier.beautify(html, 4))

Solution 5:

I tried both BeautifulSoup's prettify and html5print's HTMLBeautifier solutions but since I'm using yattag to generate HTML it seems more appropriate to use its indent function, which produces nicely indented output.

from yattag import indent

rawhtml = "String with some HTML code..."

result = indent(
    rawhtml,
    indentation = '    ',
    newline = '\r\n',
    indent_text = True
)

print(result)