Convert a filename to a file:// URL

In WeasyPrint’s public API I accept filenames (among other types) for the HTML inputs. Any filename that works with the built-in open() should work, but I need to convert it to an URL in the file:// scheme that will later be passed to urllib.urlopen().

(Everything is in URL form internally. I need to have a "base URL" for documents in order to resolve relative URL references with urlparse.urljoin().)

urllib.pathname2url is a start:

Convert the pathname path from the local syntax for a path to the form used in the path component of a URL. This does not produce a complete URL. The return value will already be quoted using the quote() function.

The emphasis is mine, but I do need a complete URL. So far this seems to work:

def path2url(path):
    """Return file:// URL from a filename."""
    path = os.path.abspath(path)
    if isinstance(path, unicode):
        path = path.encode('utf8')
    return 'file:' + urlparse.pathname2url(path)

UTF-8 seems to be recommended by RFC 3987 (IRI). But in this case (the URL is meant for urllib, eventually) maybe I should use sys.getfilesystemencoding()?

However, based on the literature I should prepend not just file: but file:// ... except when I should not: On Windows the results from nturl2path.pathname2url() already start with three slashes.

So the question is: is there a better way to do this and make it cross-platform?

For completeness, in Python 3.4+, you should do:

import pathlib

pathlib.Path(absolute_path_string).as_uri()

I'm not sure the docs are rigorous enough to guarantee it, but I think this works in practice:

import urlparse, urllib

def path2url(path):
    return urlparse.urljoin(
      'file:', urllib.pathname2url(path))

Credit to comment from @danodonovan above.

For Python3, the following code will work:

from urllib.parse import urljoin
from urllib.request import pathname2url

def path2url(path):
    return urljoin('file:', pathname2url(path))

Convert a filename to a file:// URL

Related

Recent Posts