I have a Python project in which I am using many non-code files. Currently these are all images, but I might use other kinds of files in the future. What would be a good scheme for storing and referencing these files?

I considered just making a folder "resources" in the main directory, but there is a problem; Some images are used from within sub-packages of my project. Storing these images that way would lead to coupling, which is a disadvantage.

Also, I need a way to access these files which is independent on what my current directory is.


Solution 1:

You may want to use pkg_resources library that comes with setuptools.

For example, I've made up a quick little package "proj" to illustrate the resource organization scheme I'd use:

proj/setup.py
proj/proj/__init__.py
proj/proj/code.py
proj/proj/resources/__init__.py
proj/proj/resources/images/__init__.py
proj/proj/resources/images/pic1.png
proj/proj/resources/images/pic2.png

Notice how I keep all resources in a separate subpackage.

"code.py" shows how pkg_resources is used to refer to the resource objects:

from pkg_resources import resource_string, resource_listdir

# Itemize data files under proj/resources/images:
print resource_listdir('proj.resources.images', '')
# Get the data file bytes:
print resource_string('proj.resources.images', 'pic2.png').encode('base64')

If you run it, you get:

['__init__.py', '__init__.pyc', 'pic1.png', 'pic2.png']
iVBORw0KGgoAAAANSUhE ...

If you need to treat a resource as a fileobject, use resource_stream().

The code accessing the resources may be anywhere within the subpackage structure of your project, it just needs to refer to subpackage containing the images by full name: proj.resources.images, in this case.

Here's "setup.py":

#!/usr/bin/env python

from setuptools import setup, find_packages

setup(name='proj',
      packages=find_packages(),
      package_data={'': ['*.png']})

Caveat: To test things "locally", that is w/o installing the package first, you'll have to invoke your test scripts from directory that has setup.py. If you're in the same directory as code.py, Python won't know about proj package. So things like proj.resources won't resolve.

Solution 2:

You can always have a separate "resources" folder in each subpackage which needs it, and use os.path functions to get to these from the __file__ values of your subpackages. To illustrate what I mean, I created the following __init__.py file in three locations:

c:\temp\topp        (top-level package)
c:\temp\topp\sub1   (subpackage 1)
c:\temp\topp\sub2   (subpackage 2)

Here's the __init__.py file:

import os.path
resource_path = os.path.join(os.path.split(__file__)[0], "resources")
print resource_path

In c:\temp\work, I create an app, topapp.py, as follows:

import topp
import topp.sub1
import topp.sub2

This respresents the application using the topp package and subpackages. Then I run it:

C:\temp\work>topapp
Traceback (most recent call last):
  File "C:\temp\work\topapp.py", line 1, in 
    import topp
ImportError: No module named topp

That's as expected. We set the PYTHONPATH to simulate having our package on the path:

C:\temp\work>set PYTHONPATH=c:\temp

C:\temp\work>topapp
c:\temp\topp\resources
c:\temp\topp\sub1\resources
c:\temp\topp\sub2\resources

As you can see, the resource paths resolved correctly to the location of the actual (sub)packages on the path.

Update: Here's the relevant py2exe documentation.

Solution 3:

The new way of doing this is with importlib. For Python versions older than 3.7 you can add a dependency to importlib_resources and do something like

from importlib_resources import files


def get_resource(module: str, name: str) -> str:
    """Load a textual resource file."""
    return files(module).joinpath(name).read_text(encoding="utf-8")

If your resources live inside the foo/resources sub-module, you would then use get_resource like so

resource_text = get_resource('foo.resources', 'myresource')

Solution 4:

@ pycon2009, there was a presentation on distutils and setuptools. You can find all of the videos here

Eggs and Buildout Deployment in Python - Part 1

Eggs and Buildout Deployment in Python - Part 2

Eggs and Buildout Deployment in Python - Part 3

In these videos, they describe how to include static resources in your package. I believe its in part 2.

With setuptools, you can define dependancies, this would allow you to have 2 packages that use resources from 3rd package.

Setuptools also gives you a standard way of accessing these resources and allows you to use relative paths inside of your packages, which eliminates the need to worry about where your packages are installed.