How do I manage third-party Python libraries with Google App Engine? (virtualenv? pip?)

What's the best strategy for managing third-party Python libraries with Google App Engine?

Say I want to use Flask, a webapp framework. A blog entry says to do this, which doesn't seem right:

$ cd /tmp/
$ wget http://pypi.python.org/packages/source/F/Flask/Flask-0.6.1.tar.gz
$ tar zxf Flask-0.6.1.tar.gz
$ cp -r Flask-0.6.1/flask ~/path/to/project/
(... repeat for other packages ...)

There must be a better way to manage third-party code, especially if I want to track versions, test upgrades or if two libraries share a subdirectory. I know that Python can import modules from zipfiles and that pip can work with a wonderful REQUIREMENTS file, and I've seen that pip has a zip command for use with GAE.

(Note: There's a handful of similar questions — 1, 2, 3, 4, 5 — but they're case-specific and don't really answer my question.)


Solution 1:

Here's how I do it:

  • project
    • .Python
    • bin
    • lib
      • python2.5
        • site-packages
          • < pip install packages here >
    • include
    • src
      • app.yaml
      • index.yaml
      • main.yaml
      • < symlink the pip installed packages in ../lib/python2.5/site-packages

The project directory is the top level directory where the virtualenv sits. I get the virtualenv using the following commands:

cd project
virtualenv -p /usr/bin/python2.5 --no-site-packages --distribute .

The src directory is where all your code goes. When you deploy your code to GAE, *only* deploy those in the src directory and nothing else. The appcfg.py will resolve the symlinks and copy the library files to GAE for you.

I don't install my libraries as zip files mainly for convenience in case I need to read the source code, which I happen to do a lot just out of curiosity. However, if you really want to zip the libraries, put the following code snippet into your main.py

import sys
for p in ['librarie.zip', 'package.egg'...]:
    sys.path.insert(0, p)

After this you can import your zipped up packages as usual.

One thing to watch out for is setuptools' pkg_resources.py. I copied that directly into my src directory so my other symlinked packages can use it. Watch out for anything that uses entry_points. In my case I'm using Toscawidgets2 and I had to dig into the source code to manually wire up the pieces. It can become annoying if you had a lot of libraries that rely on entry_point.

Solution 2:

What about simply:

$ pip install -r requirements.txt -t <your_app_directory/lib>

Create/edit <your_app_directory>/appengine_config.py:

"""This file is loaded when starting a new application instance."""
import sys
import os.path

# add `lib` subdirectory to `sys.path`, so our `main` module can load
# third-party libraries.
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'lib'))

UPDATE:

Google updated their sample to appengine_config.py, like:

    from google.appengine.ext import vendor
    vendor.add('lib')

Note: Even though their example has .gitignore ignoring lib/ directory you still need to keep that directory under source control if you use git-push deployment method.