Buildout/setuptools speed improvement

Tags: python, buildout, django

Sometimes buildout can seem to hang almost forever, even though you’ve set a timeout. It happens when installing the project itself.

  • Buildout effectively calls python setup.py develop on your project.

  • Setuptools then first calls os.walk() and builds a complete list of files.

  • Only then does it read your MANIFEST.in and some setup.py settings to determine which files to exclude and include.

  • If you have lots of files inside your directory, this might take a long time.

The key thing here is that it first builds a complete list. This includes that BUILDOUT_DIR/var/something directory with 245.934 files. And the BUILDOUT_DIR/node_modules/ with your full npm-downloaded stack of javascript with 10.000 files. And the BUILDOUT_DIR/var/project/ symlink to some slow windows share.

Ouch. A buildout that normally takes half a minute can take two hours this way…

The setuptools bug reports:

Intermediate solution: place this monkeypatch_setuptools.py next to your setup.py:

import os

TO_OMIT = ['var', '.git', 'parts',
           'bower_components',
           'node_modules', 'eggs',
           'bin', 'develop-eggs']

orig_os_walk = os.walk

def patched_os_walk(path, *args, **kwargs):
    for (dirpath, dirnames, filenames) in orig_os_walk(path, *args, **kwargs):
        if '.git' in dirnames:
            # We're probably in our own root directory.
            print("MONKEY PATCH: omitting a few directories like var/...")
            dirnames[:] = list(set(dirnames) - set(TO_OMIT))
        yield (dirpath, dirnames, filenames)

os.walk = patched_os_walk
# ^^^ This only modifies os.walk for the duration of calling setup.py

And then import the monkeypatch right at the top of your setup.py:

from setuptools import setup
import monkeypatch_setuptools

version = '0.3.dev0'

setup(
    name='your-package',
    version=version,
    ...

It works quite well! Some notes:

  • Adjust the TO_OMIT to your needs and local conventions.

  • There’s a check if '.git' in dirnames in the monkey patch. I use that to detect whether os.walk is currently in a directory with .git, which normally means our own base directory. So it’ll only strip out directories in there and not somewhere else. Just a small safety valve. You’ll have to adjust it if you use mercurial or something else.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):