http://12factor.net/ is often quoted as a standard shopping list if you want to get your deployment right. At least in the Python web world it is, it seems to me.
I’m currently looking at the way we deploy stuff at our company (Nelen & Schuurmans). Partially by hand, lots via fabic, increasing use of ansible. And many infrastructure-parts like the main web proxy config essentially by hand, aided by scripts.
Not everything in the 12 factor app list is needed for us, but it helps me think about what we need to keep and what we need to improve.
One codebase tracked in revision control, many deploys. We use git/github well. We also have multiple deploys, this works OK.
Explicitly declare and isolate dependencies. Python packages and buildout. Pinning. Works fine. A few projects are less tidy, though, with git branch checkouts instead of tidy packages.
On the whole, the way we compose the actual project works fine.
Store config in the environment.
DATABASES out of
We do this wrong now. Environment settings or, perhaps better, configuration
/etc/xyz/sitename/ directory, is the way to go.
Important: it suggest that “deploying the site” is a separate step from “get
the evironment on the server”, right? You haven’t won anything if the
database setting is outside of your
settings.py but inside an ansible
config file within the very same git project!
Treat backing services as attached resources. A database or a directory/share is something that is provided to you. So unless a site’s goal is to manage shares, it ideally should do any managing or setting-up of shares.
Such a backing service must then of course be easy to get at and it must be something you can depend on. Once creating a database is out of your hands, it is easy to get frustrated if something is wrong or if the process is slow. This is where “reliability” and “ease of use” and “transparency” start to get really important.
What about tools like memcached? Normally it is run on the webserver itself and often it is installed by fabric/ansible which installs the site. Should this be an external backing service or is it OK to have it run locally? Same with a local celery install, for instance. TODO: thinkwork :-)
Strictly separate build and run stages. This we don’t normally do.
bin/buildout on the server will happily compile python packages when
bin/grunt will collect its packages. Ideally, the download
and compile step happens somewhere else.
Bundling the eggs and/or wheels could help. You can generate a debian package out of a buildout, too. Or simply zip up the whole shebang. Depending on globally installed packages could remove the need for custom compilation.
Two of our projects are using automatic integration deployments when the tests all run OK in Jenkins… Perhaps we can use the output of this to skip the build stage on the production server?
Execute the app as one or more stateless processes. The way we run things, we like to have both a running gunicorn and an nginx configuration and perhaps a celery deamon… That’s more than one stateless process.
The stict separation advocated by 12factor is probably really useful for their (=heroku’s) server setup with many independent customers. It is less useful/necessary for us.
Export services via port binding. In the end, what goes out is mostly the nginx port 80. So this is OK.
On the other hand… we do some hardcoded IP address backchannel idiocy somewhere. And in another project much stuff is harder-coupled than advocated by 12factor. Some of it is by necessity, some can be avoided.
Scale out via the process model. In a way we do, with multiple gunicorn processes. In a way we don’t, as the only real service that runs on more than one machine is our “lizard5” website. And lizard5 is awfully mis-configured for that usecase (non-shared folders, faulty caching). I still need to fix that.
Maximize robustness with fast startup and graceful shutdown. Nginx, gunicorn: we’re ok. Restarting a server is normally not a big deal.
Restarting the gunicorns works through buildout-installed cronjobs with an
@reboot time. Turning that into a proper system-level service would be
Keep development, staging, and production as similar as possible.
Everyone uses ubuntu, that’s quite a good start. Custom installed versions
and PPAs and manual
sudo easy_install -U .... tend to mess it up,
Ansible is sometimes used to manage both development and staging and production. Take care to not do too much here: setting up a database locally for development can be fine, but it won’t match the production environment where the database is something outside of your direct control that you should threat as a “backing service”. And personal development environments are prone to personal preferences. Time will need to tell what the sweet spot is.
Docker of course is something that is hard to match regarding development/production parity. Configuration is a problem here, as you don’t want to have so much parity that you’re developing against the production database! Time will tell.
Treat logs as event streams. This is something we don’t do yet.
Everything is logged to
/srv/sitename/var/log. Do we
need a infrastructure-wide logging server that everyone can send its logs
to? Europython and djangocon.eu talks seem to suggest that it is a very
handy addition to sentry!
Run admin/management tasks as one-off processes. We’re mostly OK here with django’s management commands and other scripts we have lying around in our buildouts.
My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):