Large django sites at mozilla - Andy McKay (djangocon.eu)

Tags: djangocon, django

Andy McKay is someone I remember mostly as a Plone/Zope guy. One of the first authors of about about those. So I was intererested on his perspective of django!

He’s a Canadian and he wants us to remember the three most important facts about Canada’s history:

  • The canadian woman ice hockey team beat the USA at the olympics.

  • The canadian man ice hockey team beat the USA at the olympics.

  • The Canucks are going to hammer some other club tonight.

Now about mozilla. All the new sites are build with django. For instance the support website and the main add-ons website. The last one is what the talk is about. All the code is opensource at https://github.com/jbalogh/zamboni and all bugs are open at https://bugzilla.org .

The old site was cakephp, the new django implementation is called zamboni (a zamboni fixes up the ice in a ice hockey stadium after use, he’s a Canadian after all). (The Dutchman in me wants to tell that zambonis are also used for speed skating rings, btw.)

Some stats about the add-ons site:

  • 250k add-ons

  • 150 million views per month

  • 500+ million api hits per day (firefox checking for updates!)

What he uses for caching in django is mostly cache machine, which works great. From a developer point of view it is awsome, you hardly have to think about it.

The hard part about caching is invalidation. What they do for instance is to take an md5 hash of an sql query. They couple this with the objects that are returned. Changes to one of those objects means nuking the related md5 hashed sql query from memcached.

For query optimizations they use something more fancy than the default stuff: queryset-transform. It is a really comfortable way to optimize your sql queries. The drawback is that they can get big. They have a 14000 line monster, but you don’t have to read them, right? :-)

Incoming requests hit django with some 1600 uncached requests per second. The php script managed 600. (He once worked with plone, btw, and was lucky to get 2 requests per second at the time).

They iterated through a few versions. Starting out with basic django models. Then followed by some optimization to limit the amount of queries. Then putting in the plain sql script from php. Then hitting a 200-limit max number of wsgi requests the ops people put in. So they splitted out this one request from the regular django site and put in sqlalchemy. Sqlalchemy database pooling and optimized queries got them to 700 requests per second per instance. So this one bit was done outside of django, but with python.

Tip: push things to async! Sending email, image processing, add-on validation and so on. Cellery helps a lot here and it isn’t hard to do. He got it running with their code in about an hour.

They’re using some custom wsgi middleware, for instance for timing request duration. Helps a lot in debugging and uncovering errors. Other things they use for statistics and realtime graphing is graphite and statsd.

They have a lot of translations and for that, jinja’s template language is better and more helpful than django’s default template language (which looks the same). You can now integrate it quite well with django. Two tools that help:

  • jingo, an adapter to use jinja2 in django.

  • jingo-minify: “Concat and minify JS and CSS for Jinja2+Jingo+Django”.

Additional tips:

html and stripping out

And he had a couple of other tips, but I can’t write down everything :-) Good, content-packed talk with lots of pointers I want to check out later. Especially the caching part.

German narrow gauge railway
 
vanrees.org logo

About me

My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):