Djangocon: challenges when building high profile news sites - Yann Malet

Tags: djangocon, django

(One of the summaries of a talk at the 2014 djangocon.eu).

Yann Malet talks about editorial sites (=newspapers). And especially high profile ones, so ones that see a lot of visitors.

Multi layer cache to protect your database

Django is web scale. No problem. Django won’t be in your way regarding performance. Which means: if you use something else, you’ll run into the very same problems.

A news item on one of their sites got mentioned a lot on twitter and other sites. Their servers even didn’t break a sweat. Caching for the win!

You should add caching. You don’t need to re-calculate the very same page in the very same way for every new visitor. That takes way too much time. Luckily, adding caching for a news site is relatively easy.

Varnish is the first layer of defence. A “web application accelerator”, also known as “a caching HTTP reverse proxy”. A page coming out of the varnish cache will be 10-1000 times faster than calculating it fresh inside django.

Note: building something that caches reliably is very hard. Don’t build it yourself, don’t re-invent it. Just use varnish and configure it.

Some varnish tips:

  • Strip the cookies. Increasing the hit rate (instead of cache misses) is all about reducing. First thing is reducing the “vary on” paramerts. Accept-language is probably needed if your site is in multiple languages. Otherwise remove it.

    What can probably be removed: ``vary on: cookie”. You don’t want a separate page for every separate google analytics cookie. So strip off every cookie instead of csrf or session cookie for logged-in users. (Look in his slides, when available, for the varnish config example).

  • Use “varnish saint mode” (see their blog entry for config examples). It helps in the case your actual django site died. In that case, varnish will just serve the old, stale, content from its cache.

    Your site is down, but several pages, in any case your homepage, will still work. Someone visiting a two year old article will get a 500, but the 2000 visitors per second to your homepage won’t complain.

    There are plenty of reasons why your site could go down: this simple trick will protect your back while you frantically fix it.

  • Add a custom error page. Don’t show the default blank page with the weird “guru meditation” message on it….

  • A well-known quote is “there are two hard problems in computer science: naming things and cache invalidation”. Yeah. Cache invalidation. You can make big mistakes.

    Use “russian doll caching”. You can also cache template snippets in Django. Look at the last modification date of an article. Cache the individual article page items for a long time. Cache the entire article snippet for a shorter time. In the end, cache the whole page (varnish). The good thing is that your site gets faster the more it is used.

    Vary your cache expiration times a bit. If they’re all set to the same time, that single timestep might give you a lot of grief.

Oops: if your cache becomes central to your site’s speed, a crashing cache server will ruin your day. Wrap the django cache backend in a try/except, for instance.

Always add a parameter that you can add to your view to bust the cache. This way you can make sure your editor sees the absolute freshest page possible when needed.

Use “johnny cache” to cache database queries. And use the cached session backend.

Image management on responsive sites

If you have a responsive design, you suddenly have 3x more image sizes. Desktop, tablet, mobile.

There are plenty of tools in django that help with image management. Django-filer, easy-thumbnails, cloudfiles.

If you store image files in the cloud (which is often the case), the assumption of fast and reliable disk access should be forgoten. So… make sure you log the duration of every image operation. Otherwise you miss optimization opportunities.

See https://github.com/django-cumulus/django-cumulus for what they’re using.

Thumbnails: especially thumbnails need to be generated into a lot of different sizes. So once a new image is being added, immediately fire off a task that generates all the thumbnail sizes.

Devops

Configuration management: pick one that fits your brain and skillset. Puppet, chef, ansible, salt. At Lincoln Loop (his company) they use salt.

Note: they have a kickstarter project for a book they want to write about this subject.

http://reinout.vanrees.org/images/2014/django6.jpg

My daughter next to a French “mallet” loco in 2007.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):