Djangocon: really, really fast django - Christophe Pettus¶

(One of the summaries of a talk at the 2014 djangocon.eu.)

Christophe Pettus works for http://pgexperts.com . So… postgresql expert! I’ve got two other talks of him postgresql when it is not your job and advanced postgresql in django.

How fast django anyway? You hear things like “the orm is slow” or “the template engine is old”. So… when in doubt, measure. How high-overhead are django’s components? You’ll need to have a good test. Just returning an empty response is nonsense. He did a bunch of tests and django holds up pretty well. The overhead is very low.

Most ORM operations are O(N) on number of fields. So on average, a 1-field model will be 10 times faster than one with 10 fields.
Updating an update? Use the ORM update method instead of looping over items. Much faster.
Using manual SQL can be a little bit faster than the ORM, but it is neglegible.
Django’s basic request loop is plenty fast.
Request/response cycles to the database generally swamp everything else.
Always do bulk and batch operations instead of iterating.
Don’t use components you don’t need. Often you load a lot of libraries. Do you really need a full REST library or do you really only need a json serializer, for instance?
Middleware should be your last resort.
Caching can help a lot, but it is complex. Whole-page frontend caching (nginx, varnish), template caching (whole page or fragments), intermediate results using django’s cache.

First: measure. Don’t just throw everything at the wall and see what sticks.

Caches will be inconsistent and invalid, just deal with that. Don’t overdo the cache invalidation, it can actually slow your system down.
Start low, work up. Start with data-level caching and work your way up from there.
In case you’ve got a content-heavy site like a news site or a CMS: in those cases, template level caching and full-page-caching helps the most.
Watch out for the thundering herd problem. It happens when lots of requests all try to grab a fresh copy after the cache has been invalidated. Just accept a stale resource. Separate the cache rebuilding from returning results.
Template rendering time is proportional to the number of variables.
Don’t worry about front end servers like uwsgi, gunicorn and so on. Other problems are more worth your time, the performance is more or less the same.
Processes versus threads. There are no guidelines, just rules of thumbs: one process per CPU execution unit and 2-4 threads per processor.
Remember: the public internet is far slower than your application.
Your old optimilizations still apply. Most of the time for a request is actually spend inside the browser after it gets the first byte of your page. So trim your html page. Don’t include the full bootstrap library, but trim it to just what you need.
Avoid a large flurry of javascript requests back to the server for the initial page load. Each one has the full round-trip latency of the first request! Reduce the amount of calls.
Browser are quite friendly, so give them proper expiry headers, especially for static content.
Use a CDN. Serving common static content is a horrible use of your bandwidth.
Some things look great but aren’t:
- ETAG. OK for precomputed content, but not for dynamic content.
- Template fragment caching. It is good for specific big expensive parts of templates and silly for small snippets.
DNS servers are underappreciated. They’re a surprising large contributor to page load time. Use a specialist DNS service (like easydns).
A good tip: let the template drive your data acquisition. Don’t load data you don’t need. Passing in a queryset instead of evaluated data is often better.
Redis is good for basic cache storage.
Consider full prerendering. Build the entire page and cache it to disk. Or perhaps even let the webserver serve it directly!
If a big page contains a little bit of customized content, you could pre-render the big page and use javascript to pull in the small dynamic bit.
Never hand large files to the client from within django. Use xsendfile or the nginx equivalent and let the webserver handle it.
Don’t run asynchronous tasks in your view functions. Sending files, fetching from other sites, etc.
Keep modles simple and focused. Remember the ORM is O(N) on number of fields.
If you have objects that contain both often-changing data and largely-static data: split them. Don’t be afraid of foreign keys.
Keep transactions short and to the point. And never wait for an asynchronous event within an open transaction.
Don’t iterate over large querysets.
Do joins in the database, not in python.
The database is often not the problem. It is your friend. But do try to limit the amount of round trips: try grabbing everything in one go.
Don’t store session data in the database! Same with celery.
If you use postgresql, use streaming replication when doing load balancing.
Really nice: use a django database router to write to a master and read from a bunch of slaves.
If you have more than one secondary database, use pgpool.

Django can handle massive server-melting loads. No problem.

http://reinout.vanrees.org/images/2014/Gare_du_nord.jpg

Paris station “gare du Nord”

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):