Django under the hood: debugging performance - Aymeric Augustin

Tags: django, djangocon

(One of my summaries of a talk at the 2016 django under the hood conference).

Performance? Partially it is a question of perception. Up to 0.1s is “reacting instantaneously”. Up to 1s: “not interrupting the user’s flow of thought”. Up to 10 seconds is slow, but the user might keep waiting. More than 10 seconds and they’re off to check facebook.

To optimize something, we have to measure it. For instance page load time.

  • You could use your browser’s development tools to see how quick a page loads. But you’re measuring that on your fast development laptop.

  • In chrome, you can make your internet connection worse and simulate 3G speed, for instance.

  • Google analytics site speed can be used, it measures it in 1% of the cases a user looks at your page.

Performance timeline

Let’s look at what happens for one single request.

  • DNS lookup. This can take a surprising amount of time.

  • Establish a TCP connection.

  • Finally send over the request to the webserver.

  • You receive the first byte of the response. After a while the last byte comes in.

  • Page processing in the browser itself. Rendering the page and so.

  • onLoad javascript and so.

On average, the actual webserver processing takes only 15% of the time. The good news is that “it is not our backend’s problem”, the bad news is “we have to fix the speed anyway”.

The core: HTTP/1.1 is bad at fetching many resources. There are tricks like server-side concatenation, image sprites, inlining, daching. Client side you can do DNS pre-fetc, TCP pre-connect, keep-alive and pipelining, parallel connections and caching.

The front-end

There are three main stages.

  • Loading. No events, no js yet. Parse HTML and build the DOM. Download and run sync JS.

  • Interactive. DOMContentLoaded event. Download css and images and fonts. Parse css and build CSSOM.

  • Complete. The standard “onLoad” javascript running.

Now to the page rendering. HTML is converted into a DOM. CSS is converted into CSSOM. DOM and CSSOM are combined into a render tree, which is basically the DOM annotated with the CSSOM information. Only then are the fonts loaded and the layout determined and the page painted.

If you have JS, you’ll also start a javascript VM.

There are some surprising dependencies. The CSSOM has to be ready before any regular synchronous javascript is executed, for instance!

  • Rendering a page requires a DOM and CSSOM.

  • Building the DOM blocks on sync JS.

  • Executing JS blocks on the CSSOM.

Browsers luckily uptimize heaavily to keep page load time down. They parse HTML incrementally. It already paints while waiting for sync JS (after css is available). And it paints while waiting for web fonts (and re-renders after it has them).

Basic strategy:

  • Optimize HTML load time.

  • Optimize CSS load time

    • Unblocks first paint.

    • Allows js execution.

A trick you can use: async javascript. “Script-injected scripts”. This doesn’t block execution.

Another way of async javascript is to write it like <script async src="...">. This executes immediately without even waiting for the CSSOM to be ready.

The new best practice? Put critical javascript as async in your <head>. If there is some non-critical decorative js, put it as async at the bottom of your page.

Server side

The main stages:

  • The request comes in and a Request object is build.

  • We go through the middleware layers (the “before response” part of middleware).

  • URL dispatching and view calling. Perhaps also template rendering (which might take a surprising amount of time).

  • The middleware layers again (the after-response” parts).

  • Django passes the Response back to the browser.

Watch out for middleware: if one of your middlewares does a database query, that query will be done on each and every one of the requests. So be careful of doing expensive things in middleware!

In the view code, you can also do optimizations:

  • select_related(). This means you’ll automatically use a big join instead of many small individual queries. Useful for foreign keys. If you have a base object with 100 other objects pointing at it, grabbing the base object and the 100 others will mean 101 queries. With select_related you’ll have only one big one, which is much quicker.

  • prefetch_related(). Similar to above, only you’ll get two queries instead of the one big one with select_related. The first object grabs the base object and then python determines which sub-objects need to be fetched in the second query.

    prefetch_related works for every kind of relation. Foreign keys, backwards and forwards. Many2many fields.

    If you need to customize what gets prefetched, you can use a Prefetch object as argument to .prefetch_related().

    New in django 1.10: prefetch_related_objects(). This does the same as .prefetch_related(), only it works on model instances instead of on a queryset. So if you already have objects, you can still use prefetch-related speed-ups.

If you want to see if your database queries are fine, enable sql logging in your logging setup. (So send django.db.backends logging to the console).

Some small ORM optimization tips and tricks:

  • Use .only() or .defer() to limit the amount of data you grab per out of the database. But…. if you rarely need the data, perhaps you should move some of the data to a separate model that you can link.

  • Use .values_list() and .values() if you only need some specific data out of your database and if you don’t really need full-blown model instances. Instantiating model instances is very expensive. “Just” grabbing the actual data out of the database is much faster.

  • Use .aggregate() and .annotate() to do certain calculations (sum, average, count, and so on) in the database instead of in your python code. Especially when you need to manipulate large amounts of data.

  • Rarely used: .iterator() It iterates over instances, but doesn’t cache results. So you get only one instance per iteration. This is only needed when you need to conserve memory.

crowded signal box

Photo explanation: we got an explanation of how a signal box works. With ten people it sure was crowded. The signal man didn’t mind us bugging him.

water-gerelateerd Python en Django in het hartje van Utrecht!
 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):