Scalability panel (djangocon.eu)¶

Participants: Andrew Godwin, Andy McKay, Jesper Noehr, Eric Florenzano.

What are the common mistakes you see. Things that take a long time (“sending an email”) that lock up a process. Using the filesystem for caching. Testing locally and pushing it live and see it fall over.

Using an external API can also take a long time. Work with timeouts to work around it.

Scaling wise, what should you think about before scaling becomes an actual issue? And what should you definitely leave until the moment comes you need to scale? First things first: use a queue like celery, even on small sites. Get used to such a queue and have it in place, that’ll help a lot with performance later.

Make sure you’ve got your database schema is mostly OK. It doesn’t have to be perfect right away, but at least mostly OK.

Expect something to change and assume you’ll have to swap something out later to improve the performance.

One important aspect of scaling is measurement and profiling. What are the best practices and good tools for doing that in production? Bitbucket has a middleware that switches on with a special query string and that starts up the python c profiler and gives them data on the request.

The debug toolbar is a great help in development. For realtime stats graphite and statsd are an option. Or munin or kakti for real-time generic server information graphs.

Logging. Always set up logging. Look at the logfiles and figure out what happened.

Opennms, pingdom, munin, nagios, django-kong were mentioned as monitoring tools.

Puppet vs Chef vs Whatever for provisioning servers in a Django stack. Fight! Puppet is good. Chef is good. Puppet is alright. (So: not much of a fight :-) )

Django ORM: how much of an issue is this going to be when I want to scale? It is much less an issue than it used to be.

You’ll only get to know the hotpoints for YOUR application when you run into them. When you optimize beforehand, the points will be different than those you’ll really hit. And then there are ways to solve it. Caching, asynchronous, less joins, splitting things, etc. You can denormalize, too.

Simple: check your indexes. Do you have the right ones? Are you missing ones?

Also changing your actual database server configuration default values can make a lot of difference. Spend two days figuring out all the options. And check postgres for rediculously low default memory values.

Incremental roll-outs help with detecting problems. When all your 15 new instances suddenly die, you know you need to change something.

Considering that using a caching proxy, like for instance Varnish, is commonly used for improving performance/scalability, are there any options out there for Django which handle cache invalidation in a good manner that you know of? Use etags.

Most caching is dependent fully on your individual app. So something generic is virtually impossible.

Varnish gives you lots of control. You can invalidate pages from your python code. So set up a couple of proper database triggers.

Is Django fast enough? Should more attention be on speed and benchmark tests? Yes and yes. It is fast enough, but we should watch it.

Django is fast enough. If you want to scale, scale over multiple boxes instead of building out one single box.

But: watch out that django doesn’t get any slower!

Code deployment to web workers: there are lots of different ways, can we get the groups thoughts on the best practices?

By hand.
Pip. But it is a bit slow. Now they use github (with a local git mirror for their sites).
Fabric.
Simple bash script that ssh-s to the server and that updates everything.
HAProxy helps in getting a server offline and getting it transparently back up after the update.

If you were starting a new project today, which Python VMs would you consider? Probably cpython as an ops person would probably not allow us to run pypy. But I’m watching pypy and it looks good.

What’s the worst scalability failure story you’ve ever heard?

Running postgres with 32MB of memory (a default setting…).
A sysadmin that, to prove his valid point, pulled the electricity plug out of the live machine. He won.
Returning a string instead of an iterator in a wsgi script that was getting lots of hits. One character at a time…

What do you use to find slow sql queries? Django debug toolbar. Another trick is to evaluate querysets early to better see what’s going on (as querysets are lazily evaluated).

Use mysql/postgres’s configuration option to log slow queries.

How do you mimick a large load and can you simulate it? Use apachebench, but keep in mind that that won’t be a “perfect” worst-case load.

The other answers were mostly “we can’t do that”. Incremental roll-outs help. Key question: can you respond quickly? Can you deploy quickly.

How to handle database rollbacks when you rollback a release? Most either don’t use south or they don’t do rollbacks. A migration can only ADD columns or tables. They’re never removed. Never. Addition-only. This way the old code can talk just fine to the new database structure.