Update: this article was mentioned on ycombinator. There’s a bunch of extra info and more tips there. I want to re-iterate that I could not just type over his slides; there are a couple of details (“more than 32GB memory in your server”) that I left out as I only’ve got 10 fingers :-). Anyway, on to the actual summary! End of update.
Just open a postgresql configuration file. Your first impression will be “this looks like the dashboard of an ancient boeing 747” and “we’re all going to die”. 500 options and knobs you can twist. But... it can also be very easy if you know what to do.
It actually is pretty hard to seriously misconfigure postgres. Almost all performance problems are application problems. Here are the main things you need to look at:
After tweaking those settings, you never have to touch your configuration again. There are other tips for performance:
Stupid DB tricks you should not do:
.save()all the time for creating many objects. Just use postgres’s COPY.
Some quick tricks you could look at on your application’s side:
pg_stat_user_tablesto see whether indexes are needed. Get a database expert to tweak your indexes instead of slapping on
index=Truein your Django model fields.
On the database server, you can switch off the auto-vacuum process and schedule VACUUM for specific low-load times. VACUUM also does an ANALYZE. You ought to do an ANALYZE also after loading lots of data. This helps postgres optimize your queries.
Note that once in a very long while (once a year or so), postgres does a VACUUM FREEZE. Real housekeeping. To prevent it from bringing your app to its knees at a random moment, do the VACUUM FREEZE every few months yourself.
Always monitor postgres, at least display and system load. Memory and IO is very handy. Use 1-minute intervals.
Backups? pg_dump works fine. But... there’s also streaming replication. Easy to set up. And... it maintains an exact logical copy of the database on a different host. (But really really make sure it is on a different machine!) Such a streaming replication doesn’t safeguard at all against accidental errors, as it’ll also replicate your accidental ‘drop table’ just fine.
And... preplicas can be used for read-only copies! And you could do a pg_dump on such a read-only copy.
WAL archiving is great for security. It does take some more time to set up, but it helps you sleep at night.
Watch out with encoding: the encoding is fixed at database create time. So just create it with utf-8.
Schema migration can take a lot of time, especially if a column with a default value is added: every row needs updating. Try to allow NULL in your columns if at all posible. So don’t set NOT NULL unless absolutely needed. (Update: I wrote it down backwards, I originally said you should stick to NOT NULL. See one of the comments below.) Enforce mandatory fields in your django code instead.
For hardware, use your own physical hardware. Especially good IO. Raid 10 for the main DB, raids for the logs. Don’t use a SAN unless it is really good.
And... watch out with Django’s transactions. You might want to manage your own transactions instead of using Django’s automatic one. See his weblog at http://thebuild.com, he writes a lot about Django transactions there.
(In reaction to a question: postgis for geographical data is great.)
My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):