Advanced PostgreSQL in Django - Christophe Pettus

Tags: django, djangocon

(See also last year’s talk)

Database agnosticism: write once, run on any database. A critical selling point for Django: it runs on many databases. But for others, it is bad. You pay a performance hit for not using database-specific features. So once you have made your choice, really use that database.

Here are some examples of good special things available in postgres.

Custom types

Custom types. If you like types, you’ll love postgress. Many built-in types. And many are usable in Django by installing some small app.

  • Do you do .lower() in python code or in your SQL? For an email address for instance? Why not use citext, a case insensitive text field provided by postgres.

  • Often you want to add various key/value data to an object. Attributes. Extra table with a join? Add fields to the main table? Solution: hstore.

  • Postgres has a built-in json type! No need for mongodb :-) It is validated going in. Postgres 9.3 will make it much faster.

  • The UUID type is much more efficient than storing a long character string.

  • IPv4 and IPv6 addresses.

You can define your own! And it is easy to integrate into Python and Django:

  • You adapt it into psycopg2. This’ll mean quite some regex’ing, but there are many examples.

  • You write a field class for Django.

  • You write a formfield and widget for use in forms and the admin.

Indexes

Django’s models are great, but the index creation functionality is limited.

  • Very cool: partial indexes. You can create an index that only indexes a part of the table. Filter out inactive items, for instance. It might make your index much smaller and quicker.

  • Multicolumn indexes. Speeds up selection on multiple columns.

  • Expression indexes.

For these things you need to get custom SQL into the database. Using South is the only sane way.

Custom constraints

Django does foreign key constraints in the ORM, not in the database. The only other constraint is uniqueness.

Constraints should be pused into the database whenever possible. The database is much more efficient at it. And you remove one major path that could lead to data inconsistency.

Actually getting the constraints into the database means custom, hairy SQL. Sadly. He’s working on something better.

You can use exclusion constraints, like not allowing a room booking if it overlaps with another.

Raw SQL

Christophe’s rule: if you are joining more than three tables, use raw SQL. Below three, just use the ORM.

Django has raw query sets that even give you back actual Django model instances. See the django documentation.

Sometimes you just have to dig in and write some 40-line monster SQL to get some operation down from 30 seconds to 10 miliseconds.

Where to put the SQL? In the manager of the model, not directly in the view. You can also wrap it in SQL stored procedures. Again: use south to add stuff to the database if you need to.

Closing comments

  • Don’t limit yourself because of some hypothetical need to later switch databases.

  • Postgresql has lots of advanced features: use them!

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):