Django under the hood: Django ORM - Anssi Kääriäinen

Tags: django, djangocon

(One of the summaries of a talk at the 2014 django under the hood conference).

Anssi Kääriäinen is the “guardian of the ORM”, he knows where all the bits and pieces are. He’ll explain especially how QuerySet.filter() works.

What is the Django ORM (object relational mapping)? It is a:

  • Query builder.

  • Query to object mapper.

  • Object persistence.

The operations the django ORM does are higher level than SQL by design. It doesn’t have .join() and .group_by() operations exposed as an ORM operation. It’ll do them behind the scenes, but you can’t call them directly.

Now about filters. For instance:

Book.objects.filter(author__birth_date__year__lte=1981)

This grabs the books with an author with a birth date’s year that’s 1981 or earlier. That year in the query isn’t in django by default. Something new in django 1.7 are model transforms. This allows you to add such a specific year loookup:

@some_registration_decorator
class YearTransform(models.Transform):
    lookup_name = 'year'
    output_field = models.IntegerField()

    def as_sql(self, compiler, connection):
        # Some nice code that returns a bit of SQL

Book.objects.filter(author__birth_date__year__lte=1981), what does it mean?

  • Book is the model class.

  • objects is the manager (models.Manager)

  • filter is a method on the manager. It returns a models.QuerySet. It results in a models.sql.Query, which is send through a models.sql.SQLCompiler.

  • author is a related model of the book model.

  • birth_date is an attribute of the author model.

  • year is the custom transformation we just made ourselves.

  • lte is the ‘less than or equal’ hint.

An essential part is Query.build_filter(). it does value preparation (for example for F-objects or for corner cases like ‘None’ in oracle). It fetches the source field, including join generation if needed. And it fetches transforms or custom lookups (like the __birth_date or __year) from the source field. It also calls setup_joins() which handles relations and field references. It perhaps fires off a subquery and does join trimming and join reuse handling. build_lookup() is the part that handles lookups like __lte. As the last part a bit of ‘isnull’ special case handling (SQL knows True/False/unknown, this is always a bit messy).

To build a filter, the ORM needs a mapping from field names (birth_date) to SQL fields. PathInfo provides the mapping. It uses the model’s _meta attribute heavily. PathInfo knows about traversing relations and grabbing attributes from the related models.

setup_joins() uses PathInfo to return the final attribute (birth_date in our case) and return the joins needed to get to the model that actually has that final attribute.

How do ManyToMany fields work? In the same way, really. To the ORM, a ManyToManyField simply means two foreign keys, so two joins. For the rest of the ORM there’s nothing special about it. Nice.

build_lookup loops though the parts (double-underscore-separated) of the query and looks up what to do with it. Perhaps a simple lookup (“grab this field”), perhaps a transform. A simple loop. The code looks simple. Anssi tells us, however, that the actual code in Django is much harder to read because of the many special cases and corner cases and exceptions and weird database issues it needs to handle. “The implementation is logical, but the logic takes some getting used to before you understand it”.

sql.Query contains alias_map, tables and alias_refcount. This contains all the info the Query needs to turn itself into SQL.

SQLCompiler gets a Query as input and outputs rows from the database (finally :-) ). Those rows then still have to be turned into actual python objects.

Another subject: expressions. He’s working on https://github.com/akaariai/django-refsql, which he hopes will end up in django core. It is a pretty simple mapping between a django-style query expression (birth_date__year) and the related raw SQL. “You can do funny tricks with it” was what Anssi said… The main goal is to get rid of django’s .extra(): his expression work is a nicer way to do an extra “select” in SQL and annotate the resulting objects with the values. I heard quite a lot of very happy noices come out of several core committers, so this might indeed be something nice!

It is intended to end up as something you can use as an annotation in future Django versions:

Something.objects.annotate(lower_name=Lower('name')).order_by('lower_name')

Nice talk! Thanks, Anssi.

Come work for us!
 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):