(One of the summaries of a talk at the 2014 django under the hood conference).
Anssi Kääriäinen is the “guardian of the
ORM”, he knows where all the bits and pieces are. He’ll explain especially how
QuerySet.filter()
works.
What is the Django ORM (object relational mapping)? It is a:
Query builder.
Query to object mapper.
Object persistence.
The operations the django ORM does are higher level than SQL by design. It
doesn’t have .join()
and .group_by()
operations exposed as an ORM
operation. It’ll do them behind the scenes, but you can’t call them directly.
Now about filters. For instance:
Book.objects.filter(author__birth_date__year__lte=1981)
This grabs the books with an author with a birth date’s year that’s 1981 or
earlier. That year
in the query isn’t in django by default. Something new
in django 1.7 are model transforms. This allows
you to add such a specific year
loookup:
@some_registration_decorator
class YearTransform(models.Transform):
lookup_name = 'year'
output_field = models.IntegerField()
def as_sql(self, compiler, connection):
# Some nice code that returns a bit of SQL
Book.objects.filter(author__birth_date__year__lte=1981)
, what does it mean?
Book
is the model class.
objects
is the manager (models.Manager
)
filter
is a method on the manager. It returns a models.QuerySet
. It
results in a models.sql.Query
, which is send through a
models.sql.SQLCompiler
.
author
is a related model of the book model.
birth_date
is an attribute of the author model.
year
is the custom transformation we just made ourselves.
lte
is the ‘less than or equal’ hint.
An essential part is Query.build_filter()
. it does value preparation (for
example for F-objects or for corner cases like ‘None’ in oracle). It fetches
the source field, including join generation if needed. And it fetches
transforms or custom lookups (like the __birth_date
or __year
) from
the source field. It also calls setup_joins()
which handles relations and
field references. It perhaps fires off a subquery and does join trimming and
join reuse handling. build_lookup()
is the part that handles lookups like
__lte
. As the last part a bit of ‘isnull’ special case handling (SQL knows
True/False/unknown, this is always a bit messy).
To build a filter, the ORM needs a mapping from field names (birth_date
)
to SQL fields. PathInfo
provides the mapping. It uses the model’s
_meta
attribute heavily. PathInfo knows about traversing relations and
grabbing attributes from the related models.
setup_joins()
uses PathInfo to return the final attribute
(birth_date
in our case) and return the joins needed to get to the model
that actually has that final attribute.
How do ManyToMany
fields work? In the same way, really. To the ORM, a
ManyToManyField simply means two foreign keys, so two joins. For the rest of
the ORM there’s nothing special about it. Nice.
build_lookup
loops though the parts (double-underscore-separated) of the
query and looks up what to do with it. Perhaps a simple lookup (“grab this
field”), perhaps a transform. A simple loop. The code looks simple. Anssi
tells us, however, that the actual code in Django is much harder to read
because of the many special cases and corner cases and exceptions and weird
database issues it needs to handle. “The implementation is logical, but the
logic takes some getting used to before you understand it”.
sql.Query
contains alias_map
, tables
and alias_refcount
. This
contains all the info the Query needs to turn itself into SQL.
SQLCompiler
gets a Query
as input and outputs rows from the
database (finally :-) ). Those rows then still have to be turned into
actual python objects.
Another subject: expressions. He’s working on
https://github.com/akaariai/django-refsql, which he hopes will end up in
django core. It is a pretty simple mapping between a django-style query
expression (birth_date__year
) and the related raw SQL. “You can do funny
tricks with it” was what Anssi said… The main goal is to get rid of django’s
.extra()
: his expression work is a nicer way to do an extra “select” in
SQL and annotate the resulting objects with the values. I heard quite a lot of
very happy noices come out of several core committers, so this might indeed
be something nice!
It is intended to end up as something you can use as an annotation in future Django versions:
Something.objects.annotate(lower_name=Lower('name')).order_by('lower_name')
Nice talk! Thanks, Anssi.
My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):