The original idea was to have a schema backend and hooks in the django ORM. The actual migration code would live outside of django in south2. In the end, everything is now in django. The original distinction between “schema backend stuff” and “the actual migrations” is still there in the code, however.
The schema backend is relatively simple and straightforward; the migration part is hard and hairy. The migration part contains: operations, loader/graph, executor, autodetector, optimiser, state. He’ll talk about some of them here.
What about the old syncdb? It is a one-shot thing: you add tables and then you add the foreign keys. When migrating, you have dependencies. You cannot add foreign keys to tables you haven’t added yet. There is automatic dependency-detecting code, now, but that was added quite at the last moment in the 1.7 beta 2…
Basic dependencies means the obvious stuff. Some examples:
Now on to the more creative dependencies.
Usermodel that you can replace with a different custom model. Suddenly a migration that you already applied might need to point at a different model. Rolling back the migrations is no option, as that leads to data loss. It works fine if you do it at the start of the project.
He used a different mindset when developing django’s migrations as opposed to how he developed South. South depended on people reading the documentation. Which they often don’t do. So they could shoot themselves in the foot quite well. Instead, django’s migrations are much more bulletproof, so there is much less need for reading the documentation in detail.
There’s a main loop in the migrations code that tries to find dependencies, shifts operations to satisfy the dependency, checks if everything is fine, and loops again and again until it is right.
The way it works is by chopping all operations into tiny dependencies. Every individual field that has to be created is turned into a tiny dependency step. After the list of steps is sorted (via the dependency-resolving loop) into the correct list of steps, an optimiser goes through the list and optimises it. If a model gets created and deleted, nothing needs to be done, for instance.
This kind of reducing could be dangerous. So there’s another loop that checks which reductions/simplifications are possible. Whether there are conflicts. It is better to have no optimisation than to have a wrong optimisation.
Reduction is applied after various stages: after the automatically detected dependency code. After applying the manual dependencies. And after squashing.
Squashing: it makes your history a bit shorter. It squashes migrations into a new starting point. This is especially handy when you’re a third party app developer.
The final part of the puzzle is the graph. It builds a directed graph of all basic migrations in memory. It needs to read all the models on disk for that. It also looks in the database. There’s a table in there that marks which migrations (or rather: nodes in the migration graph) have been applied.
A squashed migration lists the graph nodes that it replaces. A squash can only be applied if all the replaced nodes have the same state. They either all are unapplied: then the squash is applied. If they’ve all been applied, the squash can be considered as applied.
There’s room for improvement!
It is mostly a case of un-optimized code. Big, pretty dumb, loops. So: everyone’s invited to help out, for instance at the sprint.
If you want to look at the code, here are some pointers:
django/db/migrations/autodetector.py, start at
django/db/migrations/optimizer.py, start at
His slides are at https://speakerdeck.com/andrewgodwin/migrations-under-the-hood
My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):