Fractal architectures - Laurens van Houtven

Tags: django, djangocon

He worked twisted on. twisted And people tend to talk about subjects that are almost antithetical to how Django does things. The thing that he does different from Django is that he’s not using a single data source…

Once a database gets really really too big, putting multiple databaseservers next to eachother doesn’t really work. You slowly start to get into expensive Oracle territory.

How he set it up now is what he calls a fractal architecture. The whole accepts requests. The parts of the whole acccept requests. The parts of the parts accept requests. That’s why he calls it fractal. You could also call it sharded, but that has a bad name: it is something you do when nothing else works.

The way he looks at the architecture is SMTP. Email. Simple.

He prefers SQLite. Simple and included in the python standard library. Sure, you can use postgres but you’ll need a VM to re-create the same environment locally as on your production machine. SQLite is the same everywhere.

In fact, he uses Axiom: an object store on top of SQLite. (Note: he is trying to write documentation for it at https://github.com/lvh/axiombook).

Another advantage of sqlite: it is easy to scale down. There’s not much lower you can go than import sqlite3! If you want to use postgres, remember you must install it on each and every part :-)

Important: almost nothing is as fast as a local sqlite store, especially when it is reasonably small and fits mostly in RAM. Just look at the regular comparisons of access time for L1 cache, L2 cache, RAM, SSD, LAN, spinning rust, internet and so. So if you have a local database on an SSD with quite some RAM, it’ll blow a network connection to some remote database out of the water.

But… some things don’t fit locally. You have to search everywhere, for instance. There are three basic solutions:

Duplication

You could duplicate the data over all the parts, but that doesn’t work if the data is big.

Sharding

Sharding will only work reasonable if the data itself, by nature, is sharded. Sales data per region, for instance.

Separation

Separate data for separate calculations in separate (local) stores. This is what he uses.

He mentioned paxos and raft (pdf), but I don’t remember what for.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):