I hate your database - Andrew Godwin
Andrew is most well-known because of south. So
he’s familiar with lots of databases. And also with database warts. And also
familiar with the ways in which people abuse databases.This talk is a
collection of separate points to make us aware of the things that can be wrong
with a database.
Kinds of databases
There are many different kinds of databases. Key/value, document, relational,
graph, object/hierarchical, spatial, timeseries, search. Examples of the three
most common ones:
- Relational: postgresql, mysql, sqlite.
- Document: mongodb, couchdb.
- Key/value: redis, cassandra, riak.
Some theory first before we look at the problems.
- ACID. Atomicity, consistency,
isolation, durability. This is the main theory/rule behind most/many databases.
- CAP theory. Consistency,
availability, partition tolerance. You can only have two of those, according
to the theory. So if a database claims to have all three, be suspicious.
Problems (and some good parts)
- Columns, rows. relations, normalization. Watch out with the way you write
your queries! It is easy to be sub-optimal. Looping and doing a separate
.save() for every individual item, for instance. 53 queries instead of
- Mysql has many warts. There are many things that don’t work at the same
time. You either have one feature or another. MyISAM allows full-table
locking, but no transactions. The query optimizer is poor. It is owned by
oracle (which might be a bad thing). But for some tasks it is fast.
- SQLite has little integrity checking. No concurrent access. It is impossible
to do some table alterations. No concurrent access because it tries very
hard to be a small, portable database. That’s also the good thing because it
is very portable and easy to install :-)
- Postgres has a slow default configuration. And it can be a little harder to
learn. And it almost has too many features. But… it is incredibly
reliable! It has all the SQL features. Transactions for everything. Great
geospatial support. Great database. (Andrew really seemed enthousiastic
Document databases have no fixed schema. Low barrier to entry. Closer to
- A generic problem is that they’re immature (but improving). You don’t need
migrations in your database anymore, true. But now you need to handle the
migration in your code. Schemas have their use: they are there for your
benefit and your protection. So with a document database you have more risk.
- There are no transactions.
- No integrity checking.
- Lots of small things like mongo not freeing disk space unless you drop and
re-create the entire database. And losing data upon a crash (that’s better
Key/value databases. The good thing is that they’re fast, but the bad thing is
that you can only query by key (which is why they’re fast).
- They allow horizontal scaling. Just thrown in another box. The drawback is
that the consistency isn’t perfect. You might not get the exact correct
- You can only query by key. This means that they’re mostly only useful for
things like caching and timestamped stuff.
- There is often support for map-reduce queries which helps a bit with
Spatial databases are strange beasts.
- Knowledge of projections is useful. Every single country has their own
projection, giving you conversion headaches.
- Use spatial indexes. They really speed op some problems.
- Generally, they’re an add-on to an existing database. For instance postgis
Filesystem. Yes, that’s a database. It is a hierarchical key/value store!
Supports very large files. Stable. Well-known.
Graph databases. Very efficient for neighbour queries. For anything else? Not
RRD, round robin databases. They deliberately lose old data! Useful, actually,
for logging or statistics. You won’t need the old data anyway. It prevents you
from running out of disk space.
- It is unlikely all your data all fits in one paradigm. You probably
already store big files on the filesystem instead of in your database.
- Just buying a bigger server goes a long way. Just a big big box instead of
horizontal scaling. It might keep you from lots of unnecessary work and
- If it sounds too good to be true, it probably is. Just be careful.
Feel free to try out databases. Different types of databases, too.