He works for a company (holvi.com) that offers business banking services to microentrepreneurs in a couple of countries. Payment accounts (online payments, prepaid business mastercard, etc). Business tools (invoices, online shop, bookkeeping, etc).
Technically it is nothing really special. Django, django rest framework, postgres, celery+redis, angular, native mobile apps. It runs on Amazon.
Django was a good choice. The ecosystem is big: for anything that you want to do, there seems to be a library. Important for them: django is very reliable.
Now: how not to lose your customers’ money.
Option 1: reliable payments.
Option 2: take the losses (and thus reimburse your customer).
If you’re just starting up, you might have a reliability of 99.9%. With, say, 100 payments per day and 2 messages per payment, that’s 1 error case per day. You can handle that by hand just fine.
If you grow to 10.000 messages/day and 99.99% reliability, you have 5 cases per day. You now need one or two persons just for handling the error cases. That’s not effective.
Their system is mostly build around messaging. How do you make messages reliable?
The original system records the message in the local database inside a single transaction.
In messaging, it is terribly hard to debug problems if you’re not sure whether a message was send or what the contents were. Storing a copy locally helps a lot.
On commit, the message is send.
If the initial send fails: retry.
The receiving side has to deduplicate, as it might get messages double.
You can also use an inbox/outbox model.
Abstract messages to Inbox and Outbox django models.
The outbox on the origin system stores the messages.
Send on_commit and with retry.
Receiving side stores the messages in the Inbox.
There’s a unique constraint that makes sure only unique messages are in the Inbox.
There’s a “reconcialation” task that regularly compares the Outbox and the Inbox, to see if they have the same contents.
For transport between inbox and outbox, they use kafka, which can send to multiple inboxes.
There are other reliability considerations:
Use testing and reviews.
If there’s a failure: react quickly. This is very important from the customer’s point of view.
Fix the original reason, the core reason. Ask and ask and ask. If you clicked on a wrong button, ask why you clicked on the wrong button. Is the UI illogical, for instance?
Constantly monitor and constantly run the reconciliation. This way, you get instant feedback if something is broken.
Photo explanation: station signs on the way from Utrecht (NL) to Heidelberg (DE).
My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):