Henk Doornbos gave a talk in 2011 at PyGrunn about large codebases. Nice talk, that one, so I was looking forward to his talk this year.
Programming is nice, especially with Python, but in the end the most important thing is to make sure you’ve handled all the processes and all the data in whatever you’ve got to build.
Designing information systems. How do you do it? Talking to people, reading, brainstorming. It is hard to determine which data you have to work with. A big problem is that data is often stored in databases. Often legacy databases. And databases are much harder to refactor than code! So figuring out the data is very important.
Getting the data structure out of written use cases is quite some work. You can look for nouns, for names: those often give important clues. But you’ll still miss things. Trying to describe the process also gives clues. Often the process description language/diagram and the data language/diagram don’t match.
What he’s looking for is a reliable, repeatable way to:
… all with the goal of allowing normal mortals like him to reliably end up with a good data design.
He calls this “process driven data design”. Model a business process from the perspective of an end-user of the system. A data use scenario (DUS). Each stakeholder has its own DUS. Afterwards, you then need to figure out the minimal data model that will support all the data use scenarios.
He proposes two languages to write all this down: business process model and notation (BPMN) for the processes and object role modeling (ORM) for the data (note: ORM here isn’t “object relational mapper”). BPMN’s core is easy to read and understand. The tools are mostly for windows, which probably means that the people that should use BPMN aren’t using it… ORM has the problam that most normal people cannot read it, but there are ways to work around that.
He showed an example BPMN diagram (of entering your card into an ATM and withdrawing money). Yes, it was readily readable and understandable. Basically it is a flow chart. The advantage is that regular people can validate it. Textual use cases are much harder to fully understand and validate.
Every step in the flow chart (so: the BPMN diagram) results in an ORM (object role modeling) event. So “I enter my card into the ATM” in the flow chart results in “Bank card entry” (an object) “being logged and something subsequent being done with it”. There’s a basic set of generic “fact-type identification” questions you can ask when you’ve identified an event. Questions like “which objects are involved”.
He showed an ORM diagram. Yeah right. If you don’t know the format, it is indeed pretty much unreadable. So in practice he uses a textual format, modeled after the list of generic questions. For instance the “what is the output value” would be “is the bank card permitted”.
For this to work you must have a bit of experience in data modeling. So: practice. If you do it correctly and for all the different events, you should end up with the correct data model that can support everything you need to. It is a reliable method.
A closing word about REST. He didn’t have time to say much. Restful APIs are good. But they are hard to design. Whey you’ve done above homework, you automatically know which messages with which content need to be passed from one process to the other. Each message corresponds to a put/post/get/delete on a resource. This gives you a good basis for your REST design.
My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):