Pygrunn: from code to config and back again - Jasper Spaans

Tags: python, pygrunn

(One of my summaries of the one-day 2016 PyGrunn conference).

Jasper works at Fox IT, one of the programs he works on is DetACT, a fraud detection tool for online banking. The technical summary would be something like “spamassassin and wireshark for internet traffic”.

  • Wireshark-like: DetACT intercepts online bank traffic and feeds it to a rule engine that ought to detect fraud. The rule engine is the one that needs to be configured.

  • Spamassassin-like: rules with weights. If a transaction gets too many “points”, it is marked as suspect. Just like spam detection in emails.

In the beginning of the tool, the rules were in the code itself. But as more and more rules and exceptions got added, maintaining it became a lot of work. And deploying takes a while as you need code review, automatic acceptance systems, customer approval, etc.

From code to config: they rewrote the rule engine from start to work based on a configuration. (Even though Joel Spolsky says totally rewriting your code is the single worst mistake you can make). They went 2x over budget. That’s what you get when rewriting completely….

The initial test with hand-written json config files went OK, so they went to step two: make the configuration editable in a web interface. Including config syntax validation. Including mandatory runtime performance evaluation. The advantage: they could deploy new rules much faster than when the rules were inside the source code.

Then… they did a performance test at a customer…. It was 10x slower than the old code. They didn’t have enough hardware to run it. (It needs to run on real hardware instead of in the cloud as it is very very sensitive data).

They fired up the profiler and discovered that only 30% of the time is spend on the actual rules, the other 70% is bookkeeping and overhead.

In the end they had the idea to generate python code from the configuration. They tried it. The generated code is ugly, but it works and it is fast. A 3x improvement. Fine, but not a factor of 10, yet.

They tried converting the config to AST (python’s Abstract Syntax Tree) instead of to actual python code. Every block was turned into an AST and then combined based on the config. This is then optimized (which you can do with an AST) before generating python code again.

This was fast enough!

Some lesons learned:

  • Joel Spolsky is right. You should not rewrite your software completely. If you do it, do it in very small chunks.

  • Write readable and correct code first. Then benchmark and profile

  • Have someone on your team who knows about compiler construction if you want to solve these kinds of problems.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):