2026-01-22
(One of my summaries of the Python Leiden meetup in Leiden, NL).
Precision-recall (PR) versus Receiver Operating Characteristics (ROC) curves: which one to use if data is imbalanced?
Imbalanced data: for instance when you’re investigating rare diseases. “Rare” means few people have them. So if you have data, most of the data will be of healthy people, there’s a huge imbalance in the data.
Sensitivity versus specificity: sensitive means you find most of the sick people, specificity means you want as few false negatives and false positives as possible. Sensitivity/specificity looks a bit like precision/recall.
Sensitivity: true positive rate.
Specificity: false positive rate
If you classify, you can classify immediately into healthy/sick, but you can also use a probabilistic classifier which returns a chance (percentage) that someone can be classified as sick. You can then tweak which threshold you want to use: how sensitive and/or specific do you want to be?
PR and ROC curves (curve = graph showing the sensitivity/specificity relation on two axis) are two ways of measuring/visualising the sensitivity/specificity relation. He showed some data: if the data is imbalanced, PR is much better at evaluating your model. He compared balanced and imbalanced data with ROC and there was hardly a change in the curve.
He used scikit-learn for his data evaluations and demos.
2026-01-22
(One of my summaries of the Python Leiden meetup in Leiden, NL).
He’s going to revisit common gotchas of Python ORM usage. Plus some Postgresql-specific tricks.
ORM (object relational mappers) define tables, columns etc using Python concepts: classes, attributes and methods. In your software, you work with objects instead of rows. They can help with database schema management (migrations and so). It looks like this:
class Question(models.Model):
question = models.Charfield(...)
answer = models.Charfield(...)
You often have Python “context managers” for database sessions.
ORMs are handy, but you must be beware of what you’re fetching:
# Bad, grabs all objects and then takes the length using python:
questions_count = len(Question.objects.all())
# Good: let the database do it,
# the code does the equivalent of "SELECT COUNT(*)":
questions_count = Question.objects.all().count()
Relational databases allow 1:M and N:M relations. You use them with JOIN in SQL. If
you use an ORM, make sure you use the database to follow the relations. If you first
grab the first set of objects and then grab the second kind of objects with python, your
code will be much slower.
“Migrations” generated by your ORM to move from one version of your schema to the next are real handy. But not all SQL concepts can be expressed in an ORM. Custom types, stored procedures. You have to handle them yourselves. You can get undesired behaviour as specific database versions can take a long time rebuilding after a change.
Migrations are nice, but they can lead to other problems from a database maintainer’s point of view, like the performance suddenly dropping. And optimising is hard as often you don’t know which server is connecting how much and also you don’t know what is queried. Some solutions for postgresql:
log_line_prefix = '%a %u %d" to show who is connecting to which database.
log_min_duration_statement = 1000 logs every query taking more than 1000ms.
log_lock_waits = on for feedback on blocking operations (like migrations).
Handy: feedback on the number of queries being done, as simple programming errors can translate into lots of small queries instead of one faster bigger one.
If you’ve found a slow query, run that query with EXPLAIN (ANALYZE, BUFFERS)
the-query. BUFFERS tells you how many pages of 8k the server uses for your query
(and whether those were memory or disk pages). This is so useful that they made it the
default in postgresql 18.
Some tools:
RegreSQL: performance regression testing. You feed it a list of queries that you worry about. It will store how those queries are executed and compare it with the new version of your code and warn you when one of those queries suddenly takes a lot more time.
Squawk: tells you (in CI, like github actions) which migrations are backward-incompatible or that might take a long time.
You can look at one of the branching tools: aimed at getting access to production databases for testing. Like running your migration against a “branch”/copy of production. There are several tricks that are used, like filesystem layers. “pg_branch” and “pgcow” are examples. Several DB-as-a-service products also provide it (Databricks Lakebase, Neon, Heroku, Postgres.ai).
2025-11-19
I’m used to running pre-commit autoupdate regularly to update the versions of the
linters/formatters that I use. Especially when there’s some error.
For example, a couple of months ago, there was some problem with ansible-lint.
You have an ansible-lint, ansible and ansible-core package and one of them
needed an upgrade. I’d get an error like this:
ModuleNotFoundError: No module named 'ansible.parsing.yaml.constructor'
The solution: pre-commit autoupdate, which grabbed a new ansible-lint version
that solved the problem. Upgrading is good.
But… little over a month ago, ansible-lint pinned python to 3.13 in the pre-commit hook. So when you update, you suddenly need to have 3.13 on your machine. I have that locally, but on the often-used “ubuntu latest” (24.04) github action runner, only 3.12 is installed by default. Then you’d get this:
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/astral-sh/ruff-pre-commit.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/ansible-community/ansible-lint.git.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
An unexpected error has occurred: CalledProcessError: command:
('/opt/hostedtoolcache/Python/3.12.12/x64/bin/python', '-mvirtualenv',
'/home/runner/.cache/pre-commit/repomm4m0yuo/py_env-python3.13', '-p', 'python3.13')
return code: 1
stdout:
RuntimeError: failed to find interpreter for Builtin discover of python_spec='python3.13'
stderr: (none)
Check the log at /home/runner/.cache/pre-commit/pre-commit.log
Error: Process completed with exit code 3.
Ansible-lint’s pre-commit hook needs 3.10+ or so, but won’t accept anything except 3.13. Here’s the change: https://github.com/ansible/ansible-lint/pull/4796 (including some comments that it is not ideal, including the github action problem).
The change apparently gives a good error message to people running too-old python versions, but it punishes those that do regular updates (and have perfectly fine non-3.13 python versions). A similar pin was done in “black” and later reverted (see the comments on this issue) as it caused too many problems.
Note: this comment gives some of the reasons for hardcoding 3.13. Pre-commit itself doesn’t have a way to specify a minimum Python version. Apparently old Python version cans lead to weird install errors, though I haven’t found a good ticket about that in the issue tracker. The number of issues in the tracker is impressively high, so I can imagine such a hardcoded version helping a bit.
Now on to the “fix”. Override the language_version like this:
- repo: https://github.com/ansible-community/ansible-lint.git
hooks:
- id: ansible-lint
language_version: python3 # or python3.12 or so
If you use ansible-lint a lot (like I do), you’ll have to add that line to all your (django) project repositories when you update your pre-commit config…
I personally think this pinning is a bad idea. After some discussion in issue 4821 I created a sub-optimal proposal to at least setting the default to 3.12, but that issue was closed&locked because I apparently “didn’t search the issue tracker”.
Anyway, this blog post hopefully helps people adjust their many pre-commit configs.
2025-11-13
My summaries from the sixth Python meetup in Leiden (NL).
His first experience with Mongodb was when he had to build a patient data warehouse based on literature. He started with postgres, but the fixed table structure was very limiting. Mongodb was much more flexible.
Postgres is a relational database, Mongodb is a document database. Relational: tables, clearly defined relationships and a pre-defined structure. Document/nosql: documents, flexible relationships and a flexible structure.
Nosql/document databases can scale horizontally. Multiple servers, connected. Relational databases have different scaling mechanisms.
Why is mongo such a nice combination with python?
The PyMongo package is great and has a simple syntax.
It is easily scalable
Documents are in BSON format (“binary json”) which is simple to use and pretty efficient.
He showed example python code, comparing a mysql example with a Mongodb version. The Mongodb version did indeed look simpler.
The advantage of Mongodb (the freedom) also is its drawback: you need to do your own validation and your own housekeeping, otherwise your data slowly becomes unusable.
Mathijs is now only using Mongodb, mostly because of the speed of development he enjoys with it.
He showed a couple of videos of drummers. Some with and some without “blast beats”. In metal (if I understood correctly) it means both a lot of base drum, but essentially also a “machine gun” on tne snare drum. He likes this kind of music a lot, so he wanted to analize it programmatically
He used the demucs library for his blast beat counter project. Demucs separates different instruments out of a piece of music.
With fourier transforms, he could analyse the frequencies. Individual drum sounds (snare drum hit, base drum hit) were analysed this way.
With the analysed frequency bits, they could recognise them in a piece of music and count occurrences and pick out the blast beats. He had some nice visualisations, too.
He was asked to analyze “never gonna give you up” from Rick Ashley :-) Downloading it from youtube, separating out the drums, ananlysing it, visualising it: it worked! Nice: live demo. (Of course there were no blast beats in the song.)
Live demo time again! He build a quick jekyll site (static site generator) and he’s got a small hetzner server. Just a bit of apache config and he’s got an empty directory that’s being hosted on a domainname. He quickly did this by hand.
Next he added his simple code to a git repo and uploaded it to github.
A nice trick for Github actions are self hosted runners. They’re easy to install, just follow the instructions on Github.
The runner can then run what’s in your github’s action, like “generate files with jekyll and store them in the right local folder on the server”.
The runner runs on your server, running your code: a much nicer solution than giving your ssh key to Github and having it log into your server. You also can use it on some local computer without an external address: the runner will poll Github instead of it being Github that sends you messages.
The auto-deploy worked. And while he was busy with his demo, two PRs with changes to the static website had already been created by other participants. He merged them and the site was indeed updated right away.
2025-11-04
(One of my summaries of the PyUtrecht meetup in Utrecht, NL).
Note: Victorien is currently the number one person maintaining Pydantic. Pydantic is basically “dataclasses with validation”.
There was a show of hands: about 70% uses type hints. Type hints has been around since
python 3.5. There have been improvements during the years like str|None instead of
Union[str, None] in 3.10, for instance.
Something I didn’t know: you can always introspect type hints when running your python
code: typing.get_type_hints(my_func).
Getting typing-related changes into Python takes a lot of work. You need to implemeent the changes in CPython. You have to update the spec. And get it supported by the major type checkers. That’s really a difference to typescript, as typing is built-in from the start, there.
Something that helps typing in the future is 3.15’s lazy from xxx import yyy import.
There’s an upcoming PEP 764, “inline typed dictionaries”:
def get_movie() -> {"name": str, "year": int}:
# At least something like this ^^^, I can't type that quickly :-)
...
He has some suggestions for a new syntax, using something like <{ .... }>, but
getting a syntax change into Python takes a lot of talking and a really solid proposal.
2025-11-04
(One of my summaries of the PyUtrecht meetup in Utrecht, NL).
“From SNMP to gRPC”. Maurice is working on network automation. (The link goes to his github account, the presentation’s demo code is there).
SNMP, the Simple Network Monitoring Protocol, has been the standard for network monitoring since 1980. But its age is showing. It is polling-pased, which is wasteful. The mechanism will continually poll the endpoints. It is like checking for new messages on your phone every minute instead of relying on push messaging.
The better way is streaming telemetry, the push model. He uses gRPC, “A high performance, open source universal RPC framework” and gNMI, “gRPC Network Management Interface”.
You can ask for capabilities: used in the discovery phase. Get is a simple one-time request for a specific value. With set you can do a bit of configuring. The magic is in subscribe: it creates a persistent connection, allowing the device to continuously stream data back to the client (according to the settings done with “set”).
(For the demo, he use pyGMNI, a handy python library for gNMI.)
When to use streaming?
With high-frequency monitoring. If you need data more frequent than once every 10 seconds.
When you need real-time alerting.
Large-scale deployments. With lots of devices, polling efficiency starts to pay off.
SNMP is still fine when you have small setup and hign frequency isn’t really needed.
2025-10-16
(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).
He showed a drawing of Cornelis “wooden leg” Jol, a pirate from the 17th century from Sebastiaan’s hometown. Why is he a pirate? He dresses like one, has a wooden leg, murders people like pirate and even has a parrot, so he’s probably a pirate. For python programmers used to duck typing, this is familiar.
The 17th century, the Netherlands were economically wealthy. And had a big sea-faring empire. But they wanted a way to expand their might without paying for it. So… privatization to the rescue. You give pirates a vrijbrief, a government letter saying they’ve got some kind of “permission” from the Dutch government to rob and pillage and kill everybody as long it aren’t Dutch people and ships. A privateer.So it looks like a pirate and behaves like a pirate, but it isn’t technically a real pirate.
Now on to today. There are a lot of cyber threats. Often state-sponsored. You might have a false sense of security in working for a relatively small company instead of for a juicy government target. But… privateers are back! Lots of hacking companies have coverage of governments - as long as they hack other countries. And hacking small companies can also be profitable.
“I care about security”. Do you really? What do real security people think? They think developers don’t really pay much attention to it. Eye-roll at best, disinterest at worst. Basically, “it is somebody else’s problem”.
What you need is a security culture. A buy-in at every level. You can draw an analogy with safety culture at physically dangerous companies like petrochemical. So: you as developer, should argue for security with your boss. You are a developer, so you have a duty to speak up. Just like a generic employee at a chemical plant has the duty to speak when seeing something risky.
You don’t have to become a security export (on top of everything else), but you do have to pay attention. Here are some pointers:
“Shift left”. A term meaning you have to do it earlier rather than later. Don’t try to secure your app just before shipping, but take it into account from the beginning. Defense in depth.
“Swiss cheese model”. You have multiple layers in your setup. Every layer only needs one hole for the total to be penetrated.
Learn secure design principles. “Deny by default”, “fail securely”, “avoid security by obscurity”, “minimize your attack surface”, etc. Deny by default is a problem in the python world. We’re beginner-friendly, so often everything is open…
Adopt mature security practices. Ignore ISO 27001, that’s too hard to understand. Look at OWASP instead. OWASP DevSecOps maturity model (“pin your artifacts”, for instance).
Know common vulnerabilities. Look at the popular “top 10” lists. Today, SQL injection still makes victims…
Unrelated photo from our 2025 holiday in Austria: center of Neufelden, nicely restored and beautifully painted.
2025-10-16
(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).
Full title: from flask to fastapi: why and how we made the switch.
He works at “polarsteps”, a travel app. Especially a travel app that will be used in areas with really bad internet connectivity. So performance is top of mind.
They used flask for a long time. Flask 2 added async, but it was still WSGI-bound. They really needed the async scaling possibility for their 4 million monthly users. Type hinting was also a big wish item for improved reliability.
They switched to fastapi:
True async support. It is ASGI-native
Typing and validation with pydantic. Pydantic validates requests and responses. Type hints help a lot.
Native auto-generated docs (openapi). Built-in swagger helps for the frontend team.
This meant they gave up some things that Flask provided:
Flask has a mature ecosystem. So they left a big community + handy heap of stackoverflow answers + lots of ready-made plugins behind.
Integrated command-line dev tools. Flask is handy there.
Simplicity, especially for new devs.
They did a gradual migration. So they needed to build a custom fastapi middleware that could support both worlds. And some api versioning to keep the two code bases apart. It took a lot of time to port everything over.
The middleware was key. Completely async in fastapi. Every request came through here. If needed, a request would be routed to Flask via wsgi, if possible it would go to the new fastapi part of the code.
For the migration, they made a dashboard of all the endpoints and the traffic volume. They migrated high-traffic APIs first: early infra validation. Attention to improvements by checking if the queries were faster. Lots of monitoring of both performance and errors.
Some lessons learned:
Async adds complexity, but pays off at scale. They started the process with 4 million users, now they’re at 20.
Pydantic typing catches errors early.
Versioned middleware made incremental delivery safe.
Data-driven prioritization (=the dashboard) beats a big-bang rewrite.
AI helps, but hallucinates too much on complex APIs.
Unrelated photo from our 2025 holiday in Austria: the beautiful ‘große Mühl’ river valley.
2025-10-16
(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).
By now, the basics of python type hints are well known:
def something(x: int) -> float:
...
def get_person(name: str, age: int|None) -> Person:
...
Note: I’ve tried typing (…) fast enough, but my examples will probably have errors in them, so check the typing documentation! His slides are here so do check those :-)
Sometimes you can have multiple types for some input. Often the output also changes
then. You can accept both import types and suggest both output types, but with
@overload you can be more specific:
from typing import overload
@overload
def something(x: str) -> str:
...
def something(x: int) -> int:
...
Tyou can do the same with a generic:
from typing import TypeVar
T = TypeVar("T")
@overload
def something(x: T) -> T:
...
# New syntax
def something[T](x: T) -> T:
...
# Same, but restricted to two types
def something[T: str|int](x: T) -> T:
...
Generic classes can be handy for, for instance, django:
class ModelManager[T: Model]:
def __init__(self, model_class: type[T]) -> None:
....
def get(self, pk: int) -> T:
...
Type narrowing. Sometimes you accept a broad range of items, but if you return True, it means the input is of a specific type:
from typing import TypeGuard
def is_user(obj: Any) -> TypeGuard[User]:
....
def something(obj: Any):
if is_user(obj):
# From here on, typing knows obj is a User
Generic **kwargs are a challenge, but there’s support for it:
from typing import TypedDict, Required, Unpack
class SomethingArgs(TypedDict, total-False):
usernanme: Required(str)
age: int
def something(**kwargs: Unpack[SomethingArgs]):
...
If you return “self” from some class method, you run into problems with subclasses, as
normally the method says it returns the parent class. You can use from typing import
Self` and return the type ``Self instead.
Nice talk, I learned quite a few new tricks!
Unrelated photo from our 2025 holiday in Austria: church of Neufelden seen on the top of the hill.
2025-10-16
(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).
(Note: I’ve heard a keynote by Steven at pygrunn 2016.)
Steven is in the python documentary, he
co-designed the abc programming language that was the predecessor to python. ABC was
a research project that was designed for the programmer’s needs. He also was the first
user of the open internet in Europe in November 1988, as the CWI at the university had
the first 64kbps connection in Europe. Co-designer of html, css, xhtml, rdf, etc.
1988, that’s 37 years ago. But only about 30 years earlier, the first municipality (Norwich, UK) got a computer. 21 huge crates. It ran continuously for 10 years. A modern Raspberry pi would take 5 minutes to do the same work!
Those early computers were expensive: an hour of programming time was a year’s salary for a programmer. So, early programming languages were designed to optimize for the computer. Nowadays, it is the other way around: computers are almost free and programmers are expensive. This hasn’t really had an effect on the way we program.
He’s been working on declarative programming languages. One of the declarative systems is xforms, an xml-based declarative system for defining applications. It is a w3c standard, but you rarely see it mentioned. But quite some companies and government organisations use it, like the Dutch weather service (KNMI).
The NHS (UK nationwide health service) had a “Lorenzo” system for UK patient records that cost billions of pounds, took 10 years to build and basically failed. Several hospitals (and now hospitals in Ukraine!) use an xforms-system written in three years by a single programmer. Runs, if needed, on a Raspberry pi.
He thinks declarative programming allows programmers to be at least ten times more productive. He thinks, eventually everyone will program declaratively: fewer errors, more time, more productivity. (And there’s a small conference in Amsterdam in November).
Unrelated photo from our 2025 holiday in Austria: in Vienna/Wien I visited the military museum. This is the car in which archduke Franz Ferdinand was shot in Sarajevo in 1914.
Statistics: charts of posts per year and per month.
My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):