Reinout van Rees’ weblog

Pycon NL: keynote: how not to get fooled by your data while AI engineering - Sofie van Landeghem

2025-10-16

Tags: python, pycon

(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).

(Sofie helps maintain FastAPI, Typer and spaCy; this talk is all about AI).

Sofie started with an example of a chatbot getting confused about the actual winner of an F1 race after disqualification of the winner. So you need to have a domain expert on board who can double-check the data and the results.

Let’s say you want your chatbot output to link to Wikipedia for important terms. That’s actually a hard task, as it has to do normalization of terms, differentiating between Hamilton-the-driver, Hamilton-the-town, Hamilton-the-founding-father and more.

There’s a measure for quality of output that’s called an “F-score”. She used some AI model to find the correct page and got a 79.2% F-score. How good or bad is it?

For this, you can try to determine a reasonable bottom line. “Guessing already means 50%” is what you might think. No, there are 7 million Wikipedia pages, so random guessing gives 0% F-score. Let’s pick all the pages which actually mention the word “Hamilton”. If we then look at more words like “Alexander Hamilton” or “Lewis Hamilton”, we can reason that a basic non-AI regular approach should get 78% at least, so the AI model’s 79.2% isn’t impressive.

The highest reachable quality depends on the data itself and what people expect. “Hamilton won at Spa”, do you expect Spa to point at the town or at the circuit? The room voted 60/40, so even the best answer itself can’t be 100% correct :-)

A tip: if you get a bad result, investigate the training data to see if you can spot some structural problem (which you can then fix). Especially if you have your own annotated data. In her example, some of the annotators annotated circuit names including the “GP” or “grand prix” name (“Monaco GP”) and others just the town name (“Spa”).

Some more tips:

  • Ensure your label scheme is consistent.

  • Draft clear annotation guidelines.

  • Measure inter-annotator agreement (IAA). So measure how much your annotators agree on terms. An article on F1 and politics: how many annotate it as politics and how many as F1?

  • Consider reframing your task/guidelines if the IAA is low.

  • Model uncertainty in your annotation workflow.

  • Identify structural data errors.

  • Apply to truly unseen data to measure your model’s performance.

  • Make sure you climb the right hill.

https://reinout.vanrees.org/images/2025/austria-vacation-8.jpeg

Unrelated photo from our 2025 holiday in Austria: just over the border in Germany, we stayed two days in Passau. View from the ‘Oberhaus’ castle on three rivers combining, with visibly different colors. From the left, the small, dark ‘Ilz’. The big, drab-colored one in the middle is the ‘Donau’ (so ‘schöne blaue Donau’ should be taken with a grain of salt). From the right, also big, the much lighter ‘Inn’ (lots of granite sediment from the Alps, here).

Pycon NL: workshop: measuring and elevating quality in engineering practice - Daniele Procida

2025-10-16

Tags: python, pycon, django

(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).

Daniele works as director of engineering at Canonical (the company behind Ubuntu). What he wants to talk about today is how to define, measure and elevate engineering quality at scale. That’s his job. He needs to influence/change that in an organization with a thousand technical people in dozens of teams with 100+ projects. They ideally must converge on the standards of quality he has defined and there’s only one of me. Engineering people are opinionated people :-)

Your personal charm and charisma wears thin after a while: there needs to be a different way. So: how can you get 1000+ to do what you want, the way you want. Ideally somewhat willingly? You cannot make people do it. You’ll have to be really enthousiastic about it.

He suggests three things:

  • Principle. Description of quality as objective conditions, allowing it to be defined and measured.

  • Tool. A simple dashboard, that reinforces your vision of quality and reflects it back to your teams. Daniele focuses on documentation, and showed a dashboard/spreadsheet that showed the documentation status/progress of various projects. You can do the same for “security” for instance.

  • Method. A way of drawing your teams into your vision, so that they actively want to participate in.

It being a workshop, we worked through a few examples. Someone mentioned “improved test coverage in our software”.

  • Describe your aim(s). What do you want. What is the background documentation. What is your reason.

  • You need objectives on various levels. “Started”, “first results”, “mature”. And you can have those levels for each of your aims/categories. Start small and start specific.

    • Started. “The team understands the significance of automated testing”. “We have coverage information about tests”.

    • First results. “There is a significant increase in test coverage”. Note: “significant” means you have something to talk about. You can be reasonable on the one hand, but you can also call out low numbers. Human-sized words with value, like “significant”, help internalize it. More than a number like “25%” would ever do. You don’t want to check off the box “25%”, you want to be able to claim that your team now has significant test coverage!

    • Mature. Let’s keep it simple with “100% test coverage”.

  • Measure the level projects are at at the moment. Show it in a dashboard. He used a Google spreadsheet previously, now it is a Django website. He’ll make it a world-public website soon. So it is visible for everybody. This helps draw teams into it.

Why does this work with human beings?

  • Peer pressure. People see their peers doing the right thing. People want to be seen doing the right thing.

  • Objectification. The contract and the results are described objectively. The conditions and evidence stand outside you: it is not personal anymore, so it is not a threat.

Humans are funny creatures. As soon as they believe in something, it will carry them over many bumps in the road.

People love to see their work recognized. So if you maintain a spreadsheet with all the projects’ results and progress, you won’t have to ask them for an update: they will bug you if the spreadsheet hasn’t been updated in a while. They really want to see the work they’ve put in!

You can get a positive feedback loop. If the work you need to do is clear, if the value is clear and if there is recognition, you’ll want to do it almost automatically. And if you do it, you mention it in presentations and discussions with others. Then the others are automatically more motivated to work on it, too.

Giving kids a sticker when they do something successfully really helps. It also works for hard-core programmers and team managers!

https://reinout.vanrees.org/images/2025/austria-vacation-7.jpeg

Unrelated photo from our 2025 holiday in Austria: just over the border in Germany, Passau has a nice cathedral.

Pycon NL: kedro, lessons from maintaining an open source framework - Merel Theisen

2025-10-16

Tags: python, pycon

(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).

Full title: leading kedro: lessons from maintaining an open source python framework.

Merel is the tech lead of the python open source framework kedro.

What is open source? Ok, the source code is publicly available for anyone to use, modify and share. But it is also a concept of sharing. Developing together. “Peer production”. It also means sharing of technical information and documentation. In the 1990s the actual term “open source” was coined. Also, an important milestone: Github was launched in 2008, greatly easing open source development.

Kedro is a python toolbox that applies software engineering principles to data science code, making it easier to go from prototype to production. Started in 2017, it was open sourced in 2019. (Note: Kedro has now been donated to the Linux foundation). This made it much easier to collaborate with others outside the original company (Quantumblack).

Open source also means maintenance challenges. It is not just code. Code is the simple part. How to attract contributors? How to get good quality contributions? What to accept/reject? How to balance quick wins with the long term vision of the project? How to make contributors come back?

What lessons did they learn?

  • Importance of contributor guidance. They themselves had high standards with good programming practices. How much can you ask from new contributors? They noticed they needed to improve their documentation a lot. And they had to improve their tooling. If you want well-formatted code, you need easy tools to do the formatting, for instance. And you need to actually document your formatting guidelines :-)

  • Response time is important. Response time for issues, pull requests and support. If you don’t get a timely answer, you’ll lose interest as contributor. Also: tickets need to be polished and made clearer so that new contributors can help fixing them.

  • Sharing pain points is a contribution, too. More contributors and users automatically mean more feature requests. But you don’t want your project to become a Frankenstein monster… A configuration file, for instance, can quickly become too cluttered because of all the options. Sometimes you need to evolve the architecture to deal with common problems. Users will tell you what they want, but perhaps it can be solved differently.

  • The importance of finding contribution models that fit. Perhaps a plugin mechanism for new functionality? Perhaps a section of the code marked “community” without the regular project’s guarantees about maintenance and longevity?

  • Be patient and kind. “Open source” means “people”. Code is the easy part, people add complexity. Maintainers can be defensive and contributors can be demanding.

https://reinout.vanrees.org/images/2025/austria-vacation-6.jpeg

Unrelated photo from our 2025 holiday in Austria: Neufelden has a dam+reservoir, the water travels downstream by underground pipe to the hydropower plant. At this point the pipe comes to the surface and crosses the river on a concrete construction. Nearby, the highest road bridge in this region also crosses.

Pycon NL: don’t panic, a developer’s guide to security - Sebastiaan Zeeff

2025-10-16

Tags: python, pycon

(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).

He showed a drawing of Cornelis “wooden leg” Jol, a pirate from the 17th century from Sebastiaan’s hometown. Why is he a pirate? He dresses like one, has a wooden leg, murders people like pirate and even has a parrot, so he’s probably a pirate. For python programmers used to duck typing, this is familiar.

The 17th century, the Netherlands were economically wealthy. And had a big sea-faring empire. But they wanted a way to expand their might without paying for it. So… privatization to the rescue. You give pirates a vrijbrief, a government letter saying they’ve got some kind of “permission” from the Dutch government to rob and pillage and kill everybody as long it aren’t Dutch people and ships. A privateer.So it looks like a pirate and behaves like a pirate, but it isn’t technically a real pirate.

Now on to today. There are a lot of cyber threats. Often state-sponsored. You might have a false sense of security in working for a relatively small company instead of for a juicy government target. But… privateers are back! Lots of hacking companies have coverage of governments - as long as they hack other countries. And hacking small companies can also be profitable.

“I care about security”. Do you really? What do real security people think? They think developers don’t really pay much attention to it. Eye-roll at best, disinterest at worst. Basically, “it is somebody else’s problem”.

What you need is a security culture. A buy-in at every level. You can draw an analogy with safety culture at physically dangerous companies like petrochemical. So: you as developer, should argue for security with your boss. You are a developer, so you have a duty to speak up. Just like a generic employee at a chemical plant has the duty to speak when seeing something risky.

You don’t have to become a security export (on top of everything else), but you do have to pay attention. Here are some pointers:

  • “Shift left”. A term meaning you have to do it earlier rather than later. Don’t try to secure your app just before shipping, but take it into account from the beginning. Defense in depth.

  • “Swiss cheese model”. You have multiple layers in your setup. Every layer only needs one hole for the total to be penetrated.

  • Learn secure design principles. “Deny by default”, “fail securely”, “avoid security by obscurity”, “minimize your attack surface”, etc. Deny by default is a problem in the python world. We’re beginner-friendly, so often everything is open…

  • Adopt mature security practices. Ignore ISO 27001, that’s too hard to understand. Look at OWASP instead. OWASP DevSecOps maturity model (“pin your artifacts”, for instance).

  • Know common vulnerabilities. Look at the popular “top 10” lists. Today, SQL injection still makes victims…

https://reinout.vanrees.org/images/2025/austria-vacation-5.jpeg

Unrelated photo from our 2025 holiday in Austria: center of Neufelden, nicely restored and beautifully painted.

Pycon NL: tooling with purpose - Aris Nivortis

2025-10-16

Tags: python, pycon

(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).

Full title: tooling with purpose: making smart choices as you build.

Aris uses python and data to answers research questions about everything under the ground (as geophysicist).

As a programmer you have to make lots of choices. Python environment, core project tooling, project-specific tooling, etc.

First: python environment management: pyenv/venv/pip, poetry, uv. And conda/pixi for the scientific python world. A show of hands showed uv to be real popular.

Now core project tooling. Which project structure? Do you use a template/cookiecutter for it? Subdirectories? A testing framework? Pytest is the default, start with that. (He mentioned “doctests” becoming very popular: that surprised me, as they were popular before 2010 and started to be considered old and deprecated after 2010. I’ll need to investigate a bit more).

Linting and type checking? Start with ruff for formatting/checking. Mypy is the standard type checker, but pyright/vscode and pyre are options. And the new ty is alpha, but looks promising.

Also, part of the core tooling: do you document your code? At least a README.

For domain specific tooling there are so many choices. It is easy to get lost. What to use for data storage? Web/API? Visualization tools. Scientific libraries.

Choose wisely! With great power comes great responsibility, but with great power also comes the burden of decision-making. Try to standardize. Enforce policies. Try to keep it simple.

Be aware of over-engineering. Over-engineering often comes with good intentions. And… sometimes complexity is the right path. As an example, look at database choices. You might wonder between SQL or a no-sql database and whether you need to shard your database. But often a simple sqlite database file is fast enough!

Configuration management: start with a simple os.getenv() and grab settings from environment variables. Only start using .toml files when that no longer fits your use case.

Web/api: start simple. You probably don’t need authentication from the start if it is just a quick prototype. Get something useful working, first. Once it works, you can start working on deployment or a nicer frontend.

Async code is often said to be faster. But debugging is time-consuming and hard. Error handling is different. It only really pays off when you have many, many concurrent operations. Profile your code before you start switching to async. It won’t speed up CPU-bound code.

Logging: just start using with the built-in logging module. Basic logging is better than no logging. Don’t start the Perfect Fancy Logging Setup until you have the basics running.

Testing is good and recommended, but don’t go overboard. Don’t “mock” everything to get 100% coverage. Those kinds of tests break often. And often the tests test the mock instead of your actual code. Aim for the same amount of test code compared to your actual code.

Some closing comments:

  • Sometimes simple choices are better.

  • Don’t let decision=making slow you down. Start making prototypes.

  • One-size-fits-all solutions don’t exist. Evaluate for your use case.

  • If you are an experienced developer, help your colleagues. They have to make lots of choices.

  • Early-career developer? Luckily a lot of choices are already made for you due to company policy or because the project you’re working on already made most choices for you :-)

https://reinout.vanrees.org/images/2025/austria-vacation-4.jpeg

Unrelated photo from our 2025 holiday in Austria: Neufelden station. From a 1991 train trip. I remembered the valley as being beautiful. As we now do our family holidays by train, I knew where to go as soon as Austria was chosen as destination.

Pycon NL: from flask to fastapi - William Lacerda

2025-10-16

Tags: python, pycon, django

(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).

Full title: from flask to fastapi: why and how we made the switch.

He works at “polarsteps”, a travel app. Especially a travel app that will be used in areas with really bad internet connectivity. So performance is top of mind.

They used flask for a long time. Flask 2 added async, but it was still WSGI-bound. They really needed the async scaling possibility for their 4 million monthly users. Type hinting was also a big wish item for improved reliability.

They switched to fastapi:

  • True async support. It is ASGI-native

  • Typing and validation with pydantic. Pydantic validates requests and responses. Type hints help a lot.

  • Native auto-generated docs (openapi). Built-in swagger helps for the frontend team.

This meant they gave up some things that Flask provided:

  • Flask has a mature ecosystem. So they left a big community + handy heap of stackoverflow answers + lots of ready-made plugins behind.

  • Integrated command-line dev tools. Flask is handy there.

  • Simplicity, especially for new devs.

They did a gradual migration. So they needed to build a custom fastapi middleware that could support both worlds. And some api versioning to keep the two code bases apart. It took a lot of time to port everything over.

The middleware was key. Completely async in fastapi. Every request came through here. If needed, a request would be routed to Flask via wsgi, if possible it would go to the new fastapi part of the code.

For the migration, they made a dashboard of all the endpoints and the traffic volume. They migrated high-traffic APIs first: early infra validation. Attention to improvements by checking if the queries were faster. Lots of monitoring of both performance and errors.

Some lessons learned:

  • Async adds complexity, but pays off at scale. They started the process with 4 million users, now they’re at 20.

  • Pydantic typing catches errors early.

  • Versioned middleware made incremental delivery safe.

  • Data-driven prioritization (=the dashboard) beats a big-bang rewrite.

  • AI helps, but hallucinates too much on complex APIs.

https://reinout.vanrees.org/images/2025/austria-vacation-3.jpeg

Unrelated photo from our 2025 holiday in Austria: the beautiful ‘große Mühl’ river valley.

Pycon NL: typing your python code like a ninja - Thiago Bellini Ribeiro

2025-10-16

Tags: python, pycon, django

(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).

By now, the basics of python type hints are well known:

def something(x: int) -> float:
    ...

def get_person(name: str, age: int|None) -> Person:
    ...

Note: I’ve tried typing (…) fast enough, but my examples will probably have errors in them, so check the typing documentation! His slides are here so do check those :-)

Sometimes you can have multiple types for some input. Often the output also changes then. You can accept both import types and suggest both output types, but with @overload you can be more specific:

from typing import overload

@overload
def something(x: str) -> str:
    ...

def something(x: int) -> int:
    ...

Tyou can do the same with a generic:

from typing import TypeVar

T = TypeVar("T")

@overload
def something(x: T) -> T:
    ...

# New syntax
def something[T](x: T) -> T:
    ...

# Same, but restricted to two types
def something[T: str|int](x: T) -> T:
    ...

Generic classes can be handy for, for instance, django:

class ModelManager[T: Model]:
    def __init__(self, model_class: type[T]) -> None:
        ....

    def get(self, pk: int) -> T:
        ...

Type narrowing. Sometimes you accept a broad range of items, but if you return True, it means the input is of a specific type:

from typing import TypeGuard

def is_user(obj: Any) -> TypeGuard[User]:
    ....

def something(obj: Any):
    if is_user(obj):
        # From here on, typing knows obj is a User

Generic **kwargs are a challenge, but there’s support for it:

from typing import TypedDict, Required, Unpack

class SomethingArgs(TypedDict, total-False):
    usernanme: Required(str)
    age: int

def something(**kwargs: Unpack[SomethingArgs]):
    ...

If you return “self” from some class method, you run into problems with subclasses, as normally the method says it returns the parent class. You can use from typing import Self` and return the type ``Self instead.

Nice talk, I learned quite a few new tricks!

https://reinout.vanrees.org/images/2025/austria-vacation-2.jpeg

Unrelated photo from our 2025 holiday in Austria: church of Neufelden seen on the top of the hill.

Pycon NL: programming, past and future - Steven Pemberton

2025-10-16

Tags: python, pycon

(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).

(Note: I’ve heard a keynote by Steven at pygrunn 2016.)

Steven is in the python documentary, he co-designed the abc programming language that was the predecessor to python. ABC was a research project that was designed for the programmer’s needs. He also was the first user of the open internet in Europe in November 1988, as the CWI at the university had the first 64kbps connection in Europe. Co-designer of html, css, xhtml, rdf, etc.

1988, that’s 37 years ago. But only about 30 years earlier, the first municipality (Norwich, UK) got a computer. 21 huge crates. It ran continuously for 10 years. A modern Raspberry pi would take 5 minutes to do the same work!

Those early computers were expensive: an hour of programming time was a year’s salary for a programmer. So, early programming languages were designed to optimize for the computer. Nowadays, it is the other way around: computers are almost free and programmers are expensive. This hasn’t really had an effect on the way we program.

He’s been working on declarative programming languages. One of the declarative systems is xforms, an xml-based declarative system for defining applications. It is a w3c standard, but you rarely see it mentioned. But quite some companies and government organisations use it, like the Dutch weather service (KNMI).

The NHS (UK nationwide health service) had a “Lorenzo” system for UK patient records that cost billions of pounds, took 10 years to build and basically failed. Several hospitals (and now hospitals in Ukraine!) use an xforms-system written in three years by a single programmer. Runs, if needed, on a Raspberry pi.

He thinks declarative programming allows programmers to be at least ten times more productive. He thinks, eventually everyone will program declaratively: fewer errors, more time, more productivity. (And there’s a small conference in Amsterdam in November).

https://reinout.vanrees.org/images/2025/austria-vacation-1.jpeg

Unrelated photo from our 2025 holiday in Austria: in Vienna/Wien I visited the military museum. This is the car in which archduke Franz Ferdinand was shot in Sarajevo in 1914.

Leiden python meetup: memory graph - Bas Terwijn

2025-09-04

Tags: pun, python

(One of my summaries of the fifth Python meetup in Leiden, NL).

Full title of the talk: memory graph: teaching tool and debugging aid in context of references, mutable data types, and shallow and deep copy.

memory_graph is a python debugging aid and teaching tool. It is a modern version of python tutor. (There is an online demo)

Python has two categories of types:

  • Immutable types: bool, int, float, str, tuple, etcetera. They cannot be mutated, so when a value is changed, a copy is made. If you add an item to a tuple, you get a new tuple with the extra item.

  • Mutable types: dicts, lists. You can change the values without the actual dict/list changing. You can add items to a list and you still have the same list object.

When you want an actual copy of a mutable type, you need to use import copy and copy.copy(your_list). And copy.deepcopy().

  • list2 = list1 is an assignment.

  • list2 = copy.copy(list1) gives you a second, separate, list object, but it points at the same values inside it as list1.

  • list2 = copy.deepcopy(list1) gives you a second, separate, list object and separate copies of the values inside it.

Watch out with the list2 = list1 assignment. When you add an item to list2, it is also “added” to list1 as it is the same.

He had a couple of simple exercises for us, which were amusingly hard :-)

Apart from the web online demo, there are also integrations for jupyter notebooks and lots of IDEs. Here’s an animated gif from the github repo:

https://raw.githubusercontent.com/bterwijn/memory_graph/main/images/vscode_copying.gif

Leiden python meetup: HTMX - Jan Murre

2025-09-04

Tags: pun, python, django

(One of my summaries of the fifth Python meetup in Leiden, NL).

The person who invented htmx (Carson Gross) always begins with hypermedia. Hypermedia is a media that includes non-linear branching from one location in the media to another, via hyperlinks. HTML is hypermedia, the world’s most succesful hypertext.

Another important term: HATEOAS, hypermedia as the engine of application state. With hypermedia, state is on the server. So not in a javascript frontend. With traditional single page apps that you see nowadays, you only read some json and the frontend needs to know what to do with it. Lots of logic is on the client. And you get “javascript fatigue”.

With hypermedia, you have server-side rendering. Minimal javascript. Progressive enhancement. SEO friendly. And… accessible by default. The content you get includes the behaviour (like a link to delete an item).

HTMX extends HTML with modern interactivity using simple attributes. You can target specific elements on your page for an update, so you don’t need to get a full page refresh. And you can use any http verb (get/post/put/delete/patch). And you’re not limited to forms and links: any element can trigger a request.

Some attribute examples:

  • hx-get issues a GET request to the server when you click on the element.

  • hx-post, same with POST.

  • hx-target, what you get back from your get/post, where do you want to place it?

  • hx-swap: just replace part of the page.

  • hx-trigger: when to do the request. Based on a click or based on a timer, for instance.

An advantage of HTMX is maintainability. Complexity is way lower than a single page app. Familiar patterns and regular server-side logic. Much simpler. Accessible (provided you put in some effort).

He showed a nice example with a generated list, a search field and a form. Nice: extra validation on one of the form fields via a hx-post and a hx-trigger="on-blur".

Nice trick for hx-target: you can give it the value of "closest .some-css-class", then it finds the closes enclosing element with that class.

Other niceties: hx-indicator enables a spinner upon a POST and disables it once the POST succeeds. <div hx-boost="true"> around your content tells HTMX to replace your whole page’s content with the new page, the result is the same as normally, only without the temporary flicker when loading the page.

HTMX is great for:

  • CRUD applications.

  • Content-heavy sites.

  • Forms and validation.

  • Server-side rendered apps.

  • Progressive enhancement.

  • Moderate interactivity.

You can read a book about HTMX here: https://hypermedia.systems/

In response to a question: the author considers htmx to be complete and finished. What works now should work in 20 years. So it will rarely change (unlike something like react that changes all the time).

Overview by year

Statistics: charts of posts per year and per month.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):