Reinout van Rees’ weblog

Pygrunn keynote: Homo ludens, python and play - Daniele Procida

2025-05-16

Tags: pygrunn, python

(One of my summaries of the 2025 pygrunn conference in Groningen, NL).

Organizing a conference is a weird mix between it-is-not-my-job and it-is-a-lot-of-real-work.

When Daniele Procida learned python 16 years ago, he was often told that learning python is fun! And it is easy! Often when people say it is fun and easy, that you should be suspicious.

It actually wasn’t fun. There was lots of satisfaction, though. Solving problems is nice. He didn’t want to have fun and he didn’t want to be a great programmer: He just wanted to solve problems!

Fun, playful… Python early on had a bit of playfulness. Named after Monty Python, an old BBC comedic television series. Nowadays most people haven’t seen Monty Python on television and actually discover it through the programming language. How does an Indonesian teenager react to the unknown Monty Python jokes?

Same with python-the-snakes. Lots of book with snakes on the cover. Some playful, some aggressive. Some with dripping venom (that doesn’t fit with a constricting type of snake…). But try talking about “python” in African countries where Pythons actually are a menace…

His definition of humor: the surprising violation of an expectation of congruity. A momentary re-ordering of the world. It gives us a surprising delight.

Python itself is full of expectations that can be violated… He was originally baffled by the term “pythonic”: he was just wondering whether his code worked or not, he didn’t expect his code to be judged on some fit-in-with-the-locals quality like “pythonic”.

An exception means you need a rule. A holiday is something different than regular days. A “holiday from the rule”. The special depends on the normal. Fun is a “holiday from seriousness”. Then: how can “fun” be made the expectation? A constant low-level fun is not funny.

Johan Huizinga wrote “homo ludens” in 1938. Homo ludens, the playing man. See wikipedia. He says that play is the primary pre-condition of culture. But how? According to Daniele’s examples, the holiday can’t come before the work. You need to have an exception when there’s a normal. God rested on the seventh day, but that only could be rest because he actually worked for six days.

Culture? Players on a stage. Which has rules. A courtroom can be a sort of playground. A kids’ playground has some boundaries. Could there be something to Huizinga’s argument?

At work, you can receive clear outcomes: recognition, rewards. The same system offers penalties. Work is a relatively clear system. A framework of rewards, value, etc. Daniele likes his work (he works for canonical/ubuntu). He’s allowed to collaborate a lot.

What can compete with that? In the weekend or the evenings? Often what you do is unclearer. When is it finished? When did you do a good job? To do something at home, he has to get tools out of the garage and he has to clean up afterwards: just grabbing a laptop at work is much easier. Grabbing his guitar takes more work. And nobody thanks him for playing on it. It can be too much work even to take a holiday from work. To play.

Easy-to-start-with work with good feedback is always easily available…

There’s an asymmetry of performance, failure and accountability. At work, you can get negative feedback. At home, what are they going to do? Sack him? No. Why are people talking about gamifying work? They should be talking about “workifying”! That’s more attractive!

Open source: is it work or a labour of love? What about people pestering an open source project with unwanted contributions? The dance between maintainer and contributor. What when a state actor manages to get a malicious patch into a central piece of open source software (like what happened last year)?

Does this have to do with the difference between work and play? Could open source software benefit by some explicit rules, some explicit expectations? Social expectations?

https://reinout.vanrees.org/images/2025/pygrunn-6.jpeg

Photo explanation: picture from our Harz (DE) holiday in 2023

Pygrunn: python on a tractor - Wieneke Keller, Sebastian Lenartowicz

2025-05-16

Tags: pygrunn, python

(One of my summaries of the 2025 pygrunn conference in Groningen, NL).

How to become an apple farmer in 5 minutes:

  • You want mid-size apples, as they fetch the best price.

  • You want blossom on your tree, but not too much. Otherwise the tree has to divide its water and nourishment over more apples, which makes them smaller…

They work at aurea imaging, maker of “treescout”. The treescout is a device/camera on top of a tractor that drives along the apple trees in an orchard. The device detects the tree, blossoms and other attributes of the tree. The first use case: blossom thinning to aim at the right size of apples. Blossom thinning happens with a sprayer. The blossom blossoms for just two or three weeks.

The first season they tried their project was educational. Lots or problems :-) GPS tracks that were not straight. Detected trees were not in the right location. Etcetera.

Some of the challenges:

  • It is on a vehicle, so limited power.

  • It is on a farm, so very limited or no connectivity.

  • Agricultural standards are actually from the maritime industry. GPS that is accurate within a few meters is fine, there. But not detailed enough for locating trees…

  • They were a software company, so they had to use off-the-shelf components.

The first blossom season, their architecture looked like this:

  • Python monolith with multiprocessing.

  • Restarting upon failure was hard due to them using unix pipes for inter-process communication.

  • Poor separation of responsibilities.

The second blossom season they changed several things.

  • They used k3s (a lightweight kubernetes).

  • A single-node k3s cluster sitting on top of a tractor.

  • K3s is responsible for the runtime environment and workload management.

  • Rabbitmq for inter-process communication.

  • A Kubernetes cluster really helps with rolling out quick updates.

A problem they noticed is python’s concurrency problem. There are some central components that are a bottleneck.

What they’re working on:

  • ArgoCD, open cluster management, kairos (“in-place os updates”), embedded linux. They’re close to fully-automatic remote upgrades.

  • More flexible hardware setup.

  • Machine learning and insights on tree level. BIG amount of data for an eight-man dev team…

  • Increasing the number of customers.

https://reinout.vanrees.org/images/2025/pygrunn-5.jpeg

Photo explanation: picture from our Harz (DE) holiday in 2023

Pygrunn: cloud native geospatial formats for field boundaries - Ivor Bosloper

2025-05-16

Tags: pygrunn, python

(One of my summaries of the 2025 pygrunn conference in Groningen, NL).

Cloud native geospatial file formats:

  • Geospatial data: you have raster data (= images) and vector data. And point data.

  • Raster data: geotiff, png, jpg. Vector: (shapefiles), gpkg, geoparquet. Points: gpkg, geoparquet.

Cloud native? Let’s look at geotiff for instance. Just the old .tiff format, so a raster of pixels with some metadata. A geotiff has metadata like extent, projection, etc. There is a cloud native variant, cloud optimized geotiff.

  • You have tiles, so the big image is subdivided into tiles for easier/cheaper/faster loading.

  • There are also multiple versions of the image at various “zoom levels”.

  • The metadata is always at a fixed place in the file, right at the front or at the back.

Such a cloud optimized format means that it is optimized for remote geospatial access patterns. The way it happens is with “http range requests”. After reading the metadata for the file, the algorithm knows which parts of the big file to request from the server with such a http range request.

He wanted to do the same for vector data. An approach is GeoParquet. Parquet is a bit of a “csv format”, simplified. For speed reasons it is subdivided in blocks. In the geospatial version, the blocks have an extent. (An extent is the min/max boundary around the data, btw).

Before cloud native geospatial formats, you really needed to have s special server program to host them, like geoserver. Geoserver is nice, but it is also a huge java program with loads of options. (And most people forget to properly secure it…)

What you can do now is that you can just store your cloud-native geopspatial file online in for instance s3. As long as it supports http range requests, you’re set. The big advantage is that there are good specifications and lots of implementations.

He’s now working on FIBOA: FIeld BOundaries for Agriculture. An open source and open data project. There are many open data portals with agricultural field boundaries. But all of them have different formats. FIBOA wants to unify all that. See https://github.com/fiboa/specification

For converting the current local data to their format, they used lots of python and (geo)pandas. They’re trying to generalize the python+geopandas+extract+export process, as it seems handy for lots of other use cases: https://github.com/vecorel/

https://reinout.vanrees.org/images/2025/pygrunn-4.jpeg

Photo explanation: picture from our Harz (DE) holiday in 2023

Pygrunn: django template LSP, smarter completion for django templates - Kees Hink

2025-05-16

Tags: pygrunn, python, django

(One of my summaries of the 2025 pygrunn conference in Groningen, NL).

Henk works at four digits, a long-time django shop.

He had a poll at the start about peoples’ favourite editor. The top three: pycharm 38%, vscode 37%, neovim 11%. (Bugger, my favourite, emacs, is not in the top three).

Code completion is nice. Modern editors are real handy for this with good support for django and python. But… for django templates, it is missing. No auto-completion for {% bl to {% block. And also no knowledge of the available variables and attributes.

Pycharm is an exception, it has django language support and completion for the standard django template tags and also auto-complete in case you have class-based views.

He showed us django template LSP: https://github.com/fourdigits/django-template-lsp, wich implements similar functionality for most other editors.

  • It also picks up custom template tags.

  • Docker support! It can find your template tags and code inside a docker image.

  • When something is not picked up, you can add a comment with a type hint.

  • You can install it from vscode, the name is djlsp.

You can build such a LSP, Language Server Protocol, yourself. LSP allows a client (your IDE) to interface with the language server. “A document is opened”, “I’m on line x, character y, is there a hint here?”.

They have a script called django-collector.py that is run with your project’s python (with a fallback). It statically analyses your code and returns json for use by the rest of the LSP mechanism.

There’s an alternative, django language server, started a while after they themselves started their LSP. Written in rust. But there’s not a lot of functionality yet.

Django template LSP is open source, it is in their github repo. They’re using it a lot internally. If you have ideas and pull requests: welcome!

https://reinout.vanrees.org/images/2025/pygrunn-3.jpeg

Photo explanation: picture from our Harz (DE) holiday in 2023

Pygrunn: team alignment, enterprise design power - Edzo A. Botjes

2025-05-16

Tags: pygrunn, python

(One of my summaries of the 2025 pygrunn conference in Groningen, NL).

He helps startups to design their business. He’s got more info at https://www.edzob.com/page/enterprise/ , an “enterprise design cheat sheet”. He was a consultant for a long time and started noticing patterns. He’s now in education/research and he’s focused on culture.

According to Osterwalder, success = value proposition + uniqueness + business model + timing + team alignment. In Edzo’s experience, the team alignment is often a core problem.

As a person, you have a skill set (behaviour and capabilities). As a team, you have a collective toolset: (structure, processes, data and tech). Those two are the tangible stuff. Intangible is your mindset (attitude and motivation) as a person, and the culture as a team.

UX is a social contract between an app and the user. There’s a social contract behind the interactions within the company. A culture. How do you want to collaborate? How you collaborate defines what you’re going to be building. Conway’s law. (He mentioned his talk of last year about this subject).

His wife did a PhD about the meaning behind fairy tales. For him, as a technical person, the idea of having multiple meanings of the same physical text was initially hard. What do words mean? What is their purpose? Having good conversations about the terms and words used in, for instance, your business model/motivition canvas is a good idea. Communication is key.

There was a lot more in his presentation, as it was intended as a fast-paced talk. Lots of way to look at your business or product. But in the end, team alignment could well be key. Optimizing your team members. Organizing them well. Collaboration? Facilitating? Key: have conversations. “Invest in beer and bitterballen, not in courses”.

And: you should always be looking at what can I destroy. What in your company can be stopped, removed, destroyed? You need that to change and improve.

His slides are online with lots of extra links and background information.

https://reinout.vanrees.org/images/2025/pygrunn-2.jpeg

Photo explanation: picture from our Harz (DE) holiday in 2023

Pygrunn: how to solve a python mystery - Aivars Kalvāns

2025-05-16

Tags: pygrunn, python

(One of my summaries of the 2025 pygrunn conference in Groningen, NL).

Aivars pointed at https://www.brendangregg.com/linuxperf.html as a good overview of linux tools

A good start is the /proc filesystem, you can use it to gather information on processes, for instance to grab the environment used by a process:

$ cat /proc/1234455/environ || tr '\0' '\n'

The files/sockets used by a specific process:

$ ls /proc/12345/fd/*

You might have an unfindable file that takes up lots of space (like a logfile that has been deleted from a directory, but that is still open in some program). The command above will have (deleted) next to deleted files, so you can search for that string in the output to find the process that still has such a big file open.

Another handy tool: strace, it traces linux system kernel calls. You don’t even need root access if you just want to trace your own processes. An example command:

$ strace -f -ttt -o output.txt -s 1024 -p <PID>
$ strace -f -ttt -o output.txt -s 1024 -p your-new-process.sh

If your code does a system call (“read something from somewhere”), strace prints both the start and the end of the call. So you can find out exactly where something is blocking in case of an error. He mentioned https://filippo.io/linux-syscall-table/ as a good overview of the available system calls you might see in the output.

Disk IO problems? Try iostat -x to see where the IO is happening. When testing disk throughput, don’t just test huge blobs of data, but make sure to use the actual block size (often 4k or 8k).

When debugging network access, you often use ping or traceroute. But both use protocols (ICMP and UDP) that are often blocked by network admins. He suggests tcptraceroute which uses TCP and often gives a better view of reality.

With network problems, TCP_NODELAY is a possible cause. See https://brooker.co.za/blog/2024/05/09/nagle.html for more information. Read this especially when you see the magic number 40ms in your logs, or only get 25 transactions per second.

Tip: set timeouts for everything. The defaults are often a cause for hanging.

https://reinout.vanrees.org/images/2025/pygrunn-1.jpeg

Photo explanation: picture from our Harz (DE) holiday in 2023

Pycon.de: streamlit app optimization in AWS - Darya Petrashka

2025-04-25

Tags: pycon, python

(One of my summaries of the 2025 pycon.de conference in Darmstadt, DE).

Full title: you don’t think about your streamlit app optimization until you try to deploy it to AWS

Streamlit is a quick way to show your models/data to stakeholders.

She once made a streamlit app to exercise writing Japanese characters. It used some character recognition model. But… the way streamlit works, it normally downloads the model every single time. For some local testing, it is probably OK, but if you put it in production… In production, network traffic might cost money.

Solution? You can cache it with streamlit, but you can also download it when building the docker image and store it inside the image and load it from there.

On to authentication. You can handle everything yourself as data scientist: login widget, auth logic, user privs, etc. You can also use an external provider like Amazon Cognito. Then you only have to hook up cognito in your code, but the OPS engineer has to set up cognito for you.

On to security. For all of them you’ll need the OPS engineer, probably.

  • Set up https with Route 53 and TLS certificates.

  • Configure CloudFront to protect against DDoS attacks and improve performance.

  • Use AWS web application firewall to block malicious traffic.

On to credential storage. You can use AWS secret manager instead of putting API_KEY = "1234abcd" right in your code. Using the secret manager is much more secure and that will make your OPS engineer happy.

https://reinout.vanrees.org/images/2025/pycon-35.jpeg

Photo explanation: random picture from Darmstadt (DE)

Pycon.de: community building and task automation - Cosima Meyer

2025-04-25

Tags: pycon, python

(One of my summaries of the 2025 pycon.de conference in Darmstadt, DE).

Full title: code & community: the synergy of community building and task automation

Cosima organizes PyLadies and R-Ladies events.

Visibility is important. You have to be visible to be noticed. Time is also important: you have to put in work to create good content. But you also want to have time for other things. So she thought about creating an “automated megaphone” to help her and other PyLadies members be more visible.

She created the pyladies bot on bluesky and mastodon. It regularly shares portraits of “amazing women in tech”. And it reposts messages when tagged or mentioned. It also monitors blogs and posts about them. See https://github.com/cosimameyer/awesome-pyladies-blogs

The bot runs as a github action “cron job”.

She started using google’s “gemini” LLM to create short summaries of the blog posts to make it more likely for people to click on it. She picked a cheap, small model as that was good enough. In addition she does an extra automated check on harassment, dangerous content, etc.

Lessons learned:

  • You can use powerfull LLMs to enhance your applications.

  • Integrating modern LLMs is straightforward and easy.

  • No need to go all the way to the cloud, you can just use the models via an API.

  • It is cost-effective for smasll projects and/or developers.

https://reinout.vanrees.org/images/2025/pycon-34.jpeg

Photo explanation: random picture from Darmstadt (DE)

Pycon.de: FastHTML vs. Streamlit - the dashboarding face-off - Tilman Krokotsch

2025-04-25

Tags: pycon, python, django

(One of my summaries of the 2025 pycon.de conference in Darmstadt, DE).

Streamlit is used a lot for dashboards. Fasthtml is a new contender. He demoed both of them at the same time (well done, btw, one of the better comparisons I’ve seen!), I wrote down some notes:

  • Streamlit runs through your python file and puts stuff on your screen (like a “h1”) immediately. So you see the page being build up, which tells the user something is happening. Fasthtml only sends over the page once it is fully ready.

  • Fasthtml is based on htmx! So you can put placeholders on the screen that are filled in later with htmx. This helps putting something on the screen right away.

  • There’s a fh_plotly library to create plotly graphs.

  • In fasthtml, it is fast to react to clicks and add something to the page. Only the new element is calculated, the rest stays the same. In streamlit, the whole page is calculated again, which is expensive. You can speed it up a bit with caching.

  • Adding login functionality is easier in fasthtml, especially as the layout of the page is more explicit. You can have “components” in the layout and you can swap them. So a login form component can be shown if you’re not logged in and swapped out for the actual content once you’re logged in. In streamlit, such interaction is harder as you have to manage some state variables and re-run the full script.

  • A slider below a graph to influence the size of the markers on the graph is easy in fasthtml. But in streamlit, you need advanced tricks as the graph is rendered first, even though it has to be influenced by a slider value, even though the slider isn’t defined yet.

  • Multiple pages are a pain in streamlit. In fasthtml it is just another url.

Streamlit gets you started fast. Often fewer lines of code. Many third-party integrations. Whole-page refresh only. Confusing control flow. And… if you learn streamlit, you only learn streamlit.

Fasthtml needs some boilerplate. More lines of code. Fewer integrations, but that’s because the project is pretty new. If you learn fasthtml, you learn regular web development. He thinks he can build pretty big projects with it.

See also the streamlit vs reflex talk from yesterday.

The code for his presentation: https://github.com/tilman151/streamlit-vs-fasthtml

https://reinout.vanrees.org/images/2025/pycon-36.jpeg

Photo explanation: random picture from Darmstadt (DE)

Pycon.de keynote: the future of AI: building the most impactful technology together - Leandro von Werra

2025-04-25

Tags: pycon, python

(One of my summaries of the 2025 pycon.de conference in Darmstadt, DE).

Can AI be build in the open? Especially LLMs. Key components are data, compute, pretraining, posttraining, scaling. Can these components be build in the open? As open source?

Many models are closed, like claude, gemini, OpenAI o1. There are models with open model weights (but a black box otherwise): LLaMA, deepseek, Mistralai. And you have fully open models: granite, bloom, olmo, starcoder2.

Why would we want to do it? Doing it in the open?

  • Transparency on pretraining. What data was used? How was my data used. How was the data filtered? This addresses biases, attribution and trust.

  • Transparency on alignment. Models are aligned for safety/values. Which values are in the model? Are values global? In closed models, there are only a few people that define how the model behaves, but it has lots of influence, potentially world-wide.

Deepseek-R1 is not open source, but at least the weights are open. This helped shift the discussion a lot, as previously all the big ones were closed api-only models. At hugging face they try to make a fully open version, open-R1.

Open is closing the gap. When GPT-4 came out, it took a year for open models to catch up. At the moment it is more like just two months.

  • In Europe, there are multiple public data centers with 10.000+ GPUs.

  • Collaborative training: BOOM.

  • Building the know-how together? For that you need large, high quality datasets for pre/posttraining. Good training recipes. Learning to scale to 1000s of GPUs.

  • At hugging face, they try to build open datasets, also multilingual. See a “fineweb” blog post. And also small models. And some tools to build LLMs.

Some trends:

  • Scaling will continue. Bigger is simply better. But…. it is exponential: it gets harder and harder. More compute, more power consumption.

  • A new frontier: scaling the test time compute. This could really help improve accuracy.

  • Reasoning, as done by deepseek, is interesting.

  • Challenge: agents. Agency requires multi-step reasoning, which is more harder.

  • AI in science.

What can you do?

  • Build high-quality datasets.

  • Fine-tune task specific models.

  • Work on open source tooling.

  • Join an open collaboration.

It is still early days for AI, there’s a lot of stuff you can do. Don’t think that everything is already figured out and build.

https://reinout.vanrees.org/images/2025/pycon-31.jpeg

Photo explanation: random picture from Darmstadt (DE)

Overview by year

Statistics: charts of posts per year and per month.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):