Reinout van Rees' weblog

Pycon NL: The zen of polymorphism - Brett Slatkin

2024-10-10T00:00:00+01:00

(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).

Brett is the writer of the effective python book.

As an example, he picked a calculator that can add and multiply integers. The numbers, the operators: everything is an object in his example. You can do it with one big function, object oriented or with dynamic dispatch.

One big function: just a big old recursive function with a lot of if/else. Just something to get started. Once you add more operators ("power of", "square root"), the if/else quickly gets out of hand and becomes much to long and unreadable.

And lets say you don't only want to do the calculation, but you only want to have a nicely printed version of the calculation. Suddenly your only choice is to copy/paste the entire big function and change the calculation to print statements. And then you need to keep the two in sync...

Second try: object oriented programming. Polymorphism. A generic Operator class with a calculate() method. Then subclasses called Add, Multiply, etc.

Python does a magical thing: you can just call calculate() on whatever operator. Python handles part of the if/else for you. It calls the right calculate method. That's one of the nice things about OO programming.

Also nice: adding the printing functionality means adding a print method on the various classes. The rest of the code can remain the same. Nice.

But... it is easy to imagine that the objects grow and grow and grow. Instead of a "god function" you get a "god object" with tens of methods... An object does a lot of things, so the one object probably needs multiple libraries. If all the formatting-related code would have been in one file, only that file would have needed the formatting library. With OO code, you tend to have a file per object, so every file needs every library and functionality is split over multiple files.

OO advantages: behavior next to data, easy to add more methods, avoids dispatching duplication. Drawbacks: behaviour spread over classes, scattered dependencies, dispatching is magical.

Dynamic dispatching. Python has from functools import singledispatch, since a long time. It isn't widely known. Only one person at the conference had used it in anger! You can use it like this:

@singledispatch
def my_print(value):
    print("not implemented")

@my_print.register(int)
def _(value):
    print(f"Integer: {value}")

@my_print.register(str)
...

So... you can have a function with multiple implementations for specific types. You can have pretty dumb classes (dataclasses?) and register a calculate() method for each type. That way similar behavior can be kept together in the same file.

So... you get a lot of the benefits of OO programming without lots of the drawbacks.

Dynamic dispatch advantages: simple classes, functionality is grouped, isolated dependencies, code organized on correct axis. Downsides: data and behavior separate, less encapsulation, a bit more friction when adding new classes. When you add a class, you have to remember which functions you need to add in which files.

Another drawback is that the functions tend to poke around a bit in the objects in a way that's not really clean, but... we're using python so objects aren't as sacrosanct as in other languages and we're dependent upon our own good behavior anyway :-)

His summary:

One big function: fine to start, but quickly deteriorates.
OO: good when classes share behaviurs and when larger systems are strongly interconnected.
Dynamic dispatch: good when data types are shared, but larger systems are indipendent.
Mixing OO and dynamic dispatch: good when both cohesion in the small and system decoupling in the large are needed.

Pycon NL: From state machines to event-sourced systems - Lukáš Ševčík

2024-10-10T00:00:00+01:00

(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).

Full title: Events in fintech: from state machines to event-sourced systems.

He works at kiwi.com, a payment processor. They recently refactored their systems to be more event-based.

Previously, they used state machines. A payment/customer/whatever can be in several well-defined states. Between states (but not necessarily between all states) there are transitions.

Advantages of state machines: they're simple. Easy to understand. They're efficient. Easy to test: just states and transitions.

Disadvantages: there's no inherent history. That's sometimes bad, as it is hard to answer "why is my payment in this state?!?" There's also lack of context. And changing a state machine can be hard.

Some problems they encountered: race conditions. Two states are started for the same account. One gets a +15, the other a -20. As the states don't know about each other, the resulting account balance can be incorrect.

Now on to events. Event sourcing / event driven architecture is what he called it. You start with a command "withdraw money, amount = 15". This gets placed in a queue and gets processed. The processing results in another event "15 has been removed" that gets send to the database.

Events are immutable facts about things that have happened in the system. They are always in the past tense and include all relevant data. And: avoid internal implemention details if possible.

"Event sourcing": you can re-construct the current state of the system by looking at all the events. The current state has a source: the accumulation of all events. Great for having a complete audit trail. You can also reconstruct past states ("you have programmatic access to the past" :-) ). You also have the possibility to actually solve race conditions that occurred.

If you have the time, you can even go towards the CQRS pattern (command query responsibility segregation).

There are challenges of event sourcing. Eventual consistency: if you encounter a problem, you can fix it. But the fix changes history, so the system is "eventual consistent", not "really really always consistent". Also: there are higher storage requirements. Complexity also is higher.

The challenges have possible solutions. The "saga pattern" has start and end events. You could also try some locking mechanism. An "optimistic concurrency control". Not always possible, but handy: "idempotent events", events that you can apply a couple of times after each other without doing any harm.

Pycon NL: Using PyPI trusted publishing for ansible releases - Anwesha Das

2024-10-10T00:00:00+01:00

(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).

Using software is easy, releasing software is harder.

Originally trained as a laywer, she wondered why software had to be released so often. As a lawyer, she had to work with laws sometimes already written in 1860 :-) Nowadays she is the release manager of ansible.

Ansible is ansible-core in combination with 95+ different python modules called collections.

Originally, releasing to the python package index, pypi, wasn't really safe. Every person doing the release needed some __token__ in their ~/.pypirc. This can be compromised. And the token can be overscoped. And... can you be sure that every person doing the release is doing it in a safe way? That the person's laptop is secure enough?

Pypi now allows you to use trusted publishing. OIDC, "openID connect", is used behind the scenes to connect pypi to github/gitlab/etc. It is a way to create short-lived tokens to upload to pypi from a github/gitlab job.

A specific github repository and specific github action within that repository is set up as "trusted" by one of the maintainers of the project on pypi. The github action will, when uploading, use the OIDC mechanism to request a short-lived access token from pypi. It then uses the token to upload the release to pypi.

(Personal note: I'm using it myself for a couple of projects and it works like a charm).

Ansible's own release github action is here: https://github.com/ansible-community/ansible-build-data/blob/main/.github/workflows/ansible-release.yml

Pycon NL: Strategies for avoiding package.json for fullstack python devs - Donatas Rasiukevicius

2024-10-10T00:00:00+01:00

(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).

In 2024, what is a fullstack python dev? Well, python. Database administrator. Integrator with external services. A bit of terraform or kubernetes. AWS clicky-clicky expert. And a bit of frontend.

So the frontend is only a small part of your job. You have a package.json that makes no sense to you, so you just npm install it and download the entire internet into your local node_modules/ folder. But somehow it seems to work.

The frontend? You have frameworks. More frameworks. Meta-frameworks that framework your frameworks. Frameworks don't make things simpler, they just move the complexity somewhere else. If you use django's ORM, you don't have to deal with the complexity of SQL. But in exchange you have to learn the django ORM.

He wants to look at three layers:

Markup / html.
Styling / css / design systems.
Interactivity / javascript.

Markup and html. On one hand you can have bindings. "Fasthtml", for instance. A mapping from python to html. return Div(H1("hurray")). You just move the complexity.

Or "native web components". You have custom elements that get translated into actual html. You need specific javascript code for this, so it isn't really portable between frameworks.

Another alternative: templating. Jinja2 is used in most programming languages. You can do some templating, but it quickly gets unreadable.

All these solutions are great in their own way, but also suck in their own way.

Styling/css. This is an area that actually saw a lot of improvements in the last years! CSS now supports variables out of the box, so no need for "sass" or so anymore.

You used to use bootstrap, jquery and a few other things and try to style your divs and spans. You don't need to do that anymore: there is more than just span and div nowadays. Classless: you use html's new elements such as

and get something not-too-bad out of the box. You don't use custom class statemens anymore.

CSS has its own utility frameworks now, like tailwind. He dislikes tailwind (just use a style on your element...). For the rest, css seems in pretty good shape.

Interactivity/javascript. Javascript used to be essential for things like hovers and tooltips. But: that's build into html/css now! No need for javascript for this.

You could look at web assembly. https://pyscript.net/ for running python in the browser. Nice. But you need to know both the internal browser API and the webassembly bindings and get python to run... He looks at this field about once every half year to see if it is ready for normal use.

HTMX could be nice. https://htmx.org/ . Interactivity for your html page with auto-replacing of certain parts of the page without you needing to do javascript. It is pretty popular, but he found lots of the functionality pretty hard to use. After two years, he found out he used only one part of it most of the time. So he wrote some small javascript thingy to do just one simple kind of replacement.

Interactivity: most of it sucks.

Summary: there is no one silver bullet for your project. In many cases you're going to benefit from building something yourself. So: if there's no silver bullet, just produce a lot of regular bullets. Small custom self-made one-offs.

Pycon NL: How functional programming can help you write better python code - Arjan Egges

2024-10-10T00:00:00+01:00

(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).

You might start with some simple python code. Just a few functions. It grows and grows. Now... how do I structure this? You start looking at complex enterprise code? Object oriented programming? Software patters? Why not look at functional programming?

Arjan showed some simple code with a class in it. A customer wish resulted in a subclass with some custom behavior. But if you'd change the original class, you'd also change the subclass's behaviour. Brittle. Often, subclassing is discouraged. The "rust" language even doesn't include subclassing.

With some live coding, he re-wrote the class to functions. You can pass functions as variables in python.

Another trick is to use "closures": a function that builds and returns a function:

def is_elegible(cutoff_age: int = 50) -> Callable[[...]]:
    def is_eligble_function(customer, cutoff_age):
        return customer.age > cutoff_age
    return is_eligble_function

You can do this more elegantly with from functools import partial. That's a decorator you can use on an existing function.

The example code is at https://github.com/arjancodes/pycon_nl

Functional programming concerns itself more with the flow of information. It can be more elegant if it fits your problem. Object oriented programming concerns itself more with the structure of the data.

Pycon NL: The attentive programmer - Daniele Procida

2024-10-10T00:00:00+01:00

(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).

Personal note beforehand: I'd like to point at a previous talk by Daniele at djangocon EU in 2014 about diversity in our industry.

Daniele once interviewed a programmer for a position. "Why do you enjoy programming?" The answer: "When I'm programming, I'm a god. I create things out of clean air."

You can create things. It is more than "doing things". Or "merely commanding". I cannot make something exist in the real world just by saying "let there be light". But in programming I can! I can do x = 1 or c = Camera().

Programming as intention. "Fiat voluntas tua", thy will be done. You want something. In programming, you even write tests to check whether what you intended actually turned out right :-)

Daniele likes to program, but isn't really good at it. Which is fine with him. He also likes photography, but isn't really good at it, too: this one hurts him quite a bit more. A famous photographer once said "your first 1000 photos are your worst". But once you stop for a while, the counter seems to reset to zero.

Programming as intention? There is also attention. Intention is a bit "reaching towards something". Attention is about being there, with something. Intention is future, attention is present.

He thinks he isn't as good a photographer as he wants to be because he's not so good at attention. He showed a couple of photos by his favourite photographer Garry Winogrand. Garry often photographed in quick way, spotting opportunities. Garry really was "paying attention". Spotting great photos in the randomness of the real world.

There is power in attention. You express being present. Which is a power all of its own.

Developer exhaustion is a known problem. You ask a programmer too much, they have no more to give. As an analogy, the same can occur when developing analog film rolls: you use a "developer solution" to chemically develop them. When the developer solution has been used too much, the films don't come out right.

Daniele modeled his favourite mechanical analog camera in python code: https://c-is-for-camera.readthedocs.io/ . It is a program that doesn't actually do anything. It models the camera and its behaviour and its detailed mechanisms and the relationship between the mechanical parts. It was Daniele paying attention to his favourite camera, really getting to know the mechanism. You could say really getting to know his camera. You could say it was a love program for his camera!

You can have love programs just like you can have love songs.

He closed with part of a poem by Mary Oliver:

Pay attention
Be astonished
Tell about it

Pycon NL: Efficient python project setup with cookiecutter - Merel Theisen

2024-10-10T00:00:00+01:00

(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).

Full title: efficient python project setup: showing cookiecutter's potential within Kedro.

Kedro: https://kedro.org/, "a toolbox for production-ready data science". Open source, python. It helps you apply regular software engineering principles to data science code, making it easier to go from prototype to production.

Things like Jupyter notebooks are great for experimenting, but not nice when you throw it over the wall to some programmer to clean it up and convert it to "real" code.

Kedro consists of:

Project template. This is done with cookiecutter.
Data catalog. Core declarative IO abstraction layer.
Nodes + pipelines.
Experiment tracking.
Extensibility.

Cookiecutter: https://cookiecutter.readthedocs.io/ . You use cookiecutter (the program) to create projects from "cookiecutter templates". Such a template gives you a repository structure out of the box, filled in with some parameters that you provide like the name of the project.

Cookiecutter reads a settings file and prompts you interactively with some variables it wants you to provide. It then reads a directory structure and generates an output directory based on it. Really handy, as you normally get a README, some pyproject.toml or so, a proper directory structure, perhaps a sample test file.

The alternative is to start with an empty directory. Does the data scientist know or care about a README? Or how to set up a python project? It is much better to provide a handy starting point out-of-the-box.

There was a love/hate relationship with the Kedro cookiecutter templates. The templates were pretty complete, but the completeness meant that there was actually a lot of code in there: steep learning curve and lots of boilerplate. Documentation generation, for instance, which isn't always needed.

They then made a second version that asked a few more questions and limited the amount of generated code, based on the answers. For this customization they used the "hooks" that cookiecutter provides: pre_prompt, pre_gen_project, post_gen_project. pre_gen_project can adjust the filled-in variables before actually generating the code. post_gen_project can be used to adjust the code after generating.

With some if/else and some post_gen_project cleanup of the generated code, they were able to limit the amount of generated unnecessary code.

So... use cookiecutter! A great way to help colleagues and users get started in an easy and right way.

Pycon NL: Past, present and future of python parallelism - Pavel Filonov

2024-10-10T00:00:00+01:00

(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).

In his job, he's waiting a lot. Waiting for his machine learning stuff to finish. Waiting for results to be calculated. In c++ you'd start enabling multiple processing to get your entire processor to work. How how about python?

He showed a simple function to calculate prime numbers. A CPU bound problem. Everything he tests is run on an AWS t3.xlarge 4 CPU machine,

The present

Just simple single core. Pure non-parallel python. 1430 ms to calculate.

You can use python extensions: you code it in c++ and use some library for splitting it up in multiple tasks. 25 ms, 60x faster.

You can use python's own multithreading, using ThreadPoolExecutor. 1670 ms: wow, it is a little bit slower. Which is to be expected as the problem is CPU bound.

Similar approach, with multiprocessing. ProcessPoolExecutor with three CPUs as worker. 777 ms, about twice as fast. Many python data processing libraries support multiprocessing. It is a good approach.

The future

You can look at multiple interpreters. PEP 734. And there's a PEP to have a GIL ("global interpreter lock") per interpreter. If you start up multiple interpreters, you can get a speedup. (He showed some code with "subinterpreters", an experimental feature in python 3.13). 907 ms, so a bit faster. But note that the subinterpreter work is in an early stage of development, lots of improvements are possible. Especially passing information along between the subinterpreters is tricky.

Optional GIL. Partially, the future is already here as python 3.13 supports this. Apparently osx/windows builds of 3.13 contain a pythont executable in addition to python, the 't' variant excludes the GIL.

He tried the multithreading script again, now without GIL. 941 ms. Better than the with-GIL multithreading, but not spectacular.

He showed why the non-GIL option is not standard: it actually makes regular single-threaded code slower: 1900 ms, longer than the original 1670 ms.

Pycon NL: How bazel streamlines python development - Alexey Preobrazhenskiy

2024-10-10T00:00:00+01:00

(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).

Full title: how bazel streamlines python development: stories from the Uber monorepo.

Monorepo: a single repository that contains multiple distinct projects, with well-defined relationships. There are advantages to monorepo's like developer productivity: code sharing, component reuse, reducing duplication. Atomic commits across components: you fix everything everywhere in one commit. And you have consistency and a common way of working.

Bazel: https://bazel.build/ . A build system that allows you to define tools and tasks by writing code. With an ever-growing build-in set of rules to support popular langauges and packages. Bazel's language is called starlark, which is inspired by python. Some semantics differ, but the behaviour is mostly the same.

Do you even need a build tool for python? Well, perhaps you have compiled cython modules. Or a javascript frontend. Code generation? In any case you probably need to generate deployable artifacts like wheels or debian packages.

They used git-filter-repo to help merge an existing repo into a monorepo, preserving the history. (Correction: I originally said they wrote it it, but Alexey told me it was an existing project.)

There is support for running tests. And optionally caching test results to prevent re-running unneeded tests (important in a huge code base).

Caching is hard, that's why Bazel emphasises hermeticity. When given the same input source code and the same configuration, the result should always be the same. It should be hermetically isolated from outsice influences.

Some challenges for monorepo's:

The ecosystem might not be hermetically closed. Some projects don't publish binary wheels and depend on OS libraries. Those OS libaries might not be under total control.
Flaky test. Sometimes you have tests that don't always fail. They depend on the speed of execution or a race condition or whatever. This wreaks Bazel's approach a bit.
Lack of tests. If something is important to you, you should test it.

Some things they want to improve:

IDE support.
Static typing. Mypy is not fast. Bazel could perhaps help here.
Shared build cache accross multiple environments. Perhaps even accross the entire organisation?!?

Pycon NL: Localization and translation of programming languages - Felienne Hermans

2024-10-10T00:00:00+01:00

(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).

Localisation and translation of programming language: how to make programming languages more inclusive and why that matters. Felienne is professor at the vrije universiteit in Amsterdam. She also works at a school teaching 12 year olds. Those 12 year olds wanted to learn programming and she agreed to teach it.

When she herself learned programming, it was in the 1980's without any adults in her environment to teach her. So she typed over some Basic code from a book and learned herself. That's how she learned programming. The compiler was the teacher and she learned to read error messages.

But current 12 year olds don't learn that way:

>   print("Hello world")  # <= Note the extra space in front of print
  ^^^ IndentationError

"Teacher, what is an indentationerror?". Or rather in Dutch "juf, wat betekent een indentation error". So she tried to write a much simpler language to teach the kids. Simple code. "Lark" translates it to proper syntax. This is then converted to an "AST", abstract syntax tree. Which is then converted to python.

See https://www.hedycode.com/

A request that came up quickly was if the keywords could also be in Dutch. So not "ask" but "vraag". She found it weird as Dutch kids are supposed to be good in English. But yeah, it was a natural question and she got it working. Even with mixing langauges at the same time.

Then the next request came from someone from Palestine. Couldn't she make a version for Arabic? Right-to-left language... And what about variable names? Then she started to look up the definition. A combination of underscores, lowercase and uppercase characters. Oh. It didn't include the accented characters of many European languages. And most especially Arabic characters as those have no upper/lowercase...

Right-to-left: not everything is hard. In right-to-left, the first character your computer gives you is the rightmost character. Easy. And even if you use (brackets), what looks like a closing bracket on the right is actually an opening bracket: fonts solve this!

It does depend on the program/editor that renders your code or output, though. Especially when mixing languages, you can get weird results. She showed a python traceback where a RTL string was showed as LTR.

Our 0123456789 numbers are Arabic numbers, right? As apart from the Roman I, II, III, IV, V? Well, actually Arabic uses different numbers! Why don't we learn this instead of Roman numerals? ١, ٢, ٣. (Note: I hope this renders correctly. My editor (emacs) is doing funny (but probably correct!) thingies moving the cursor as it recognises the right-to-left pasted characters).

It is epistemic injustice. Epistemic is somehting like "the study of knowledge". In this case she means that loads of people are done injustice as their numbers are not allowed. She showed an Arabic "1+1" in many programming languages with the syntax errors they result in. Loads of people in the world are basically discriminated against because a small group of western people designed programming langauges in a specific way.

Well, does it work? Does it work to teach programming using such a localised, translated langauge? Yes, it does. They tested it in Botswana on bilingual kids (most of the kids there speak English in addition to the local language). The kids using the localised, translated language learned more and used it more. It was easier to understand concepts.

It should also be kept in mind that English, in many countries, is the language of either a former coloniser or oppressor or the country that bombed them. What message are you sending if you effectively say that you have to use the English language when you go into IT?