2025-01-27
(One of my summaries of the first Python Leiden (NL) meetup in Leiden, NL).
Tobias studied applied mathematics at Delft University.
One of the fields he used python for was graph theory. A graph consists of points (“vertices”) connected by lines (“edges”). It is a large field with many real world projects like social networks and logistics. He showed a demo he made with networkx, a python library that makes it real easy to do these kinds of graph calculations.
Graphs need to be shown. He used pyviz for that by converting the networkx graph to the format understood by pyviz.
Another field is machine learning. He did an experiment with a simulated self-driving car. He used a library that handles the basics like “reacting to a closed line on the road” and “measuring the distance to the dashed line on the road”. The simulation is shown in a visual form, which makes it funny to look at.
In his study, python was also handy for statistics and numerical analysis.
2025-01-27
(One of my summaries of the first Python Leiden (NL) meetup in Leiden, NL).
FawltyDeps is a python dependency checker. “Finding undeclared and unused dependencies in your notebooks and projects”.
Note by Reinout: since 2009 I’m one of the maintainers of z3c.dependencychecker…. also a python dependency checker :-) So this talk interested me a lot, as I didn’t know yet about fawltydeps.
A big problem in science is the “replication crisis”. Lots of research cannot actually be reproduced when you try it… Data science is part of this problem. Reproducing your jupyter notebook for instance.
Someone looked at 22k+ jupyter notebooks. Only 70% had declared their dependencies, 46%
could actually install the dependencies and only 5% actually could be
run. ModuleNotFoundError
and ImportError
were the number 1 and 3 in the list of
exceptions!
What is a dependency? For instance “numpy”, if you have a import numpy as np
in your
file. Numpy isn’t in the python standard library, you have to install it first.
You can specify dependencies in setup.py
, pyproject.toml
, requirements.txt
and so. If you import something and don’t specify it, it is an “undeclared
dependency”. When you later on remove an import and don’t adjust your
requirements.txt
, you have an “unused dependency”. That’s not immediately fatal, but
it might take up unnecessary space.
FawltyDeps was started to help with this problem: find undeclared and unused dependencies. It reports them. You can ask for a more detailed report with line numbers where the dependencies were found.
FawltyDeps supports most dependency declaration locations. requirements.txt, setup.py,
pyproject, conda, etc. And it works with plain python files, notebooks, most python
versions and most OSs. You can configure it on the commandline and in config
files. There’s even a handy command to add an example config to your pyproject.toml
.
Handy: you can add it as a pre-commit hook (https://pre-commit.com). And: there’s a ready-made github action for it, including good reporting.
Fawltydeps has to deal with several corner cases:
Package names that don’t match what you import. import sklearn
and the dependency
scikit-learn
.
Or setuptools
that provides both setuptools
and pkg_resources
.
For this it looks at various locations for installed packages to help figure out those mappings. It helps if you’ve installed FawltyDeps in your project’s virtualenv.
You can add your own custom mappings in your configuration to help FawltyDeps.
You can exclude directories.
There’s a default list of “tool” packages that FawltyDeps doesn’t complain about if you include them as dependency without importing them. Ruff, black, isort: those kinds of tools.
Django projects can have dependencies that aren’t actually imported. You can ignore those in the config to prevent them to be imported.
At the moment, extra dependencies (like [test]
or [dev]
dependencies) are just
handled as part of the whole set of dependencies.
2025-01-16
Important things first: 27 January there’s a python meetup in Leiden (NL) of the new python Leiden user group.
Meetup groups come and go, often depending on one or two people or on a company that organises it. And yes, meetup.com has basically cornered the market, at least in my experience.
There used to be a “PUN”, python usergroup Nederland, meeting that would be held in various cities, depending on the company that hosted it in turn. (For those in NL: Den Haag, Zoetermeer, Rotterdam, Utrecht, Veenendaal, Arnhem, Amsterdam, …). Managed via a mailinglist, as meetup.com didn’t exist yet. Later lots of python and/or django meetup.com-based-meetups were organised. To me, it felt a bit weird that all of them seemed to be city-oriented. Amsterdam python meetup (3 different ones), Amsterdam django meetup, Utrecht (2x), Eindhoven, Rotterdam. I went to many of them, mostly it is just an hour of travel by public transport :-)
I like going to those meetups. You get a feel for what people are doing with python. You get ideas. You learn about libaries (sometimes even from the standard library) that you didn’t know about. New python tricks. For me, it is a great method to keep up-to-date on what’s possible and on what people are enthousiastic about.
At the moment, the number of python meetups in the Netherlands seems a bit low. Perhaps I’m missing something? (I see there’s a pydata one in Amsterdam that I missed.) The last two I attended were the nice PyUtrecht ones. So: I’ll be attending the Leiden one :-)
(Note: I’m talking about meetups, we’re blessed with two one-day python conferences in the Netherlands. pygrunn in May and pycon NL in October.)
2024-10-10
(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).
Full title: Events in fintech: from state machines to event-sourced systems.
He works at kiwi.com, a payment processor. They recently refactored their systems to be more event-based.
Previously, they used state machines. A payment/customer/whatever can be in several well-defined states. Between states (but not necessarily between all states) there are transitions.
Advantages of state machines: they’re simple. Easy to understand. They’re efficient. Easy to test: just states and transitions.
Disadvantages: there’s no inherent history. That’s sometimes bad, as it is hard to answer “why is my payment in this state?!?” There’s also lack of context. And changing a state machine can be hard.
Some problems they encountered: race conditions. Two states are started for the same account. One gets a +15, the other a -20. As the states don’t know about each other, the resulting account balance can be incorrect.
Now on to events. Event sourcing / event driven architecture is what he called it. You start with a command “withdraw money, amount = 15”. This gets placed in a queue and gets processed. The processing results in another event “15 has been removed” that gets send to the database.
Events are immutable facts about things that have happened in the system. They are always in the past tense and include all relevant data. And: avoid internal implemention details if possible.
“Event sourcing”: you can re-construct the current state of the system by looking at all the events. The current state has a source: the accumulation of all events. Great for having a complete audit trail. You can also reconstruct past states (“you have programmatic access to the past” :-) ). You also have the possibility to actually solve race conditions that occurred.
If you have the time, you can even go towards the CQRS pattern (command query responsibility segregation).
There are challenges of event sourcing. Eventual consistency: if you encounter a problem, you can fix it. But the fix changes history, so the system is “eventual consistent”, not “really really always consistent”. Also: there are higher storage requirements. Complexity also is higher.
The challenges have possible solutions. The “saga pattern” has start and end events. You could also try some locking mechanism. An “optimistic concurrency control”. Not always possible, but handy: “idempotent events”, events that you can apply a couple of times after each other without doing any harm.
2024-10-10
(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).
In 2024, what is a fullstack python dev? Well, python. Database administrator. Integrator with external services. A bit of terraform or kubernetes. AWS clicky-clicky expert. And a bit of frontend.
So the frontend is only a small part of your job. You have a package.json
that makes
no sense to you, so you just npm install
it and download the entire internet into
your local node_modules/
folder. But somehow it seems to work.
The frontend? You have frameworks. More frameworks. Meta-frameworks that framework your frameworks. Frameworks don’t make things simpler, they just move the complexity somewhere else. If you use django’s ORM, you don’t have to deal with the complexity of SQL. But in exchange you have to learn the django ORM.
He wants to look at three layers:
Markup / html.
Styling / css / design systems.
Interactivity / javascript.
Markup and html. On one hand you can have bindings. “Fasthtml”, for instance. A
mapping from python to html. return Div(H1("hurray"))
. You just move the complexity.
Or “native web components”. You have custom <star-rating>
elements that get
translated into actual html. You need specific javascript code for this, so it isn’t
really portable between frameworks.
Another alternative: templating. Jinja2
is used in most programming languages. You
can do some templating, but it quickly gets unreadable.
All these solutions are great in their own way, but also suck in their own way.
Styling/css. This is an area that actually saw a lot of improvements in the last years! CSS now supports variables out of the box, so no need for “sass” or so anymore.
You used to use bootstrap, jquery and a few other things and try to style your divs and
spans. You don’t need to do that anymore: there is more than just span and div
nowadays. Classless: you use html’s new elements such as <article>
and get
something not-too-bad out of the box. You don’t use custom class statemens anymore.
CSS has its own utility frameworks now, like tailwind. He dislikes tailwind (just use a style on your element…). For the rest, css seems in pretty good shape.
Interactivity/javascript. Javascript used to be essential for things like hovers and tooltips. But: that’s build into html/css now! No need for javascript for this.
You could look at web assembly. https://pyscript.net/ for running python in the browser. Nice. But you need to know both the internal browser API and the webassembly bindings and get python to run… He looks at this field about once every half year to see if it is ready for normal use.
HTMX could be nice. https://htmx.org/ . Interactivity for your html page with auto-replacing of certain parts of the page without you needing to do javascript. It is pretty popular, but he found lots of the functionality pretty hard to use. After two years, he found out he used only one part of it most of the time. So he wrote some small javascript thingy to do just one simple kind of replacement.
Interactivity: most of it sucks.
Summary: there is no one silver bullet for your project. In many cases you’re going to benefit from building something yourself. So: if there’s no silver bullet, just produce a lot of regular bullets. Small custom self-made one-offs.
2024-10-10
(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).
You might start with some simple python code. Just a few functions. It grows and grows. Now… how do I structure this? You start looking at complex enterprise code? Object oriented programming? Software patters? Why not look at functional programming?
Arjan showed some simple code with a class in it. A customer wish resulted in a subclass with some custom behavior. But if you’d change the original class, you’d also change the subclass’s behaviour. Brittle. Often, subclassing is discouraged. The “rust” language even doesn’t include subclassing.
With some live coding, he re-wrote the class to functions. You can pass functions as variables in python.
Another trick is to use “closures”: a function that builds and returns a function:
def is_elegible(cutoff_age: int = 50) -> Callable[[...]]:
def is_eligble_function(customer, cutoff_age):
return customer.age > cutoff_age
return is_eligble_function
You can do this more elegantly with from functools import partial
. That’s a
decorator you can use on an existing function.
The example code is at https://github.com/arjancodes/pycon_nl
Functional programming concerns itself more with the flow of information. It can be more elegant if it fits your problem. Object oriented programming concerns itself more with the structure of the data.
2024-10-10
(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).
Personal note beforehand: I’d like to point at a previous talk by Daniele at djangocon EU in 2014 about diversity in our industry.
Daniele once interviewed a programmer for a position. “Why do you enjoy programming?” The answer: “When I’m programming, I’m a god. I create things out of clean air.”
You can create things. It is more than “doing things”. Or “merely commanding”. I
cannot make something exist in the real world just by saying “let there be light”. But
in programming I can! I can do x = 1
or c = Camera()
.
Programming as intention. “Fiat voluntas tua”, thy will be done. You want something. In programming, you even write tests to check whether what you intended actually turned out right :-)
Daniele likes to program, but isn’t really good at it. Which is fine with him. He also likes photography, but isn’t really good at it, too: this one hurts him quite a bit more. A famous photographer once said “your first 1000 photos are your worst”. But once you stop for a while, the counter seems to reset to zero.
Programming as intention? There is also attention. Intention is a bit “reaching towards something”. Attention is about being there, with something. Intention is future, attention is present.
He thinks he isn’t as good a photographer as he wants to be because he’s not so good at attention. He showed a couple of photos by his favourite photographer Garry Winogrand. Garry often photographed in quick way, spotting opportunities. Garry really was “paying attention”. Spotting great photos in the randomness of the real world.
There is power in attention. You express being present. Which is a power all of its own.
Developer exhaustion is a known problem. You ask a programmer too much, they have no more to give. As an analogy, the same can occur when developing analog film rolls: you use a “developer solution” to chemically develop them. When the developer solution has been used too much, the films don’t come out right.
Daniele modeled his favourite mechanical analog camera in python code: https://c-is-for-camera.readthedocs.io/ . It is a program that doesn’t actually do anything. It models the camera and its behaviour and its detailed mechanisms and the relationship between the mechanical parts. It was Daniele paying attention to his favourite camera, really getting to know the mechanism. You could say really getting to know his camera. You could say it was a love program for his camera!
You can have love programs just like you can have love songs.
He closed with part of a poem by Mary Oliver:
Pay attention
Be astonished
Tell about it
2024-10-10
(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).
Full title: efficient python project setup: showing cookiecutter’s potential within Kedro.
Kedro: https://kedro.org/, “a toolbox for production-ready data science”. Open source, python. It helps you apply regular software engineering principles to data science code, making it easier to go from prototype to production.
Things like Jupyter notebooks are great for experimenting, but not nice when you throw it over the wall to some programmer to clean it up and convert it to “real” code.
Kedro consists of:
Project template. This is done with cookiecutter.
Data catalog. Core declarative IO abstraction layer.
Nodes + pipelines.
Experiment tracking.
Extensibility.
Cookiecutter: https://cookiecutter.readthedocs.io/ . You use cookiecutter (the program) to create projects from “cookiecutter templates”. Such a template gives you a repository structure out of the box, filled in with some parameters that you provide like the name of the project.
Cookiecutter reads a settings file and prompts you interactively with some variables it
wants you to provide. It then reads a directory structure and generates an output
directory based on it. Really handy, as you normally get a README, some
pyproject.toml
or so, a proper directory structure, perhaps a sample test file.
The alternative is to start with an empty directory. Does the data scientist know or care about a README? Or how to set up a python project? It is much better to provide a handy starting point out-of-the-box.
There was a love/hate relationship with the Kedro cookiecutter templates. The templates were pretty complete, but the completeness meant that there was actually a lot of code in there: steep learning curve and lots of boilerplate. Documentation generation, for instance, which isn’t always needed.
They then made a second version that asked a few more questions and limited the amount
of generated code, based on the answers. For this customization they used the “hooks”
that cookiecutter provides: pre_prompt
, pre_gen_project
,
post_gen_project
. pre_gen_project
can adjust the filled-in variables before
actually generating the code. post_gen_project
can be used to adjust the code after
generating.
With some if/else and some post_gen_project
cleanup of the generated code, they were
able to limit the amount of generated unnecessary code.
So… use cookiecutter! A great way to help colleagues and users get started in an easy and right way.
2024-10-10
(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).
Full title: how bazel streamlines python development: stories from the Uber monorepo.
Monorepo: a single repository that contains multiple distinct projects, with well-defined relationships. There are advantages to monorepo’s like developer productivity: code sharing, component reuse, reducing duplication. Atomic commits across components: you fix everything everywhere in one commit. And you have consistency and a common way of working.
Bazel: https://bazel.build/ . A build system that allows you to define tools and tasks
by writing code. With an ever-growing build-in set of rules to support popular langauges
and packages. Bazel’s language is called starlark
, which is inspired by python. Some
semantics differ, but the behaviour is mostly the same.
Do you even need a build tool for python? Well, perhaps you have compiled cython modules. Or a javascript frontend. Code generation? In any case you probably need to generate deployable artifacts like wheels or debian packages.
They used git-filter-repo to help merge an existing repo into a monorepo, preserving the history. (Correction: I originally said they wrote it it, but Alexey told me it was an existing project.)
There is support for running tests. And optionally caching test results to prevent re-running unneeded tests (important in a huge code base).
Caching is hard, that’s why Bazel emphasises hermeticity. When given the same input source code and the same configuration, the result should always be the same. It should be hermetically isolated from outsice influences.
Some challenges for monorepo’s:
The ecosystem might not be hermetically closed. Some projects don’t publish binary wheels and depend on OS libraries. Those OS libaries might not be under total control.
Flaky test. Sometimes you have tests that don’t always fail. They depend on the speed of execution or a race condition or whatever. This wreaks Bazel’s approach a bit.
Lack of tests. If something is important to you, you should test it.
Some things they want to improve:
IDE support.
Static typing. Mypy is not fast. Bazel could perhaps help here.
Shared build cache accross multiple environments. Perhaps even accross the entire organisation?!?
2024-10-10
(One of my summaries of the one-day Pycon NL conference in Utrecht, NL).
Localisation and translation of programming language: how to make programming languages more inclusive and why that matters. Felienne is professor at the vrije universiteit in Amsterdam. She also works at a school teaching 12 year olds. Those 12 year olds wanted to learn programming and she agreed to teach it.
When she herself learned programming, it was in the 1980’s without any adults in her
environment to teach her. So she typed over some Basic
code from a book and learned
herself. That’s how she learned programming. The compiler was the teacher and she
learned to read error messages.
But current 12 year olds don’t learn that way:
> print("Hello world") # <= Note the extra space in front of print
^^^ IndentationError
“Teacher, what is an indentationerror?”. Or rather in Dutch “juf, wat betekent een indentation error”. So she tried to write a much simpler language to teach the kids. Simple code. “Lark” translates it to proper syntax. This is then converted to an “AST”, abstract syntax tree. Which is then converted to python.
A request that came up quickly was if the keywords could also be in Dutch. So not “ask” but “vraag”. She found it weird as Dutch kids are supposed to be good in English. But yeah, it was a natural question and she got it working. Even with mixing langauges at the same time.
Then the next request came from someone from Palestine. Couldn’t she make a version for Arabic? Right-to-left language… And what about variable names? Then she started to look up the definition. A combination of underscores, lowercase and uppercase characters. Oh. It didn’t include the accented characters of many European languages. And most especially Arabic characters as those have no upper/lowercase…
Right-to-left: not everything is hard. In right-to-left, the first character your computer gives you is the rightmost character. Easy. And even if you use (brackets), what looks like a closing bracket on the right is actually an opening bracket: fonts solve this!
It does depend on the program/editor that renders your code or output, though. Especially when mixing languages, you can get weird results. She showed a python traceback where a RTL string was showed as LTR.
Our 0123456789
numbers are Arabic numbers, right? As apart from the Roman I
,
II
, III
, IV
, V
? Well, actually Arabic uses different numbers! Why don’t
we learn this instead of Roman numerals? ١
, ٢
, ٣
. (Note: I hope this
renders correctly. My editor (emacs) is doing funny (but probably correct!) thingies
moving the cursor as it recognises the right-to-left pasted characters).
It is epistemic injustice. Epistemic is somehting like “the study of knowledge”. In this case she means that loads of people are done injustice as their numbers are not allowed. She showed an Arabic “1+1” in many programming languages with the syntax errors they result in. Loads of people in the world are basically discriminated against because a small group of western people designed programming langauges in a specific way.
Well, does it work? Does it work to teach programming using such a localised, translated langauge? Yes, it does. They tested it in Botswana on bilingual kids (most of the kids there speak English in addition to the local language). The kids using the localised, translated language learned more and used it more. It was easier to understand concepts.
It should also be kept in mind that English, in many countries, is the language of either a former coloniser or oppressor or the country that bombed them. What message are you sending if you effectively say that you have to use the English language when you go into IT?
Statistics: charts of posts per year and per month.
My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):