Reinout van Rees' weblog

PyGrunn: blender & python: the pleasure & the struggle - Sybren Stüvel

2024-05-17T00:00:00+01:00

(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).

He showed a demo video of content created with blender. Blender started in 1994 (python 1991). Like most open source projects, blender is all about the people. Blender HQ is in Amsterdam, btw. They have 51 people on payroll and lots and lots of volunteers. The money comes mostly from donations, sponsors and subscriptions (via a foundation, the office and a studio).

Lots of small studios and game/movie makers can only exist because blender is open source: a good reason for sponsoring/supporting the project!

Now to the python side of things. Python is used as the UI glue. The actual operators that the UI calls are written in python or C++. If you enable python development inside blender, if you hover over user interface elements, you get hints on which python class it is. Right-click and you can get the full path of an object for copy/pasting into the python console. You can then work with it.

Even fancier: if you right-click on some tool or transformation, you get the python code that would be executed if you would have clicked on the tool. Which you can then modify.

Some items in Blender:

DNA: main data structure of blender. Flat lists of scenes, objects, meshes, armatures, etc. List items are identified by name. The actual file structure hasn't changed in 20+ years!
RNA ("apologies to those that know biology"): a wrapper around DNS (sub-)structures. It defines "RNA properties" (I think they're kinda layered on top of the DNA items ). It is used by the GUI, the animation system and the python access.
Drivers: small expression for modifying values, like moving a light through a scene. You can also do it with more elaborate python code.
Add-ons/modules.
Operators: you can register your own python functions as items in the user interface.

Blender mostly just uses the standard libary. As extras there are requests and numpy and GPU support.

Now on to the pain.

Getting and isolating dependencies of add-ons. The workaround is to backup sys.modules and sys.path, grab the wheels, do any imports and then restore sys.modules and sys.path. It works, but barely. You can get funny errors when you do some lazy loading. The main problem is that most of the programmers are not really experienced python programmers that might understand this hack.
pip install can also be used to install dependencies in some central location. Drawback is that it is very developer-oriented. And some dependencies need C compilers which non-developers don't have.
The new 4.2 blender will have support for wheels, that helps a lot. But it still doesn't solve the isolation problem (for when you need different versions...). Perhaps a custom import hook could help? Perhaps a fixed list of specific package versions to use? Perhaps sub-interpreters? If you have a solution: talk to him.

Personal note: I had to do something similar for a "geoserver" add-on. Similar problems. I did some really clever stuff, but in the end it got all ripped out as it was too complex and people kept finding corner cases.

They want blender to work without a network connection for security/privacy reasons (except when you explicitly allow it). This also has to be true for addons/extensions. And they struggle a bit with a reliable way to reliably restrict extensions in this way as it is "just python" and you can effectively do anything. If you have a solution: talk to him.

PyGrunn: args, kwargs, and all other ways to design your function parameters - Mike Huls

2024-05-17T00:00:00+01:00

(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).

Python is easy to use, but it is also easy to use in a wrong way. The goal of this talk is to get a better design for your functions.

Some definition:

Arguments versus parameters: parameters are the placeholders/names, arguments are the actual values. So some_function(parameter="argument").
Type hinting: hurray! Do it! It makes your parameters clearer.
Arguments versus keyword arguments. Keyword arguments are less error-prone and are more readable but also are more work. some_function("a1", "a2", "a3") versus some_function(p1="a1", p2="a2", p3="a3") He did some testing and keyword arguments are a tiny, tiny bit slower, but nothing to worry about.
Defaults. Handy and fine, but don't use "mutable types" like a list or a dict, that's unsafe.

Behind the scenes, what happens when you call a function:

Stack frame creation (a packages that contains all the info for the function to be able to run).
Arg evaluation and binding.
Function executes.
Transfer control back to the part that called the function.
Garbage control, the regular python cleanup.

Some handy functionality apart from regular args and kwargs:

def some_function(a, b, *args):, args are the left-over arguments if not all of them matched. args is a tuple.
def some_function(a="a", b="b", **kargs):, same for keyword arguments. This is a dict.
def some_function(*, a="a", b="b"):, the asteriks at the start "eats up" all the non-keyword arguments, ensuring it is impossible to pass positional arguments to the keywords. This makes your code safer.
def some_function(a, b, /):, the slash says that everything before it has to be a positional parameter, it is forbidden from being used at a keyword argument. You rarely need this.

A variant is def some_function(positional1, *, /, mandatory_kwarg=None), with one mandatory positional argument and one mandatory keyword argument.

So... with a bit of care you can design nice functions.

PyGrunn: release the KrakenD - Erik-Jan Blanksma

2024-05-17T00:00:00+01:00

(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).

Note: it is "kraken d" not "kraken": https://www.krakend.io/ .

From monolith to services: you have an application plus webclient. Then you add a mobile client. Suddenly you need an extra subscription service. And then a third party frontend needs to call your backend. And then you write a newer version of your first application... Lots of connections between the various parts.

The solution: put an API gateway in between. All the frontends call the gateway, the gateway calls all the backend applications.

If you have one gateway, that's immediately a good place to handle some common functionality: auhthentication/authorization, logging/monitoring, load balancing, service discovery, response transformation, caching, throttling, circuit breaking, combining backend responses.

What they ended up with was krakenD. Open source. It provides a lot of middleware out of the box. Lots of plugins.

What krakenD suggests is that you have a gateway per frontend, so tailored to the use case.

There's an online GUI to allows you to edit the configuration through a web interface. It is handy for exploring what's on offer and to play with the system and to figure out how it works. (In production, you of course don't want to do manual changes).

Tip: use the flexible configuration that krakenD allows. The regular single krakend.json file can get pretty big, so krakenD has support for configuration templates, environment variable files. KrakenD cna than compile the various files into one config file.

Even then, maintaining the configuration can be a challenge. If you have multiple gateways, especially. Well, you can generate part of the configuration with python scripts. Reading some openapi.json and turning it into your preferred kind of config.

Useful: there's a check command that will check your configuration for syntax errors and an audit command for improving your configuration regarding security and best practices.

From the krakend config file, you can generate an openapi spec which you can then give to swagger to have nice online documentation for your API gateway.

He thinks it is a great tool to manage your APIs. It is lightweight and fast and simple to configure. Lots of functionality out of the box. Versatile and extensible. And... it makes your software architecture more agile.

Drawbacks: you need to manage the API gateway config, which is an extra step. And changing the config requires a restart.

PyGrunn: descriptors, decoding the magic - Alex Dijkstra

2024-05-17T00:00:00+01:00

(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).

"Descriptors" are re-usable @property-like items. They implement the __get__, __set__ and __delete__ methods and possibly the __set_name__ method.

See https://docs.python.org/3/howto/descriptor.html for an introduction on the python website.

Most of the talk consisted of code examples: I'm not that fast a typer :-) So this summary is mostly a note to myself to check it out deeper.

You can use them for custom checks on class attributes, for instance. Or for some security checks. Or for logging when an attribute is updated.

This is how it looks when you use it:

class Something:
    id = PositiveInteger()  # <= this is your descriptor

So usage of a descriptor is easy. The hard and tricky part is actually writing them. If you don't really need them: don't try to write them. If you write code that is used a lot, the extra effort might pay off.

A funny one was an Alias descriptor:

class Something:
    name: str
    full_name = Alias("name")
    nickname = Alias("name")

something = Something()
something.name = "pietje"
something.name  # Returns pietje
something.full_name  # Returns pietje, too
something.nickname  # Returns pietje, too

PyGrunn: unifying django APIs through GraphQL - Jeroen Stoker

2024-05-17T00:00:00+01:00

(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).

He works on a big monolythical django application. It started with one regular django app. Then a second app was added. And a third one. The UI was templates with a bit of javascript/jquery. Then a REST api was added with djangorestframework and a single page app in backbone or react. Then another rest api was added. With another rest api library.

In the end, it becomes quite messy. Lots of different APIs. Perhaps one has a certain attribute and another version doesn't have it. And the documentation is probably spread all over the place.

REST APIs have problems of their own. Overfetching and underfetching are the most common. Underfetching is when you have to do multiple requests to get the info you need. Overfetching is when you get a big blob of data and only need a little part of it.

Then.... a new frontend app had to be build. He wanted to spend less time on defining endpoionts/serializers. And to re-use existing functionality as much as possible.

At a previous pygrunn, https://graphql.org/ was suggested. What is graphql?

Query language for APIs.
Defined with a strongly typed schema.
You query for only the data you need. This also means you get predictable results: exactly what you want, based on the schema. You can query for items and subitems, reducing the underfetching problem of REST APIs.

There were a few options to start with graphql. Hasura (haskell), postgraphile (typescriot), apollo, graphene. Graphene uses python, so he chose that.

Graphene provides Fields to map python types to GraphQL types. graphene_django helps you provide those mappings for django models, in this way it looks a bit like djangorestframework's serializers. He puts the mappings in a schemas.py next to the models.py in the existing apps.

He then demoed graphql. There are apps to browse a graphql API and to interactively construct queries and look at the results.

Some feedback they got after the initial implementation:

You might have accidentally worked around security that was present in the old REST API views. The solution is to put the security into the database layer. Perhaps row level security.
Speed might be a problem. Graphene out-of-the-box experience can be really slow for certain kinds of queries. A graphene component called "data loaders" was the solution for them.

TIP: look at https://relay.dev/ , it has a steep learning curve, but it is worth the effort.

I have two other summaries you might want to look at:

lessons from using GraphQL in production from the 2019 pygrunn conference. (I wonder if this was the talk he referred to :-) )
Graphql in python? Meet strawberry! from a 2023 Amersfoort python meetup. Here "strawberry" was advertised as a more modern version of graphene by one of the graphene maintainers.

PyGrunn: platform engineering, a python perspective - Andrii Mishkovskyi

2024-05-17T00:00:00+01:00

(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).

There's no universally accepted definition of platform engineering. But there are some big reasons to do it. "Shift left" is a trend: a shift from pure development into more parts of the full process, like quality assurance and deployment.

But... then you have an abundance of choice. What tools are you going to use for package management? Which CI program? How are you going to deploy? There are so many choices... Jenkins or github actions or gitlab? Setuptools or poetry? Etcetera.

You have lots of choices: hurray! Freedom! But it takes a huge amount of time to make the choice. To figure out how to use it. To get others to use it. There of course is a need to consolidate: you can't just support everything.

How do you do platform engineering?

You observe. Observation: you need to be part of a community where you gather ideas and tips and possible projects you can use.
You execute. Execute: you have to actually set things up or implement ideas.
You collect feedback. Talk with your colleagues, see how they're using it. You really have to be open for feedback.

Some "products" that can come out of platform engineering to give you an idea:

Documentation.
Self service portal.
Boilerplates: templates to generate code. They use cookiecutter (https://cookiecutter.org) for it.
APIs.

In his company, they now support three cookiecutter templates. You shouldn't have too many. And you should try to keep them as standard and simple as possible. He suggests using a happy path: a basic set of tools that are supported and suggested. That's where you aim all your automation and templates at: you're allowed to do something else, but you'll have to support and understand it yourself.

Cookiecutter has some drawbacks. Replaying templates is brittle. There's no versioning built-in. Applying it incrementally is painful.

Dependency management: they now use pyproject.toml for everything. Poetry instead of setuptools+pip-tools. And as platform team they recommend Renovate to the actual software teams, which helps keeping your code and dependencies up to date. Renovate turned out to be quite popular.

Regarding API: one of the services the platform team at his company provides is orchestrating the authorization used by the various (micro)services. Previously, it was mostly a manual process via a django admin website. "Chatops": they'd get lots of requests for access. Lots of manual work, no clarity which "scopes" (+/- permissions) there are, no clarity which permissions were actually handed out.

They fixed it with automation based on yaml files:

Apps deoclare their scopes.
Maintainers approve access.
A CI/CD pipeline checks if the access rights are in place.

Amersfoort (NL) python meetup

2023-11-16T00:00:00+01:00

The first "pyutrecht" meetup in Amersfoort in the Netherlands. (Amersfoort is not the city of Utrecht, but it is in the similarly named province of Utrecht).

I gave a talk myself about being more of a proper programmer to your own laptop setup. Have a git repo with a README explaining which programs you installed. An install script or makefile for installing certain tools. "Dotfiles" for storing your config in git. Etc. I haven't made a summary of my own talk. Here are the other three:

An introduction to web scraping - William Lacerda

William works at deliverect, the host of the meeting. Webscraping means extracting data from a website and parsing it into a more useful format. Like translating a list of restaurants on a

There's a difference with web crawling: that is following links and trying to download all the pages on a website.

Important: robots.txt. As a crawler or scraper you're supposed to read it as it tells you which user agents are allowed and which areas of the website are off-limits (or not useful).

Another useful file that is often available: /sitemap.xml. A list of URLs in the site that the site thinks are useful for scraping or crawling.

A handy trick: looking at the network tab when browsing the website. Are there any internal APIs that the javascript frontend uses to populate the page? Sometimes they are blocked from easy scraping or they're difficult to access due to creative headers or authentication or cookies or session IDs.

A tip: beautifulsoup, a python library for extracting neat, structured content from an otherwise messy html page.

selenium is an alternative as it behaves much more like a regular webbrowser. So you can "click" a "next" button a couple of times in order to get a full list of items. Because selenium behaves like a real webbrowser, things like cookies and IDs in query parameters and headers just work. That makes it easier to work around many kinds of basic protection.

MicroPython - Wouter van Ooijen

A microcontroller is a combination of cpu, memory and some interfaces to external ports. https://micropython.org is a version of python for such low-power devices.

He demoed python's prompt running on a raspberrypi micro connected via microUSB. And of course the mandatory lets-blink-the-onboard-LED programs. And then some other demoes with more leds and servos. Nice.

A big advantage of micropython is that it doesn't care what processor you have. With C/C++ you specifically have to compile for the right kind of processor. With micropython you can just run your code anywhere.

You can use micropython in three ways:

As .py sources, uploaded to the microcontroller.
As pre-compiled .mpy code, also uploaded.
As frozen .mpy included in the images

He showed a couple of possible target microcontrollers. A note to myself about the ESP8266: limited support, use .mpy. I think I have a few of those at home for should-test-it-at-some-time :-) Some examples: Pi RP2040, ESP32, Teensy 4.1.

A problem: RAM is scarce in such chips and python is hungry... You can do some tricks like on-demand loading. Watch out when using an LCD graphic display, that takes 150kb easily.

You have to watch out for the timing requirements of what you want to do. Steering a servo is fine, but "neopixel" leds for instance needs a higher frequency of signals than micropython is capable of on such a microcontroller. If you use a C library for it, it works (he showed a demo).

GraphQL in python? meet strawberry - Erik Wrede

Erik works as maintainer on the Graphene and the strawberry-GraphQL projects.

Graphql is a query language for APIs. It is an alternative to the well-known REST method. With REST you often have to do multiple requests to get all the data you have. And the answers will often give more information than you actually need.

With graphql, you always start with a graphql schema. You can compare it a bit to an openapi document. The graphql schema specifies what you can request ("a Meetup has a name, description, list of talks, etc").

An actual query specifies what you want to get back as response. You can omit fields from the schema that you don't need. If you don't need "description", you leave it out. If you want to dive deeper into certain objects, you specify their fields.

Strawberry is a graphql framework. It has integrations for django, sqlalchemy, pydantic and more. The schemas are defined with classes annotated with @strawberry.type and fields with python type hints. (It looked neat!)

He showed a live demo, including the browser-based query interface bundled with graphql.

Note: strawberry is the more modern project (type hints and so) and will later have all the functionality of graphene. So if strawberry's functionality is enough, you should use that one.

aiGrunn: be a better developer with AI - Henry Bol

2023-11-10T00:00:00+01:00

(One of my summaries of the 2023 Dutch aiGrunn AI conference in Groningen, NL).

"Everybody" uses stackoverflow. Now lots of people use chatgpt (or chatgpt plus). Stackoverflow traffic has dropped by 50% in the last 1.5 year. So chatgpt can be your coding buddy.

He really likes it for quickly getting something working (MVP). Like writing something that talks to a magento API (a webshop system). It would take him ages to figure it all out. Or he could ask chatgpt.

He also thinks you don't need docstrings anymore: you can just ask chatgpt to explain a snippet of code for you. (Something I myself don't agree with, btw).

(He demoed some chatgpt code generation of a sample website). What he learned:

Good briefing and interaction is key. First tell it what you want before you start to code.
Chatgpt sometimes loses track if the interaction goes on for too long.
Read what it gives you, otherwise you won't know what it build for you.
Watch out for the "cut-off time" of the chatgpt training set: perhaps newer versions of libraries don't work anymore with the generated code.

Some dangers:

You get lazy.
You can get frustrated if you don't understand what has been generated for you.

aiGrunn: small and practical AI models for CO2 reduction in buildings - Bram de Wit

2023-11-10T00:00:00+01:00

(One of my summaries of the 2023 Dutch aiGrunn AI conference in Groningen, NL).

LLM models can be huge. Mind-boggling huge. But... we can also have fun with small models.

He works a company that regulates climate installations in buildings (HVAC, heating, ventilation, air conditioning) via the cloud. Buildings use 30% of all energy worldwide. So improving how the HVAC installation is used has a big impact.

A use case: normally you pre-heat rooms so that it is comfy when you arrive. But sometimes the sun quickly warms the room anyway shortly afterwards. Can you not conserve some energy without sacrificing too much comfort?

You could calculate an optimal solution, but "just" measuring every individual room in combination with an AI.

Technical setup:

An "edge device" inside the building.
An external API.
The API stores the data in mysql (the room metadata) and influxdb (the timeseries).
A user selects a room and a machine learning model type and a training data set (from historical data).
The software creates a dataset from influxdb, trains the model (pytorch). The trained neural network goes to ONNX (open neural network exchange). The output is stored in minio (S3-compatible object store). Note: all this is internal: no chatgpt or so.
With the business logic these predictions get interpreted and used for steering the heating. Normally you can achieve 3-5% savings.
The actual steering happens locally in the building with a "go" program that reads the ONNX data. It is open source and is called... gonnx :-)

They have a server with 1 GPU, which is enough for training all those models!

aiGrunn: learntail, turn anything into a quiz using AI - Arjan Egges

2023-11-10T00:00:00+01:00

(One of my summaries of the 2023 Dutch aiGrunn AI conference in Groningen, NL).

Arjan is known for his programming videos.

Alternative title: "the dark side of integrating a LLM (large language model) in your software". You run into several challenges. He illustrates it with https://www.learntail.com/ , something he helped build. It creates quizes from text to make the reader more active.

What he used was the python library langchain to connect his app with a LLM. A handy trick: you can have it send extra format instructions to chatgpt based on a pydantic model. If it works, it works. But if you don't get proper json back, it crashes.

Some more challenges:

There is a limit on prompt length. If it gets too long, the LLM won't fully understand it anymore and ignore some of the instructions.
A LLM is no human being. So "hard" or "easy" don't mean anything. You have to be more machine-explicit, like "quiz without jargon".
The longest answer it provides is often the correct one. Because the data it has been trained on often has the longest one as the correct answer...
Limits are hard to predict. The token limit is input + output, so you basically have to know beforehand how many tokens the AI needs for its output.
Rate limiting is an issue. If you start chunking, for instance.

A LLM is not a proper API.

You need to do syntax checking on the answer.
Are all the fields present? Validation.
Are the answers of the right type (float/string/etc).

And hey, you can still write code yourself. You don't have to ask the LLM everything, you can just do the work yourself, too. An open question is whether developers will start to depend too much on LLMs.