2024-05-17
(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).
He showed a demo video of content created with blender. Blender started in 1994 (python 1991). Like most open source projects, blender is all about the people. Blender HQ is in Amsterdam, btw. They have 51 people on payroll and lots and lots of volunteers. The money comes mostly from donations, sponsors and subscriptions (via a foundation, the office and a studio).
Lots of small studios and game/movie makers can only exist because blender is open source: a good reason for sponsoring/supporting the project!
Now to the python side of things. Python is used as the UI glue. The actual operators that the UI calls are written in python or C++. If you enable python development inside blender, if you hover over user interface elements, you get hints on which python class it is. Right-click and you can get the full path of an object for copy/pasting into the python console. You can then work with it.
Even fancier: if you right-click on some tool or transformation, you get the python code that would be executed if you would have clicked on the tool. Which you can then modify.
Some items in Blender:
DNA: main data structure of blender. Flat lists of scenes, objects, meshes, armatures,
etc. List items are identified by name
. The actual file structure hasn’t changed
in 20+ years!
RNA (“apologies to those that know biology”): a wrapper around DNS (sub-)structures. It defines “RNA properties” (I think they’re kinda layered on top of the DNA items ). It is used by the GUI, the animation system and the python access.
Drivers: small expression for modifying values, like moving a light through a scene. You can also do it with more elaborate python code.
Add-ons/modules.
Operators: you can register your own python functions as items in the user interface.
Blender mostly just uses the standard libary. As extras there are requests
and
numpy
and GPU support.
Now on to the pain.
Getting and isolating dependencies of add-ons. The workaround is to backup
sys.modules
and sys.path
, grab the wheels, do any imports and then restore
sys.modules
and sys.path
. It works, but barely. You can get funny errors when
you do some lazy loading. The main problem is that most of the programmers are not
really experienced python programmers that might understand this hack.
pip install
can also be used to install dependencies in some central
location. Drawback is that it is very developer-oriented. And some dependencies need C
compilers which non-developers don’t have.
The new 4.2 blender will have support for wheels, that helps a lot. But it still doesn’t solve the isolation problem (for when you need different versions…). Perhaps a custom import hook could help? Perhaps a fixed list of specific package versions to use? Perhaps sub-interpreters? If you have a solution: talk to him.
Personal note: I had to do something similar for a “geoserver” add-on. Similar problems. I did some really clever stuff, but in the end it got all ripped out as it was too complex and people kept finding corner cases.
They want blender to work without a network connection for security/privacy reasons (except when you explicitly allow it). This also has to be true for addons/extensions. And they struggle a bit with a reliable way to reliably restrict extensions in this way as it is “just python” and you can effectively do anything. If you have a solution: talk to him.
2024-05-17
(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).
Python is easy to use, but it is also easy to use in a wrong way. The goal of this talk is to get a better design for your functions.
Some definition:
Arguments versus parameters: parameters are the placeholders/names, arguments are the
actual values. So some_function(parameter="argument")
.
Type hinting: hurray! Do it! It makes your parameters clearer.
Arguments versus keyword arguments. Keyword arguments are less error-prone and are more
readable but also are more work. some_function("a1", "a2", "a3")
versus
some_function(p1="a1", p2="a2", p3="a3")
He did some testing and keyword arguments
are a tiny, tiny bit slower, but nothing to worry about.
Defaults. Handy and fine, but don’t use “mutable types” like a list or a dict, that’s unsafe.
Behind the scenes, what happens when you call a function:
Stack frame creation (a packages that contains all the info for the function to be able to run).
Arg evaluation and binding.
Function executes.
Transfer control back to the part that called the function.
Garbage control, the regular python cleanup.
Some handy functionality apart from regular args and kwargs:
def some_function(a, b, *args):
, args are the left-over arguments if not all of
them matched. args is a tuple.
def some_function(a="a", b="b", **kargs):
, same for keyword arguments. This is a
dict.
def some_function(*, a="a", b="b"):
, the asteriks at the start “eats up” all the
non-keyword arguments, ensuring it is impossible to pass positional arguments to the
keywords. This makes your code safer.
def some_function(a, b, /):
, the slash says that everything before it has to be
a positional parameter, it is forbidden from being used at a keyword argument. You
rarely need this.
A variant is def some_function(positional1, *, /, mandatory_kwarg=None)
, with one
mandatory positional argument and one mandatory keyword argument.
So… with a bit of care you can design nice functions.
2024-05-17
(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).
Note: it is “kraken d” not “kraken”: https://www.krakend.io/ .
From monolith to services: you have an application plus webclient. Then you add a mobile client. Suddenly you need an extra subscription service. And then a third party frontend needs to call your backend. And then you write a newer version of your first application… Lots of connections between the various parts.
The solution: put an API gateway in between. All the frontends call the gateway, the gateway calls all the backend applications.
If you have one gateway, that’s immediately a good place to handle some common functionality: auhthentication/authorization, logging/monitoring, load balancing, service discovery, response transformation, caching, throttling, circuit breaking, combining backend responses.
What they ended up with was krakenD. Open source. It provides a lot of middleware out of the box. Lots of plugins.
What krakenD suggests is that you have a gateway per frontend, so tailored to the use case.
There’s an online GUI to allows you to edit the configuration through a web interface. It is handy for exploring what’s on offer and to play with the system and to figure out how it works. (In production, you of course don’t want to do manual changes).
Tip: use the flexible configuration that krakenD allows. The regular single
krakend.json
file can get pretty big, so krakenD has support for configuration
templates, environment variable files. KrakenD cna than compile the various files into
one config file.
Even then, maintaining the configuration can be a challenge. If you have multiple
gateways, especially. Well, you can generate part of the configuration with python
scripts. Reading some openapi.json
and turning it into your preferred kind of
config.
Useful: there’s a check
command that will check your configuration for syntax errors
and an audit
command for improving your configuration regarding security and best
practices.
From the krakend config file, you can generate an openapi spec which you can then give to swagger to have nice online documentation for your API gateway.
He thinks it is a great tool to manage your APIs. It is lightweight and fast and simple to configure. Lots of functionality out of the box. Versatile and extensible. And… it makes your software architecture more agile.
Drawbacks: you need to manage the API gateway config, which is an extra step. And changing the config requires a restart.
2024-05-17
(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).
“Descriptors” are re-usable @property
-like items. They implement the __get__
,
__set__
and __delete__
methods and possibly the __set_name__
method.
See https://docs.python.org/3/howto/descriptor.html for an introduction on the python website.
Most of the talk consisted of code examples: I’m not that fast a typer :-) So this summary is mostly a note to myself to check it out deeper.
You can use them for custom checks on class attributes, for instance. Or for some security checks. Or for logging when an attribute is updated.
This is how it looks when you use it:
class Something:
id = PositiveInteger() # <= this is your descriptor
So usage of a descriptor is easy. The hard and tricky part is actually writing them. If you don’t really need them: don’t try to write them. If you write code that is used a lot, the extra effort might pay off.
A funny one was an Alias
descriptor:
class Something:
name: str
full_name = Alias("name")
nickname = Alias("name")
something = Something()
something.name = "pietje"
something.name # Returns pietje
something.full_name # Returns pietje, too
something.nickname # Returns pietje, too
2024-05-17
(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).
He works on a big monolythical django application. It started with one regular django app. Then a second app was added. And a third one. The UI was templates with a bit of javascript/jquery. Then a REST api was added with djangorestframework and a single page app in backbone or react. Then another rest api was added. With another rest api library.
In the end, it becomes quite messy. Lots of different APIs. Perhaps one has a certain attribute and another version doesn’t have it. And the documentation is probably spread all over the place.
REST APIs have problems of their own. Overfetching and underfetching are the most common. Underfetching is when you have to do multiple requests to get the info you need. Overfetching is when you get a big blob of data and only need a little part of it.
Then…. a new frontend app had to be build. He wanted to spend less time on defining endpoionts/serializers. And to re-use existing functionality as much as possible.
At a previous pygrunn, https://graphql.org/ was suggested. What is graphql?
Query language for APIs.
Defined with a strongly typed schema.
You query for only the data you need. This also means you get predictable results: exactly what you want, based on the schema. You can query for items and subitems, reducing the underfetching problem of REST APIs.
There were a few options to start with graphql. Hasura (haskell), postgraphile (typescriot), apollo, graphene. Graphene uses python, so he chose that.
Graphene provides Fields
to map python types to GraphQL types. graphene_django
helps you provide those mappings for django models, in this way it looks a bit like
djangorestframework’s serializers. He puts the mappings in a schemas.py
next to the
models.py
in the existing apps.
He then demoed graphql. There are apps to browse a graphql API and to interactively construct queries and look at the results.
Some feedback they got after the initial implementation:
You might have accidentally worked around security that was present in the old REST API views. The solution is to put the security into the database layer. Perhaps row level security.
Speed might be a problem. Graphene out-of-the-box experience can be really slow for certain kinds of queries. A graphene component called “data loaders” was the solution for them.
TIP: look at https://relay.dev/ , it has a steep learning curve, but it is worth the effort.
I have two other summaries you might want to look at:
lessons from using GraphQL in production from the 2019 pygrunn conference. (I wonder if this was the talk he referred to :-) )
Graphql in python? Meet strawberry! from a 2023 Amersfoort python meetup. Here “strawberry” was advertised as a more modern version of graphene by one of the graphene maintainers.
2024-05-17
(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).
There’s no universally accepted definition of platform engineering. But there are some big reasons to do it. “Shift left” is a trend: a shift from pure development into more parts of the full process, like quality assurance and deployment.
But… then you have an abundance of choice. What tools are you going to use for package management? Which CI program? How are you going to deploy? There are so many choices… Jenkins or github actions or gitlab? Setuptools or poetry? Etcetera.
You have lots of choices: hurray! Freedom! But it takes a huge amount of time to make the choice. To figure out how to use it. To get others to use it. There of course is a need to consolidate: you can’t just support everything.
How do you do platform engineering?
You observe. Observation: you need to be part of a community where you gather ideas and tips and possible projects you can use.
You execute. Execute: you have to actually set things up or implement ideas.
You collect feedback. Talk with your colleagues, see how they’re using it. You really have to be open for feedback.
Some “products” that can come out of platform engineering to give you an idea:
Documentation.
Self service portal.
Boilerplates: templates to generate code. They use cookiecutter (https://cookiecutter.org) for it.
APIs.
In his company, they now support three cookiecutter templates. You shouldn’t have too many. And you should try to keep them as standard and simple as possible. He suggests using a happy path: a basic set of tools that are supported and suggested. That’s where you aim all your automation and templates at: you’re allowed to do something else, but you’ll have to support and understand it yourself.
Cookiecutter has some drawbacks. Replaying templates is brittle. There’s no versioning built-in. Applying it incrementally is painful.
Dependency management: they now use pyproject.toml
for everything. Poetry instead of
setuptools+pip-tools. And as platform team they recommend Renovate
to the actual
software teams, which helps keeping your code and dependencies up to date. Renovate
turned out to be quite popular.
Regarding API: one of the services the platform team at his company provides is orchestrating the authorization used by the various (micro)services. Previously, it was mostly a manual process via a django admin website. “Chatops”: they’d get lots of requests for access. Lots of manual work, no clarity which “scopes” (+/- permissions) there are, no clarity which permissions were actually handed out.
They fixed it with automation based on yaml files:
Apps deoclare their scopes.
Maintainers approve access.
A CI/CD pipeline checks if the access rights are in place.
2024-04-16
The company I work for (Nelen & Schuurmans) is a member of the EU ‘TEMA’ project. TEMA stands for trusted extremely precise mapping and prediction for emergency management. A main focus for us is to improve our ‘3Di’ hydrodynamic computation software.
But… such a flood calculation program doesn’t function in a vacuum, it needs input and something useful has to done with its output :-) The TEMA project has prepared a couple of technologies to help with that:
A kubernetes cluster. That’s scary, so that’s why I was asked to take a look.
A semantic web server (that’s how I’d call it). NGSI-LD, using JSON-LD. That’s what the summary below is actually about.
But first I want to get a bit enthousiastic about that semantic web part. I’ve attached the tag rdf to this blog entry. Json-ld is basically “rdf in json form”. RDF is a way of linking information with URLs, like json-ld. So not just textual author="reinout"
, but http://purl.org/dc/terms/creator="https://reinout.vanrees.org"
or something like that.
I could point at a piece of my PhD thesis where I describe RDF… anyway, I’m quite enthousiastic about finally doing a little bit of work with semantic web technologies :-)
Json is a data exchange format, but it isn’t understandable for machines. Also for humans, it is not understandable out of the box. name=”John”: is that the first name? Or the username? Or the name of a book?
JSON-LD is an extension to json that annotates attributes with extra information. An attribute is turned into a URI with a specific meaning. A json document has an associated schema. Schemas are associated by a @context
atttribute pointing at the schema.
You can define a Person
type with a name
attribute that precisely says it is the name of the person. Inheritance is also possible: a Policeman
can be a kind of Person
.
NGSI-LD: next generation service interface - linked data is an open standard for context information management. “Context” means such a JSON-LD-defined schema. NGSI-LD 1.0 started already in 2012. The start of the current version was in 2017.
@id
must be an URN
An entity must have a @type
.
The class must be defined in the @context
.
https://uri.etsi.org/ngsi-ld/v1/ngsi-ld-core-context-v1.7.jsonld is implicitly included.
An entity consists of id, type and scope. An entity is stored in (or owned by) a “tenant”: a separate database. Tenants are isolated, so you can have security and access rights. A “context broker” is connected to several tenants and can handle access to them. To work around possible ID collisions between tenants, the “full” id of an entity is the combination between tenant+id.
Within such a context broker, you can search according to a “scope”. Entities can belong to multiple scopes. Scopes can be nested.
An entity is a tangible object in the physical world. It can have attributes (via json-ld). An attribute has a type, a name and a value. Type can be a regular property, a geo property, a temporal property, a list property (a list of values instead of a single value), etc.
If needed, an attribute can itself have a sub-attribute. “speed=20” can have “measured_at=14:00” and “unitCode=km/h” sub-attributes, for instance.
To access the data, you have to talk to the API. The api supports three types of json-ld:
Normalized (the most verbose, but also the most complete and interoperable).
Concise.
Simplified.
The @context
is the semantic context of the data in a resource.
The API supports the regular CRUD (create, read, update, delete) actions. Very important: the filtering. You can filter by id, by type, a geoquery and more. Advanced queries can be done with the “ngsi-ld query language”.
Subscriptions are a mechanism for receiving real-time notifications. You subscribe to the results of a filter: when something changes, you get a notification via http or mqtt. The context broker sends the notifications.
Which entities do you want to watch?
What attributes do you want to watch? And what is the condition? (“temperature>20”).
What is the endpoint (=url) you want the notification to go to?
2023-11-16
The first “pyutrecht” meetup in Amersfoort in the Netherlands. (Amersfoort is not the city of Utrecht, but it is in the similarly named province of Utrecht).
I gave a talk myself about being more of a proper programmer to your own
laptop setup. Have a git repo with a README
explaining which programs you
installed. An install script or makefile for installing certain
tools. “Dotfiles” for storing your config in git. Etc. I haven’t made a
summary of my own talk. Here are the other three:
William works at deliverect, the host of the meeting. Webscraping means extracting data from a website and parsing it into a more useful format. Like translating a list of restaurants on a
There’s a difference with web crawling: that is following links and trying to download all the pages on a website.
Important: robots.txt
. As a crawler or scraper you’re supposed to read it
as it tells you which user agents are allowed and which areas of the website
are off-limits (or not useful).
Another useful file that is often available: /sitemap.xml
. A list of URLs
in the site that the site thinks are useful for scraping or crawling.
A handy trick: looking at the network tab when browsing the website. Are there any internal APIs that the javascript frontend uses to populate the page? Sometimes they are blocked from easy scraping or they’re difficult to access due to creative headers or authentication or cookies or session IDs.
A tip: beautifulsoup, a python library for extracting neat, structured content from an otherwise messy html page.
selenium is an alternative as it behaves much more like a regular webbrowser. So you can “click” a “next” button a couple of times in order to get a full list of items. Because selenium behaves like a real webbrowser, things like cookies and IDs in query parameters and headers just work. That makes it easier to work around many kinds of basic protection.
A microcontroller is a combination of cpu, memory and some interfaces to external ports. https://micropython.org is a version of python for such low-power devices.
He demoed python’s prompt running on a raspberrypi micro connected via microUSB. And of course the mandatory lets-blink-the-onboard-LED programs. And then some other demoes with more leds and servos. Nice.
A big advantage of micropython is that it doesn’t care what processor you have. With C/C++ you specifically have to compile for the right kind of processor. With micropython you can just run your code anywhere.
You can use micropython in three ways:
As .py sources, uploaded to the microcontroller.
As pre-compiled .mpy
code, also uploaded.
As frozen .mpy
included in the images
He showed a couple of possible target microcontrollers. A note to myself about
the ESP8266
: limited support, use .mpy
. I think I have a few of those
at home for should-test-it-at-some-time :-) Some examples: Pi RP2040, ESP32,
Teensy 4.1.
A problem: RAM is scarce in such chips and python is hungry… You can do some tricks like on-demand loading. Watch out when using an LCD graphic display, that takes 150kb easily.
You have to watch out for the timing requirements of what you want to do. Steering a servo is fine, but “neopixel” leds for instance needs a higher frequency of signals than micropython is capable of on such a microcontroller. If you use a C library for it, it works (he showed a demo).
Erik works as maintainer on the Graphene and the strawberry-GraphQL projects.
Graphql is a query language for APIs. It is an alternative to the well-known REST method. With REST you often have to do multiple requests to get all the data you have. And the answers will often give more information than you actually need.
With graphql, you always start with a graphql schema. You can compare it a bit to an openapi document. The graphql schema specifies what you can request (“a Meetup has a name, description, list of talks, etc”).
An actual query specifies what you want to get back as response. You can omit fields from the schema that you don’t need. If you don’t need “description”, you leave it out. If you want to dive deeper into certain objects, you specify their fields.
Strawberry is a graphql framework. It has
integrations for django, sqlalchemy, pydantic and more. The schemas are
defined with classes annotated with @strawberry.type
and fields with
python type hints. (It looked neat!)
He showed a live demo, including the browser-based query interface bundled with graphql.
Note: strawberry is the more modern project (type hints and so) and will later have all the functionality of graphene. So if strawberry’s functionality is enough, you should use that one.
2023-11-10
(One of my summaries of the 2023 Dutch aiGrunn AI conference in Groningen, NL).
Arjan is known for his programming videos.
Alternative title: “the dark side of integrating a LLM (large language model) in your software”. You run into several challenges. He illustrates it with https://www.learntail.com/ , something he helped build. It creates quizes from text to make the reader more active.
What he used was the python library langchain to connect his app with a LLM. A handy trick: you can have it send extra format instructions to chatgpt based on a pydantic model. If it works, it works. But if you don’t get proper json back, it crashes.
Some more challenges:
There is a limit on prompt length. If it gets too long, the LLM won’t fully understand it anymore and ignore some of the instructions.
A LLM is no human being. So “hard” or “easy” don’t mean anything. You have to be more machine-explicit, like “quiz without jargon”.
The longest answer it provides is often the correct one. Because the data it has been trained on often has the longest one as the correct answer…
Limits are hard to predict. The token limit is input + output, so you basically have to know beforehand how many tokens the AI needs for its output.
Rate limiting is an issue. If you start chunking, for instance.
A LLM is not a proper API.
You need to do syntax checking on the answer.
Are all the fields present? Validation.
Are the answers of the right type (float/string/etc).
And hey, you can still write code yourself. You don’t have to ask the LLM everything, you can just do the work yourself, too. An open question is whether developers will start to depend too much on LLMs.
2023-11-10
(One of my summaries of the 2023 Dutch aiGrunn AI conference in Groningen, NL).
Alternative title: five reasons your boss doesn’t allow you to work on your LLM app idea.
Show of hands at the beginning. “Who has never used chatgpt”. I think I was the only one raising my hand :-) Lots of people are interested in it. According to google search queries, more people are interested in prompt engineering courses than in programming courses. Working in generative AI is a great work field at the moment.
Wijnand played a lot with it. He made a linkedin autoresponder, a whatsapp chatbot, a rap song generated, etc. To become enthousiastic about it he recommends checking out https://devday.openai.com/ .
There are several common drawbacks you can hear from your boss:
“Generative AI doesn’t comply with privacy laws”. Main reason: data is often hosted by big USA companies. Well, you can use azure in Europe. There are Dutch startups like Orquesta that help you pick the right ones. Complying with the GDPR is possible. You can also use local models.
“AI hallucinates and is unreliable”. He thinks it is mostly solved. Retrieval augmented generation is one of the methods you can look at. Or prompt chain techniques like manual validation prompts or enforcing explicit requirements.
“Too expensive”. Programmers are expensive and models also. So: look at smaller, cheaper models: you often don’t need the full chatgpt4. Use simpler prompts. Perhaps create your vectorisation once: then you can run your prompts practically for free. Oh, and chatgpt4 will drop its price by a factor of 3.
“The context window is too small”. (Chatgpt4 can consume bigger items since last monday, btw). Chunking/summarizing or vector embedding can also help. If you want it to write it an entire course, you can give it the initial question and ask it to generate a summary. From the summary a table of contents and from the TOC the individual chapters.
“Merging genAI with regular tools is hard”. You can ask chatgpt to reply
with json
. With the json output, you can then even feed it to javscript
functions.
During the talk, he showed off a project he is working on. A combination of chatgpt4 and web scraping, switching back between the two of them.
The biggest challenge he sees is to create something that won’t be taken over by OpenAI. So don’t compete with it but complement OpenAI. It is very hard to compete with them as they’re moving so quickly…
Statistics: charts of posts per year and per month.
My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):