2024-09-17
(One of my summaries of the Dutch PyUtrecht meetup in Utrecht, NL).
Joris started out as a bond trader, with his MSc in financial management. So definitively no programmer by education. Bond trading meant shouting on the stock market’s floor, something that was in risk of being automated. So he moved on. In his case to Reuters, one of the two big financial information suppliers. Later he started using python and making nice prototypes, which clients loved. Now he’s a python trainer (“PythonSherpa”).
A bit of imposter syndrome was present when he was demoing some jupyter notebooks to expensive managers when he was still a python beginner. Later on as python trainer, imposter syndrome was still a thingy.
The best way to learn: learning by teaching! He had to keep learning things because he needed to explain them. And by explaining he remembered it even better.
But one thing kept worrying him: he couldn’t pass many of the standard programming interview questions like “validate this binary search algorithm”… The solution could be online courses on coursera, edx, udacity, udemy.
What he recommends is https://github.com/ossu/computer-science, an overview of free courses you need to do to theoretically pass a full computer science education. For instance the course “programming languages part a/b/c” by the university of Washington. “Algorithms” by Stanford University. “Nand2Tetris” by the Hebrew University of Jerusalem: building a computer from the basic nand/or/and/nor ports up.
Also nice for learning: https://exercism.org . “Develop fluency in 74 languages”. Free, but they do need donations quickly.
2024-09-17
(One of my summaries of the Dutch PyUtrecht meetup in Utrecht, NL).
Take your average “advent of code” text parsing problem. First try: some basic
string.split()
stuff. Not very readable and a bit brittle. Second try: a feast of
regex’es…. More elegant, but much less readable.
A regex can parse regular text. If text isn’t regular, you have to get to “EBNF”, extended Backus-Naur form. This is what lark can help you parse.
In lark you define a grammar. You can import pre-defined parts of vocabulary like “signed number” or “quoted string”. You can define custom types/variables. And you can group stuff. So first some number of “account statements”, followed by “whitespace”, followed by “transaction statements”, for instance.
After parsing with lark, you can then process the results. A useful thing you can do is
to convert everything to python dataclasses
. When you pair it with pydantic, you get
even easier type coercion and validation.
A nice quote by Miriam Forner: writing code is communicatng to your future self and other developers. So: readability matters. And Lark can help.
2024-09-17
(One of my summaries of the Dutch PyUtrecht meetup in Utrecht, NL).
Parenting and coding: choices have an effect. What is the outcome of your work? How does it turn out? But in the end you want to be proud of the outcome. You want your child/code to turn out right. With code, the effect of your decisions is clear reasonably quickly, with children it takes some more time.
One repo turns into two repos and into three. And more. How do you keep them all the attention they deserve? How do you keep them up to date?
Describe your child (=readme).
Keep a journal (=changelog).
Easy maintenance and easy to work with.
Kids need boundaries. They need to know how to behave! Likewise, your code needs to behave: you need tests to set boundaries.
After a while, your kid is old enough to go to school. Four years old or so. Hurray: you suddenly have more free time! Apparently your code is still alive! Potty training accomplished. Is your child/code ready to be shared with a wider public? But are you ready for others to make changes to your child?
Some things are standard (or should be standard) for most code:
Dependency management. requirements.txt, pipfile, uv.
Standardisation. Black, ruff, pylance, pyright.
Tests: pytest, unittest. (And look at bdd
).
Pipelines. Gitlab/github, pre-commit, jenkins.
For a good setup, use a template: cookiecutter or copier. You basically start with a ready-made baby of one year old and you skip all the diapers :-)
Children change. Their environment changes. They get new needs (“I want a PS5!”).
Renovatebot, dependabot for auto-dating. Basically you can have your kid come to you for an update.
You can have your child use several different environments with “nox” and “tox”.
When your child grows older, they need different kinds of attention. And more education. You want them to get a bit more independent. Tests are important to make sure everything works. Perhaps some automated merges can be set up.
In the end, they grow up so quickly! Be an example. Make sure your code is neat. How do you want to be remembered? Through your children. Especially as a freelancer, you want to remembered by your great code. It should be an easy child with clear documentation and clear examples. No surprises. Clean code, tests, no security leaks. Low maintenance.
2024-09-17
(One of my summaries of the Dutch PyUtrecht meetup in Utrecht, NL).
Full title: this is FFIne: building foreign function interfaces without shooting yourself in the foot.
He works at channable (the host). They have both haskell and python codebases. Haskell is used for the more performance-critical code. They looked into using haskell from within python. With a foreign function interface. In some cases Haskell is faster. And it is handy to circumvent the “global interpreter lock” GIL. And sometimes there’s a library for which there is no good alternative.
Just so you know: many of the libraries you commonly use actually use another more system-oriented language under the hood.
You need a host language: python, of course. And an embedded language that can compile
to a dynamic library such as C, haskell and others. And third you need python’s
ctypes
module. With ctypes you can load external libraries, including some necessary
bookkeeping like input/output types. Behind the scenes, ctypes
treats everything
like C, including the problems iwth memory management, datatype mapping problems and
undefined behaviour…
The alternative to including such a libary via ctypes would be to running the other library in its own web server code and call it: much easier, but much more overhead.
So… he tried his hand at a simpler FFI. One without C. (Note: if you want to bind to Rust, use https://github.com/PyO3/pyo3). His approach uses several layers.
The first layer of the solution is to have only one kind of item that’s passed through
the boundary: a ByteBox
which is a pointer to a list of bytes and the length. When
you call a foreign function from python, you call it with two ByteBoxes: one for your
input and one for the called function’s output. Going one way is called “lowering”,
the other way “lifting”.
How we can go from generic bytes to bytes. The next layer is serialisation/deserialisation, what’s included in most web frameworks or database layers. Like converting something to/from json. Or better a binary format for performance reasons.
Layer 3: esceptions. In many languages it is basically a name, message, a cause and a
traceback. He treats it as a list. In that way he can convert between languages.
The tblib
“traceback library” helps with that.
Layer 4: uncurrying to pass everything properly to the actual function.
He did all this in a day during a hackathon. It is reasonably production ready and reasonably performant. The code (and link to the slides) is at https://github.com/channable/virgil-ffi
2024-05-17
(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).
He showed a demo video of content created with blender. Blender started in 1994 (python 1991). Like most open source projects, blender is all about the people. Blender HQ is in Amsterdam, btw. They have 51 people on payroll and lots and lots of volunteers. The money comes mostly from donations, sponsors and subscriptions (via a foundation, the office and a studio).
Lots of small studios and game/movie makers can only exist because blender is open source: a good reason for sponsoring/supporting the project!
Now to the python side of things. Python is used as the UI glue. The actual operators that the UI calls are written in python or C++. If you enable python development inside blender, if you hover over user interface elements, you get hints on which python class it is. Right-click and you can get the full path of an object for copy/pasting into the python console. You can then work with it.
Even fancier: if you right-click on some tool or transformation, you get the python code that would be executed if you would have clicked on the tool. Which you can then modify.
Some items in Blender:
DNA: main data structure of blender. Flat lists of scenes, objects, meshes, armatures,
etc. List items are identified by name
. The actual file structure hasn’t changed
in 20+ years!
RNA (“apologies to those that know biology”): a wrapper around DNS (sub-)structures. It defines “RNA properties” (I think they’re kinda layered on top of the DNA items ). It is used by the GUI, the animation system and the python access.
Drivers: small expression for modifying values, like moving a light through a scene. You can also do it with more elaborate python code.
Add-ons/modules.
Operators: you can register your own python functions as items in the user interface.
Blender mostly just uses the standard libary. As extras there are requests
and
numpy
and GPU support.
Now on to the pain.
Getting and isolating dependencies of add-ons. The workaround is to backup
sys.modules
and sys.path
, grab the wheels, do any imports and then restore
sys.modules
and sys.path
. It works, but barely. You can get funny errors when
you do some lazy loading. The main problem is that most of the programmers are not
really experienced python programmers that might understand this hack.
pip install
can also be used to install dependencies in some central
location. Drawback is that it is very developer-oriented. And some dependencies need C
compilers which non-developers don’t have.
The new 4.2 blender will have support for wheels, that helps a lot. But it still doesn’t solve the isolation problem (for when you need different versions…). Perhaps a custom import hook could help? Perhaps a fixed list of specific package versions to use? Perhaps sub-interpreters? If you have a solution: talk to him.
Personal note: I had to do something similar for a “geoserver” add-on. Similar problems. I did some really clever stuff, but in the end it got all ripped out as it was too complex and people kept finding corner cases.
They want blender to work without a network connection for security/privacy reasons (except when you explicitly allow it). This also has to be true for addons/extensions. And they struggle a bit with a reliable way to reliably restrict extensions in this way as it is “just python” and you can effectively do anything. If you have a solution: talk to him.
2024-05-17
(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).
Python is easy to use, but it is also easy to use in a wrong way. The goal of this talk is to get a better design for your functions.
Some definition:
Arguments versus parameters: parameters are the placeholders/names, arguments are the
actual values. So some_function(parameter="argument")
.
Type hinting: hurray! Do it! It makes your parameters clearer.
Arguments versus keyword arguments. Keyword arguments are less error-prone and are more
readable but also are more work. some_function("a1", "a2", "a3")
versus
some_function(p1="a1", p2="a2", p3="a3")
He did some testing and keyword arguments
are a tiny, tiny bit slower, but nothing to worry about.
Defaults. Handy and fine, but don’t use “mutable types” like a list or a dict, that’s unsafe.
Behind the scenes, what happens when you call a function:
Stack frame creation (a packages that contains all the info for the function to be able to run).
Arg evaluation and binding.
Function executes.
Transfer control back to the part that called the function.
Garbage control, the regular python cleanup.
Some handy functionality apart from regular args and kwargs:
def some_function(a, b, *args):
, args are the left-over arguments if not all of
them matched. args is a tuple.
def some_function(a="a", b="b", **kargs):
, same for keyword arguments. This is a
dict.
def some_function(*, a="a", b="b"):
, the asteriks at the start “eats up” all the
non-keyword arguments, ensuring it is impossible to pass positional arguments to the
keywords. This makes your code safer.
def some_function(a, b, /):
, the slash says that everything before it has to be
a positional parameter, it is forbidden from being used at a keyword argument. You
rarely need this.
A variant is def some_function(positional1, *, /, mandatory_kwarg=None)
, with one
mandatory positional argument and one mandatory keyword argument.
So… with a bit of care you can design nice functions.
2024-05-17
(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).
Note: it is “kraken d” not “kraken”: https://www.krakend.io/ .
From monolith to services: you have an application plus webclient. Then you add a mobile client. Suddenly you need an extra subscription service. And then a third party frontend needs to call your backend. And then you write a newer version of your first application… Lots of connections between the various parts.
The solution: put an API gateway in between. All the frontends call the gateway, the gateway calls all the backend applications.
If you have one gateway, that’s immediately a good place to handle some common functionality: auhthentication/authorization, logging/monitoring, load balancing, service discovery, response transformation, caching, throttling, circuit breaking, combining backend responses.
What they ended up with was krakenD. Open source. It provides a lot of middleware out of the box. Lots of plugins.
What krakenD suggests is that you have a gateway per frontend, so tailored to the use case.
There’s an online GUI to allows you to edit the configuration through a web interface. It is handy for exploring what’s on offer and to play with the system and to figure out how it works. (In production, you of course don’t want to do manual changes).
Tip: use the flexible configuration that krakenD allows. The regular single
krakend.json
file can get pretty big, so krakenD has support for configuration
templates, environment variable files. KrakenD cna than compile the various files into
one config file.
Even then, maintaining the configuration can be a challenge. If you have multiple
gateways, especially. Well, you can generate part of the configuration with python
scripts. Reading some openapi.json
and turning it into your preferred kind of
config.
Useful: there’s a check
command that will check your configuration for syntax errors
and an audit
command for improving your configuration regarding security and best
practices.
From the krakend config file, you can generate an openapi spec which you can then give to swagger to have nice online documentation for your API gateway.
He thinks it is a great tool to manage your APIs. It is lightweight and fast and simple to configure. Lots of functionality out of the box. Versatile and extensible. And… it makes your software architecture more agile.
Drawbacks: you need to manage the API gateway config, which is an extra step. And changing the config requires a restart.
2024-05-17
(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).
“Descriptors” are re-usable @property
-like items. They implement the __get__
,
__set__
and __delete__
methods and possibly the __set_name__
method.
See https://docs.python.org/3/howto/descriptor.html for an introduction on the python website.
Most of the talk consisted of code examples: I’m not that fast a typer :-) So this summary is mostly a note to myself to check it out deeper.
You can use them for custom checks on class attributes, for instance. Or for some security checks. Or for logging when an attribute is updated.
This is how it looks when you use it:
class Something:
id = PositiveInteger() # <= this is your descriptor
So usage of a descriptor is easy. The hard and tricky part is actually writing them. If you don’t really need them: don’t try to write them. If you write code that is used a lot, the extra effort might pay off.
A funny one was an Alias
descriptor:
class Something:
name: str
full_name = Alias("name")
nickname = Alias("name")
something = Something()
something.name = "pietje"
something.name # Returns pietje
something.full_name # Returns pietje, too
something.nickname # Returns pietje, too
2024-05-17
(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).
He works on a big monolythical django application. It started with one regular django app. Then a second app was added. And a third one. The UI was templates with a bit of javascript/jquery. Then a REST api was added with djangorestframework and a single page app in backbone or react. Then another rest api was added. With another rest api library.
In the end, it becomes quite messy. Lots of different APIs. Perhaps one has a certain attribute and another version doesn’t have it. And the documentation is probably spread all over the place.
REST APIs have problems of their own. Overfetching and underfetching are the most common. Underfetching is when you have to do multiple requests to get the info you need. Overfetching is when you get a big blob of data and only need a little part of it.
Then…. a new frontend app had to be build. He wanted to spend less time on defining endpoionts/serializers. And to re-use existing functionality as much as possible.
At a previous pygrunn, https://graphql.org/ was suggested. What is graphql?
Query language for APIs.
Defined with a strongly typed schema.
You query for only the data you need. This also means you get predictable results: exactly what you want, based on the schema. You can query for items and subitems, reducing the underfetching problem of REST APIs.
There were a few options to start with graphql. Hasura (haskell), postgraphile (typescriot), apollo, graphene. Graphene uses python, so he chose that.
Graphene provides Fields
to map python types to GraphQL types. graphene_django
helps you provide those mappings for django models, in this way it looks a bit like
djangorestframework’s serializers. He puts the mappings in a schemas.py
next to the
models.py
in the existing apps.
He then demoed graphql. There are apps to browse a graphql API and to interactively construct queries and look at the results.
Some feedback they got after the initial implementation:
You might have accidentally worked around security that was present in the old REST API views. The solution is to put the security into the database layer. Perhaps row level security.
Speed might be a problem. Graphene out-of-the-box experience can be really slow for certain kinds of queries. A graphene component called “data loaders” was the solution for them.
TIP: look at https://relay.dev/ , it has a steep learning curve, but it is worth the effort.
I have two other summaries you might want to look at:
lessons from using GraphQL in production from the 2019 pygrunn conference. (I wonder if this was the talk he referred to :-) )
Graphql in python? Meet strawberry! from a 2023 Amersfoort python meetup. Here “strawberry” was advertised as a more modern version of graphene by one of the graphene maintainers.
2024-05-17
(One of my summaries of the 2024 Dutch PyGrunn python&friends conference in Groningen, NL).
There’s no universally accepted definition of platform engineering. But there are some big reasons to do it. “Shift left” is a trend: a shift from pure development into more parts of the full process, like quality assurance and deployment.
But… then you have an abundance of choice. What tools are you going to use for package management? Which CI program? How are you going to deploy? There are so many choices… Jenkins or github actions or gitlab? Setuptools or poetry? Etcetera.
You have lots of choices: hurray! Freedom! But it takes a huge amount of time to make the choice. To figure out how to use it. To get others to use it. There of course is a need to consolidate: you can’t just support everything.
How do you do platform engineering?
You observe. Observation: you need to be part of a community where you gather ideas and tips and possible projects you can use.
You execute. Execute: you have to actually set things up or implement ideas.
You collect feedback. Talk with your colleagues, see how they’re using it. You really have to be open for feedback.
Some “products” that can come out of platform engineering to give you an idea:
Documentation.
Self service portal.
Boilerplates: templates to generate code. They use cookiecutter (https://cookiecutter.org) for it.
APIs.
In his company, they now support three cookiecutter templates. You shouldn’t have too many. And you should try to keep them as standard and simple as possible. He suggests using a happy path: a basic set of tools that are supported and suggested. That’s where you aim all your automation and templates at: you’re allowed to do something else, but you’ll have to support and understand it yourself.
Cookiecutter has some drawbacks. Replaying templates is brittle. There’s no versioning built-in. Applying it incrementally is painful.
Dependency management: they now use pyproject.toml
for everything. Poetry instead of
setuptools+pip-tools. And as platform team they recommend Renovate
to the actual
software teams, which helps keeping your code and dependencies up to date. Renovate
turned out to be quite popular.
Regarding API: one of the services the platform team at his company provides is orchestrating the authorization used by the various (micro)services. Previously, it was mostly a manual process via a django admin website. “Chatops”: they’d get lots of requests for access. Lots of manual work, no clarity which “scopes” (+/- permissions) there are, no clarity which permissions were actually handed out.
They fixed it with automation based on yaml files:
Apps deoclare their scopes.
Maintainers approve access.
A CI/CD pipeline checks if the access rights are in place.
Statistics: charts of posts per year and per month.
My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):