Diving into event-driven architectures - Marc-André Lemburg

Tags: python, pun

(One of my summaries of the 2023 Dutch pythonconferentie python meeting in Utrecht, NL).

Marc-André has been involved with python already since 1994.

If you want to implement something you want to see grow, it should be scalable (vertically and horizontally). Easy to adapt, easy to maintain. Prepared for the enterprise. You have to keep it in mind from the beginning.

Also:

  • Have good failure modes. Everything will fail at some point: handle it elegantly.

  • Integrate observability. If something fails you need to know what happened.

  • Automate governance. You might not think about this from day one, but if you want to grow…

There are two well-known synchronous archtectures: REST and the lesser known GraphQL. REST has good frontend support. Every backend can “do” it. GraphQL has the advantage that it simplifies querying new data. It is just more flexible. But performance is a problem. Regular REST also isn’t necessarily speedy.

EDA, event-driven architectures, could be a solution. Asynchronous. You can combine it with REST/GraphQL. The asynchronous nature ensures it scales well. It also promotes loose coupling.

The main concept is event driven communication. You pass along messages. You produce and consume without a direct connection. There’s always a broker that handles the messages: “event distribution”. Messages are often categorised: topics/queues/channels, terminology depending on the technology. And consumers can themselves be producers, too.

Events are handled with messages: “something happened”, “this should happen”, “something changed”. Messages should be really small. If there are larger pieces of data that belong to the message, you’re better off storing them in a database or object store.

Some formats: avro (used by kafka), protobuf, messagepack, json (but make sure to use with a json schema!). You should sign them for security. And they should be typed.

Topics/channels/queues: use one message type per topic, that’s the best practice. Such a topic often uses PubSub: pushish/subscribe. Producers publish messages and consumers subscribe to those messages. Streaming is an alternative to PubSub, consumers have to do a bit more work themselves, but it supports re-reading older messages and late-joining.

Topics have a message broker. Kafka. Redis. And even postgres supports it. Rabbitmq. activeMQ, Mosquitto. All the cloud vendors have their own version. To connect to the message broker you need a connector library.

Brokers are quite hard to build. Guaranteed delivery of messages even when the network fails. Especially difficult: make sure messages are processed only once. How do you handle retries and replays?

If you want to specify your async api, https://www.asyncapi.com/ is a good way. Basically openapi/swagger with some changes. Only…. python isn’t very good at it.

So:

  • Split applications into loosly coupled components. Components communicate with events.

  • Use a broker to manage communication.

  • Scale up/down individual parts of the stack as needed.

Event driven: sounds nice, with lots of advantages, but there are challenges, too.

  • Gathering logs from all the various nodes…

  • Associating individual log entries with specific incoming requests. Perhaps some ID injection?

  • Organisationally, you all really need to understand the architecture.

  • Organisationally, you also need really good documentation.

Now about python. The basic async support inside the language is good. But asyncAPI supprt not really, there are two packages on pypi, but both are stalled with the last activity in 2020/2021. The asyncAPI is mostly focused on java and javascript/node.

There are two recent python packages that show some promise:

  • SIO-AsyncAPI, limited to socketio/websocket.

  • asyncapi-schema-pydantic, aimed at generating pydantic schemas.

Rolling your own is definitively possible, though!

 
vanrees.org logo

About me

My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):