Reinout van Rees’ weblog

Pycon.de: streamlit app optimization in AWS - Darya Petrashka

2025-04-25

Tags: pycon, python

(One of my summaries of the 2025 pycon.de conference in Darmstadt, DE).

Full title: you don’t think about your streamlit app optimization until you try to deploy it to AWS

Streamlit is a quick way to show your models/data to stakeholders.

She once made a streamlit app to exercise writing Japanese characters. It used some character recognition model. But… the way streamlit works, it normally downloads the model every single time. For some local testing, it is probably OK, but if you put it in production… In production, network traffic might cost money.

Solution? You can cache it with streamlit, but you can also download it when building the docker image and store it inside the image and load it from there.

On to authentication. You can handle everything yourself as data scientist: login widget, auth logic, user privs, etc. You can also use an external provider like Amazon Cognito. Then you only have to hook up cognito in your code, but the OPS engineer has to set up cognito for you.

On to security. For all of them you’ll need the OPS engineer, probably.

  • Set up https with Route 53 and TLS certificates.

  • Configure CloudFront to protect against DDoS attacks and improve performance.

  • Use AWS web application firewall to block malicious traffic.

On to credential storage. You can use AWS secret manager instead of putting API_KEY = "1234abcd" right in your code. Using the secret manager is much more secure and that will make your OPS engineer happy.

https://reinout.vanrees.org/images/2025/pycon-35.jpeg

Photo explanation: random picture from Darmstadt (DE)

Pycon.de: community building and task automation - Cosima Meyer

2025-04-25

Tags: pycon, python

(One of my summaries of the 2025 pycon.de conference in Darmstadt, DE).

Full title: code & community: the synergy of community building and task automation

Cosima organizes PyLadies and R-Ladies events.

Visibility is important. You have to be visible to be noticed. Time is also important: you have to put in work to create good content. But you also want to have time for other things. So she thought about creating an “automated megaphone” to help her and other PyLadies members be more visible.

She created the pyladies bot on bluesky and mastodon. It regularly shares portraits of “amazing women in tech”. And it reposts messages when tagged or mentioned. It also monitors blogs and posts about them. See https://github.com/cosimameyer/awesome-pyladies-blogs

The bot runs as a github action “cron job”.

She started using google’s “gemini” LLM to create short summaries of the blog posts to make it more likely for people to click on it. She picked a cheap, small model as that was good enough. In addition she does an extra automated check on harassment, dangerous content, etc.

Lessons learned:

  • You can use powerfull LLMs to enhance your applications.

  • Integrating modern LLMs is straightforward and easy.

  • No need to go all the way to the cloud, you can just use the models via an API.

  • It is cost-effective for smasll projects and/or developers.

https://reinout.vanrees.org/images/2025/pycon-34.jpeg

Photo explanation: random picture from Darmstadt (DE)

Pycon.de: using python to enter the world of microcontrollers - Jens Nie

2025-04-25

Tags: pycon, python

(One of my summaries of the 2025 pycon.de conference in Darmstadt, DE).

A microcontroller is a combination of CPU, memory, storage and IO on a single cheap chip. He showed: ESP32, STM32, raspberry pi 2350. The lines can be a bit blurred. Raspberry pi variants that are almost a normal computer, for instance.

Computer: multi-user, multi-tasking, always ready to do something. Microcontroller: single task. Can sleep when it doesn’t have to do the task. So: low power consumption, perhaps even suitable for battery operation.

Years and years ago, microcontrollers meant resources measured with k. Now it is M or even G. From 4 kB to 4 GB

For microcontrollers you have MicroPython. Very simple to learn and it feels much the same as regular python. Libraries like “requests” and “numpy” are available.

He demoed how he set up monitoring of his power consumption at home with microcontrollers.

https://reinout.vanrees.org/images/2025/pycon-33.jpeg

Photo explanation: random picture from Darmstadt (DE)

Pycon.de: FastHTML vs. Streamlit - the dashboarding face-off - Tilman Krokotsch

2025-04-25

Tags: pycon, python, django

(One of my summaries of the 2025 pycon.de conference in Darmstadt, DE).

Streamlit is used a lot for dashboards. Fasthtml is a new contender. He demoed both of them at the same time (well done, btw, one of the better comparisons I’ve seen!), I wrote down some notes:

  • Streamlit runs through your python file and puts stuff on your screen (like a “h1”) immediately. So you see the page being build up, which tells the user something is happening. Fasthtml only sends over the page once it is fully ready.

  • Fasthtml is based on htmx! So you can put placeholders on the screen that are filled in later with htmx. This helps putting something on the screen right away.

  • There’s a fh_plotly library to create plotly graphs.

  • In fasthtml, it is fast to react to clicks and add something to the page. Only the new element is calculated, the rest stays the same. In streamlit, the whole page is calculated again, which is expensive. You can speed it up a bit with caching.

  • Adding login functionality is easier in fasthtml, especially as the layout of the page is more explicit. You can have “components” in the layout and you can swap them. So a login form component can be shown if you’re not logged in and swapped out for the actual content once you’re logged in. In streamlit, such interaction is harder as you have to manage some state variables and re-run the full script.

  • A slider below a graph to influence the size of the markers on the graph is easy in fasthtml. But in streamlit, you need advanced tricks as the graph is rendered first, even though it has to be influenced by a slider value, even though the slider isn’t defined yet.

  • Multiple pages are a pain in streamlit. In fasthtml it is just another url.

Streamlit gets you started fast. Often fewer lines of code. Many third-party integrations. Whole-page refresh only. Confusing control flow. And… if you learn streamlit, you only learn streamlit.

Fasthtml needs some boilerplate. More lines of code. Fewer integrations, but that’s because the project is pretty new. If you learn fasthtml, you learn regular web development. He thinks he can build pretty big projects with it.

See also the streamlit vs reflex talk from yesterday.

The code for his presentation: https://github.com/tilman151/streamlit-vs-fasthtml

https://reinout.vanrees.org/images/2025/pycon-36.jpeg

Photo explanation: random picture from Darmstadt (DE)

Pycon.de keynote: the future of AI: building the most impactful technology together - Leandro von Werra

2025-04-25

Tags: pycon, python

(One of my summaries of the 2025 pycon.de conference in Darmstadt, DE).

Can AI be build in the open? Especially LLMs. Key components are data, compute, pretraining, posttraining, scaling. Can these components be build in the open? As open source?

Many models are closed, like claude, gemini, OpenAI o1. There are models with open model weights (but a black box otherwise): LLaMA, deepseek, Mistralai. And you have fully open models: granite, bloom, olmo, starcoder2.

Why would we want to do it? Doing it in the open?

  • Transparency on pretraining. What data was used? How was my data used. How was the data filtered? This addresses biases, attribution and trust.

  • Transparency on alignment. Models are aligned for safety/values. Which values are in the model? Are values global? In closed models, there are only a few people that define how the model behaves, but it has lots of influence, potentially world-wide.

Deepseek-R1 is not open source, but at least the weights are open. This helped shift the discussion a lot, as previously all the big ones were closed api-only models. At hugging face they try to make a fully open version, open-R1.

Open is closing the gap. When GPT-4 came out, it took a year for open models to catch up. At the moment it is more like just two months.

  • In Europe, there are multiple public data centers with 10.000+ GPUs.

  • Collaborative training: BOOM.

  • Building the know-how together? For that you need large, high quality datasets for pre/posttraining. Good training recipes. Learning to scale to 1000s of GPUs.

  • At hugging face, they try to build open datasets, also multilingual. See a “fineweb” blog post. And also small models. And some tools to build LLMs.

Some trends:

  • Scaling will continue. Bigger is simply better. But…. it is exponential: it gets harder and harder. More compute, more power consumption.

  • A new frontier: scaling the test time compute. This could really help improve accuracy.

  • Reasoning, as done by deepseek, is interesting.

  • Challenge: agents. Agency requires multi-step reasoning, which is more harder.

  • AI in science.

What can you do?

  • Build high-quality datasets.

  • Fine-tune task specific models.

  • Work on open source tooling.

  • Join an open collaboration.

It is still early days for AI, there’s a lot of stuff you can do. Don’t think that everything is already figured out and build.

https://reinout.vanrees.org/images/2025/pycon-31.jpeg

Photo explanation: random picture from Darmstadt (DE)

Pycon.de: a11y need is love (but accessible docs help too) - Smera Goel

2025-04-24

Tags: pycon, python

(One of my summaries of the 2025 pycon.de conference in Darmstadt, DE).

a11y = AccessibilitY.

Documention is an important part of every project. But what does it mean to write documentation for everyone?

You can make a “maslov pyramid” for documentation. Accurate content and install instructions are the basic necessity. For many writers, accessibility is somewhere at the top: a nice cherry when you get around to it.

But for some people it is a basic necessity. Accessibility means removing barriers. Making sure people can use what you build.

And: with accessibility you often think about blind people or someone without an arm. But if you solve a problem for the arm-less person, you also solve it for someone who broke their arm. Or someone holding a baby.

Common accessibility problems in docs:

  • Low contrast text.

  • Poor heading structure.

  • Unlabeled buttons/links.

  • No visible focus indicators.

Every one of those problems adds some friction for everyone. And… docs are often read when there’s pressure to fix something, so any fiction is bad.

Now, how do you determine if your docs are accessible? An audit can help. It can be manual or automated or a mix. There are plenty of free tools: microsoft accessibility insights for web is one of the many tools (and the one she will use). The gold standard of testing, though, is to do real user testing. The best insights come from disabled users.

As a test she looked at the pydata sphinx theme. When they started improving it, they converted the output of microsoft’s accessibility test into issues. For such issues, be specific. Include the page, element and what’s failing. Reference an accessibility standard if possible. A screenshot or a short screencast.

Common problem: brand colors. Your project or company has a specific color already and if you use that color directly, it often isn’t accessible. There just is not enough contrast. The solution can be to take the color you want and to create lighter and darker versions of it.

Hover states problems are also common. Often there’s a slight change in color to signify that you’ve clicked something, but that’s often not clear for colorblind people. Use borders, underlines, etc.

They’ve got a “figma” explanation if you use figma.

She then did a quick investigation of the pycon.de website with microsoft’s accessibility tool. Most of the clearest problems were due to color contrast. White text on light gray, white text on light green, white text on yellow, etc. White on gray had an contrast ratio of 1.6:1, where 3:1 is recommended.

What can you do yourself?

  • Run a quick audit on your docs.

  • Review focus styles and heading structure.

  • Share or reuse the pydata design system.

  • Encourage accessibility discussions in your project.

  • Open an bug in your project for encouraging collecting accessibility issues.

https://reinout.vanrees.org/images/2025/pycon-29.jpeg

Photo explanation: picture from our 2024 vacation around Kassel (DE)

Pycon.de: boosted application performance with redis and client-side caching - David Maier

2025-04-24

Tags: pycon, python

(One of my summaries of the 2025 pycon.de conference in Darmstadt, DE).

Full title: cache me if you can: boosted application performance with redis and client-side caching.

Redis can be used:

  • As an in-memory database.

  • As a cache.

  • For data streaming.

  • As a message broker.

  • Even some vector database functionality.

Redis develops client libaries, for instance redis-py for python. Though you probably use redis through some web framework integration or so.

Why cache your data? Well, performance, scalability, speed. Instead of scaling your database, for instance, you can also put some caching in front of it (and scale the cache instead).

Caching patterns built into redis:

  • “Look-aside”: app reads from cache and if there’s a miss, it looks in the actual data source instead.

  • “Change data capture”: the app reads from the cache and writes to the data source. Upon a change, the data source writes to the cache.

Redis has the regular cache features like expiration (time to live, expiration keys, etc), eviction (explixitly removing known stale items), LRU/LFU (least recently used, least frequently used). Redis behaves like a key/value store.

Why client side caching? Again performance and scalability. Some items/keys in redis are accessed much more often than others: “hot keys”. Caching those locally on the client can help improvement.

Redis has a feature called client tracking. The client’s cache is connected to the “main” one: the main one can invalidate keys on the client side.

Now to the redis-py client library. Some responsibilities are endpoint discovery, topology responsibility, authentication, etc. And, since recently, handling a local cache, connected to the main cache.

https://reinout.vanrees.org/images/2025/pycon-28.jpeg

Photo explanation: picture from our 2024 vacation around Kassel (DE)

Pycon.de: reinventing streamlit - Malte Klemm

2025-04-24

Tags: pycon, python, django

(One of my summaries of the 2025 pycon.de conference in Darmstadt, DE).

He asked everyone who has used streamlit to stand up (75% stood). Then everyone who thought their dashboards were getting much too complex could sit down again. Only a few were left standing.

In 2015 he started out with a streamlit precursor based on bokeh. Around 2020 streamlit came around and it quickly gained a lot of popularity. He still uses streamlit and likes it. It is simple and easy.

But… people ask for more functionality. Or they add multiple pages. Then it slowly starts to break down. It doesn’t fit the streamlit paradigm anymore. You can use @st.cache_data to speed it up a bit if you do an expensive calculation. @st.fragment limits the execution scope: changes within a fragment only trigger the fragment to re-run. After a while, the cache_data and fragment decorators are only band-aid on a bigger problem. It breaks down.

He recently discovered https://reflex.dev/ . An open source framework to quickly build and deploy web apps. It is a pure python framework, so you don’t have to write typescript. But the big difference with streamlit is that the code is explicitly divided in frontend and backend.

  • The UI is a result of a function.

  • Changes are handled via events.

  • State variables are accessible by the frontend. And they are session-scoped (so every user gets its own state). Note: you can have more than one state, nesting them a bit, as they otherwise could get too big and unwieldy.

You have a State class/object with @rx.event-decorated methods that implement the events.

  • Input data is transformed to output data.

  • Input data can be changed by a widget.

  • Output data can end up in a widget.

  • Widget changes can trigger transformations.

You start with python code. A compiler/transpiler turns it into a next.js frontend and a fastapi backend. Events are passed back and forth via an api between the frontend and backend. The backend gives state updates to the frontend via a websocket.

Because it is compiled once before starting the dashboard, it is less dynamic/on-the-fly than streamlit: you have to do some rx.for_each(....) to loop over values.

  • If you know django+react or fastapi+react: use that if you know how to do it.

  • If you want simple dashboards: streamlit, dash, gradio.

  • The newcomers that aim at the middle ground because they are more explicit: reflex, rio, fasthtml. There’s a fasthtml-vs-streamlit talk tomorrow!

Some closing comments on why you might not want to use reflex:

  • Runtime errors in javascript are hard to debug.

  • You need to have a mental model of frontend vs backend.

  • The framework is moving fast (which is also good).

  • Performance and wide adoption: that’s still to be seen, it is still early days.

Something really useful that’s not in the documentation: AppHarness, which allows you to test the backend.

He tried to create a wrapper for streamlit in reflex, mostly to make it easier to move your existing dashboards slowly over to reflex. It is called relit, but it doesn’t work completely yet (and he hasn’t released it publicly yet, just ask him). And it was pretty hard to get to work :-) He thinks it might be used for writing tests for streamlit dashboards.

https://reinout.vanrees.org/images/2025/pycon-24.jpeg

Photo explanation: picture from our 2024 vacation around Kassel (DE)

Pycon.de: serverless orchestration: exploring the future of workflow automation - Tim Bossenmaier

2025-04-24

Tags: pycon, python

(One of my summaries of the 2025 pycon.de conference in Darmstadt, DE).

What is orchestration? Coordinated execution of multiple commputer systems, applications or services. It is more than automation. Some things you can think of:

  • Containers/dockers can be managed.

  • Coordinating multiple workflows/tasks.

  • Syncronizing/managing two or more apps.

  • Coordinating microservices, data services, networks, etc.

You can run code on-prem: a physical server in your cellar or a data center. You can also rent servers from a cloud provider. Another level up is serverless: you pay the specific compute resources you have used. AWS lambda is an example of a serverless function, this popularized the serverless paradigm.

Why would you combine them?

  • Resilience: no orchestration tool to keep running.

  • Cost efficiency: you only pay for what you use.

  • Scalability: automatically handled.

Some options: AWS step functions, azure logic apps, azure durable functions, google’s gcp workflows. A drawback for all of them is that they take a no-code/low-code approach, allowing you to click/drag/drop your workflows in the browser. It is stored in json, so as a proper software developer you are limited to uploading the json with terraform or so.

There are also open source solutions. Argo workflows, for instance. Drawback of those solutions: you again have to worry about infrastructure. If your company already has something set up, it might be a handy idea.

His conclusions:

  • Operations: you can orchestrate workflows with minimal OPS overhead.

  • Cost: you can build solutions at low cost.

  • Perfect for event-driven workflows.

  • The future? Probably not (vendor lock-in, mostly). But it is a great extension of your developer-toolbox.

His slides are available on github

https://reinout.vanrees.org/images/2025/pycon-23.jpeg

Photo explanation: picture from our 2024 vacation around Kassel (DE)

Pycon.de: design, generate, deploy: contract-first with fastapi - Evelyne Groen & Kateryna Budzyak

2025-04-24

Tags: pycon, python

(One of my summaries of the 2025 pycon.de conference in Darmstadt, DE).

They work for a huge marketplace for freelancers. So there’s a matching service, matching project descriptions to freelancers. In between there’s a REST api. Works fine. But… of course there are changes to the api. How do you manage that? One approach is an api contract.

But first the code first approach. A regular fastapy @app.post("/match/{project_id}")-like decorator and a function with some parameters. You can already improve it by using pydantic for more validation of the incoming parameters. Define a “model” in pydantic and pass it to the @app.post().

Fastapi uses the pydantic model when generating the openapi specification. So everything is nicely explicitly specified. See also https://gitlab.com/maltcommunity/public/pycon25 for an example.

A drawback: the implementation of changes is not formalised.

They started using a contract first approach (or: design first approach). You start with an openapi specification and generate the fastapi code. So the other way around. You can find the openapi generator here: https://openapi-generator.tech/ . It works based on mustache templates. You can specifiy your own templates. Of course it only generates the scaffolding, you still have to add your own actual implementation.

This is what they use on the backend in fastapi. Nice: the frontend team also uses the same openapi spec to generate the data models for their client side javascript! This way they are always in sync.

  • The openapi spec is the base definition. This created clarity between the teams.

  • The spec is in git, the automatically generated files are also in git. And created with CI/CD pipelines.

  • The automation works because you can create your own templates to take care of necessary adjustments.

https://reinout.vanrees.org/images/2025/pycon-22.jpeg

Photo explanation: picture from our 2024 vacation around Kassel (DE)

Overview by year

Statistics: charts of posts per year and per month.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):