Note: Łukasz Langa is the author of the wonderful black code formatter.
Note 2: I made a summary of a different pattern matching talk last month.
Łukasz started making a small game to learn about python 3.10’s new pattern matching functionality. Actually programming something that you want to finish helps you to really delve into new functionality. You won’t cut corners.
One of the things he automated in the past was a system to manage his notes, for instance to export notes marked “public” to his weblog. His notes are all in git. Lots of notes. An advice unrelated to the rest of the talk:
Own your data.
Automate with python.
He showed the source code for his simple game. One of the methods was 15 lines
of an if/elif with some more nested if/else statements.
isinstance(...) and so. He then showed the same code with the new pattern
matching of python 3.10. Matching on types, matching on attribute values.
case may seem very weird now in the way they are
implemented. But he thinks they can become pretty useful. You won’t use them a
lot, normally. But in some cases it’ll make your code more neat and clear. It
He is the CTO of a big Ukraine fashion marketplace. 10-20k orders per day. So the talk is about them surviving load spikes and so.
In 2016 they had a clojure/clojurescript/react single page app. They saw 30% more requests per second, which caused 3x the processor load. Bad news… One of the things he used was clojure.cache and picked the fast memory cache option. After finally reading the documentation, he discovered it was the cause of their problem. A cache call would fail, which would end up in a retry loop which would in effect cause almost an infite loop. Oh, and his son was only two weeks old and he was sleep-deprived. He managed to replace clojure.cache by memcached, which solved the problem.
Halloween 2017. Wife in hospital. They started losing TCP packets… The
main.js was barely loading which is bad in a single page web application
:-) The processor load on the load balancers just kept increasing. One of the
problems was the marketing department that recently added a fourth level to
the menu structure of the website. Which resulted in a 3MB json file with the
full menu. To compensate a bit, they increased the gzip level to “9” which
made it a little bit smaller. But that also meant a huge increase in the load
on the (bad) load balancer that had to do the compressing. Putting it back at
“5” solved the issue…
A regular non-busy day in july. Son is in hospital after a vaccine shot. He was also in the hospital. What can go wrong? Well, the site can go down in the night due to a DDOS attack. The solved it by doing a quick if/else on the attacker’s user agent string in the UI code…
2018, they did a pre-shopping-season load test. It turned out their database
was hit quite hard. So they used
pg_stat_statements to check all their
queries. The table with products was the one being hit hard. Which was
strange, because they cached it really well. Only… the cache wasn’t
working. They missed a key in their cache setting…
Black friday 2018 went without a hitch.
16 november 2020. Black friday just around the corner. But after a new release, the app suddenly starts eating CPU like crazy. Deploying the old release helped. But… the changes between the old and new version didn’t look suspicious. What to do? The took a profiler and started looking at the performance. It turned out some date parsing function was mostly to blame. Parsing dates? Yes, they just started a marketing promotion with a credit card provider with an offer limited to a specific date. So they added the date to the config file. And their was some tooltip showing the date. And there was some library they used that tried some 20 date formats every time… The solution? Parse the config’ed date once upon application startup…
Later they had a problem talking to the database. JVM problem? Can it be the network? Postgres driver problem? PGbouncer? Postgres itself? No. Everything seemed to be working well, only it didn’t work. 20 hours later they stopped everything and started manually executiong SQL select statements in desperation. …. and many of them stayed stuck without an error message??? In the end it was one corrupted file in postgres. So even the super-reliable postgres isn’t always perfect.
He wants to show us some insights from the 2021 offerzen developer survey . (The report is open data, btw, so you can download the data and do your own analysis on it.)
Background and education.
Junior/senior: after 4 year, you’re no longer a junior. Management roles start pick up after 6-10 years. Note: there are differences between countries.
Salary: 32k for a junior, 47k intermediate, 60k senior, 73k tech lead/management. The rise in salary is pretty linear with your career progression.
Degrees: 57% computer science.
A massive amount (44%) is self-taught. 28% at school, 21% at university.
People start coding young! 13-18 year is the most prevalent.
75% do some coding for fun outside of their jobs.
Skills and learning
The most promising industry: AI and cloud computing. (It is a bit weird that they’re grouped together, perhaps).
Python is the most desired language devs want to work with, followed by typescript and go.
Frequency of learning a new language: 30% every few months, 32% once a year and 33% every few years.
Devs like challenging projects (67%). New languages/frameworks is 43%.
Non-financial things devs look for when looking for a new job: 58% the language to work with. 50% opportunity to grow. 48% office environment or company culture.
26% want to stay for at least 5 years, 23% at least 2 years. The rest is looking for jobs within 2 years or sooner.
Reasons to leave: 48% bad management. 42% looks for a better salary. The work/life balance is number three.
Reason to stay: work/life balance (62%), people you work with (55%).
You can take the survey for next year’s report here: https://bit.ly/PyGrunn
How do we know if our code works? Perhaps the requirements you were given were unclear. Perhaps there simply is an error in your code. Perhaps your software is used in the wrong way.
There are many sorts of tests. The one she focuses on is unit tests, the one we have the most influence on as programmers.
You can use a traceability matrix where you put every individual part of the requirements (“division by zero results in an error”) in columns. In the rows you mention the tests that verify the specific requirement. Lots of work, perhaps only needed for medical equipment. You’re also restrained by the (in)completeness of the requirements. And it is a manual process…
You can also use coverage.py, which checks how many lines of your code are covered/executed by your tests.
When you run coverage, you should configure it properly, so that it only reports your python files. You don’t want to include the standard library or an external library in your report. Also exclude your test files from the report, as they normally have 100% test coverage and thus inflate your score.
But you should be a bit careful with the coverage report. In python, you can
have statements with “and/or” or “if/else” in them.
a = 20 if b > 4 else
d/c The lines might show up as “covered” in your report even though some
parts of your line weren’t executed. Luckily there’s a
which can mark your complex one-line statements as “only partially tested”.
--cov-report=html to create nicely colored html output that shows
you visually which lines are covered or not.
Satellites. There are almost 3000 satellites orbiting the earth. Some of them have cameras, which are the interesting ones for him. He started Dacom (now CropX), agricultural software. Satellite imagery is interesting for farmers as you can see how well the crops are growing by analyzing the images.
He did a live demo. They have a website with all 800.000 fields in the Netherlands. For every field they have image data. They can show it both as a regular image, but also color-coded for amount of greenery. And of course a nice graph throughout the year.
You can do all sorts of analysis on it. Look at the variation in crop yields within the field, for instance. You might have to use more fertilizer in the low-yield areas. But you also have to use other data sources, like an elevation map.
They started out experimentally with groenmonitor.nl in 2014. In 2015, ESA launched the “Sentinel 2a” satellite (with a twin, “2b”, in 2017). The data is free, part of the EU Copernicus project! They started using the data in 2016.
The images are huge 800MB: for a 100x100km tile. They download the useful images (the ones without too much cloud cover…) and proces them, use filters, do statistics on them, etc. Lots of separate tools. They use python as the glue to tie everything together.
Some of the processing is done by open source projects provided by ESA. Also they used lots of gdal. They had to battle with performance issues. I/O overhead was one of the bigger problems. They started looking at software-as-a-service providers like sentinelhub: yes, that could work well. But they were not sure about the price they would have to pay for their huge datasets.
The EU provides the satellites and the data for free. But they still had the idea that more people could make use of it. So they recently started the “DIAS” initiative. Multiple data datacenters throughout Europe with locally stored raw data and processed data. So you can host your software there without having to worry about huge data traffic bills. Nice!
They build a website with django where they stored all the processed field data. So per date and per field you’d store min/max/mean/etc values. With postgis/geodjango of course for easy geographical handling.
One of the core tools they use is rasterstats, which calculates the min/max/mean stats for raster images. Probably it uses gdal and numpy and so behind the scenes. These statistics are then stored in django, ready for quick retrieval in the user interface.
Wagtail (https://wagtail.io/) is a nice CMS layer on top of django and python. Wagtail is special in that it doesn’t have a build-in user-facing front end. You are invited to build your own that perfectly suits your needs. Wagtail is opinionated in that sense. It does have a nice admin interface for adding content.
Through google summer of code, an earlier idea he had on live blogging was added to wagtail. It is intended for live blogging a sports match, for instance. Lots of small messages.
It can grab input from slack or telegram. Of course, you can also use the admin interface or a REST api.
There is a bit of a workflow mechanism. You perhaps want to format the first line of an incoming message as a title.
There are multiple ways to set up the blog-viewing webpage. Interval polling, long polling, websockets. Websockets are the best option, but it takes more setup effort. Interval polling is the easiest option: just plain http.
Get your house in order first. Before you can start training people, you need to have an onboarding strategy in place. Make sure you’ve got that beforehand. You also need your communication tools to be ready.
Oh, and documentation is important! It is good in general, but there’s a specific advantage during onboarding: if you’ve got it, new developers get an opportunity to be more independent and to look things up for themselves
Let us get started guiding new devs. Psychological safety is important: you need to feel safe. Safe to talk about mistakes, safe to make suggestions.
If you’re the senior: keep in mind your position of power. If you say something, a junior developer might think it is The Law.
What you don’t want: a clone of yourself! It is good for a company to have diversity.
Bring the new developer with you to meetings. And let them speak! (So try to shut up a bit yourself).
Success is a team effort - and so is failure.
New developers need feedback. Compliment in public, negative feedback in private. Oh, and make an effort in general to give compliments. In IT, we’re not very good at communication. Negative comments (“fix this”, “improve this”) in pull requests: yes those come natural to us. But compliments??? We’re not used to it. But they can do so much good, so try to make them more.
Limit contact moments: don’t be a “helicopter parent” that hovers over their “kids”. And stick to 1:1 sessions.
Some nuggets of wisdom:
Treat people like they are going to be around for years. You can ask “what if we train someone and they leave?” but you can also ask “what if we don’t train someone and they stay”?
Happy employees become your billboard.
His slides: http://tonz.nl/foss4g21/
There are new laws being proposed in Europe on data and digitization that will have influence on geodata. The idea: “digital compass & digital rights and principles”.
Digital markets act: if you’re really dominant, this is a law that applies to you.
Digital services act: more or less the same rules as above, but less strict for “regular” companies.
AI regulation: a set of rules that get stricter the more riskfull it is. Riskfull? The way in which it can impact us humans and critical infrastructure, for instance.
Open data directive. Open data gets more broad: musea, utility companies, semi-government. And there will be more mandatory open data that the government has to make available.
Data act. A bit more transparency.
Data government act: this is an important one. Lots of data that could be open cannot be open because it is too sensitive. But it is valuable. Such data might have to be made available anyway, but in a manner that’s privacy-friendly.
Exchanging data: always hard. They propose “data spaces”: specific sectors within which the exchange has to work well. Between sectors, it doesn’t necessarity have to work completely seamless.
The green deal already assumes the EU data strategy as a given. So lots of data that impacts the green deal will need to conform earlier. Like INSPIRE open geodata! Digital twins are also a component here.
What can FOSS4G help with? Federation&interoperability, standards, open tools, cross-border cooperation. We’re quite good at that and we can help shape the new standards.
We’re also involved in many of the green deal-related data spaces: agri, geo, mobility, environment.
Look at this (Dutch) wiki: https://geonovum.github.io/eu_regelingen_datastrategie/ , this can for instance help in figuring out which subsidies are available. And collaborating on data spaces and to collect use cases.
As a hobby, she’s a photographer. And she likes roadtrips through Europe. Last year she made a travel photo book.
She wanted to have maps in her photo book, having a geo background. She did have to convince the book company that maps belonged in an art-like photo book. “Yes, I know how to do it and they will be beautiful.”
But open data also helped with planning the trips. You need to find nice places. And how about the sunlight? When do you have to be in a certain place? Does a road run in the valley or does it run higher up the hill so that you can make a beautiful photo? What’s the road surface like (and what is the influence on the estimated speed/planning)?
OpenTopoMap: a very nice map rendering for looking at the topography. You get a good impression of the landscape. Are there mountains? It is interesting at many zoomlevels.
Google maps with the satellite photos (not open data, though, but useful for finding some items).
Openstreetmap + qgis was what she used for creating the maps for the book.
His talk is about their (webmapper) experiences with vector tiles.
They started with map based story telling. You highlight individual locations, you add textual explanation and perhaps graphs. When moving and rotating between the locations, it can be very nice to have a “2.5D” view. For instance they made a map that showed the buildings in the Netherlands, with the correct height, color-coded by height category.
The advantage of vector tiles here is the speed and ease of animation. And that you can “stretch” the
Another nice example is https://kaart.75jaarvrij.nl , an interactive map of the liberation of the Netherlands in 1944/1945. A date picker allows you to go through time. Military unit locations, food droppings, liberated areas: everything is done vector-based. A big advantage is that you only have to download small amounts of data. Interactiveness
A new project is https://cartiqo.nl, a vector map with various Dutch open datasets. They host it via https://www.maptiler.com/, a Swiss company. Vector tiles give them lots of flexibility with changing the visualisation of the map: color-coding according to different criteria, for instance. If you select a different parameter to highlight on, you just change the visualisation of the already-downloaded geo data instead of downloading the same data as raster images again.
So: speed, flexibility, ease of use!
He works at the RIVM, well-known in the Netherlands for their recent health-related work. But they do a lot more, for instance air quality and environmental noise. Noise can negative effects on your health. Road noise, wind turbines, heavy trains, planes, industrial noise.
Noise: they look at it in three ways: source, transmission and recipient. For recipients there’s the public building data (houses, hospitals). For sources, there are also good datasets: the public road network database, railway maps, etc.
Roads and railways are linear. You also need to look at the kind of traffic, the intensities, the kind of paving material or kind of railway sleepers, etc.
Transmission of the noise: there is a lot to take into account. Are their other buildings in the neighbourhood that can reflect noise? What’s the ground use like? A concrete paving lot transmits sound pretty good, grassland dampens it a bit. Are there woods in-between? How does the height map look? Is there line-of-sight between the source and the recipient?
Lots of calculations. And you need to do it multiple times, as they take into account various frequency bands. Low frequencies behave in a different way to high frequencies regarding distance traveled and way of deflection.
A major end result are contour maps with noise levels for the whole of the Netherlands. Since a while, there’s even a 3D model.
The Dutch province of Zeeland has a lot of use for 3D data. Everything that’s inside the ground (cables and so). The border between
They have an IoT (internet of things) infrastructure based on TTN (The Things Network). One of the IoT users are “multiflexmeters”: open source arduino-based groundwater level meters.
They try to use as much open source as possible. Not everything is possible yet, sadly. But when they can switch, they will.
Digital twin: you try go have as good a digital representation of reality as possible. So that you don’t need to go to some piece of equipment, for instance, when it tells you it needs maintenance: you don’t have to visit it, you can simply trust it.
Digital twin: they way the dunes and the beach change through time. If you present it as 3d, including visualizing differences between the stages, it starts to “live” for you. Interaction is easier and it is easier to work with.
They use “cesium terrain builder + cesium terrain server” to convert 2.5D data to 3D. The cesium stuff is hard to install, but with docker it gets easier (there’ll be a talk about it later). This way they’re able to visualize 3D in a reasonably performant manner.
Amélie works at the UNESCO International Institute for Educational Planning. They explicitly use open source geo. The official title of her presentation: Trust and transparency to address global challenges.
Educational planning? Planning schools, determining how many teachers you need, etc. So education management and policy on various levels. All within the Unesco organisation. There is a lot you can do nowadays with GIS. Nice maps of travel times to school. Distribution of children of school-going age.
A big challenge was how to train people to use all these possibilities. Buying some commercial package and training one person at an education ministry doesn’t cut it. The person might leave in a few years. A better way is to use open source software and open data. You can train more and everyone can train and introduce their colleages.
Self-help, mutual help, going open: it means taking risks. Letting go. Losing a bit of control. Taking yourself out of the normal relationships you have. Foss4g in a sense is an act of bravery as it puts them outside the way everything works normally.
As government, you need transparency, as you need trust. Will people trust your reports? Trust your data? If you can show in detail how you got the data or what your conclusions are based on: that helps in gaining trust.
Really going open means more than just installing qgis. It also means requiring open access literature so that people can also read the research you base your data on, for instance.
Statistics: charts of posts per year and per month.
My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):