He searched for python meetups in Rotterdam, but didn’t find any. So he clicked on the “do you want to start one?” button.
Today’s meetup is at Van Oord (a big family owned marine contractor), but anywhere else in Rotterdam is fine. And you can help out. Everything is fine, from talks to workshops, as long as it is python-related.
The goal is to have a few meetups per year.
Joost is a data engineer. He uses python daily, mostly jupyter notebooks. Jupyter notebooks are ideal for getting to know python, btw.
Jupyter lab is the next iteration of notebooks. It is in beta, but usable.
Normally, you still work in notebooks, but you can also start a regular python prompt or a terminal. It starts to look like an IDE: file browser, generic editor, etc. And very nice for data analysis, for instance with a fast CSV browser.
He demoed the markdown integration. A source file with an (live updating) output window side by side. And a terminal window at the bottom.
Very fancy: there is an extension that works with google docs. So you can collaboratively work on the same document (via google docs). And everything just works. He demoed it with a colleague: fancy!
Erna gave a nice presentation on geospatial data, using trash containers in Rotterdam as an example. She uses geopandas, a spatial add-on to pandas (=timeseries).
With geopandas, it is easy to read geospatial datasets, like all the administrative areas in the Netherlands. You can display them directly in the jupyter notebook (or rather jupyter lab, as she’s now using that on a day-to-day basis).
Nice features: you can easily give all areas a random color so that the areas are better recognizable. But of course you can also do geo calculations like “how big is the area”.
The trash container info was in an excel file. Geopandas could read that one without problems, too. The coordinates were regular numbers in two separate columns. Of course python can convert that to proper coordinates. Also a transformation from one projection to another wasn’t a problem.
In the end, she could color-code the areas depending on the number of trash containers (relative to the number of people living in the area).
Bas works on Python adaptive, “a tool for adaptive and parallel evaluation of functions”. He’s a PhD student that works on something fancy quantum-computer related. “I use half a year of CPU time per day for my research”.
With such a huge amount of calculations, it makes sense to optimize it. “Adaptive” is a strategy he uses: figure out where more calculation is useful by sampling a function. Sampling is to calculate a couple of points and to try to detect where the biggest changes happen and concentrate on those areas.
That’s where he wrote python-adaptive for: a library that handles it generally.
It works based on a “Learner” object that takes the function to learn and the bounds. With three methods, you can get it to learn and improve and add new points and re-evaluate etc etc.
He showed some sample code. A problem with it was that, while running, it blocks the CPU. And it only uses one thread. For that, there’s a separate runner that can run the calculation on multiple computer cores at the same time.
He also demoed it with a 2D figure: great. The difference with a non-adaptive (homogeneous) figure was striking.
How does the multiprocessor stuff work? Simply
import ProcessPoolExecutor, from the standard library. It even works on the
university’s supercomputer. Standard python!
They do lots of calculations on ship movements. How does a ship move in waves. How when there’s a crane with a heavy load on the deck?
He showed a short movie about an offshore windmill farm being build. Those ships and cranes and wind turbines are absolutely huge. Not exactly the same movie, but this one is pretty similar. It is the same vessel he used in his talk.
When lifting the big yellow pipes, there are various variables you have to take into account. Such a pipe musn’t descent too fast. And not move too fast in the horizontal plane. And the stresses on the cables keeping the pipe in place shouldn’t get too big.
With jupyter notebook, he showed the effect of some alternative rigging configurations they were researching: with a few extra lines, they could improve the resistance to interference quite effectively. The actual calculation happens in a specific program, but they can interface with it with python.
With python they also made graphs (based on weather forecasts) for the installation vessels with info on whether they can actually do the work, based on the time of day and the heading of the vessel. Such a simple graph can then easily be used on the vessel.
He used run length encode as an example. You take a sequence (‘a a a c b b`) and transform it to ‘3xa, 1xc, 2xb’. Often you can get quite a good compression out of this with the right kind of data.
He showed a couple of implementations, starting with a pure python one. Next came a numpy implementation. Drawback: the numpy code is unreadable. Advantage: it is (in his example) 34 times faster than the pure python code.
Next: compile the pure python code with cython. It was a 50% improvement over regular python. But when you specify data types, cython can do a much better job: it even beats numpy with a factor of 2.
Dirty, dirty, dirty: you can tell cython to switch off lots of python safety features (for memory allocation and so).... Another factor of 8. In the end: 18x faster than numpy and 628x faster than the original python code.
Another go: numba, just in time compilation via a decorator. Twice at fast as numpy. Almost as fast as most of the cython efforts, but with only a simple decorator.
My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):