Reinout van Rees’ weblog

Utrecht (NL) python meetup september 2018


Tags: python

Data processing using parser combinators - Werner de Groot

He collaborated with data scientists from Wageningen University. The scientists did lots of cool programming stuff. But they did not use version control, so they introduced git :-) They also had lots and lots of data, so they introduced Apache Spark.

Their data sets were in ascii files, which are huge. So the ascii files need to be parsed. He showed an example of a file with DNA data. Ouch, it turns out to be pretty complex because there are quite some exceptions. Fields (in this example) are separated by semicolons. But some of the values also contain semicolons (quoted, though). So the generic python code they used to parse their DNA data was littered with “if” statements. Unmaintainable.

You probably heard that hard problems need to be split up into smaller problems. Yes, that’s true. But the smaller parts also need to be combined again. So: smaller parsers + a combining them again.

A parser takes an input and returns the part that matched and the part that remained.

He showed parsy as a library for “parser combinators”. Many languages have such libraries. He demonstrated how to combine those input/match/remainder parsers into a kind of pipeline/sequence. Such a sequence of parsers can be treated as a parser in its own right. This makes designing and nesting them easy.

When combining parsers, you of course need to handle variants: an “or” operator handles that.

Someone asked about “yacc” parsers, which are able to handle robustly handle each and every corner case, “how does it compare to the simpler ‘parsy’”. The answer: parsy is a simple library, there are more elaborate python libraries. But: parsy is already quite good. A json parser written in parsy takes only 50 lines!

He did a live demo, developing a set of combined parser step by step. Fun! And very understandable. So “parsy” sounds like a nice library for this kind of work.

There were some comparison-questions to regular expressions. Werner’s answer was that parsy’s kind of parsers are much more readable and debuggable. He was surprised at the amount of attendees that like regular expressions :-)

The nice thing: every individual part of your parser (“just some numbers”, “an equals sign”) is a piece of python, so you can give it a name. This way, you can give those pieces of parser names from your domain like dnaName, type, customer).

In the end, he live-coded the whole DNA ascii file parser. Quite boring. And that was his whole point: what would be hard or impossible to do in plain python becomes “just boring” with parsy. Exactly what we want!

A practical application of Python metaclasses - Jan-Hein Bührman

(See an earlier summary about metaclasses being used in django)

Apart from metaclasses, he showed some utilities that he likes: pipenv, pylint, mypy.

A nice touch to his presentation: he had his example code all in separate branches. Instead of live coding, he just switched branches all the time. Because he gave his branches clear names, it worked quite well!

The example he build up is impossible to summarize here. The example included a register function that he had to call on certain classes. He didn’t like it. That’s where metaclasses come in.

Python objects are instances of classes. Classes themselves are instances of type. You can create classes programmatically by doing something like:

>>> B = type('B', (), {})
>>> b = B()
>>> type(b)
<class 'B'>

Normally, when python imports a module (= reads a python file), class statements are executed and the class is created. You can influence that process by adding a __new__ method.

  • __init__() influences the creation of objects from the class (= instantiating the object from the class).
  • __new__() influences the creation of the class (= instantiating the class from type).

He used it to automatically register objects created from classes with the metaclass.

Note: in python 3.6, __init_subclass__() was added that really makes this much easier.

Plain text pypi description formatting: possible cause


Tags: python

Sometimes projects have a plaintext description on You see the restructuredtext formatting, but it isn’t applied. See my own z3c.dependencychecker 2.4.2 for example.

I use an editor with restructuredtext syntax highlighting. I double-checked everything. I used docutils. I used pypi’s own readme_renderer to check it. Not a single error.

But still, after some tries, the formatting was still off. Then a fellow contributor adjusted one of the setup() keywords in our something might have to be a string (as in another package, with a perfectly-rendering description) instead of a list (as we had). Sadly, no luck.

But it made me investigate the other keyword arguments. I noticed, next to long_description, the description. This is normally a one-line description. In our case, it was a multi-line string:

long_description = u'\n\n'.join([ ... some files .... ])

description = """
Checks which imports are done and compares them to what's
in and warns when discovering missing or unneeded


I changed it to a single line and yes, the next release worked fine on pypi!

(Note that I also toyed a bit with encodings in that pull request, but I don’t think that had any influence).

So: take a look at the description field if your project also has rendering problems (assuming that your long_description is fine).

Ansible provision/deploy setup


Tags: python, django

I never got around to write down the ansible setup I figured out (together with others, of course, at my previous job) for deploying/provisioning django websites.

The whole setup (as a cookiecutter template) can be found in . The relevant code is in the ansible/ directory. Note: this is a “cookiecutter template” from which you can generate a project so you’ll see some {{ }} and {% %}: when you create the actual project these items will be filled in.

My goal was to keep the setup simple and safe. With “safe”, I mean that you normally cannot accidentally do something wrong.

And it was intended for getting one website project onto the server(s). It is not a huge ansible setup to set up the whole infra in one fell swoop.

First the inventory files:

  • Yes, multiple: there are two of them. production_inventory and staging_inventory. The safe aspect is that we used to have a single inventory file with [production] and [staging] headings in it. If you wanted to update staging, you had to add --limit staging on the command line. If you forgot that…

    With two separate files, you never have this problem. Accidents are much less likely to occur.

  • The simple aspect is that variables are right there in the inventory. It are only a few variables after all: servername, hostname, checkout (master or tag).

    A “problem” with ansible is that you can place variables in lots of places (playbook, inventory, host variable file, group variable file and another two or three I don’t remember right away). So where to look? I figured that the inventory was the right place for this kind of simple setup:


Second, the ansible playbooks:

  • There are two. A provision.yml and a deploy.yml. Provisioning is for one-time setup (well, you probably want to change stuff sometimes, but you’ll rarely run this one). Deploy is for the regular deploys of the actual code. The stuff you regularly do.

  • Provision should be run as a user that can do sudo su on the target machine. Provisioning installs packages and adds the nginx config file in /etc/. And it creates a user for running/deploying the django site (in my previous job, this user was historically called buildout).

  • The deploy playbook connects as the abovementioned deploy user, does a “git pull” (or whatever deploy mechanism you prefer), runs the database migration and so.

  • Now, how can the person who deploys connect as that deploy user? Well, the provision playbook creates the deploy user (“buildout”), disables the password and adds the public ssh keys of the deployers to the /home/buildout/.ssh/authorized_keys file:

    - name: Add user "buildout" and set an unusable password.
       user: name=buildout password='*' state=present shell="/bin/bash"
     - name: Add maintainers' ssh keys so they can log in as user buildout.
       authorized_key: user=buildout key={{ item}}.keys
         - reinout
         - another_colleague

    It is simple because you only need a very limited number of people on any server with sudo rights. Or a very limited number of people with the password of a generic admin account. Re-provisioning is basically only needed if something changed in the nginx config file. In practice: hardly ever.

    It is simple because you don’t need to give the deployers each a separate user account (or access to a password): their public ssh key is enough.

    It safe because it limits the (root-level) mistakes you can do during regular deploys. And the small amount of users you need on your system is also an advantage.

  • The ansible playbooks are short. Just a couple of tasks. Even though I’m normally all in favour of generic libraries and so: for a 65 line playbook I personally don’t need the mental overhead of one or two generic ansible roles.

    The playbooks do basically the same thing I did years earlier with “Fabric” scripts. So: the basic structure has quite proven itself (for me).

There are all sorts of approaches you can take. Automatic deploys from your Jenkins or Gitlab, for instance. That’d be something I like to do once. For not-automatically-deployed projects, a simple setup such as I showed here has much to recommend itself.

Djangocon: an ode to OAuth - Akos Hochrein


Tags: djangocon, django

(One of my summaries of a talk at the 2018 European djangocon.)

A show of hands. Who use a password manager? 90% of the hands went up. Who uses at least two? Some 40%.

Passwords are irritating. There have been initiatives to “outsource” passwords. Openid, oauth, oauth2.0.

On to OAuth2.0. It started at twitter. They started with OpenID, but that only handled login, not access to resources. In the end, oauth2.0 came out.

(Note: he said “openid connect”, but that’s build on oauth2.0, so he must have meant plain “openid” if I’m correct. But it might mean that I’m not totally correct in this summary, or I heard it incorrectly).

There are multiple ways to work with oauth2.0. He showed the “authorisation code grant”. I can’t visualize his diagram here, look at the video for that.

There are some terms:

  • Resource owner.
  • Client: you, behind your browser.
  • Authentication server: this is where you will log in (“log in with facebook/google/etc”) and where the
  • Resource server: this is where the data is.

Akos works at prezi. The backend is actually a django site. But there were many customizations to auth, sessions and user objects. At one point, they wanted to make it easier for users to log in. So: social login.

They had those customizations, so they forked django-social-auth somewhere in 2011 and had to maintain their fork ever since.

In 2017 they wanted to get rid of the old stuff for a new kind of login. They didn’t want to fork yet another project. And actually, they wanted to get rid of their current forks.

Then they discovered A perfect set of building stones to hang their own customizations in.

There are three possibilities when logging in with social auth:

  • Regular login of a user that logged in before. This should be a smooth and simple as possible.
  • Signup. The user doesn’t exist yet.
  • Associate. The email adress of the user that logs in via social auth somehow already exists as an older non-social-auth user. So you need to send an email whether it is ok to combine them.

The presentation is online at

Photo explanation: station signs on the way from Utrecht (NL) to Heidelberg (DE).

Djangocon: want more woman in tech? Start with Django Girls - Sara Heins


Tags: djangocon, django

(One of my summaries of a talk at the 2018 European djangocon.)

She lives in Kansas, USA. Kansas is great for women in tech, there are many initiatives. The one she is most enthousiastic about is django girls.

“We are raising our girls to be perfect and we’re raising our boys to be brave”. If a woman applies for a job, she fulfills 100% of the demands, if a man applies, he fullfills on average 60%…

Why does programming make girls brave? You are going to mistakes and you’re going to fix them. How do you get more girls to try it? One of the best ways is to organize a django girls workshop.

There are some qualifications you don’t need.

  • You don’t need to be perfect. You only have to be brave :-)
  • You don’t need to be a savvy event planner. There’s lots of info on github. There are pre-made websites. Lots of resources.
  • You don’t need to have a large budget.

Qualifications you do need:

  • You need to be able to solve problems.
  • You need to be open to learning new things.
  • You need to want to constantly improve. You’ll be busy with it for half a year and you are also going to get some negative feedback: learn from it.

There are some stages of organizing it:

  • Planning. You need a team to run it.

    Attract sponsors. Use your network for this. Polish your sponsorship “ask”: ask it as soon and directly as possible in your email. Include statistics and bold items to make them stand out. And… personalize it!

  • Recruitment. Mentors. They don’t need to be django or python experts, it is enough if they understand basic programming concepts.

    Mentors can be any gender!

    Aim for a mix of junior and senior programmers.

    And…. you need to find attendees! Press releases, blog posts, local news, schools, meetups/events, flyers.

  • Accept/reject attendees.

  • Final stretch. Last minute worry things…

  • The day! The friday evening before, an install-and-pizza party. That way you can get started with the fun stuff on saturday morning. On saturday the actual day. Breakfast, presentation, tutorials, lunch, tutorial session again. Followed by a mentor appreciation party.

  • After the event. Retrospective, see what can be done better, see what worked well.

Do you want to do it?

  • Find a venue (great wifi is a must) and coordinate a date.
  • Fill in the short form at

Do it! If you don’t do it, nobody in your city might.

Photo explanation: station signs on the way from Utrecht (NL) to Heidelberg (DE).

Djangocon: banking with django, how not to lose your customers’ money - Anssi Kääriänen


Tags: djangocon, django

(One of my summaries of a talk at the 2018 European djangocon.)

He works for a company ( that offers business banking services to microentrepreneurs in a couple of countries. Payment accounts (online payments, prepaid business mastercard, etc). Business tools (invoices, online shop, bookkeeping, etc).

Technically it is nothing really special. Django, django rest framework, postgres, celery+redis, angular, native mobile apps. It runs on Amazon.

Django was a good choice. The ecosystem is big: for anything that you want to do, there seems to be a library. Important for them: django is very reliable.

Now: how not to lose your customers’ money.

  • Option 1: reliable payments.
  • Option 2: take the losses (and thus reimburse your customer).

If you’re just starting up, you might have a reliability of 99.9%. With, say, 100 payments per day and 2 messages per payment, that’s 1 error case per day. You can handle that by hand just fine.

If you grow to 10.000 messages/day and 99.99% reliability, you have 5 cases per day. You now need one or two persons just for handling the error cases. That’s not effective.

Their system is mostly build around messaging. How do you make messages reliable?

  • The original system records the message in the local database inside a single transaction.

    In messaging, it is terribly hard to debug problems if you’re not sure whether a message was send or what the contents were. Storing a copy locally helps a lot.

  • On commit, the message is send.

  • If the initial send fails: retry.

  • The receiving side has to deduplicate, as it might get messages double.

You can also use an inbox/outbox model.

  • Abstract messages to Inbox and Outbox django models.
  • The outbox on the origin system stores the messages.
  • Send on_commit and with retry.
  • Receiving side stores the messages in the Inbox.
  • There’s a unique constraint that makes sure only unique messages are in the Inbox.
  • There’s a “reconcialation” task that regularly compares the Outbox and the Inbox, to see if they have the same contents.

For transport between inbox and outbox, they use kafka, which can send to multiple inboxes.

There are other reliability considerations:

  • Use testing and reviews.
  • If there’s a failure: react quickly. This is very important from the customer’s point of view.
  • Fix the original reason, the core reason. Ask and ask and ask. If you clicked on a wrong button, ask why you clicked on the wrong button. Is the UI illogical, for instance?
  • Constantly monitor and constantly run the reconciliation. This way, you get instant feedback if something is broken.

Photo explanation: station signs on the way from Utrecht (NL) to Heidelberg (DE).

Djangocon: Graphql in python and django - Patrick Arminio


Tags: djangocon, django, python

(One of my summaries of a talk at the 2018 European djangocon.)

For APIs, REST is the normal way. But REST is not perfect.

You can, for instance, have too many requests. If you request a user (/users/1) and the user has a list of friends, you have to grab the user page of all those friends also. You could make a special endpoint where you get the names of the friends, but can end up with many endpoints (/users-with-friends/1, /users-with-friends-and-images/1). Or with very big responses that contain everything you might need.

Graphql was created to solve some of these issues. You have a single /graphql endpoint, which you POST to. You post the data structure that you want to get back. There’s the option of adding types. So you’re not bound to pre-defined REST responses, but you can tell exactly how much or how few you need and in what form.

Almost every graphql instance has introspection enabled. You can discover the API that way, including which data types to expect.

In python, you can use the graphene library. From the same authors, there’s graphene-django.

There is also integration for django REST framework in graphene-django. Quite useful when you already have all of your serializers.

For trying out a graphql API, is a handy in-browser IDE to “play” with it.

(He demoed it: looked nice and useful.)

What about security/authentication? Standard session based authentication. Or you can use an authentication header.

What about malicious queries? You could get big exploding responses by following a foreignkey relation back and forth (author->posts->authors->posts etc).

In the end, graphql is quite handy, especially when you’re working with many developers. With REST, you’d have just finished one response when the UI people were already clamoring for other, different responses. That problem is gone with graphql.

Photo explanation: station signs on the way from Utrecht (NL) to Heidelberg (DE).

Djangocon: survival tricks and tools for remote developers - Alessio Bragadini


Tags: djangocon, django, python

(One of my summaries of a talk at the 2018 European djangocon.)

He works in a company that has many remote workers. He is one of them. The core question for him: “how do I manage to work remotely in an effective way without much stress”.

There is a difference between a remote-friendly company and a remote-first company. Remote-friendly is a company that still has an office. But you’re allowed to work from home and you don’t have strict work hours. Remote-first changes the entire structure/culture.

Can agile help?

  • Test driven development. First you make the tests. That’s handy for remote teams. It sets strict boundaries where you can take over the work in a way that you do not have when sitting behind the same keyboard.
  • No code ownership. Anybody can work on everything all the time.
  • Shared “visual backlog” (boards and so).

But… “agile” also says that teams that work face-to-face are more efficient in conveying information. But note that the agile manifesto is many years old now.

Face-to-face means proximity, but also truthfulness. So: no documents that can mean anything, but truthful conversation. Eh: we are now used to slack, skype, whatsapp. This is 99% of what face-to-face means. (You still miss body language, though, and the pleasure to be near to each other).

And, what is information? Discussion about the project, about code or design. Information about what moves forward: commits, tasks. Info about what moves backwards: bugs, regressions. All these things can be done online. Some of these can even be done better online.

The more you use these online communication channels, the more you become remote-first. Being in the office is almost accidental. The online channels become stronger if you have your machines post feedback there (“failed test!”). Perhaps even automate tasks that you can start via messages in your channel…

You need a shared repository that is accessible everywhere. A channel to communicate on. Automatic testing. CI. Etc.

Some comments:

  • There are some agile “ceremonies” like a daily standup and a sprint review. Do that, but online.
  • Explain what you’re going to do and what you’ve done. Don’t work in an invisible way.
  • Establish “work hours” even if you are not in a proper office. This is perhaps counter-intuitive to working remotely.
  • Important: keep the chat hannel open during work hours.
  • Do meet face-to-face from time to time.
  • Learn from companies that do remote-first: automattic, balsamiq.

Some tools they use:

  • Test driven development (unittests, selenium).
  • Infrastructure as code (VMs, docker).
  • In-house Gitlab as their git repository and project center.
  • Continuous integrations (with pipelines on gitlab). Due to the automated pipelines, no one has to do those tasks. Otherwise you often have a single person that has to do those kinds of tasks so that he feels a bit separated from the rest. Automate it away instead so that everybody can code.
  • Slack channel with integrations with gitlab and sentry. (via a Chatbot)
  • Gitlab boards, some trello boards.
  • Skype for “agile ceremonies” including the daily standup.
  • Google docs.

(He mentioned an article by Lee Bryant about slack, I guess this is it, but I’m not sure).

Update: here is the correct link

Photo explanation: station signs on the way from Utrecht (NL) to Heidelberg (DE).

Djangocon: an intro to docker for djangonauts - Lacey Williams Henschel


Tags: djangocon, django, python

(One of my summaries of a talk at the 2018 European djangocon.)


  • Nice: it separates dependencies.
  • It shares your OS (so less weight than a VM).
  • It puts all memmbers on the same page. Everything is defined to the last detail.

But: there is a pretty steep learning curve.

Docker is like the polyjuice potion from Harry Potter. You mix the potion, add a hair of the other person, and you suddenly look exactly like that other person.

  • The (docker) image is the person you want to turn into.
  • The (docker) container, that is you.
  • The Dockerfile, that is the hair. The DNA that tells exactly what you want it to look like. (She showed how the format looked).
  • docker build actually brews the potion. It builds the image according to the instructions in the Dockerfile.

Ok. Which images do I have? Image Revelio!: docker images. Same with continens revelio: docker container ls.

From that command, you can grap the ID of your running container. If you want to poke around in that running container, you can do docker exec -it THE_ID bash

Stop it? Stupefy! docker stop THE_ID. But that’s just pause. If it is Avada kedavra! you want: docker kill THE_ID.

Very handy: docker-compose. It comes with docker on the mac, for other systems it is an extra download. You have one config file with which you can start multiple containers. It is like Hermione’s magic bag. One small file and you can have it start lots of things.

It is especially handy when you want to talk to, for instance, a postgres database. With two lines in your docker-compose.yml, you have a running postgres in your project. Or an extra celery task server.

Starting up the whole project is easier than with just plain docker: docker-compose up! Running a command in one of the containers is also handier.

The examples are at

Photo explanation: station signs on the way from Utrecht (NL) to Heidelberg (DE).

Djangocon keynote: the naïve programmer - Daniele Procida


Tags: djangocon, django, python

(One of my summaries of a talk at the 2018 European djangocon.)

The naïve programmer is not “the bad programmer” or so. He is just not so sophisticated. Naïve programmers are everywhere. Almost all programmers wish they could be better. Whether you’re

Programming is a craft/art/skill. That’s our starting point. Art can be measured against human valuation. In the practical arts/crafts, you can be measured against the world (if your bridge collapses, for instance).

Is your craft something you do all the time, like landing a plane? Or are you, as programmer, more in the creative arts: you face the blank canvas all the time (an empty

In this talk, we won’t rate along the single axis “worse - better”. There are more axes. “Technique - inept”, “creative - dull”, “judgment - uncritical” and sophistication - naïve. It is the last one that we deal with.

What does it mean to be a sophisticated programmer? To be a real master of your craft? They are versatile and powerful. They draw connections. They work with concepts and ideas (sometimes coming from other fields) to think about and to explain the problems they have to solve.

The naïve programmer will write, perhaps, badly structured programs.

But… the programs exist. They do get build. What should we make of this?

He showed an example of some small-town USA photographer (Mike Disfarmer). He worked on his own with old tools. He had no contacts with other photographers. Just someone making photo portraits. Years after his death his photos were discovered: beautifully composed, beautifully lighted (though a single skylight…).

Software development is a profession. So we pay attention to tools and practices. Rightfully so. But not everyone is a professional developer.

Not everyone has to be a professional programmer. It is OK if someone learns django for the first time and builds something useful. Even if there are no unit tests. Likewise a researcher that writes a horrid little program that automates something for him. Are we allowed to judge that?

He talked a bit about mucisians. Most of them sophisticated and very good musicians. But some of them also used naïvity. Swapping instruments, for instance. Then you make more mistakes and you play more simply. Perhaps you discover new things that way. Perhaps you finally manage to get out of a rut you’re in.

Some closing thoughts:

  • Would you rather be a naïve programmer with a vision or a sophisticated programmer without?
  • If you want to be a professional developer, you should try to become more sophisticated. That is part of the craft.
  • If you’re naïve and you produce working code: there’s nothing wrong with being proud of it.
  • As a sophisticated programmer: look at what the naïve programmer produces. Is there anything good in it? (He earlier showed bad work of a naïve French painter; his work was loved by Picasso.)

Suggestion: watch the keynote on youtube, this is the kind of talk you should see instead of read :-)

Photo explanation: station signs on the way from Utrecht (NL) to Heidelberg (DE). logo

About me

My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):