Reinout van Rees’ weblog

Django meetup Amsterdam 18 May 2016

2016-05-23

Tags: django

Summary of the Django meetup organized at crunchr in Amsterdam, the Netherlands.

(I gave a talk on the django admin, which I of course don’t have a summary of, yet, though my brother made a summary of an almost-identical talk I did the friday before)

Reducing boilerplate with class-based views - Priy Werry

A view can be more than just a function. They can also be class based, django has quite a lot of them. For example the TemplateView that is very quick for rendering a template. Boilerplate reduction.

Django REST framework is a good example of class based views usage. It really helps you to reduce the number of boring boilerplate and concentrate on your actual code.

Examples of possible boilerplate code:

  • Parameter validation.
  • Pagination.
  • Ordering.
  • Serialisation.

They wanted to handle this a bit like django’s middleware mechanism, but then view-specific. So they wrote a base class that performed most of the boilerplate steps. So the actual views could be fairly simple.

It also helps with unit testing: normally you’d have to test all the corner cases in all your views, now you only have to test your base class for that.

Custom base classes also often means you have methods that you might re-define in subclasses to get extra functionality. In those cases make sure you call the parent class’s original method (when needed).

Users of your views (especially when it is an API) get the advantage of having a more consistent API. It is automatically closer to the specification. The API should also be easier to read.

Meetups on steroids - Bob Aalsma

“Can I get a consultant for this specific subject?” Some managers find it difficult to allow this financially. A special course with a deep-dive is easier to allow.

He would like to be a kind of a broker between students and teachers to arrange it: “Meetups on steroids: pick the subject - any subject; pick the date - any date; pick the group - any group”

Security in Django - Joery van der Zwart

Joery comes out of the security world. He doesn’t know anything from the inside of django, but a lot of the outside of django. He’s tested a lot of them.

Security is as strong as its weekest link. People are often the weakest link. Django doesn’t protect you if you explicitly circumvent its security mechanisms as a programmer.

Django actually protects you a lot!

A good thing is to look at the OWASP list of top 10 errors. (See also Florian Apolloner’s talk at ‘Django under the Hood’ 2015, for instance).

  • SQL injection. Protection is integrated in django. But watch out when doing raw sql queries, because they are really raw and unprotected. If you work through the model layer, you’re safe.
  • Authentication and sessions. Django’s SessionSecurityMiddleware is quite good. He has some comments on authentication, though, so he advices to do that one yourself. (Note: the local core committer looked quite suspicious as this was the first he heard about it. Apparently there are a number of CVEs that are unfixed in Django. Joery will share the numbers.)
  • XSS injection. User-fillable fields that aren’t escaped. Django by default... yes, protects you against this. Unless you use {% autoescape off %}, so don’t do that.
  • Direct object reference. He doesn’t agree with this point. So ignore it.
  • Security misconfiguration. Basically common sense. Don’t have DEBUG = True on in your production site. Django’s output is very detailed and thus very useful for anyone breaking into your site.
  • Sensitive data. Enable https. Django doesn’t enforce it. But use https. Extra incentive: google lowers the page ranking for non-https sites...
  • Access control. It is very very very hard to get into Django this way. He says django is one of the few systems to fix it this rigidly!
  • CSRF. Django protects you. Unless you explicitly use @csfr_exempt...
  • Known vulnerabilities. Update django! Because there have been fixes in django. Older versions are thus broken.
  • Insecure forwards/redirects. Once you’ve enabled django’s default middleware, you’re secure.

So Django is quite secure, but you are not.

Look at django’s security documentation. And look at https://www.ponycheckup.com. You can check your site with it. The good is that it is simple. It only checks django itself, though.

With some other tools (like nessus) you have to watch out for false positives, though. So if you don’t know to interpret the result, you’ll be scared shitless.

A good one: Qualys SSLlabs https checker to get your ssl certificate completely right. (Note: see my blog post fixing ssl certificate chains for some extra background.)

“Owasp zap”: open source tool that combines checker and reduces the number of false positives.

The summary:

  • Good: django with middleware.
  • Good: django provides a good starting point.
  • Bad: experimenting. Be very sure you’re doing it right. Look at documentation.
  • Bad: do it yourself. Most of the times.

Django girls Amsterdam

On 25 june there’ll be a django girls workshop in Amsterdam. Everything’s set, but they do still need coaches.

Pygrunn keynote: the future of programming - Steven Pemberton

2016-05-13

Tags: python, pygrunn

(One of my summaries of the one-day 2016 PyGrunn conference).

Steven Pemberton (https://en.wikipedia.org/wiki/Steven_Pemberton) is one of the developers of ABC, a predecessor of python.

He’s a researcher at CWI in Amsterdam. It was the first non-military internet site in Europe in 1988 when the whole of Europe was still connected to the USA with a 64kb link.

When designing ABC they were considered completely crazy because it was an interpreted language. Computers were slow at that time. But they knew about Moore’s law. Computers would become much faster.

At that time computers were very, very expensive. Programmers were basically free. Now it is the other way. Computers are basically free and programmers are very expensive. So, at that time, in the 1950s, programming languages were designed around the needs of the computer, not the programmer.

Moore’s law is still going strong. Despite many articles claiming its imminent demise. He heard the first one in 1977. Steven showed a graph of his own computers. It fits.

On modern laptops, the CPU is hardly doing anything most of the time. So why use programming languages optimized for giving the CPU a rest?

There’s another cost. The more lines a program has, the more bugs there are in it. But it is not a linear relationship. More like lines ^ 1.5. So a program with 10x more lines probably has 30x more bugs.

Steven thinks the future of programming is in declarative programming instead of in procedural programming. Declarative code describes what you want to achieve and not how you want to achieve it. It is much shorter.

Procedural code would have specified everything in detail. He showed a code example of 1000 lines. And a declarative one of 15 lines. Wow.

He also showed an example with xforms, which is declarative. Projects that use it regularly report a factor of 10 in savings compared to more traditional methods. He mentioned a couple of examples.

Steven doesn’t necessarily want us all to jump on Xforms. It might not fit with our usecases. But he does want us to understand that declarative languages are the way to go. The approach has been proven.

In response to a question he compared it to the difference between roman numerals and arabic numerals and the speed difference in using them.

(The sheets will be up on http://homepages.cwi.nl/~steven/Talks/2016/05-13-pygrunn/ later).

Pygrunn keynote: Morepath under the hood - Martijn Faassen

2016-05-13

Tags: python, pygrunn, zope, django

(One of my summaries of the one-day 2016 PyGrunn conference).

Martijn Faassen is well-known from lxml, zope, grok. Europython, Zope foundation. And he’s written Morepath, a python web framework.

Three subjects in this talk:

  • Morepath implementation details.
  • History of concepts in web frameworks
  • Creativity in software development.

Morepath implementation details. A framework with super powers (“it was the last to escape from the exploding planet Zope”)

Traversal. In the 1990’s you’d have filesystem traversal. example.com/addresses/faassen would map to a file /webroot/addresses/faassen.

In zope2 (1998) you had “traversal through an object tree. So root['addresses']['faassen'] in python. The advantage is that it is all python. The drawback is that every object needs to know how to render itself for the web. It is an example of creativity: how do we map filesystem traversal to objects?.

In zope3 (2001) the goal was the zope2 object traversal, but with objects that don’t need to know how to handle the web. A way of working called “component architecture” was invented to add traversal-capabilities to existing objects. It works, but as a developer you need to quite some configuration and registration. Creativity: “separation of concerns” and “lookups in a registry”

Pyramid sits somewhere in between. And has some creativity on its own.

Another option is routing. You map a url explicitly to a function. A @route('/addresses/{name}') decorator to a function (or a django urls.py). The creativity is that is simple.

Both traversal and routing have their advantages. So Morepath has both of them. Simple routing to get to the content object and then traversal from there to the view.

The creativity here is “dialectic”. You have a “thesis” and an “antithesis” and end up with a “synthesis”. So a creative mix between two ideas that seem to be opposites.

Apart from traversal/routing, there’s also the registry. Zope’s registry (component architecture) is very complicated. He’s now got a replacement called “Reg” (http://reg.readthedocs.io/).

He ended up with this after creatively experimenting with it. Just experimenting, nothing serious. Rewriting everything from scratch.

(It turned out there already was something that worked a bit the same in the python standard library: @functools.singledispatch.)

He later extended it from single dispatch to multiple dispatch. The creativity here was the freedom to completely change the implementation as he was the only user of the library at that moment. Don’t be afraid to break stuff. Everything has been invented before (so research). Also creative: it is just a function.

A recent spin-off: “dectate”. (http://dectate.readthedocs.io/). A decorator-based configuration system for frameworks :-) Including subclassing to override configuration.

Some creativity here: it is all just subclassing. And split something off into a library for focus, testing and documentation. Split something off to gain these advantages.

Pygrunn: from code to config and back again - Jasper Spaans

2016-05-13

Tags: python, pygrunn

(One of my summaries of the one-day 2016 PyGrunn conference).

Jasper works at Fox IT, one of the programs he works on is DetACT, a fraud detection tool for online banking. The technical summary would be something like “spamassassin and wireshark for internet traffic”.

  • Wireshark-like: DetACT intercepts online bank traffic and feeds it to a rule engine that ought to detect fraud. The rule engine is the one that needs to be configured.
  • Spamassassin-like: rules with weights. If a transaction gets too many “points”, it is marked as suspect. Just like spam detection in emails.

In the beginning of the tool, the rules were in the code itself. But as more and more rules and exceptions got added, maintaining it became a lot of work. And deploying takes a while as you need code review, automatic acceptance systems, customer approval, etc.

From code to config: they rewrote the rule engine from start to work based on a configuration. (Even though Joel Spolsky says totally rewriting your code is the single worst mistake you can make). They went 2x over budget. That’s what you get when rewriting completely....

The initial test with hand-written json config files went OK, so they went to step two: make the configuration editable in a web interface. Including config syntax validation. Including mandatory runtime performance evaluation. The advantage: they could deploy new rules much faster than when the rules were inside the source code.

Then... they did a performance test at a customer.... It was 10x slower than the old code. They didn’t have enough hardware to run it. (It needs to run on real hardware instead of in the cloud as it is very very sensitive data).

They fired up the profiler and discovered that only 30% of the time is spend on the actual rules, the other 70% is bookkeeping and overhead.

In the end they had the idea to generate python code from the configuration. They tried it. The generated code is ugly, but it works and it is fast. A 3x improvement. Fine, but not a factor of 10, yet.

They tried converting the config to AST (python’s Abstract Syntax Tree) instead of to actual python code. Every block was turned into an AST and then combined based on the config. This is then optimized (which you can do with an AST) before generating python code again.

This was fast enough!

Some lesons learned:

  • Joel Spolsky is right. You should not rewrite your software completely. If you do it, do it in very small chunks.
  • Write readable and correct code first. Then benchmark and profile
  • Have someone on your team who knows about compiler construction if you want to solve these kinds of problems.

Pygrunn: simple cloud with TripleO quickstart - K Rain Leander

2016-05-13

Tags: python, pygrunn

(One of my summaries of the one-day 2016 PyGrunn conference).

What is openstack? A “cloud operating system”. Openstack is an umbrella with a huge number of actual open source projects under it. The goal is a public and/or private cloud.

Just like you use “the internet” without concerning yourself with the actual hardware everything runs on, just in the same way you should be able to use a private/public cloud on any regular hardware.

What is RDO? Exactly the same as openstack, but using RPM packages. Really, it is exactly the same. So a way to get openstack running on a Red Hat enterprise basis.

There are lots of ways to get started. For RDO there are three oft-used ones:

  • TryStack for trying out a free instance. Not intended for production.

  • PackStack. Install openstack-packstack with “yum”. Then you run it on your own hardware.

  • TripleO (https://wiki.openstack.org/wiki/TripleO). It is basically “openstack on openstack”. You install an “undercloud” that you use to deploy/update/monitor/manage several “overclouds”. An overcloud is then the production openstack cloud.

    TripleO has a separate user interface that’s different from openstack’s own one. This is mostly done to prevent confusion.

    It is kind of heavy, though. The latest openstack release (mitaka) is resource-hungry and needs ideally 32GB memory. That’s just for the undercloud. If you strip it, you could get the requirement down to 16GB.

To help with setting up there’s now a TripleO quickstart shell script.

Pygrunn: Understanding PyPy and using it in production - Peter Odding/Bart Kroon

2016-05-13

Tags: python, pygrunn

(One of my summaries of the one-day 2016 PyGrunn conference).

pypy is “the faster version of python”.

There are actually quite a lot of python implementation. cpython is the main one. There are also JIT compilers. Pypy is one of them. It is by far the most mature. PyPy is a python implementation, compliant with 2.7.10 and 3.2.5. And it is fast!.

Some advantages of pypy:

  • Speed. There are a lot of automatic optimizations. It didn’t use to be fast, but since 5 years it is actually faster than cpython! It has a “tracing JIT compiler”.
  • Memory usage is often lower.
  • Multi core programming. Some stackless features. Some experimental work has been started (“software transactional memory”) to get rid of the GIL, the infamous Global Interpreter Lock.

What does having a “tracing JIT compiler” mean? JIT means “Just In Time”. It runs as an interpreter, but it automatically identifies the “hot path” and optimizes that a lot by compiling it on the fly.

It is written in RPython, which is a statically typed subset of python which translates to C and is compiled to produce an interpreter. It provides a framework for writing interpreters. “PyPy” really means “Python written in Python”.

How to actually use it? Well, that’s easy:

$ pypy your_python_file.py

Unless you’re using C modules. Lots of python extension modules use C code that compile against CPython... There is a compatibility layer, but that catches only 40-60% of the cases. Ideally, all extension modules would use “cffi”, the C Foreign Function Interface, instead of “ctypes”. CFFI provides an interface to C that allows lots of optimizations, especially by pypy.

Peter and Bart work at paylogic. A company that sells tickets for big events. So you have half a million people trying to get a ticket to a big event. Opening multiple browsers to improve their chances. “You are getting DDOSed by your own customers”.

Whatever you do: you still have to handle 500000 pageviews in just a few seconds. The solution: a CDN for the HTML and only small JSON requests to servers. Even then then you still need a lot of servers to handle the JSON requests. State synchronisation was a problem as in the end you still had one single server for that single task.

Their results after using pypy for that task:

  • An 8-fold improvement. Initially 4x, but pypy has been optimized since, so they got an extra 2x for free. So: upgrade regularly.
  • Real savings on hosting costs
  • The queue has been tested to work for at least two million visitors now.

Guido van Rossum supposedly says “if you want your code to run faster, you should probably just use PyPy” :-)

Note: slides are online

Pygrunn: django channels - Bram Noordzij/Bob Voorneveld

2016-05-13

Tags: python, pygrunn, django

(One of my summaries of the one-day 2016 PyGrunn conference).

Django channels is a project to make Django to handle more than “only” plain http requests. So: websockets, http2, etc. Regular http is the normal request/response cycle. Websockets is a connection that stays open, for bi-directional communication. Websockets are technically an ordered first-in first-out queue with message expiry and at-most-once delivery to only one listener at the time.

“Django channels” is an easy-to-understand extension of the Django view mechanism. Easy to integrate and deploy.

Installing django channels is quick. Just add the application to your INSTALLED_APPS list. That’s it. The complexity happens when deploying it as it is not a regular WSGI deployment. It uses a new standard called ASGI (a = asynchronous). Currently there’s a “worker service” called daphne (build in parallel to django channels) that implements ASGI.

You need to configure a “backing service”. Simplified: a queue.

They showed a demo where everybody in the room could move markers over a map. Worked like a charm.

How it works behind the scenes is that you define “channels”. Channels can recieve messages and can send messages to other channels. So you can have channel for reading incoming messages, do something with it and then send a reply back to some output channel. Everything is hooked up with “routes”.

You can add channels to groups so that you can, for instance, add the “output” channel of a new connection to the group you use for sending out status messages.

Pygrunn: Kliko, compute container specification - Gijs Molenaar

2016-05-13

Tags: python, pygrunn

(One of my summaries of the one-day 2016 PyGrunn conference).

Gijs Molenaar works on processing big data for large radio telescopes (“Meerkat” in the south of Africa and “Lofar” in the Netherlands).

The data volumes coming from such telescopes are huge. 4 terabits per seconds, for example. So they do a log of processing and filtering to get that number down. Gijs works on the “imaging and calibration” part of the process.

So: scientific software. Which is hard to install and fragile. Especially for scientists. So they use ubuntu’s “lauchpad PPA’s” to package it all up as debian packages.

The new hit nowadays is docker. Containerization. A self-contained light-weight “virtual machine”. Someone called it centralized agony: only one person needs to go through the pain of creating the container and all the rest of the world can use it... :-)

His line of work is often centered around pipelines. Data flows from one step to the other and on to the next. This is often done with bash scripts.

Docker is nice and you can hook up multiple dockers. But... it is all network-centric: a web container plus a database container plus a redis container. It isn’t centered on data flows.

So he build something new: kliko. He’s got a spec for “kliko” containers. Like “read your input from /input”. “Write your output to /output”. There should be a kliko.yml that defines the parameters you can pass. There should be a /kliko script as an entry point.

Apart from the kliko container, you also have the “kliko runner”. It is the actor that runs the container. It runs the containers with the right parameters. You can pass the parameters on the command line or via a web interface. Perfect for scientists! You get a form where you can fill in the various parameters (defined in the kliko.yml file) and “just” run the kliko container.

An idea: you could use it almost as functional programming: functional containers. Containers that don’t change the data they’re operating on. Every time you run it on the same input data, you get the same results. And you can run them in parallel per definition. And you can do fun things with caching.

There are some problems with kliko.

  • There is no streaming yet.
  • It is filesystem based at the moment, which is slow.

These are known problems which are fine with what they’re currently using it for. They’ll work on it, though. One thing they’re also looking at is “kliko-compose”, so something that looks like “docker-compose”.

Some (fundamental) problems with docker:

  • Docker access means root access, basically.
  • GPU acceleration is crap.
  • Cached filesystem layers is just annoying. In first instance it seems fine that all the intermediary steps in your Dockerfile are cached, but it is really irritating once you install, for instance, debian packages. They’re hard to update.
  • You can’t combine containers.

Pygrunn: Micropython, internet of pythonic things - Lars de Ridder

2016-05-13

Tags: python, pygrunn

(One of my summaries of the one-day 2016 PyGrunn conference).

micropython is a project that wants to bring python to the world of microprocessors.

Micropython is a lean and fast implementation of python 3 for microprocessors. It was funded in 2013 on kickstarter. Originally it only ran on a special “pyboard”, but it has now been ported to various other microprocessors.

Why use micropython? Easy to learn, with powerful features. Native bitwise operations. Ideal for rapid prototyping. (You cannot use cpython, mainly due to RAM usage.)

It is not a full python, of course, they had to strip things out. “functools” and “this” are out, for instance. Extra included are libraries for the specific boards. There are lots of memory optimizations. Nothing fancy, most of the tricks are directly from compiler textbooks, but it is nice to see it all implemented in a real project.

Some of the supported boards:

  • Pyboard
  • The “BBC micro:bit” which is supplied to 1 million school children!
  • Wipy. More of a professional-grade board.
  • LoPy. a board which supports LoRa, an open network to connect internet-of-things chips.

Development: there is one full time developer (funded by the ESA) and two core contributors. It is stable and feels like it is maturing.

Is it production ready? That depends on your board. It is amazing for prototyping or for embedding in games and tools.

Experiment: cheap small camera for model railroad

2016-04-01

Tags: modelrailway, eifelburgenbahn

As an experiment, I bought a cheap (€9.08!) mini camera off Aliexpress.com (search for “y2000 mini camera). I wanted to see whether I could use it to make on-board model railway video. I didn’t expect much; my main worry was whether the camera could handle close objects. I want so see things 20cm - 2m away, not 2m - 200m!

A bit of ducttape:

http://abload.de/img/screenshot2016-04-01a2wsva.png

... and let’s roll!

http://abload.de/img/screenshot2016-04-01atuqjx.png

The quality is not very good, but it sure is a lot of fun for 9 Euros :-) I especially like the view of my viaduct at 2:15.

 
vanrees.org logo

About me

My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):