Reinout van Rees’ weblog

A high availability Django setup on the cheap - Roland van Laar

2016-06-22

Tags: python, pun, django

(One of the talks at the 22 June 2016 Amsterdam Python meetup)

Roland build an educational website that needed to be high available on a tight budget. He demoed the website. A site for the teacher on his own laptop and a separate page for on the digiboard for the class. The teacher steers the digiboard from his laptop (or an ipad or phone).

As it is used in classrooms, it needs to be really really available. As a teacher, you don’t want to have to change your lesson plan at 8:30. The customer hat three goals:

  • Not expensive.
  • Always up.
  • Reliable.

He had some technical goals of his own:

  • Buildable.
  • Functional.
  • Maintainable.

Always up? Django? You have the following challenges, apart from having a bunch of webservers.

  • Media files. Files uploaded on one server need to be visible on others.
  • Websockets.
  • Database.
  • Sessions.

The setup he chose:

  • Front end (html, javascript, images): cloudflare as CDN, content delivery network. The front end is a single page jquery app. It chooses a random API host for ajax requests.

    It changes API hosts when the API is not responding. But.... when is an API not responding? Some schools have really bad internet, so 10 seconds for a request might be “normal”.

    Don’t make a “ping pong” application that retries all the time. Try every server and then fail.

  • Some Django API servers. The actual django project was easy. Simple models, a bit of djangorestframework. As an extra he used some new postgres features.

  • Two SQL servers in BDR, bi-directional replication, mode. “Postgres async multi master”. It is awesome! It just works! Even sessions are replicated faultlessly.

    Things to watch out for: create a separate replication user on both ends. Also watch out with sequences (auto-increment fields). For django it was easy to get working by configuring the database with “USING BDR” when using such IDs. This takes a little bit longer to create such objects. Alternatively you can UUIDs.

    Backups: oopsie. When postgres goes down, you normally restart it and it rebuilds itself. But in a BDR setup, the sequences don’t work right then. The standard tools don’t work, he had to write a custom script.

    Another drawback. For updating your tables, you need a lock on all database nodes. This means you have downtime. No problem, he’ll just do it early on in the morning in a weekend.

  • He uses csync2 for syncing uploaded files between the hosts. Simply a cronjob on all servers. This is good enough as the updates only really happen in the summer; during the school year nothing changes.

  • Websockets. He uses Tornado plus javascript code for reconnecting websockets. Initial connection for the teacher to connect his laptop with the digiboad is via a short 6-digit number. Internally, a UUID is generated. The UUID is stored in local storage, so reloading the page or restarting a laptop Just Works.

The One Time We Were Down: they switched email providers one time because their original one got much more expensive. But the new provider wasn’t as good and suddenly calls took more than 10 seconds and clients started to fail. It wasn’t that critical as it happened after school time when only one teacher wanted to reset his password. So it was easy to fix.

What are distributed systems? - Quazi Nafiul Islam

2016-06-22

Tags: python, pun

(One of the talks at the 22 June 2016 Amsterdam Python meetup)

Distributed systems?

  • Distributed computation (like hadoop, mapreduce).
  • Distributed storage (like cassandra, riak).
  • Distributed messaging (like kafka).

You need distributed systems if you want to do more than is regulary possible.

With many systems you need things like synchronisation (for instance NTP, network time protocol).

A distributed system is a bunch of computers working together. Such systems have challenges and limitations. Take for instance the CAP theorem for distributed storage systems. You can’t have all three of the following three at the same time:

  • Consistency.
  • Availability.
  • Partition tolerance.

You can for instance value availability over consistency. You give the answer as soon as possible, even if you’re not completely sure you have the full answer (as other nodes might have extra/newer information).

Python for OPS and platform teams - Pavel Chunyayev

2016-06-22

Tags: python, pun

(One of the talks at the 22 June 2016 Amsterdam Python meetup)

The sub-title of his talk is supporting development teams in their journey to Continuous Delivery. Pavel’s job description is continuous delivery architect. He loves biking so he just had to move to Amsterdam :-)

What is continuous delivery? Often you’ll hear something like “safely, rapidly and predictably deliver new features to production”. But that’s a little bit restricted. He likes to start already in the inception and planning stage and only end in the actual operation stage. So the whole process. You don’t want to continuously deliver bad features that won’t be used. You want to include the inception/planning stage to make sure the right features are implemented.

A core idea is to keep the product releasable: build quality in. That is the core that we ought to remember. Quality is an integral part of the product, it is not something you can tack on later.

So... quality in the entire build/test/release cycle.

  • Build. All the checks you can do. Linting, static code analysis, unit tests.
  • Test. Contract tests, end-to-end testsuites, browser tests, scaling tests.
  • Release. Verify the deploy.

Especially in the “test” phase, you need specific clean test environments. Provision infrastructure, run the tests, dispose of the infrastructure. Repeatability is keys.

“Platform as a service”: the real name of every operations team nowadays.

The number of jobs that include configuring servers manually is quickly declining. You need to be a programmer now! You need to enable self-service for the teams you work with.

So: python for system engineers. Python is the best language for this.

  • It is the most popular programming language for devops (apart from bash).
  • It is easy to learn (but hard to master).
  • It is more powerful than bash.
  • You can create and distribute real helper applications.
  • Devops means you’re a developer now!

What you need to do: create a provisioning service. A self-service. The developer can just click a button and they have a server! Amazon’s AWS was born this way. But these are generic servers. Your company needs custom setups. This is where you come up.

  • Services to manage environment lifecycle.
  • The same way for everyone.
  • Manipulation using API.
  • You can mix and match infrastructure providers.

They build something with python + flask + ansible. Flask recieves API calls, analyzes it and creates an ansible object and fires off ansible. Ready!

Also something to look at: Jenkins pipelines. All the jenkins tasks for one job inside one versionable file. You can apparently even write those in python nowadays (instead of the default “groovy” DSL).

Some further thoughts:

  • Open all the helper tools to the whole organization.
  • Distribute it as docker containers. Both the services and command line tools!
  • As sysadmin, act as a developer. Use the same tools!

Django meetup Amsterdam 18 May 2016

2016-05-23

Tags: django

Summary of the Django meetup organized at crunchr in Amsterdam, the Netherlands.

(I gave a talk on the django admin, which I of course don’t have a summary of, yet, though my brother made a summary of an almost-identical talk I did the friday before)

Reducing boilerplate with class-based views - Priy Werry

A view can be more than just a function. They can also be class based, django has quite a lot of them. For example the TemplateView that is very quick for rendering a template. Boilerplate reduction.

Django REST framework is a good example of class based views usage. It really helps you to reduce the number of boring boilerplate and concentrate on your actual code.

Examples of possible boilerplate code:

  • Parameter validation.
  • Pagination.
  • Ordering.
  • Serialisation.

They wanted to handle this a bit like django’s middleware mechanism, but then view-specific. So they wrote a base class that performed most of the boilerplate steps. So the actual views could be fairly simple.

It also helps with unit testing: normally you’d have to test all the corner cases in all your views, now you only have to test your base class for that.

Custom base classes also often means you have methods that you might re-define in subclasses to get extra functionality. In those cases make sure you call the parent class’s original method (when needed).

Users of your views (especially when it is an API) get the advantage of having a more consistent API. It is automatically closer to the specification. The API should also be easier to read.

Meetups on steroids - Bob Aalsma

“Can I get a consultant for this specific subject?” Some managers find it difficult to allow this financially. A special course with a deep-dive is easier to allow.

He would like to be a kind of a broker between students and teachers to arrange it: “Meetups on steroids: pick the subject - any subject; pick the date - any date; pick the group - any group”

Security in Django - Joery van der Zwart

Joery comes out of the security world. He doesn’t know anything from the inside of django, but a lot of the outside of django. He’s tested a lot of them.

Security is as strong as its weekest link. People are often the weakest link. Django doesn’t protect you if you explicitly circumvent its security mechanisms as a programmer.

Django actually protects you a lot!

A good thing is to look at the OWASP list of top 10 errors. (See also Florian Apolloner’s talk at ‘Django under the Hood’ 2015, for instance).

  • SQL injection. Protection is integrated in django. But watch out when doing raw sql queries, because they are really raw and unprotected. If you work through the model layer, you’re safe.
  • Authentication and sessions. Django’s SessionSecurityMiddleware is quite good. He has some comments on authentication, though, so he advices to do that one yourself. (Note: the local core committer looked quite suspicious as this was the first he heard about it. Apparently there are a number of CVEs that are unfixed in Django. Joery will share the numbers.)
  • XSS injection. User-fillable fields that aren’t escaped. Django by default... yes, protects you against this. Unless you use {% autoescape off %}, so don’t do that.
  • Direct object reference. He doesn’t agree with this point. So ignore it.
  • Security misconfiguration. Basically common sense. Don’t have DEBUG = True on in your production site. Django’s output is very detailed and thus very useful for anyone breaking into your site.
  • Sensitive data. Enable https. Django doesn’t enforce it. But use https. Extra incentive: google lowers the page ranking for non-https sites...
  • Access control. It is very very very hard to get into Django this way. He says django is one of the few systems to fix it this rigidly!
  • CSRF. Django protects you. Unless you explicitly use @csfr_exempt...
  • Known vulnerabilities. Update django! Because there have been fixes in django. Older versions are thus broken.
  • Insecure forwards/redirects. Once you’ve enabled django’s default middleware, you’re secure.

So Django is quite secure, but you are not.

Look at django’s security documentation. And look at https://www.ponycheckup.com. You can check your site with it. The good is that it is simple. It only checks django itself, though.

With some other tools (like nessus) you have to watch out for false positives, though. So if you don’t know to interpret the result, you’ll be scared shitless.

A good one: Qualys SSLlabs https checker to get your ssl certificate completely right. (Note: see my blog post fixing ssl certificate chains for some extra background.)

“Owasp zap”: open source tool that combines checker and reduces the number of false positives.

The summary:

  • Good: django with middleware.
  • Good: django provides a good starting point.
  • Bad: experimenting. Be very sure you’re doing it right. Look at documentation.
  • Bad: do it yourself. Most of the times.

Django girls Amsterdam

On 25 june there’ll be a django girls workshop in Amsterdam. Everything’s set, but they do still need coaches.

Pygrunn keynote: the future of programming - Steven Pemberton

2016-05-13

Tags: python, pygrunn

(One of my summaries of the one-day 2016 PyGrunn conference).

Steven Pemberton (https://en.wikipedia.org/wiki/Steven_Pemberton) is one of the developers of ABC, a predecessor of python.

He’s a researcher at CWI in Amsterdam. It was the first non-military internet site in Europe in 1988 when the whole of Europe was still connected to the USA with a 64kb link.

When designing ABC they were considered completely crazy because it was an interpreted language. Computers were slow at that time. But they knew about Moore’s law. Computers would become much faster.

At that time computers were very, very expensive. Programmers were basically free. Now it is the other way. Computers are basically free and programmers are very expensive. So, at that time, in the 1950s, programming languages were designed around the needs of the computer, not the programmer.

Moore’s law is still going strong. Despite many articles claiming its imminent demise. He heard the first one in 1977. Steven showed a graph of his own computers. It fits.

On modern laptops, the CPU is hardly doing anything most of the time. So why use programming languages optimized for giving the CPU a rest?

There’s another cost. The more lines a program has, the more bugs there are in it. But it is not a linear relationship. More like lines ^ 1.5. So a program with 10x more lines probably has 30x more bugs.

Steven thinks the future of programming is in declarative programming instead of in procedural programming. Declarative code describes what you want to achieve and not how you want to achieve it. It is much shorter.

Procedural code would have specified everything in detail. He showed a code example of 1000 lines. And a declarative one of 15 lines. Wow.

He also showed an example with xforms, which is declarative. Projects that use it regularly report a factor of 10 in savings compared to more traditional methods. He mentioned a couple of examples.

Steven doesn’t necessarily want us all to jump on Xforms. It might not fit with our usecases. But he does want us to understand that declarative languages are the way to go. The approach has been proven.

In response to a question he compared it to the difference between roman numerals and arabic numerals and the speed difference in using them.

(The sheets will be up on http://homepages.cwi.nl/~steven/Talks/2016/05-13-pygrunn/ later).

Pygrunn keynote: Morepath under the hood - Martijn Faassen

2016-05-13

Tags: python, pygrunn, zope, django

(One of my summaries of the one-day 2016 PyGrunn conference).

Martijn Faassen is well-known from lxml, zope, grok. Europython, Zope foundation. And he’s written Morepath, a python web framework.

Three subjects in this talk:

  • Morepath implementation details.
  • History of concepts in web frameworks
  • Creativity in software development.

Morepath implementation details. A framework with super powers (“it was the last to escape from the exploding planet Zope”)

Traversal. In the 1990’s you’d have filesystem traversal. example.com/addresses/faassen would map to a file /webroot/addresses/faassen.

In zope2 (1998) you had “traversal through an object tree. So root['addresses']['faassen'] in python. The advantage is that it is all python. The drawback is that every object needs to know how to render itself for the web. It is an example of creativity: how do we map filesystem traversal to objects?.

In zope3 (2001) the goal was the zope2 object traversal, but with objects that don’t need to know how to handle the web. A way of working called “component architecture” was invented to add traversal-capabilities to existing objects. It works, but as a developer you need to quite some configuration and registration. Creativity: “separation of concerns” and “lookups in a registry”

Pyramid sits somewhere in between. And has some creativity on its own.

Another option is routing. You map a url explicitly to a function. A @route('/addresses/{name}') decorator to a function (or a django urls.py). The creativity is that is simple.

Both traversal and routing have their advantages. So Morepath has both of them. Simple routing to get to the content object and then traversal from there to the view.

The creativity here is “dialectic”. You have a “thesis” and an “antithesis” and end up with a “synthesis”. So a creative mix between two ideas that seem to be opposites.

Apart from traversal/routing, there’s also the registry. Zope’s registry (component architecture) is very complicated. He’s now got a replacement called “Reg” (http://reg.readthedocs.io/).

He ended up with this after creatively experimenting with it. Just experimenting, nothing serious. Rewriting everything from scratch.

(It turned out there already was something that worked a bit the same in the python standard library: @functools.singledispatch.)

He later extended it from single dispatch to multiple dispatch. The creativity here was the freedom to completely change the implementation as he was the only user of the library at that moment. Don’t be afraid to break stuff. Everything has been invented before (so research). Also creative: it is just a function.

A recent spin-off: “dectate”. (http://dectate.readthedocs.io/). A decorator-based configuration system for frameworks :-) Including subclassing to override configuration.

Some creativity here: it is all just subclassing. And split something off into a library for focus, testing and documentation. Split something off to gain these advantages.

Pygrunn: from code to config and back again - Jasper Spaans

2016-05-13

Tags: python, pygrunn

(One of my summaries of the one-day 2016 PyGrunn conference).

Jasper works at Fox IT, one of the programs he works on is DetACT, a fraud detection tool for online banking. The technical summary would be something like “spamassassin and wireshark for internet traffic”.

  • Wireshark-like: DetACT intercepts online bank traffic and feeds it to a rule engine that ought to detect fraud. The rule engine is the one that needs to be configured.
  • Spamassassin-like: rules with weights. If a transaction gets too many “points”, it is marked as suspect. Just like spam detection in emails.

In the beginning of the tool, the rules were in the code itself. But as more and more rules and exceptions got added, maintaining it became a lot of work. And deploying takes a while as you need code review, automatic acceptance systems, customer approval, etc.

From code to config: they rewrote the rule engine from start to work based on a configuration. (Even though Joel Spolsky says totally rewriting your code is the single worst mistake you can make). They went 2x over budget. That’s what you get when rewriting completely....

The initial test with hand-written json config files went OK, so they went to step two: make the configuration editable in a web interface. Including config syntax validation. Including mandatory runtime performance evaluation. The advantage: they could deploy new rules much faster than when the rules were inside the source code.

Then... they did a performance test at a customer.... It was 10x slower than the old code. They didn’t have enough hardware to run it. (It needs to run on real hardware instead of in the cloud as it is very very sensitive data).

They fired up the profiler and discovered that only 30% of the time is spend on the actual rules, the other 70% is bookkeeping and overhead.

In the end they had the idea to generate python code from the configuration. They tried it. The generated code is ugly, but it works and it is fast. A 3x improvement. Fine, but not a factor of 10, yet.

They tried converting the config to AST (python’s Abstract Syntax Tree) instead of to actual python code. Every block was turned into an AST and then combined based on the config. This is then optimized (which you can do with an AST) before generating python code again.

This was fast enough!

Some lesons learned:

  • Joel Spolsky is right. You should not rewrite your software completely. If you do it, do it in very small chunks.
  • Write readable and correct code first. Then benchmark and profile
  • Have someone on your team who knows about compiler construction if you want to solve these kinds of problems.

Pygrunn: simple cloud with TripleO quickstart - K Rain Leander

2016-05-13

Tags: python, pygrunn

(One of my summaries of the one-day 2016 PyGrunn conference).

What is openstack? A “cloud operating system”. Openstack is an umbrella with a huge number of actual open source projects under it. The goal is a public and/or private cloud.

Just like you use “the internet” without concerning yourself with the actual hardware everything runs on, just in the same way you should be able to use a private/public cloud on any regular hardware.

What is RDO? Exactly the same as openstack, but using RPM packages. Really, it is exactly the same. So a way to get openstack running on a Red Hat enterprise basis.

There are lots of ways to get started. For RDO there are three oft-used ones:

  • TryStack for trying out a free instance. Not intended for production.

  • PackStack. Install openstack-packstack with “yum”. Then you run it on your own hardware.

  • TripleO (https://wiki.openstack.org/wiki/TripleO). It is basically “openstack on openstack”. You install an “undercloud” that you use to deploy/update/monitor/manage several “overclouds”. An overcloud is then the production openstack cloud.

    TripleO has a separate user interface that’s different from openstack’s own one. This is mostly done to prevent confusion.

    It is kind of heavy, though. The latest openstack release (mitaka) is resource-hungry and needs ideally 32GB memory. That’s just for the undercloud. If you strip it, you could get the requirement down to 16GB.

To help with setting up there’s now a TripleO quickstart shell script.

Pygrunn: Understanding PyPy and using it in production - Peter Odding/Bart Kroon

2016-05-13

Tags: python, pygrunn

(One of my summaries of the one-day 2016 PyGrunn conference).

pypy is “the faster version of python”.

There are actually quite a lot of python implementation. cpython is the main one. There are also JIT compilers. Pypy is one of them. It is by far the most mature. PyPy is a python implementation, compliant with 2.7.10 and 3.2.5. And it is fast!.

Some advantages of pypy:

  • Speed. There are a lot of automatic optimizations. It didn’t use to be fast, but since 5 years it is actually faster than cpython! It has a “tracing JIT compiler”.
  • Memory usage is often lower.
  • Multi core programming. Some stackless features. Some experimental work has been started (“software transactional memory”) to get rid of the GIL, the infamous Global Interpreter Lock.

What does having a “tracing JIT compiler” mean? JIT means “Just In Time”. It runs as an interpreter, but it automatically identifies the “hot path” and optimizes that a lot by compiling it on the fly.

It is written in RPython, which is a statically typed subset of python which translates to C and is compiled to produce an interpreter. It provides a framework for writing interpreters. “PyPy” really means “Python written in Python”.

How to actually use it? Well, that’s easy:

$ pypy your_python_file.py

Unless you’re using C modules. Lots of python extension modules use C code that compile against CPython... There is a compatibility layer, but that catches only 40-60% of the cases. Ideally, all extension modules would use “cffi”, the C Foreign Function Interface, instead of “ctypes”. CFFI provides an interface to C that allows lots of optimizations, especially by pypy.

Peter and Bart work at paylogic. A company that sells tickets for big events. So you have half a million people trying to get a ticket to a big event. Opening multiple browsers to improve their chances. “You are getting DDOSed by your own customers”.

Whatever you do: you still have to handle 500000 pageviews in just a few seconds. The solution: a CDN for the HTML and only small JSON requests to servers. Even then then you still need a lot of servers to handle the JSON requests. State synchronisation was a problem as in the end you still had one single server for that single task.

Their results after using pypy for that task:

  • An 8-fold improvement. Initially 4x, but pypy has been optimized since, so they got an extra 2x for free. So: upgrade regularly.
  • Real savings on hosting costs
  • The queue has been tested to work for at least two million visitors now.

Guido van Rossum supposedly says “if you want your code to run faster, you should probably just use PyPy” :-)

Note: slides are online

Pygrunn: django channels - Bram Noordzij/Bob Voorneveld

2016-05-13

Tags: python, pygrunn, django

(One of my summaries of the one-day 2016 PyGrunn conference).

Django channels is a project to make Django to handle more than “only” plain http requests. So: websockets, http2, etc. Regular http is the normal request/response cycle. Websockets is a connection that stays open, for bi-directional communication. Websockets are technically an ordered first-in first-out queue with message expiry and at-most-once delivery to only one listener at the time.

“Django channels” is an easy-to-understand extension of the Django view mechanism. Easy to integrate and deploy.

Installing django channels is quick. Just add the application to your INSTALLED_APPS list. That’s it. The complexity happens when deploying it as it is not a regular WSGI deployment. It uses a new standard called ASGI (a = asynchronous). Currently there’s a “worker service” called daphne (build in parallel to django channels) that implements ASGI.

You need to configure a “backing service”. Simplified: a queue.

They showed a demo where everybody in the room could move markers over a map. Worked like a charm.

How it works behind the scenes is that you define “channels”. Channels can recieve messages and can send messages to other channels. So you can have channel for reading incoming messages, do something with it and then send a reply back to some output channel. Everything is hooked up with “routes”.

You can add channels to groups so that you can, for instance, add the “output” channel of a new connection to the group you use for sending out status messages.

 
vanrees.org logo

About me

My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):