Reinout van Rees’ weblog

Simple qgis plugin repo

2016-08-29

Tags: python, nelenschuurmans

At my company, we sometimes build QGIS plugins. You can install those by hand by unzipping a zipfile in the correct directory, but there’s a nicer way.

You can add custom plugin “registries” to QGIS and QGIS will then treat your plugins just like regular ones. Here’s an example registry: https://plugins.lizard.net/ . Simple, right? Just a directory on our webserver. The URL you have to configure inside QGIS as registry is that of the plugins.xml file: https://plugins.lizard.net/plugins.xml .

The plugins.xml has a specific format:

<?xml version="1.0"?>
<plugins>

  <pyqgis_plugin name="GGMN lizard integration" version="1.6">
       <description>Download GGMN data from lizard, interpolate and add new points</description>
       <homepage>https://github.com/nens/ggmn-qgis</homepage>
       <qgis_minimum_version>2.8</qgis_minimum_version>
       <file_name>LizardDownloader.1.6.zip</file_name>
       <author_name>Reinout van Rees, Nelen &amp; Schuurmans</author_name>
       <download_url>https://plugins.lizard.net/LizardDownloader.1.6.zip</download_url>
   </pyqgis_plugin>

   .... more plugins ...

 </plugins>

As you see, the format is reasonably simple. There’s one directory on the webserver that I “scp” the zipfiles with the plugins to. I then run this script on the directory. That script extracts the (mandatory) metadata.txt from all zipfiles and creates a plugins.xml file out of it.

A gotcha regarding the zipfiles: they should contain the version number, but, in contrast to python packages, the version should be prefixed by a dot instead of a dash. So no myplugin-1.0.zip but myplugin.1.0.zip. It took me a while before I figured that one out!

About the metadata.txt: QGIS has a “plugin builder” plugin that generates a basic plugin structure for you. This structure includes a metadata.txt, so that’s easy.

(In case you want to use zest.releaser to release your plugin, you can extend zest.releaser to understand the metadata.txt format by adding https://github.com/nens/qgispluginreleaser . It also generates a correctly-named zipfile.)

collective.recipe.sphinxbuilder buildout recipe works on python 3 now

2016-07-11

Tags: python, buildout

I wanted to do a few small tweaks on collective.recipe.sphinxbuilder because it failed to install on python 3. I ended up as maintainer :-) It is now at https://github.com/reinout/collective.recipe.sphinxbuilder .

The only change needed was to tweak the way the readme was read in the setup.py and do a new release. Since then Thomas Khyn added windows support.

The documentation is now on readthedocs: http://collectiverecipesphinxbuilder.readthedocs.io/ . And the package is also tested on travis-ci.org.

Djangorecipe: easy test coverage reports

2016-06-30

Tags: django, buildout

Code coverage reports help you see which parts of your code are still untested. Yes, it doesn’t say anything about the quality of your tests, but at the least it tells you which parts of your code have absolute the worst kind of tests: those that are absent :-)

Ned Batchelder’s coverage.py is the number one tool in python.

Note: I use buildout instead of pip to set up my projects. Reason: I can integrate extra automation that way. One of those extra automation steps is “djangorecipe”: a recipe is a buildout plugin. It installs django, amongst others.

My previous setup

I use nose as a test runner. Easier test discovery was the reason at the time. Python’s unittest2 is better at that, so it might be time to re-visit my choice. Especially as I just read that nose isn’t really being maintained anymore.

A nice thing about nose is that you can have plugins. coverage.py is a standard plugin. And with django-nose you can easily use nose as a test runner instead of the standard django one. So once I put a few nose-related settings in my setup.cfg, coverage was run automatically every time I ran my tests. Including a quick coverage report.

The setup.cfg:

[nosetests]
cover-package = my_program
with-coverage = 1
cover-erase = 1
cover-html = 1
cover-html-dir = htmlcov

Running it:

$ bin/test
......................................
Name                       Stmts   Miss  Cover   Missing
--------------------------------------------------------
djangorecipe                   0      0   100%
djangorecipe.binscripts       42     16    62%   25, 37-57
djangorecipe.boilerplate       1      0   100%
djangorecipe.recipe          115      0   100%
--------------------------------------------------------
TOTAL                        158     16    90%
----------------------------------------------------------------------
Ran 38 tests in 1.308s

OK

An important point here is that laziness is key. Just running bin/test (or an equivalent command) should run your tests and print out a quick coverage summary.

Note: bin/test is the normal test script if you set up a project with buildout, but it is effectively the same as python manage.py test.

New situation

“New” is relative here. Starting in django 1.7, you cannot use a custom test runner (like django-nose) anymore to automatically run your tests with coverage enabled. The new app initialization mechanism already loads your models.py, for instance, before the test runner gets called. So your models.py shows up as largely untested.

There’s a bug report for this for django-nose.

There are basically two solutions. Either you run coverage separately:

$ coverage run python manage.py test
$ coverage report
$ coverage html_report

Ok. Right. You can guess what the prototypical programmer will do instead:

$ python manage.py test

That’s easier. But you don’t get coverage data.

The second alternative is to use the coverage API and modify your manage.py as shown in one of the answers. This is what I now build into buildout’s djangorecipe (version 2.2.1).

bin/test now starts coverage recording before django gets called. It also prints out a report and export xml results (for recording test results in Jenkins, for instance) and html results. The only thing you need to do is to add coverage = true to your buildout config.

The details plus setup examples are in the 2.2.1 documentation.

(An advantage: I can now more easily move from nose to another test runner, if needed).

Not using buildout?

Most people use pip. Those using buildout for django will, I think, really appreciate this new functionality.

Using pip? Well, the basic idea still stands. You can use the call-coverage-API-in-manage.py approach just fine in your manage.py manually. Make it easy for yourself and your colleagues to get automatic coverage summaries!

A high availability Django setup on the cheap - Roland van Laar

2016-06-22

Tags: python, pun, django

(One of the talks at the 22 June 2016 Amsterdam Python meetup)

Roland build an educational website that needed to be high available on a tight budget. He demoed the website. A site for the teacher on his own laptop and a separate page for on the digiboard for the class. The teacher steers the digiboard from his laptop (or an ipad or phone).

As it is used in classrooms, it needs to be really really available. As a teacher, you don’t want to have to change your lesson plan at 8:30. The customer hat three goals:

  • Not expensive.
  • Always up.
  • Reliable.

He had some technical goals of his own:

  • Buildable.
  • Functional.
  • Maintainable.

Always up? Django? You have the following challenges, apart from having a bunch of webservers.

  • Media files. Files uploaded on one server need to be visible on others.
  • Websockets.
  • Database.
  • Sessions.

The setup he chose:

  • Front end (html, javascript, images): cloudflare as CDN, content delivery network. The front end is a single page jquery app. It chooses a random API host for ajax requests.

    It changes API hosts when the API is not responding. But.... when is an API not responding? Some schools have really bad internet, so 10 seconds for a request might be “normal”.

    Don’t make a “ping pong” application that retries all the time. Try every server and then fail.

  • Some Django API servers. The actual django project was easy. Simple models, a bit of djangorestframework. As an extra he used some new postgres features.

  • Two SQL servers in BDR, bi-directional replication, mode. “Postgres async multi master”. It is awesome! It just works! Even sessions are replicated faultlessly.

    Things to watch out for: create a separate replication user on both ends. Also watch out with sequences (auto-increment fields). For django it was easy to get working by configuring the database with “USING BDR” when using such IDs. This takes a little bit longer to create such objects. Alternatively you can UUIDs.

    Backups: oopsie. When postgres goes down, you normally restart it and it rebuilds itself. But in a BDR setup, the sequences don’t work right then. The standard tools don’t work, he had to write a custom script.

    Another drawback. For updating your tables, you need a lock on all database nodes. This means you have downtime. No problem, he’ll just do it early on in the morning in a weekend.

  • He uses csync2 for syncing uploaded files between the hosts. Simply a cronjob on all servers. This is good enough as the updates only really happen in the summer; during the school year nothing changes.

  • Websockets. He uses Tornado plus javascript code for reconnecting websockets. Initial connection for the teacher to connect his laptop with the digiboad is via a short 6-digit number. Internally, a UUID is generated. The UUID is stored in local storage, so reloading the page or restarting a laptop Just Works.

The One Time We Were Down: they switched email providers one time because their original one got much more expensive. But the new provider wasn’t as good and suddenly calls took more than 10 seconds and clients started to fail. It wasn’t that critical as it happened after school time when only one teacher wanted to reset his password. So it was easy to fix.

What are distributed systems? - Quazi Nafiul Islam

2016-06-22

Tags: python, pun

(One of the talks at the 22 June 2016 Amsterdam Python meetup)

Distributed systems?

  • Distributed computation (like hadoop, mapreduce).
  • Distributed storage (like cassandra, riak).
  • Distributed messaging (like kafka).

You need distributed systems if you want to do more than is regulary possible.

With many systems you need things like synchronisation (for instance NTP, network time protocol).

A distributed system is a bunch of computers working together. Such systems have challenges and limitations. Take for instance the CAP theorem for distributed storage systems. You can’t have all three of the following three at the same time:

  • Consistency.
  • Availability.
  • Partition tolerance.

You can for instance value availability over consistency. You give the answer as soon as possible, even if you’re not completely sure you have the full answer (as other nodes might have extra/newer information).

Python for OPS and platform teams - Pavel Chunyayev

2016-06-22

Tags: python, pun

(One of the talks at the 22 June 2016 Amsterdam Python meetup)

The sub-title of his talk is supporting development teams in their journey to Continuous Delivery. Pavel’s job description is continuous delivery architect. He loves biking so he just had to move to Amsterdam :-)

What is continuous delivery? Often you’ll hear something like “safely, rapidly and predictably deliver new features to production”. But that’s a little bit restricted. He likes to start already in the inception and planning stage and only end in the actual operation stage. So the whole process. You don’t want to continuously deliver bad features that won’t be used. You want to include the inception/planning stage to make sure the right features are implemented.

A core idea is to keep the product releasable: build quality in. That is the core that we ought to remember. Quality is an integral part of the product, it is not something you can tack on later.

So... quality in the entire build/test/release cycle.

  • Build. All the checks you can do. Linting, static code analysis, unit tests.
  • Test. Contract tests, end-to-end testsuites, browser tests, scaling tests.
  • Release. Verify the deploy.

Especially in the “test” phase, you need specific clean test environments. Provision infrastructure, run the tests, dispose of the infrastructure. Repeatability is keys.

“Platform as a service”: the real name of every operations team nowadays.

The number of jobs that include configuring servers manually is quickly declining. You need to be a programmer now! You need to enable self-service for the teams you work with.

So: python for system engineers. Python is the best language for this.

  • It is the most popular programming language for devops (apart from bash).
  • It is easy to learn (but hard to master).
  • It is more powerful than bash.
  • You can create and distribute real helper applications.
  • Devops means you’re a developer now!

What you need to do: create a provisioning service. A self-service. The developer can just click a button and they have a server! Amazon’s AWS was born this way. But these are generic servers. Your company needs custom setups. This is where you come up.

  • Services to manage environment lifecycle.
  • The same way for everyone.
  • Manipulation using API.
  • You can mix and match infrastructure providers.

They build something with python + flask + ansible. Flask recieves API calls, analyzes it and creates an ansible object and fires off ansible. Ready!

Also something to look at: Jenkins pipelines. All the jenkins tasks for one job inside one versionable file. You can apparently even write those in python nowadays (instead of the default “groovy” DSL).

Some further thoughts:

  • Open all the helper tools to the whole organization.
  • Distribute it as docker containers. Both the services and command line tools!
  • As sysadmin, act as a developer. Use the same tools!

Django meetup Amsterdam 18 May 2016

2016-05-23

Tags: django

Summary of the Django meetup organized at crunchr in Amsterdam, the Netherlands.

(I gave a talk on the django admin, which I of course don’t have a summary of, yet, though my brother made a summary of an almost-identical talk I did the friday before)

Reducing boilerplate with class-based views - Priy Werry

A view can be more than just a function. They can also be class based, django has quite a lot of them. For example the TemplateView that is very quick for rendering a template. Boilerplate reduction.

Django REST framework is a good example of class based views usage. It really helps you to reduce the number of boring boilerplate and concentrate on your actual code.

Examples of possible boilerplate code:

  • Parameter validation.
  • Pagination.
  • Ordering.
  • Serialisation.

They wanted to handle this a bit like django’s middleware mechanism, but then view-specific. So they wrote a base class that performed most of the boilerplate steps. So the actual views could be fairly simple.

It also helps with unit testing: normally you’d have to test all the corner cases in all your views, now you only have to test your base class for that.

Custom base classes also often means you have methods that you might re-define in subclasses to get extra functionality. In those cases make sure you call the parent class’s original method (when needed).

Users of your views (especially when it is an API) get the advantage of having a more consistent API. It is automatically closer to the specification. The API should also be easier to read.

Meetups on steroids - Bob Aalsma

“Can I get a consultant for this specific subject?” Some managers find it difficult to allow this financially. A special course with a deep-dive is easier to allow.

He would like to be a kind of a broker between students and teachers to arrange it: “Meetups on steroids: pick the subject - any subject; pick the date - any date; pick the group - any group”

Security in Django - Joery van der Zwart

Joery comes out of the security world. He doesn’t know anything from the inside of django, but a lot of the outside of django. He’s tested a lot of them.

Security is as strong as its weekest link. People are often the weakest link. Django doesn’t protect you if you explicitly circumvent its security mechanisms as a programmer.

Django actually protects you a lot!

A good thing is to look at the OWASP list of top 10 errors. (See also Florian Apolloner’s talk at ‘Django under the Hood’ 2015, for instance).

  • SQL injection. Protection is integrated in django. But watch out when doing raw sql queries, because they are really raw and unprotected. If you work through the model layer, you’re safe.
  • Authentication and sessions. Django’s SessionSecurityMiddleware is quite good. He has some comments on authentication, though, so he advices to do that one yourself. (Note: the local core committer looked quite suspicious as this was the first he heard about it. Apparently there are a number of CVEs that are unfixed in Django. Joery will share the numbers.)
  • XSS injection. User-fillable fields that aren’t escaped. Django by default... yes, protects you against this. Unless you use {% autoescape off %}, so don’t do that.
  • Direct object reference. He doesn’t agree with this point. So ignore it.
  • Security misconfiguration. Basically common sense. Don’t have DEBUG = True on in your production site. Django’s output is very detailed and thus very useful for anyone breaking into your site.
  • Sensitive data. Enable https. Django doesn’t enforce it. But use https. Extra incentive: google lowers the page ranking for non-https sites...
  • Access control. It is very very very hard to get into Django this way. He says django is one of the few systems to fix it this rigidly!
  • CSRF. Django protects you. Unless you explicitly use @csfr_exempt...
  • Known vulnerabilities. Update django! Because there have been fixes in django. Older versions are thus broken.
  • Insecure forwards/redirects. Once you’ve enabled django’s default middleware, you’re secure.

So Django is quite secure, but you are not.

Look at django’s security documentation. And look at https://www.ponycheckup.com. You can check your site with it. The good is that it is simple. It only checks django itself, though.

With some other tools (like nessus) you have to watch out for false positives, though. So if you don’t know to interpret the result, you’ll be scared shitless.

A good one: Qualys SSLlabs https checker to get your ssl certificate completely right. (Note: see my blog post fixing ssl certificate chains for some extra background.)

“Owasp zap”: open source tool that combines checker and reduces the number of false positives.

The summary:

  • Good: django with middleware.
  • Good: django provides a good starting point.
  • Bad: experimenting. Be very sure you’re doing it right. Look at documentation.
  • Bad: do it yourself. Most of the times.

Django girls Amsterdam

On 25 june there’ll be a django girls workshop in Amsterdam. Everything’s set, but they do still need coaches.

Pygrunn keynote: the future of programming - Steven Pemberton

2016-05-13

Tags: python, pygrunn

(One of my summaries of the one-day 2016 PyGrunn conference).

Steven Pemberton (https://en.wikipedia.org/wiki/Steven_Pemberton) is one of the developers of ABC, a predecessor of python.

He’s a researcher at CWI in Amsterdam. It was the first non-military internet site in Europe in 1988 when the whole of Europe was still connected to the USA with a 64kb link.

When designing ABC they were considered completely crazy because it was an interpreted language. Computers were slow at that time. But they knew about Moore’s law. Computers would become much faster.

At that time computers were very, very expensive. Programmers were basically free. Now it is the other way. Computers are basically free and programmers are very expensive. So, at that time, in the 1950s, programming languages were designed around the needs of the computer, not the programmer.

Moore’s law is still going strong. Despite many articles claiming its imminent demise. He heard the first one in 1977. Steven showed a graph of his own computers. It fits.

On modern laptops, the CPU is hardly doing anything most of the time. So why use programming languages optimized for giving the CPU a rest?

There’s another cost. The more lines a program has, the more bugs there are in it. But it is not a linear relationship. More like lines ^ 1.5. So a program with 10x more lines probably has 30x more bugs.

Steven thinks the future of programming is in declarative programming instead of in procedural programming. Declarative code describes what you want to achieve and not how you want to achieve it. It is much shorter.

Procedural code would have specified everything in detail. He showed a code example of 1000 lines. And a declarative one of 15 lines. Wow.

He also showed an example with xforms, which is declarative. Projects that use it regularly report a factor of 10 in savings compared to more traditional methods. He mentioned a couple of examples.

Steven doesn’t necessarily want us all to jump on Xforms. It might not fit with our usecases. But he does want us to understand that declarative languages are the way to go. The approach has been proven.

In response to a question he compared it to the difference between roman numerals and arabic numerals and the speed difference in using them.

(The sheets will be up on http://homepages.cwi.nl/~steven/Talks/2016/05-13-pygrunn/ later).

Pygrunn keynote: Morepath under the hood - Martijn Faassen

2016-05-13

Tags: python, pygrunn, zope, django

(One of my summaries of the one-day 2016 PyGrunn conference).

Martijn Faassen is well-known from lxml, zope, grok. Europython, Zope foundation. And he’s written Morepath, a python web framework.

Three subjects in this talk:

  • Morepath implementation details.
  • History of concepts in web frameworks
  • Creativity in software development.

Morepath implementation details. A framework with super powers (“it was the last to escape from the exploding planet Zope”)

Traversal. In the 1990’s you’d have filesystem traversal. example.com/addresses/faassen would map to a file /webroot/addresses/faassen.

In zope2 (1998) you had “traversal through an object tree. So root['addresses']['faassen'] in python. The advantage is that it is all python. The drawback is that every object needs to know how to render itself for the web. It is an example of creativity: how do we map filesystem traversal to objects?.

In zope3 (2001) the goal was the zope2 object traversal, but with objects that don’t need to know how to handle the web. A way of working called “component architecture” was invented to add traversal-capabilities to existing objects. It works, but as a developer you need to quite some configuration and registration. Creativity: “separation of concerns” and “lookups in a registry”

Pyramid sits somewhere in between. And has some creativity on its own.

Another option is routing. You map a url explicitly to a function. A @route('/addresses/{name}') decorator to a function (or a django urls.py). The creativity is that is simple.

Both traversal and routing have their advantages. So Morepath has both of them. Simple routing to get to the content object and then traversal from there to the view.

The creativity here is “dialectic”. You have a “thesis” and an “antithesis” and end up with a “synthesis”. So a creative mix between two ideas that seem to be opposites.

Apart from traversal/routing, there’s also the registry. Zope’s registry (component architecture) is very complicated. He’s now got a replacement called “Reg” (http://reg.readthedocs.io/).

He ended up with this after creatively experimenting with it. Just experimenting, nothing serious. Rewriting everything from scratch.

(It turned out there already was something that worked a bit the same in the python standard library: @functools.singledispatch.)

He later extended it from single dispatch to multiple dispatch. The creativity here was the freedom to completely change the implementation as he was the only user of the library at that moment. Don’t be afraid to break stuff. Everything has been invented before (so research). Also creative: it is just a function.

A recent spin-off: “dectate”. (http://dectate.readthedocs.io/). A decorator-based configuration system for frameworks :-) Including subclassing to override configuration.

Some creativity here: it is all just subclassing. And split something off into a library for focus, testing and documentation. Split something off to gain these advantages.

Pygrunn: from code to config and back again - Jasper Spaans

2016-05-13

Tags: python, pygrunn

(One of my summaries of the one-day 2016 PyGrunn conference).

Jasper works at Fox IT, one of the programs he works on is DetACT, a fraud detection tool for online banking. The technical summary would be something like “spamassassin and wireshark for internet traffic”.

  • Wireshark-like: DetACT intercepts online bank traffic and feeds it to a rule engine that ought to detect fraud. The rule engine is the one that needs to be configured.
  • Spamassassin-like: rules with weights. If a transaction gets too many “points”, it is marked as suspect. Just like spam detection in emails.

In the beginning of the tool, the rules were in the code itself. But as more and more rules and exceptions got added, maintaining it became a lot of work. And deploying takes a while as you need code review, automatic acceptance systems, customer approval, etc.

From code to config: they rewrote the rule engine from start to work based on a configuration. (Even though Joel Spolsky says totally rewriting your code is the single worst mistake you can make). They went 2x over budget. That’s what you get when rewriting completely....

The initial test with hand-written json config files went OK, so they went to step two: make the configuration editable in a web interface. Including config syntax validation. Including mandatory runtime performance evaluation. The advantage: they could deploy new rules much faster than when the rules were inside the source code.

Then... they did a performance test at a customer.... It was 10x slower than the old code. They didn’t have enough hardware to run it. (It needs to run on real hardware instead of in the cloud as it is very very sensitive data).

They fired up the profiler and discovered that only 30% of the time is spend on the actual rules, the other 70% is bookkeeping and overhead.

In the end they had the idea to generate python code from the configuration. They tried it. The generated code is ugly, but it works and it is fast. A 3x improvement. Fine, but not a factor of 10, yet.

They tried converting the config to AST (python’s Abstract Syntax Tree) instead of to actual python code. Every block was turned into an AST and then combined based on the config. This is then optimized (which you can do with an AST) before generating python code again.

This was fast enough!

Some lesons learned:

  • Joel Spolsky is right. You should not rewrite your software completely. If you do it, do it in very small chunks.
  • Write readable and correct code first. Then benchmark and profile
  • Have someone on your team who knows about compiler construction if you want to solve these kinds of problems.
 
vanrees.org logo

About me

My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):