Reinout van Rees’ weblog

Summary of my “developer laptop automation” talk

2014-11-21

Tags: python, django

Last week I gave a talk at a python meetup in Eindhoven (NL). I summarize Therry van Neerven’s python desktop application development talk, but I didn’t write one for my own “developer laptop automation” talk.

Turns out Therry returned the favour and made a summary of my talk. A good one!

Django under the hood: django migrations - Andrew Godwin

2014-11-14

Tags: django, djangocon

(One of the summaries of a talk at the 2014 django under the hood conference).

Andrew Godwin wrote south, the number one migrations framework for django. It is superseded by django’s new built-in migrations, also written mostly by Andrew, hurray!

The original idea was to have a schema backend and hooks in the django ORM. The actual migration code would live outside of django in south2. In the end, everything is now in django. The original distinction between “schema backend stuff” and “the actual migrations” is still there in the code, however.

The schema backend is relatively simple and straightforward; the migration part is hard and hairy. The migration part contains: operations, loader/graph, executor, autodetector, optimiser, state. He’ll talk about some of them here.

What about the old syncdb? It is a one-shot thing: you add tables and then you add the foreign keys. When migrating, you have dependencies. You cannot add foreign keys to tables you haven’t added yet. There is automatic dependency-detecting code, now, but that was added quite at the last moment in the 1.7 beta 2...

Basic dependencies means the obvious stuff. Some examples:

  • Like creating a model before adding a foreign key to it. Most databases get fussy if you try to add the foreign key too early.
  • Create the model before creating the fields. Sounds simple, but you need to have these basics in place first in the dependency graph before you can get on to the hard cases.

Now on to the more creative dependencies.

  • For many-to-many fields you need both target models first before you add the actual M2M model that points at the targets.
  • Multi table inheritance? Create the MTI parent before the child.
  • “Unique together” constraints need to be done after adding the fields. Same for “index together”.
  • “Order with respect to” is a rarely used feature that adds an extra field with an ordering based on a foreign key field. He started dispairing when he discovered this feature.
  • Proxy models. Weird things need to happen when you actually turn proxies into real models and want to do that in a migration. It is a one-line change for a developer, but it makes for nightmares in the migration code. “You have to create the model with the same name before you can delete the model with the same name”. Yes, that’s not a typo.
  • Swappable models! Please take them away! Swappable models? For instance the User model that you can replace with a different custom model. Suddenly a migration that you already applied might need to point at a different model. Rolling back the migrations is no option, as that leads to data loss. It works fine if you do it at the start of the project.

He used a different mindset when developing django’s migrations as opposed to how he developed South. South depended on people reading the documentation. Which they often don’t do. So they could shoot themselves in the foot quite well. Instead, django’s migrations are much more bulletproof, so there is much less need for reading the documentation in detail.

There’s a main loop in the migrations code that tries to find dependencies, shifts operations to satisfy the dependency, checks if everything is fine, and loops again and again until it is right.

The way it works is by chopping all operations into tiny dependencies. Every individual field that has to be created is turned into a tiny dependency step. After the list of steps is sorted (via the dependency-resolving loop) into the correct list of steps, an optimiser goes through the list and optimises it. If a model gets created and deleted, nothing needs to be done, for instance.

This kind of reducing could be dangerous. So there’s another loop that checks which reductions/simplifications are possible. Whether there are conflicts. It is better to have no optimisation than to have a wrong optimisation.

Reduction is applied after various stages: after the automatically detected dependency code. After applying the manual dependencies. And after squashing.

Squashing: it makes your history a bit shorter. It squashes migrations into a new starting point. This is especially handy when you’re a third party app developer.

The final part of the puzzle is the graph. It builds a directed graph of all basic migrations in memory. It needs to read all the models on disk for that. It also looks in the database. There’s a table in there that marks which migrations (or rather: nodes in the migration graph) have been applied.

A squashed migration lists the graph nodes that it replaces. A squash can only be applied if all the replaced nodes have the same state. They either all are unapplied: then the squash is applied. If they’ve all been applied, the squash can be considered as applied.

There’s room for improvement!

  • The autodetector is slow.
  • The optimizer is not great.
  • Graph state building is inefficient. Very inefficient. It might take 30 seconds. Building the state itself isn’t that slow, but it simply happens thousands of times.

It is mostly a case of un-optimized code. Big, pretty dumb, loops. So: everyone’s invited to help out, for instance at the sprint.

If you want to look at the code, here are some pointers:

  • django/db/migrations/autodetector.py, start at _detect_changes()
  • django/db/migrations/optimizer.py, start at reduce()
  • django/db/migrations/graph.py
  • django/db/migrations/loader.py

His slides are at https://speakerdeck.com/andrewgodwin/migrations-under-the-hood

Come work for us!

Django under the hood: internationalisation - Jannis Leidel

2014-11-14

Tags: django, djangocon

(One of the summaries of a talk at the 2014 django under the hood conference).

Jannis Leidel tells us about django internationalisation (or should that be internationaliZation? Well, Jannis confessed to changing it to an S on purpose :-) ) Jannis used to maintain django’s translations a few years ago. Now he works on the mozilla developer network website, which has translations into more than 30 languages.

He showed a picture of the Rosetta stone. That stone turned into a tool to transmit the previously-unknown egypt hyroglyphs to our times. It was the vessel that brought lots of Egypt culture to the present time.

Isn’t what we do with our Django websites something that similarly can transfer knowledge and culture? But there is a problem...

56% of the web is English. 5% is a native speaker. Only 20% of the world population has a basic knowledge of English!

Another example? 0.8% of the web is Arabic. 4% is a native speaker. 0.1% is Hindi, with 5% native speakers.

Multilingual and multicultural... There are huge differences. The future is a global web, with lots of fresh new internet users coming online in the coming years. Users that don’t speak English and largely aren’t “western”. How do we get to a global web? There are three long words that are relevant:

  • Internationalization, “i18n”. The preparation phase of making your software at least translatable. Django has tools for this, for instance {% blocktrans %} in your templates. And gettext() in your python code.

  • Translation, “t9n”. Actually translating the marked-as-translatable strings from the i18n phase. Additionally you can translate your content. A different problem altogether is translating your project’s documentation.

    The files used for translation are normally the GNU gettext .po files. A technical format.

  • Localization (l10n). Different representations for numbers or dates and so. “5 mei 2014” versus “May 5th, 2014”.

    In django, everything is done automatically if you have USE_L10N = True in your settings file.

In django, look at django.utils.translation. It is an old part of the code. There wasn’t much web tech experience at that time. It contains a “thread local” that accesses the request, for instance. Various parts really should be replaced.

How does django detect the current language?

  • URL path.
  • Session data from a previous visit.
  • A cookie.
  • The Accept-Language request header.
  • Django’s LANGUAGE-CODE setting.

We have tons of information here, but it also means that it isn’t quite clear which language the user is actually going to see.

Most of the tools the open source generally uses come from the GNU Gettext project. It isn’t particularly bad as such, but we’re pretty much bound to gettext and cannot really use any improvements made in other toolsets.

The normal gettext string extraction workflow means running makemessages, sending the untranslated .po files over to the translators. Get them back, run compilemessages to generate the resulting .mo compiled translation file. This is very technical. A very technical solution to a non-technical problem.

It gets worse with javascript. We have to use a javascript file that contains a gettext implementation and that actually contains all the translations. It is a wonder it still works. (But, yes, it does work).

Timezones are aonother area. We use pytz for that, which works well.

The missing pieces in django. This are the bits he wants us all to add to django to support all those upcoming 4 billion/milliard (depends on your locale, I mean 4*10^9) new users.

  • One way to do this is to use more of the unicode CLDR database. We only use part of this. Lots of locale-relevant data.
  • Translation catalogs could be made handier. Merging them for instance. Visualize which strings still require translation. Progress indicator.
  • Another area: translating content. The content in the django database. There are quite a lot of add-ons that allow this. All with different approaches. But... all in all there are three basic approaches to do it. All three must be moved to django core!
  • For translating content, editing support could be handier, for instance a CompositeField to aid translation in multiple languages.
  • We can add helpers, for instance for visual translation with a sync to PO files. Just a small javascript helper that gives you a popup to translate a single string.
  • And... where is the common language selector? We all implement such a thing from scratch in all of our projects, right?
  • And what about a timezone selector? And can’t we suggest a way for browsers to pass along the user’s current timezone as part of the request? Cannot we lead that?
  • Use the hostname, too, when detecting the language. django.de, django.nl, etcetera.

Call to action

  • Rewrite our translation infrastructure on top of Babel. Advantage: it uses much more of the unicode CLDR project, bringing us much closer to many more people’s culture.
  • Push the boundaries. We can have lots of influence as a big open source project.
  • Make i18n a first class citizen. Remove the USE_I18N setting. It should be on by default.

His slides are on http://www.slideshare.net/jezdez/django-internationalisation

Come work for us!

Django under the hood: model _meta - Daniel Pyrathon

2014-11-14

Tags: django, djangocon

(One of the summaries of a talk at the 2014 django under the hood conference).

Daniel Pyrathon talks about django’s Model._meta and how to make it non-disgusting. He worked on it via the google summer of code program. His task was to “formalize the Meta object”.

The Meta API is an internal API, hidden under the _meta object within each model. It allows Django to inspect a model’s internals. And.... it makes a lot of Django’s model magic possible (for instance the admin site).

What’s in there? For instance some real metadata like “model name”, “app name”, “abstract?”, “proxy model?”. It also provides metadata and references to fields and relations in a model: field names, field types, etc.

Which apps use it?

  • The admin.
  • Migrations.
  • ModelForms.
  • ... other developers. Developers have always used it, even though it is not an official API! Developers shouldn’t be using it as it is internal. You really need it however for things like django-rest-framework.

So... There’s a big need for a real, public API.

There is an important distinction between fields and related objects. A field is any field defined on the model, with or without a relation. Including foreign keys. Related objects are a special case: they are objects that django creates on objects if there’s for instance a foreign key pointing the other way. This distincion is how django likes to work internally. It does lead to a little bit of duplication regarding the API.

There are about 10 functions (“entry points”) in django that make use of _meta. And 4 properties. And there are 6 separate caching systems for the API... many_to_many, get_field, get_all_related_objects, etc.

The new Meta API’s philosophy:

  • An official API that everyone can use without fear of breakage.
  • A fast API, that also Django’s internals can use.
  • An intuitive API, simple to use. And documented.

The new API has only 7 entry points. Well, really only two: get_field and get_fields. The other five are fast cached helper functions to make the API easier to use.

There are three intuitive return types.

  • A set of field names.
  • A field object.
  • A set of cached properties, for instance a set of fields.

The new Meta API is properly tested. The old _meta was “only” tested by the entire set of django tests. The new one is explicitly properly tested in isolation.

get_fields is the main method that iterates through all the models, handling inheritance and so. In every loop through a model, the result is cached, leading to more performance.

For related objects, a complete graph of all the models with all the fields is needed. This is an expensive one-time operation which is cached afterwards.

Sidenote: what is used often in Meta is the cached_property decorator. It is a property that is only computed once per instance. It prevents lots of unnecessary re-calculations.

cached_property is included in django. You can also install a generic implementation from https://github.com/pydanny/cached-property (Note: at the bottom of the README there, I get thanked for calling cached_property to pydanny’s (Daniel Greenfeld’s) attention. Funny :-) )

Cached_property means the five extra cached properties (for grabbing related objects, for instance) are essentially free. They don’t have any overhead as they’re computed only once.

An important concept in the Meta API: immutability. This helps prevents lots of bugs. If you return an immutable result, you can be sure it cannot be changed (of course). An advantage is that they’re quick. You can also use itertools.chain() to avoid allocating a new list. You can make a copy of everything as a list, of course.

Fun fact: it seems that the Meta API and its optimizations give django a 10% performance boost.

He showed some additional ideas for future improvements. He’ll discuss them tomorow at the sprint.

Come work for us!

Django under the hood: python templating - Armin Ronacher

2014-11-14

Tags: django, djangocon, python

(One of the summaries of a talk at the 2014 django under the hood conference).

Armin Ronacher is the maker of many non-Django things such as Flask and Jinja2.

His initial thought was “why are we even discussing templates in 2014”? In 2011 everyone started making single-page applications. But a year later it turned out not to be too good an idea: the cloud is much faster than your mobile phone. So server-side rendering is hip again.

Django in 2005 had one of the few available template languages that were any good. It got imitated a lot, for instance in Jinja2. Which is fast.

He doesn’t really like looking at his own Jinja2 code. For him, it is pretty early Python code and it could be so much nicer. Feature-wise Jinja2 hasn’t changed a lot in the recent years. There are a couple of problems, but they’re not big enough to warrant a fix, as any fix will invariably break someone’s template.

The template language is mostly the same, but Jinja2’s and Django’s internal design differs greatly. Django has an AST (“abstract syntax tree”) interpreter with made-up semantics. Jinja2 compiles into python code (“transpiles” he calls it).

Rendering is mostly the same .You have a context object with all data for the template, you run it through the AST or the code and you get output.

The parsing of the template is widely different. Jinja2 supports nested expressions, for instance, Django not. Django parses in two stages. With a regex it filters out blocks, statements, comments and the rest. One of the problems is that a block statement needs to be on a single line. You often end up with super-long lines this way.

The parsing inside a block in Django happens in the block’s implementation. So it is very inconsistent and often re-implemented.

Extensions? Heavily discouraged in Jinja2. It is tricky to debug due to the compiled nature. It is encouraged in Django (“template tags”). Many people build them. It is easy to implement and debugging is quite doable. It is easier to implement due to the context and the render method.

A Jinja2 template compiles into a generator yielding string chunks. The django render() functions yield strings. Any form of recursive calls generate new strings. In the end, jinja can generate huge documents, django can run into problems.

Error handling is at the python level in Jinja2. jinja2 generates python code, so it simply prints a traceback if something goes wrong. Django has its own error handling that shows less information.

A huge difference is the context. In jinja2 it is a source of data. It only holds top-level variables. In django, it itself stores data. It holds all variables. Basically, it is a stack of dictionaries.

Autoescaping. In django, it is django-specific. It largely lives only in the template engine. Jinja2 uses markupsafe, which is pretty much standard in the python world. Escaping is handled in python. (Django also uses markupsafe one-directionally).

He showed some example django and jinja2 code. Fun to see the generated python code coming out of a short jinja2 template! He also showed some of the optimizations that jinja2 can do. Impressive.

Fun: he hacked the C python traceback functionality to get good tracebacks out of his templates. Imagine seeing simply {% for i in something %} in a traceback line...

Armin tried to use jinja in django for a google summer of code project. But in the end, it boils down to philosophical differences:

  • People like jinja because of its python expressions and because of its speed.
  • People like django’s templates because of the extensibility.

Why can’t django just use jinja2? Jinja needed to sacrifice certain functionality. Doing the same in Django would break everybody’s code.

The problem is solved to a sufficient degree that he thinks no big changes will happen.

Come work for us!

Django under the hood: django REST framework - Tom Christie

2014-11-14

Tags: django, djangocon

(One of the summaries of a talk at the 2014 django under the hood conference).

Tom Christie will give a two-part talk. The first is about django REST framework, the 3.0 beta will be out tomorrow. The second part is about API design in general.

REST framework 3.0

They made a few fundamental changes in 3.0. Why did they do this?

For instance for serializers. Serialization is from an object to a serialization format like json. It is is the deserialization that is the hard part: you have to do validation on incoming data and transform it if needed and convert it to an object. How it happened till now:

serializer = ExampleSerializer(data=request.DATA, ...)
serializer.is_valid()
# serializer.object exists now

This .is_valid() performs some validation on the serializer and it also instantiates an object. In 3.0, the validation still happens, but it only works on the serializer, not on an object. It returns validated data instead of an object. The validated data can afterwards be saved to an object explicitly.

Another change is that there’s a separate .create() and .update() method to split out the behaviour regarding creating new database objects or updating existing ones. The previous version determined itself whether a create or update was needed and it was often mixed in a wrong way.

This change was done because of relations. It is tricky to save a django User+Profile object when the user doesn’t exist yet. A profile needs a foreign key to a saved user... The code here did a dirty trick. A trick that you needed to do by hand to if you wanted to do something similar yourself.

You can look at this as an encapsulation problem (see Tom’s blog article). You can use the scheme of “fat models, thin views”. The idea is to say “never write to a model field or call .save() directly”. You instead always use model methods and manager methods for state changing operations. The advantage is that all your model-related code is encapsulated in one place.

Regarding the user/profile problem, you could write a custom model manager for users with a custom create() method which always creates User and Profile instances as a pair.

In REST framework 3.0, you can now use a similar method as you can now have a .create() model on the serializer that controls the actual creation process. You can still use the ModelSerializer shortcut, but your validation behaviour is made visible and explicit. “unique_together” and field validation are helped by this.

Note: I missed something here as my late-2010 macbook pro crashed again with the hardware GPU bug that apple refuses to fix.

Field validation in 2.0 can be hard to track, because you don’t know if something failed because of a serializer validation or because of a database validation. A max on an integer field will only fail when saving to the database: in 2.0 the error would come out of .is_valid(). In 3.0, the serializer validations will run OK and only the database save() will fail. Nice and explicit.

There are now more methods that you can override on the serializer, like to_internal_value(), to_representation().

Serializers are used as input for renderers. Simple ones like a JSON renderer, but also an elaborate one like the HTML renderer for browser forms. The serializer output is almost exactly like regular form data, except that there’s an extra data.serializer attribute, pointing back at the serializer so that the renderer can query it for extra information.

There’s a way to give extra hints to a renderer: add a style dictionary to your fields. The contents of the dict is intentionally left unspecified, you can do what you want. It is the renderer that will decide what to do with it. Probably you’ll only use this to customize the HTML renderer.

The big 3.0 picture:

  • Easier to override save behaviour.
  • More explicit validation.
  • Better model encapsulation.
  • Less complex internal implementation.
  • Supports both Form and API output.
  • Template based form rendering.
  • Nested forms and lists of forms.

In the 3.1 version, you’ll be able to use the forms as exposed by Django REST framework as almost-regular forms in your django site.

Generic API design thoughts

There are quite some hypermedia formats coming out. HAL, JSON-LD for instance. He doesn’t think they’re compelling yet.

There are hypermedia formats that are successes.

  • Like plain old RSS. A simple link of lists, at the core. It is domain specific. It is machine readable. There are many tools that can read it.
  • HTML is the closest we’ve come to a universal platform. It is document centric, though, not data centric. It is not intended for programmatic interaction.

Keep in your mind one of your websites. In your mind, peel off the javascript. Peel off the css next. Then you still have quite a lot of document-centric structure left!

Can’t you peel back the html, too? And be left with the actual data? So: less documented-oriented but data object oriented? Say, an abstract object interface? A Data Object Model? That way a mobile application and a regular html application can interact with the same content.

(Note: I don’t know if he really meant the Data Object Model to abbreviate to DOM, which is obviously already the DOM abbreviation as known from html...)

A document object might have some meta attributes like uri, title, description. And other attributes like lists, objects, strings, nested documents. A “link” in turn can have a uri, some fields, etc.

Could you build generic client application on top of such a thing?

A “document” could have a ‘board’ attribute with the current state of a chess board, for instance. If you work with such a document’s Data Object Model, you could call methods that modify the board (play_move('d2', 'd4')) directly...

Also imagine the level of functional testing you could do on your application. Especially as opposed to what you could do with bare html!

According to Tom, it is time to start addressing the fundamental issues of system communication. You’ve got to think big for this.

So: how do we design abstract object interfaces and describe the actions and transforms they expose?

Come work for us!

Django under the hood: Django ORM - Anssi Kääriäinen

2014-11-14

Tags: django, djangocon

(One of the summaries of a talk at the 2014 django under the hood conference).

Anssi Kääriäinen is the “guardian of the ORM”, he knows where all the bits and pieces are. He’ll explain especially how QuerySet.filter() works.

What is the Django ORM (object relational mapping)? It is a:

  • Query builder.
  • Query to object mapper.
  • Object persistence.

The operations the django ORM does are higher level than SQL by design. It doesn’t have .join() and .group_by() operations exposed as an ORM operation. It’ll do them behind the scenes, but you can’t call them directly.

Now about filters. For instance:

Book.objects.filter(author__birth_date__year__lte=1981)

This grabs the books with an author with a birth date’s year that’s 1981 or earlier. That year in the query isn’t in django by default. Something new in django 1.7 are model transforms. This allows you to add such a specific year loookup:

@some_registration_decorator
class YearTransform(models.Transform):
    lookup_name = 'year'
    output_field = models.IntegerField()

    def as_sql(self, compiler, connection):
        # Some nice code that returns a bit of SQL

Book.objects.filter(author__birth_date__year__lte=1981), what does it mean?

  • Book is the model class.
  • objects is the manager (models.Manager)
  • filter is a method on the manager. It returns a models.QuerySet. It results in a models.sql.Query, which is send through a models.sql.SQLCompiler.
  • author is a related model of the book model.
  • birth_date is an attribute of the author model.
  • year is the custom transformation we just made ourselves.
  • lte is the ‘less than or equal’ hint.

An essential part is Query.build_filter(). it does value preparation (for example for F-objects or for corner cases like ‘None’ in oracle). It fetches the source field, including join generation if needed. And it fetches transforms or custom lookups (like the __birth_date or __year) from the source field. It also calls setup_joins() which handles relations and field references. It perhaps fires off a subquery and does join trimming and join reuse handling. build_lookup() is the part that handles lookups like __lte. As the last part a bit of ‘isnull’ special case handling (SQL knows True/False/unknown, this is always a bit messy).

To build a filter, the ORM needs a mapping from field names (birth_date) to SQL fields. PathInfo provides the mapping. It uses the model’s _meta attribute heavily. PathInfo knows about traversing relations and grabbing attributes from the related models.

setup_joins() uses PathInfo to return the final attribute (birth_date in our case) and return the joins needed to get to the model that actually has that final attribute.

How do ManyToMany fields work? In the same way, really. To the ORM, a ManyToManyField simply means two foreign keys, so two joins. For the rest of the ORM there’s nothing special about it. Nice.

build_lookup loops though the parts (double-underscore-separated) of the query and looks up what to do with it. Perhaps a simple lookup (“grab this field”), perhaps a transform. A simple loop. The code looks simple. Anssi tells us, however, that the actual code in Django is much harder to read because of the many special cases and corner cases and exceptions and weird database issues it needs to handle. “The implementation is logical, but the logic takes some getting used to before you understand it”.

sql.Query contains alias_map, tables and alias_refcount. This contains all the info the Query needs to turn itself into SQL.

SQLCompiler gets a Query as input and outputs rows from the database (finally :-) ). Those rows then still have to be turned into actual python objects.

Another subject: expressions. He’s working on https://github.com/akaariai/django-refsql, which he hopes will end up in django core. It is a pretty simple mapping between a django-style query expression (birth_date__year) and the related raw SQL. “You can do funny tricks with it” was what Anssi said... The main goal is to get rid of django’s .extra(): his expression work is a nicer way to do an extra “select” in SQL and annotate the resulting objects with the values. I heard quite a lot of very happy noices come out of several core committers, so this might indeed be something nice!

It is intended to end up as something you can use as an annotation in future Django versions:

Something.objects.annotate(lower_name=Lower('name')).order_by('lower_name')

Nice talk! Thanks, Anssi.

Come work for us!

Django under the hood one-day conference this friday

2014-11-13

Tags: django, djangocon

Hurray, tomorrow morning (friday) I’ll go to Amsterdam for the first one-day django under the hood conference.

The tickets sold out quickly, but I convinced the company where I work (Nelen & Schuurmans) to sponsor the event, so we had three tickets reserved for us :-) If, by this evil strategem, I managed to snatch away your ticket: I’ll provide a little bit of atonement by providing summaries of all the talks. (I’ll tag them with djangocon to keep them together with my other django summaries).

We as Nelen & Schuurmans use a lot of Django, so we’re happy to do a bit in return by sponsoring the conference. We’re certainly not a pure IT company, as our main focus is water management. 40 of us are experts in ecology, water management, hydrology etcetera, 12 of us are programmers. And most of our work is done with Python and Django. That makes us one of the biggest Django shops in the Netherlands, which is fun for a company mainly associated with water :-)

I’m looking forward to the talks tomorrow. The conference aims at giving us a better understanding of Django’s internals so that we’ll hopefully contribute more to Django’s code and documentation.

  • Two talks about the database layer directly: the ORM internals and the summer of code work done on the models’ ._meta attribute. I hope to gather some extra tricks here.

    Every year I seem to discover one or two features of Django’s ORM that make my live easier. Features I saw no use for earlier or features that I didn’t now the real power of. Using the django debug toolbar is essential: it tells you the number of queries your page is generating. Often the number sends me directly to the django documentation to re-visit the ORM pages in search of stuff I missed.

    Anyway, these two talks should help me deepen my understanding.

  • Talking about databases: one of the talks is about the new migration framework. We’re heavy users of south, which has now been superseded by similar migration functionality in django 1.7 itself.

    Honesty forces me to confess to not having used 1.7 in a project yet. The one I’m starting next week will be 1.7, so hurray for some extra background information!

  • Templating languages: I’m happy with Django’s default one. And I use Jinja2 in outside-of-django python projects. Apparently jinja2 is quicker. Is that also true if integrated in Django? Are there other template languages better suited to other tasks? How about django’s template security features? Does that map OK to Jinja2, which allows more python in the templates?

    Lots of questions, so I’m all ears during that talk.

  • Internationalisation. Some extra info here won’t hurt.

    We normally do everything in English and translate it to Dutch. For a foreign project there might be some Vietnamese or Afrikaans translations lying around somewhere. Some of the projects have translations in our local transifex instance. We use that sometimes so that a project partner in FarFarAwayCountry can translate everything without needing github access and without needing knowledge of *.po files.

    In our setup, there’s room for improvement, especially regarding the workflow we use. So... what will the talk bring?

  • Django rest framework. Tom Christie got funding beyond his wildest dreams via kickstarter to work on the new 3.0 version.

    We use django rest framework quite a lot. The browseable API is its most direct selling point. That, and the fact that it is very actively developed.

    It is getting boring, but I’m also looking forward to this talk :-)

Six useful talks. That’ll be a good day. Followed by a day of sprinting on saturday.

As this is my own blog, I’ll be excused if I do a shout-out for the nice company where I work (Nelen & Schuurmans) again :-) If you live in the Netherlands (and if you speak Dutch) and if you’re good with Django, give us a thought. Useful, fun and socially relevant Django work!

http://reinout.vanrees.org/images/2014/nelenschuurmans_sponsor.png

Python desktop application development - Therry van Neerven

2014-11-11

Tags: python

(Summary of a talk at a python meeting in Eindhoven, NL. I also gave a presentation (just the slides)).

He’s the author of the “sendcloud client”, a desktop app of the company he works for. It steers a barcode scanner and prints out packaging slips.

Why a desktop app? They wanted to make it look like a native windows app. But especially, they wanted to do two things that a browser app cannot do: printing without a dialog and scanning barcodes! So they needed to.

Why in python? He wanted to learn more about python. And python allowed him to work quickly. There are quite a lot of libraries that helped him a lot (examples: requests, pywin32, pyusb).

There are a couple of GUI frameworks for python. Some don’t look too good, others are nicer. Kivy is a nice one. As is PyQt/PySide. PyQt might mean some licensing costs. Kivy has less features and it is not native looking. Kivy does have a more innovative UI as it is build on PyGame. It also supports android and ios.

Tip by Therry: if you’re interested, simply get your hands dirty. Try out something!

Important for desktop applications is to “freeze” your application. “Freezing” means compiling your python scripts to bytecode and to put them in a zip folder. The zip gets appended to a python interpreter. Then you place the pyhton interpreter in a folder where it can find all the other dependencies like Qt and your funny kitten images.

You can tool it and in the end, the process is quite accessible. You can look at CX_Freeze, Nuitka or Py2exe.

Freezing needs to happen on the platform where you want to build for. Cross platform building means you need to use virtual machines (vagrant, virtualbox, etc).

Script your build process as you will build your app very often.

Installers alre also important. Inno setup, NSIS and pynsist are options for windows. Shipping updates is simple: just create a new installer for every build... You could look at esky if you want to update without needing a completely new packages.

Some tips:

  • Pick the GUI framework that fits your needs. Do you need an elaborate one? Is a simple one fine?
  • Automate the things you do often. Makefiles of python scripts: both fine. It can save you hours of typing and clicking!
  • Not every GUI framework or package builder is mature. Feel free to discuss on github or stackoverflow.
  • You can make browser extensions. These might help you get the printing working that would normally not work well in a web environment. Suddenly you might be able to re-use all your web knowledge anyway!
  • From your webbrowser you could even connect to a desktop app with a REST API or a websocket. So you only have to build a desktop app that provides a little webserver and exposes it to your browser... Nice idea!

Ubuntu PPA madness

2014-10-30

Tags: python, django, plone

I’m going flipping insane. In ye olde days, when I was programming with the python CMS Plone, my dependencies were limited to python and PIL. Perhaps lxml. LXML was a pain to install sometimes, but there were ways around it.

Working on OSX was no problem. Server setup? Ubuntu. The only thing you really had to watch in those days was your python version. Does this old site still depends on python 2.4 or is it fine to use 2.6? Plone had its own Zope database, so you didn’t even need database bindings.

Now I’m working on Django sites. No problem with Django, btw! But... the sites we build with it are pretty elaborate geographical websites with lots of dependencies. Mapnik, matplotlib, numpy, scipy, gdal, spatialite, postgis. And that’s not the full list. So developing on OSX is no fun anymore, using a virtual machine (virtualbox or vmware) is a necessity. So: Ubuntu.

But... ubuntu 12.04, which we still use on most of the servers, has too-old versions of several of those packages. We need a newer gdal, for instance. And a newer spatialite. The common solution is to use a PPA for that, like ubuntugis-stable.

Now for some random things that can go wrong:

  • We absolutely need a newer gdal, so we add the ubuntugis-stable PPA. This has nice new versions for lots of geo-related packages, for instance the “proj” projection library.

  • It doesn’t include the “python-pyproj” package, though, which means that the ubuntu-installed python-pyproj package is compiled against a different proj library. Which means your django site segfaults. Digging deep with strace was needed to discover the problem.

  • Of course, if you need that latest gdal for your site, you add the PPA to the server. Everything runs fine.

  • A month later, the server has to be rebooted. Now the three other sites on that same server fail to start due to the pyproj-segfault. Nobody bothered to check the other sites on the server, of course. (This happened three times on different servers. This is the sort of stuff that makes you cast a doubtful eye on our quite liberal “sudo” policy...)

  • Pinning pyproj to 1.9.3 helped, as 1.9.3 worked around the issue by bundling the proj library instead of relying on the OS-packaged one.

  • Ubuntugis-stable sounds stable, but they’re of course focused on getting the latest geo packages into ubuntu. So they switched from gdal 1.9 to 1.10 somewhere around june. So /usr/lib/libgdal1.so became /usr/lib/libgdal1h.so and suddenly “apt-get update/upgrade” took down many sites.

    See this travis-ci issue for some background.

  • The solution for this PPA problem was another PPA: the postgres one. That includes gdal 1.9 instead of the too-new 1.10.

  • Possible problem: the postgres PPA also uses 1.9 on the new ubuntu 14.04. 14.04 contains gdal 1.10, so using the postgres PPA downgrades gdal. That cannot but break a lot of things for us.

  • I just discovered a site that couldn’t possibly work. It needs the ubuntugis-stable PPA as it needs a recent spatialite. But it also needs the postgres PPA for the 1.9 gdal! And those two don’t match.

  • It still works, though. I’m not totally sure why. On a compilation machine where we build a custom debian package for one of the components, the postgres PPA was installed manually outside of the automatic build scripts. And a jenkins server where we test it still has the ubuntugis PPA, but somehow it still has the old 1.9 gdal. Probably someone pinned it?

  • Another reason is probably that one of the components was compiled before the 1.9/1.10 gdal change and didn’t need re-compilation yet. Once that must be done we’re probably in deep shit.

  • If I look at some ansible scripts that are used to set up some of our servers, I see the ubuntugis PPA, the mapnik/v2.2.0 PPA and the redis PPA. Oh, how can that ever work? The software on those servers needs the 1.9 gdal, right?

  • I asked a colleague. Apparently the servers were all created before june and they haven’t done an “apt-get upgrade” since. That’s why they still work.

Personally, I think the best way forward is to use ubuntu 14.04 LTS with its recent versions. And to stick to the base ubuntu as much as possible. And if one or two packages are needed in more recent versions, try to somehow make a custom package for it without breaking the rest. I did something like that for mapnik, where we somehow needed the ancient 0.7 version on some servers.

If a PPA equates to never being able to do “apt-get update”, I don’t really think it is the best way forward for servers that really have to stay up.

Does someone have other thoughts? Other solutions? And no, I don’t think docker containers are the solution as throwing around PPAs doesn’t get more stable once you isolate it in a container. You don’t break anything else, true, but the container itself can be broken by an update just fine.

 
vanrees.org logo

About me

My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):