Django under the hood: internationalisation - Jannis Leidel

Tags: django, djangocon

(One of the summaries of a talk at the 2014 django under the hood conference).

Jannis Leidel tells us about django internationalisation (or should that be internationaliZation? Well, Jannis confessed to changing it to an S on purpose :-) ) Jannis used to maintain django’s translations a few years ago. Now he works on the mozilla developer network website, which has translations into more than 30 languages.

He showed a picture of the Rosetta stone. That stone turned into a tool to transmit the previously-unknown egypt hyroglyphs to our times. It was the vessel that brought lots of Egypt culture to the present time.

Isn’t what we do with our Django websites something that similarly can transfer knowledge and culture? But there is a problem…

56% of the web is English. 5% is a native speaker. Only 20% of the world population has a basic knowledge of English!

Another example? 0.8% of the web is Arabic. 4% is a native speaker. 0.1% is Hindi, with 5% native speakers.

Multilingual and multicultural… There are huge differences. The future is a global web, with lots of fresh new internet users coming online in the coming years. Users that don’t speak English and largely aren’t “western”. How do we get to a global web? There are three long words that are relevant:

  • Internationalization, “i18n”. The preparation phase of making your software at least translatable. Django has tools for this, for instance {% blocktrans %} in your templates. And gettext() in your python code.

  • Translation, “t9n”. Actually translating the marked-as-translatable strings from the i18n phase. Additionally you can translate your content. A different problem altogether is translating your project’s documentation.

    The files used for translation are normally the GNU gettext .po files. A technical format.

  • Localization (l10n). Different representations for numbers or dates and so. “5 mei 2014” versus “May 5th, 2014”.

    In django, everything is done automatically if you have USE_L10N = True in your settings file.

In django, look at django.utils.translation. It is an old part of the code. There wasn’t much web tech experience at that time. It contains a “thread local” that accesses the request, for instance. Various parts really should be replaced.

How does django detect the current language?

  • URL path.

  • Session data from a previous visit.

  • A cookie.

  • The Accept-Language request header.

  • Django’s LANGUAGE-CODE setting.

We have tons of information here, but it also means that it isn’t quite clear which language the user is actually going to see.

Most of the tools the open source generally uses come from the GNU Gettext project. It isn’t particularly bad as such, but we’re pretty much bound to gettext and cannot really use any improvements made in other toolsets.

The normal gettext string extraction workflow means running makemessages, sending the untranslated .po files over to the translators. Get them back, run compilemessages to generate the resulting .mo compiled translation file. This is very technical. A very technical solution to a non-technical problem.

It gets worse with javascript. We have to use a javascript file that contains a gettext implementation and that actually contains all the translations. It is a wonder it still works. (But, yes, it does work).

Timezones are aonother area. We use pytz for that, which works well.

The missing pieces in django. This are the bits he wants us all to add to django to support all those upcoming 4 billion/milliard (depends on your locale, I mean 4*10^9) new users.

  • One way to do this is to use more of the unicode CLDR database. We only use part of this. Lots of locale-relevant data.

  • Translation catalogs could be made handier. Merging them for instance. Visualize which strings still require translation. Progress indicator.

  • Another area: translating content. The content in the django database. There are quite a lot of add-ons that allow this. All with different approaches. But… all in all there are three basic approaches to do it. All three must be moved to django core!

  • For translating content, editing support could be handier, for instance a CompositeField to aid translation in multiple languages.

  • We can add helpers, for instance for visual translation with a sync to PO files. Just a small javascript helper that gives you a popup to translate a single string.

  • And… where is the common language selector? We all implement such a thing from scratch in all of our projects, right?

  • And what about a timezone selector? And can’t we suggest a way for browsers to pass along the user’s current timezone as part of the request? Cannot we lead that?

  • Use the hostname, too, when detecting the language. django.de, django.nl, etcetera.

Call to action

  • Rewrite our translation infrastructure on top of Babel. Advantage: it uses much more of the unicode CLDR project, bringing us much closer to many more people’s culture.

  • Push the boundaries. We can have lots of influence as a big open source project.

  • Make i18n a first class citizen. Remove the USE_I18N setting. It should be on by default.

His slides are on http://www.slideshare.net/jezdez/django-internationalisation

Come work for us!
 
vanrees.org logo

About me

My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):