Reinout van Rees’ weblog

“Complex” and “complicated”

2015-12-21

Tags: python

I’m not a native english speaker, so sometimes I need to refresh my memory as to the exact meaning of a word. In this case the exact difference between complex and complicated.

I found an english.stackexchange.com page with some nice answers. According to the first answer:

  • Complex is about the number of parts. The more (different) parts, the more complex.
  • Complicated is about how hard/difficult something is.

Another answer pointed at the relevant part of the Zen of Python:

Simple is better than complex.
Complex is better than complicated.

So when programming, you can divide up a task or problem into separate parts. The more parts, the more complex. Simple is better than complex, so don’t add too many parts if you don’t need them.

On the other hand, if you want to do everything in one part, it might end up being too difficult, too complicated. Making it more complex (more parts) is better in that case.

It is a trade-off. And it takes experience and a bit of Fingerspitzengefühl to make the right trade-off.

How to learn it? Talk with more experienced programmers is one way. Another way is to explicitly review your programs yourself with those two lines from the Zen of Python. I’ll show them here again to drive them home:

Simple is better than complex.
Complex is better than complicated.

Ask yourself questions like this:

  • Do I still understand my own program? Did I make it too complicated?

  • Did I split it up enough? Or are there 400-line functions that I might better cut up? Into a more complex, but clearer, whole?

    Well-separated parts might make reasoning about your complicated problem easier.

  • Did I split everything up too much? Do you have more parts than your brain can handle?

    Your brain’s CPU has room for some 7 variables. Too many parts might make it impossible for you to work on your own code :-)

Hit by a car for the first time

2015-12-08

Tags: bike

Today was the first time I got hit by a car. On my (recumbent) bicycle. First things first: nothing is wrong :-)

If you look on this map, I crossed the “Winthontlaan” right in the center of that map from right to left. Contrary to that map, the cyclists have the right of way crossing that road (since half a year or so).

I crossed the road. The first half of the road was full of cars, as it often is late in the afternoon. So cars coming from the right don’t have a very good view on the cyclists that want to cross. The driver didn’t notice me. When I was crossing, there was quite some room between me and an approaching car, so I assumed he would stop. Room enough. And cars normally drive slowly there because of the unclear situation. This car did, too.

Only... he kept on driving. Weird to experience it. You see a car approaching and it doesn’t stop. There were no alarms going off in my head, only cool calculated distance calculations :-) I remember concluding that I would get hit.

Also weird: I don’t remember exactly what happened when I got hit. I just ended up on the ground. The whole process probably only takes three seconds, so no wonder you don’t remember the exact process.

The car only hit the back of my bike near the bag with my laptop backpack. From the right side. The total center of gravity was probably a little bit above the point where it hit, so I probably got lifted a little bit and ended up on the right side of my body.

  • No personal damage, I got up right away and everything checked out OK.
  • The owner of the car stopped immediately and first inquired of my health (“that’s the most important thing”). Very kind, correct and helpful!
  • We then checked out my bike. We had to adjust the angle of the front wheel and wriggle the brakes a bit, but that was about it.
  • I got the car owner’s email address in case I did find some additional daamge. I’ll mail him this summary if I don’t find anything when I check out my bike tomorrow :-)
  • I totally forgot to inquire after the car, weird enough. I asume it has no damage, though, as it hit my bike in a nice soft padded spot.

So: no harm done. And a gallant car owner that helped me afterwards. Not a bad way to get an accident :-)

An hour later I started to get sore on my left leg. I probably hit my steering bar with it or so. I know for sure I’ll feel it tomorrow. Nothing wrong, just sore.

Oh, I have gotten one thing out of the accident: I now know what was bothering me with my bike for the last week. My bike felt a little bit “squishy” when steering. Something somewhere was loose. I just couldn’t find it. After the accident it worsened a bit and I finally spotted which nut I have to tighten. Funny.

As a former dutch soccer player used to say: every drawback has its advantage.

It also has the advantage of cheaply instilling in me the need for a bit more defensiveness in my driving. Not that I’m driving recklessly, far from it. I even stop for red traffic lights, unlike 90% of the dutch population :-)

Some legal/dutch comments for those that live in a country where cycling is not ubiquitous:

  • I simply had the right of way, so I had the right of way. Sounds logical, but the stories I sometimes hear from the UK sometimes make the dutch situation sound like a luxury. “But it was a cyclist” seems to make everything right (for the car owner) in the UK, it seems.

  • It was dark, but I had my lights on. And that’s a luxury, as 40% of the cyclists don’t have a light in the Netherlands. So I was very visible (compared to most other cyclists). I have a good (85 Euro) headlight.

  • Yes, I have a recumbent bicycle that’s lower than “normal” bicycles. That puts my head more or less on the same level as that of car drivers. Normal bicyclists are higher than cars. I assume I cannot be blamed to be more or less car-height? Especially as I had my lights on. And my light is at the same level as most other bikes.

  • If there would have been damage (which there wasn’t), the money would have to be payed by the car driver unless the cyclist was very, very clearly and recklessly at fault. This way, the law protects the most vulnerable traffic participant. If you drive a car, you technically drive a 1000 kg killing machine, so you have more legal responsibility than a mere cyclist or pedestrian.

  • I have cycled since I was a young child. At the age of 9 I already cycled 1.5km to school on my own. Nowadays I cycle 9km every morning and afternoon. I’m 42 and this is only the first reasonably serious accident I have. I had one crash with another cyclist at almost the same point. The other cyclist totally cut the corner and hit me straight on even though I was driving neatly on the right side of the road.

    So: cycling in the Netherlands is very, very safe! And I enjoy it every day.

Fixing SSL certificate chains

2015-11-23

Tags: python, django

This blog post applies when the following two cases are true:

  • Your browser does not complain about your https site. Everything seems fine.
  • Some other tool does complain about not finding your certificate or not finding intermediate certificates. What is the problem?

So: your browser doesn’t complain. Let’s see a screenshot:

Browser address bar with a nice green closed lock, so ssl is fine

Examples of the errors you can see

Some examples of complaining tools. First curl:

$ curl https://api.letsgxxxxxxx
curl: (60) SSL certificate problem: Invalid certificate chain
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

curl has the right error message: Invalid certificate chain.

Let us look at wget:

$ wget https://api.letsgxxxxxx
--2015-11-23 10:54:28--  https://api.letsgxxxxx
Resolving api.letsgxxxxxx... 87.233.157.170
Connecting to api.letsgxxxxxx|87.233.157.170|:443... connected.
ERROR: cannot verify api.letsgxxxxxx's certificate, issued by 'CN=COMODO RSA
  Domain Validation Secure Server CA,O=COMODO CA Limited,L=Salford,ST=Greater Manchester,C=GB':
  Self-signed certificate encountered.
To connect to api.letsgxxxxxx insecurely, use `--no-check-certificate'.

wget is right that it cannot verify .... certificate. But its conclusion Self-signed certificate encountered is less helpful. The certificate is not self-signed, it is just that wget has to treat it that way because the certificate chain is incorrect.

If you talk to such an https URL with java, you can see an error like this:

javax.net.ssl.SSLHandshakeException:
sun.security.validator.ValidatorException:
PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException:
unable to find valid certification path to requested target

This looks quite cryptic, but the cause is the same. SunCertPathBuilderException: CertPath sure sounds like a path to a certificate that it cannot find.

A final example is with the python requests library:

>>> import requests
>>> requests.get('https://api.letsgxxxxxx')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File ".../requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File ".../requests/sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File ".../requests/sessions.py", line 573, in send
    r = adapter.send(request, **kwargs)
  File ".../requests/adapters.py", line 431, in send
    raise SSLError(e, request=request)
SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)

How to determine what’s wrong

So... you yourself discover the problem. Or a customer calls that he’s getting an error like this. Even though everything seems right if you test the https site in the browser.

Solution: go to https://www.digicert.com/help/

If that site says everything is completely right, then you’re done. If it still complains about something, you’ve got work to do.

Most of the checkmarks are probably green:

Green checkmarks in front of many common SSL checks

In cases like this, the problem is in the certificate chain at the bottom of the page. Here’s an example of one of our own sites from a few months ago:

Broken chain icon indicating the exact problem spot

Note the “broken chain” icon halfway. Just follow the chain from top to bottom. Everything has to be perfect. We start with the *.lizard.net which is issued by GeoTrust SSL CA - G2.

The certificate GeoTrust SSL CA - G2 in turn is issued by GeoTrust Global CA.

The problem: the next certificate in the chain is not about GeoTrust Global CA, but about GeoTrust SSL CA, which is different. Here the chain breaks. It does not matter that the fourth certificate is about the GeoTrust Global CA we were looking for. The chain is broken. The order in which the certificates are placed must be perfect.

After fixing the order of the certificates in our certificate file, the problem was fixed:

Chain icons indicating that the chain is unbroken

Why is a chain needed?

There are lots of certificates in the wild. All the browsers (and java, and your OS and...) often only store a handful (well, 20+) “root certificates”. All the other certificates have to trace their origin back to one of those root certificates.

That is where the intermediate certificates come in: they’re a cryptographically signed way to trace the validity of your certificate back to one of the known-good root certificates.

How to fix it

  • If you’re handling certificates yourself, you ought to know which files to edit. The main problem will be getting the right intermediary certificates from the issuing party. Often you only get “your” certificate, not the intermediary ones. Ask about it or google for it.

  • Often you won’t maintain those certificates yourself. So you have to get your hosting service to fix it.

    If you let someone else take care of the certificate, point them at https://www.digicert.com/help/ and tell them to make sure that page is completely happy.

    In my experience (=three times in the last two years!) they’ll mail back with “everything works now”. But it still won’t work. Then you’ll have to mail them again and tell them to really check https://www.digicert.com/help/ and probably provide screenshots.

Good luck!

Nginx proxying to nginx: getting gzip compression to work

2015-11-19

Tags: python, django

At work we use gunicorn as our wsgi runner. Like many, gunicorn advises you to run the nginx webserver in front of it. So on every server we have one or more websites with gunicorn. And an nginx in front.

Nginx takes care, of course, of serving the static files like css and javascript. Some gzipping of the results is a very, very good idea:

server {
    listen 80;
    server_name my.great.site;
    ...
    gzip on;
    gzip_proxied any;
    gzip_types
        text/css
        text/javascript
        text/xml
        text/plain
        application/javascript
        application/x-javascript
        application/json;

    ....
}

Two notes:

  • The default is to only gzip html output. We also want javascript and json. So you need to configure gzip_types.

    (I copy-pasted this from one of my config files, apparently I needed three different javascript mimetypes... Perhaps some further research could strip that number down.)

  • gzip_proxied any tells nginx that gzipping is fine even for proxied requests.

Proxied requests? Yes, because we have a lot of servers and all external traffic first hits our main nginx proxy. So: we have one central server with nginx that proxies requests to the actual servers. So: nginx behind nginx:

server {
    listen   443;
    server_name my.great.site;
    location / {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_pass http://some-internal-server-name/;
    }
    ssl on;
    ssl_certificate ...
    }

Pretty standard “I listen on 443/https and proxy it on port 80 to some internal server” setup.

Works like a charm. Only drawback: gzipping does not work.

The reason? nginx defaults, in this case.

  • The gzip module has a gzip_http_version configuration parameter with a default of 1.1.

    Which means that http 1.0 requests are not gzipped, only 1.1.

  • The proxy module has a proxy_http_version configuration parameter with a default of 1.0.

    Which means that proxied requests are send from the main proxy to the actual webserver with http 1.0.

These two don’t match. There are two solutions:

  • Set gzip_http_version 1.0 in the nginx configs on your webservers. This switches on gzip for the http 1.0 connections coming from the proxy.
  • Set proxy_http_version 1.1 on the main proxy so that it sends http 1.1 connections to the webservers.

My choice originally was to do the first one. But a bug report came in for another site and now I’ve switched it on on the main proxy so that all the sites get the benefit.

Note: you might want to make different choices. Perhaps you have a caching proxy halfway? Perhaps you want the main nginx on the proxy to do the gzipping for you? Etcetera. Check whether the above tips apply to your situation :-)

Buildout 2.5.0 has much nicer version conflict reporting

2015-11-16

Tags: python, django, buildout

We use buildout for all our django projects. Nothing wrong with pip, but buildout has extension possibilities build-in (for creating directories, installing user crontabs, local development checkouts and many more) that are quite helpful. And it works better when you need to use system packages (gdal, mapnik, etc).

One area where buildout could use some improvement was the version conflict reporting. Let’s say you have pinned django to 1.6.6 (old project that I’ll upgrade to 1.8 this week) and you add the django debug toolbar. This is the error you get:

The constraint, 1.6.6, is not consistent with the requirement, 'Django>=1.7'.
While:
  Updating django.
Error: Bad constraint 1.6.6 Django>=1.7

First things first. An easy one is to improve the wording of the message:

While:
  Installing django.
Error: The requirement ('Django>=1.7') is not allowed by
your [versions] constraint (1.6.6)

Now... so there is some package that requires at least django 1.7. But which one? Buildout did not tell you. Which would mean you’d have to grep in all your requirements’ sub-requirements for which package actually requires the offending “django>=1.7”...

I’ve now added some internal logging that stores which package required which dependency. After an error occurs, the list is searched for possible matches.

With this change you’ll get a much more helpful output right before the error:

Installing django.
version and requirements information containing django:
  [versions] constraint on django: 1.6.6
  Base installation request: 'sso', 'djangorecipe'
  Requirement of djangorecipe==1.10: Django
  Requirement of djangorecipe==1.10: zc.recipe.egg
  Requirement of djangorecipe==1.10: zc.buildout
  Requirement of sso: django-nose
  Requirement of sso: django-mama-cas
  Requirement of sso: django-debug-toolbar
  Requirement of sso: django-auth-ldap
  Requirement of sso: Django<1.7,>=1.4.2
  Requirement of lizard-auth-server: django-nose
  Requirement of lizard-auth-server: django-extensions
  Requirement of lizard-auth-server: Django<1.7,>=1.6
  Requirement of django-nose: Django>=1.2
  Requirement of django-nose: nose>=1.2.1
  Requirement of django-mama-cas: requests==1.1.0
  Requirement of django-debug-toolbar: sqlparse
  Requirement of django-debug-toolbar: Django>=1.7
  Requirement of django-auth-ldap: python-ldap>=2.0
  Requirement of django-auth-ldap: django>=1.1
  Requirement of translations: Django>=1.4
  Requirement of django-extensions: six>=1.2
While:
  Installing django.
Error: The requirement ('Django>=1.7') is not allowed by
your [versions] constraint (1.6.6)

This makes it much easier to spot the cause (in this case django-debug-toolbar).

There are some unrelated packages in here because I’m doing a textual comparison. The advantage is that it is very robust. And extracting the right package name from requirements without messing things up is harder to get right and takes more code.

So... if you use buildout, give version 2.5.0 a try!

Django under the hood: documentation workshop - Mikey Ariel

2015-11-07

Tags: django, djangocon

(One of my summaries of a talk at the 2015 django under the hood conference).

Documentation used to be an afterthought of software delivery. Now it is a key component of the success of a software project.

Content strategy

Originally it is a marketing term. Which is fine, as documentation is an important part of your project’s marketing!

The core is asking the right questions (even is the answer is simple).

  • Who are my readers? Sounds like a simple question. But... are your readers advanced users? Or beginners? Do you need “persona-based” documentation, so documentation for specific groups (“admins”, “developers”, etc)?

  • What do my readers want to know? Often your readers need context before they can understand reference documentation. Do you need an end-to-end tutorial? Or just explanations?

    Does the textual content need to be enhanced with video or diagrams, for instance?

  • When do my readers need the content? Installation documentation right at the beginning to get started? A reference guide when you’re already working with it? Tutorials for learning it?

    “When” also relates to “when do I need/want to update the documentation?”

  • Where do my readers consume the content? Do you need a “man” page? Embedded help in your GUI app? Good, helpful error messages? Online documentation that can be found by google?

  • Why do my readers even need this content? Minimize double work. Can you point at another project’s documentation or do you need to describe some feature yourself?

    Similarly, if you need to add documentation to work around bugs or things that don’t work right yet: should you not actually fix the code instead?

DevOps for docs

“Content strategy” leverages marketing concepts to make your documentation better. Likewise, “devops for docs” leverages engineering for your documentation.

  • Look for a unified toolchain. If possible, use the same tools as the developers of the project you’re documenting (especially if you’re both the developer and the documenter). Git, for instance. Don’t use google docs if the project uses git. By using the same kind of system, everybody can help each other.

  • Use out of the box documentation tools like asciidoctor, gitbook, MkDocs, sphinx.

  • Use continuous integration! Automatic checker for broken links, perhaps an automatic spell checker, automatic builds (like read the docs does).

    There are automatic checkers like “Hemingway” that can be used as a kind of text unit test.

    You can add custom checks like making sure your project name is always spelled correctly.

  • Iterative documentation. Dividing the work into sprints for instance if it is documentation for a big project. Use your issue tracker or trello or something like that to manage it.

Keep in mind: we’re all in this together. Designers, developers, product managers, quality assurance, support engineers, technical writers, users.

Docs or it didn’t happen

Some ideas.

  • Treat docs as a development requirement. Write it down in your contribution guidelines. Write down what your definition of “documented” is.

  • Contribution guidelines are a good idea! They’re an important part of your documentation in itself. Do you want people to help you? Write those guidelines.

    With contrib guidelines you can also steer the direction and the atmosphere of your project. If you suggest that 20% of a participant’s time is spend mentoring new contributors, you send a strong message that you’re a welcoming and helpful community, for instance.

    Also look at non-code areas. Do you want contributions from designers? Do you explicitly like somone to work only on the documentation side of things?

  • Provide templates. “If you add a new feature, use this template as a basis.”. “Add a ‘version added’ link to your description.” That kind of helpful suggestions.

  • Contributing to documentation is a great (and often reasonably easy) way to get into contributing to a project as a whole.

  • Collaboration and training. Sprints and hackfests are great to get people started. There are communities and conferences. “Open help”, “write the docs”. Also mini-conferences inside bigger ones.

My recumbent bike in front of a station

Image: my recumbent bike in front of Troisvierges station in the north of Luxemburg, our startpoint this summer for cycling over the former ‘Vennbahn’ railway

water-gerelateerd Python en Django in het hartje van Utrecht!

Django under the hood: twisted and django - Amber Brown

2015-11-06

Tags: django, djangocon

(One of my summaries of a talk at the 2015 django under the hood conference).

Amber Brown is a Twisted core developer. This talk’s summary will be:

>>> Django == good
True
>>> Twisted == good
True

And everything will be better if we work together.

Synchronous and asynchronous. Synchronous code is code that returns inline. Asynchronous code is code that returns something possibly at a different time. The extra complication is that IO is often blocking.

Twisted is asynchronous. Regular python code like socket.write() is blocking. Twisted has its own socket that calls python’s behind the scenes. In user code, you should only use twisted’s version: then your code is async and it isn’t blocking on IO.

At the core there’s always something that tries to read/write data. But we normally work at a higher level. So there are protocols that we actually use that are build upon the lower-level read/write connection.

A lot of the async code works with callbacks. You call a function and pass in a second function that gets called with the result when the first function is ready. Twisted uses it often like this:

>>> deferred = Deferred()
>>> deferred.addCallback(lambda t: t+1)
<Deferred at ...>
>>> deferred.addCallback(lambda t: print t)
<Deferred at ...>
>>>deferred.call(12)
13

They’re trying to use more of the recent python 3 syntax goodness to make working with this easier. Generators, yield from, etcetera.

Now on to django.

Django does blocking IO. Making this asynchronous is hard/impossible. Everything has to cooperate or everything falls apart. It is hard/impossible to “bolt it on” afterwards.

People use to think “sync=easy, async=hard”. That’s not the case, though. Both have their own advantages and drawbacks:

  • Sync code is easy to follow. One thing happens after the other. A drawback is that you can only do one thing at once. Persistent connections are hard.
  • Async code is massively scalable. handling persistent/evented connections is super easy. Python 3 adds syntactic sugar that makes it easier to write. A drawback is that you can get into a “callback hell”. You have to be a good citizen: blocking in the reactor loop is disastrous for performance.

A way of running django is a threaded WSGI runner. Each thread can be blocking on IO, but you have lots of them. You could look at hendrix, a WSGI runner that can run django and which also includes websocket support.

There’s something new for django: django channels (note: corrected the link 2015-12-14, I pointed at django-channels.readthedocs.org instead of channels.readthedocs.org). Requests and websockets are now events that can be fed via channels to queues. Workers can grab work from the queue. When ready, the channel feeds it back to something on the other side. It supports websockets.

With django-channels you can use the @consumer('django.wsgi.request') decorator to subscribe to some queue.

It doesn’t really make django code asynchronous. It is “only” a way to use synchronous code in an async way. But that might just be enough! It is a big improvement for django and it is better than the current approach. There are talks of integrating django-channels in django core when it is polished a bit more.

But: adopting an asynchronous framework (=twisted) is a long-term way forward. Otherwise we keep bolting patches on a request/response mechanism that isn’t suited very much to the modern web.

Then we got shown an example of django running with an async ORM and handling requests in an async way. It was a quick hack with lots of bits missing, but it did work. It would probably work very fast on pypy.

Python 3.4 has the “yield from” statement that lets you use return in the function you’re calling. Python 3.5 has even more goodies like await.

Twisted is trying to modernize itself and trying to get more people onboard. A django-style deprecation policy. Removing 2.6 support. Using new python 3.4+ features.

Django should run on twisted!

And what about greenlets? Bad... Just read https://glyph.twistedmatrix.com/2014/02/unyielding.html

simulated television interview

Image: television interview with not-quite-completely-painted scale figures on my in-progress ‘Eifelburgenbahn’ 1:87 railway layout.

water-gerelateerd Python en Django in het hartje van Utrecht!

Django under the hood: expressions - Josh Smeaton

2015-11-06

Tags: django, djangocon

(One of my summaries of a talk at the 2015 django under the hood conference).

Josh Smeaton is a django core developer after his work on expressions.

What already existed for a long time in django are F expressions. There are used to send a computation to the database. A self-contained parcel of SQL. Like “take the price field and add the shipping costs to it”. Later aggregations were added. It is a bit the same, as it is “just a bit of sql” that gets send to the database.

Expressions in django are now much more refined. Multiple database backend support. Deep integration in the ORM to make writing expressions yourself in django easier. It almost makes .extra() and .raw() obsolete.

  • .raw() is for writing an entire query in SQL. For those corner cases where you need to do weird tricks that the ORM doesn’t support.
  • .extra() is for appending bits of SQL to the rest of your django query. It is evil and should go away.

Both are escape hatches that are hardly ever needed. One problem with them is that they are database backend specific.

Some examples of where you can use expressions:

.create(... username=Lower(username))
.annotate(title=F('price') + shipping)
.aggregate(sum_total=Sum('total'))
order_by(Coalesce('last_name', 'first_name))
.filter(name=Lower(Value(user_input)))

Batteries included! There are a couple of build-in functions like Coalesce, Concat, Lower, Upper.

Expressions can hide complexity. F(), Case(), When(). F() can refer to fields added by an aggregate, for instance. That goes much deeper than you could do with some custom .extra() SQL. And Case()/When() can be used to select different values out of the database depending on other values.

There are building bloks: Aggregate(), Func(), Value(). You can use those to make your own expressions.

Expressions in django 1.8 now has a proper public API with documentation. The espressions are composable: Sum(F('talk') + F('hold')) + Sum('wrap'). And the internals of the whole ORM are greatly simlified.

There’s one thing you can do with .extra() but not with expressions: custom joins. There are also still a few small bugs as the functionality is still pretty new. They’ve all been easy to fix till now.

He then showed some examples of using expressions and writing your own. Looks nice!

Steam loco loading coal

Image: German type-86 loco taking on coal on my in-progress ‘Eifelburgenbahn’ layout.

water-gerelateerd Python en Django in het hartje van Utrecht!

Django under the hood: files in Django - James Aylett

2015-11-06

Tags: django, djangocon

(One of my summaries of a talk at the 2015 django under the hood conference).

James Aylett talks about files in django. You’ve got Files in python. Django build its own abstraction on top of it. File, ImageFile. Separate ones for use in tests. UploadedFile (“behaves something like a file object”, it mentions in the documentation...). There are temporary file and memory variants. Custom upload handlers. forms.FileField. It might not be perfect, but it works.

Files in the ORM: what gets stored in the database is a path to a file, the file is stored on the filesystem. If you store an ImageField, you can query the width and the height of the image. You’re better off storing the width and height in the database, though, as otherwise the image has to be read from disk on every request.

“Files are stored on the filesystem”? They are stored in settings.MEDIA_ROOT by default. Storing is done by a storage backend. You can replace it to get different behaviour. You can use a different storage backend by configuring it in the settings file. Or you can override it on a field-by-field basis.

Different storage backends? You can store data in amazon S3, for instance.

If you have a reusable app that works with files, please test on both windows and linux. And test with someting remote like working with S3.

Static/media files. Originally, you only had “media” files. Since django 1.3, you also have static files: non-user-uploaded files such as your apps’ javascript/css. Splitting “media files” and “static files” is a good thing.

There’s a complication: CachedStaticfilesStorage (or ManifestStaticfilesStorage in django 1.7). It adds hashes of the files to the filename to allow them to be cached for ever. Great system. Best practice. But it depends on everything using {% staticfiles %} in a very neat way. Otherwise you have cached-forever files that you want to change anyway...

Asset pipelines. Not many people write their css and javascript in one single file. You split it over multiple files. Or you compile coffeescript to javascript. Or you use a program (webpack or browserify for instance) to combine various files in one big one. This is graet for minification and caching. You do probably need “source maps” to help your browser debug tools refer back to the original files.

(Note by Reinout: read https://github.com/faassen/bundle_example for a nice explanation!)

Now... how do you get this into your django template? Either your combiner has to read your html code and write it back again. Or you write custom code to do things like {% asset 'my-js.js' %}.

For an example of an alternative you could look at rails/sprocket. Sprocket manages the entire pipeline and can touch every file and manage everything.

In the node.js world, it is common for the web code not to touch the pipeline. They’re separate. Webpack is an interesting one. Also “gulp” which defines the pipeline in a program. This means it can be customized a lot.

For django, it is good to be compatible with what node.js is doing.

What we’d ideally need:

  • Use a pipeline external to django.
  • Hashes computed by staticfiles.
  • Sourcemap support.

If you want to use webpack, you could look at wabpack-bundle-tracker and django-webpack-loader. The pipeline is run by webpack and it emits a mapping file. There is a template tag to resolve the bundle name to a URL relative to STATIC_ROOT.

Tip: many people know http://caniuse.com/. There’s also http://doiuse.com/, which looks at it the other way around. It looks at your website and figures out which thing you’re using that lead to problems in browsers you care about.

Cat licking itself near a water tower

Image: cat cleaning itself on the valve of a water tower, picture of my in-progress ‘Eifelburgenbahn’ layout.

water-gerelateerd Python en Django in het hartje van Utrecht!

Django under the hood: keynote - Russell Keith-Magee

2015-11-06

Tags: django, djangocon

(One of my summaries of a talk at the 2015 django under the hood conference).

Russell Keith-Magee started by showing a lot of commit messages to show Django’s history. There are weird and humoristic ones in them.

Many having to do with Malcolm Tredennick. Like Malcolm’ insistence on auto-escaping in templates to make it safe. And removal of white space at the end of lines.

Stories about bugs that only surfaced on the first day of the month if UTC had not yet rolled over. And only if the previous month had 31 days... Oh, and a set of commits done by a person that was convicted to community service!

There was a fine collection of weird problems. “Fixed #16809 – Forced MySQL to behave like a database”.

Now we come to the present. There are some technical threats like real-time and async code. Technical challenges can be met. There is a bigger risk, though: the social aspect. The low hanging fruit in django has all been solved. What is left to do is really hard big problems. Often, only core committers do that kind of work. But those are the ones that already do a lot of work. We need new people stepping up. Those need to be mentored. By the same people that already do a lot of work. There are some initial efforts at paying people to work on django, but that’s a topic of an entirely different talk.

Bringing in new people means new people in the community. What is the community like? Can it cope with new people? How is the atmosphere? Do technical debates between old community colleagues flare up into wars? Or do they not? There has been a big debate on the code of conduct with all the expected arguments. In the end the code of conduct is now just accepted practice, but it took quite some work and flak.

Django is an incredible project. With a great community. And it already exists for 10 years! Which is an achievement in itself.

Russell mentored some people, he told an example earlier. Malcolm was called out as someone who especially mentored and welcomed and helped people. Russell asked us to follow Malcolm’s example and to be welcoming and to help people and to share knowledge. Start at this conference!

German railcar leaving tunnel

Image: German ‘Schienenbus’ railcar leaving the Monreal tunnel on my in-progress ‘Eifelburgenbahn’ layout

water-gerelateerd Python en Django in het hartje van Utrecht!
 
vanrees.org logo

About me

My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):