I am doing HTTP wrong - Armin Ronacher

Tags: python, django, pygrunn

According to Armin Ronacher most (Python) web frameworks use a request/response style of handling HTTP. At his company, they’re treating HTTP a litle bit different. (So the talk is first about some HTTP-usage-in-Python observations and second a look at the alternative way they’re treating HTTP).

Note: my brother has a clearer summary, btw.

The most low-level way is to write directly to the response. Write the response headers, write the actual response content. In Python, you often have some response object; often some sort of middleware gets the chance to do something to the response on the way out.

The nice things we like about HTTP:

  • It is text based. You can easily debug it.

  • REST is handy for APIs.

  • Content negotiation.

  • Caching.

  • Very very well supported :-)

A basic question you should ask yourself is why does my application look like HTTP? A common Django application gets a request, does something and returns a response. Works well. But why is it set up that way? Why is it so focused on HTTP? (It is logical that it focuses on this use case, but you can still ask the question).

HTTP can be a stream or buffered. Sending stuff from the server to your browser is a stream. But often an incoming request in a Python web framework is first buffered internally (memory or disk). In the same way a request is a bit of a strange mix:

  • request.headers: buffered

  • request.form

  • request.files: buffered to disk

  • request.body: streamed!

On the client (like your webbrowser) you cannot do anything to an incoming request, once it started, is to close the connection. You cannot interact anymore once you received your first incoming byte.

A consequence of the buffering and the way HTTP is handled is that you can have problems accepting data. How big a file should you accept? How big an incoming form? Buffer it in memory? Or on disk? And how do you handle streaming? You might be streaming in one part of your code, but how do the other layers handle it?

Internally in his company, he’s trying to handle HTTP differently. There’s no direct HTTP contact in most of the code base. Everything that eventually ends up in the HTTP layer is implemented as some sort of “type object”. This allowed them to really flexible in the HTTP layer. Support for different input/output format. Easier to test. Documentation can be auto-generated. Lots of common errors can be catched early.

A basic rule is to be strict in what you send, but generous in what you receive. But web Python code is often generous by “just” accepting a lot without much checking. That might be a security risk. In Armin’s system, you know what type should be coming in, so you can do proper checking.

How does this deal with the big-upload problem? Incoming streaming data? Well, because of the type system, you actually know which types need a streaming API. This makes it easy to set up your API correctly. You can even selectively use a different protocol than HTTP.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):