Django and the real-time web - Zachary Voase¶

Tags: django, djangocon

In july 2008, IE was at 70%, chrome at 0%. Now they’re both at 35%. That’s one of the big reasons that the current real-time web can happen and is happening. (Note: real-time web isn’t real-time in the regular definition. Jacob actually mentioned that in his keynote.)

Developer time is precious. Do we run after all the new stuff or do we maintain the old?

The real time web, when he talks about it, is about:

UI before technology. Does it feel better than native applications?
It is proactive instead of reactive. Effectively: push. You don’t have to refresh the page anymore.
It is synchronized with the “real world”.

Here are several ways we can look at it:

Model view controller (MVC) is the dominant UI design pattern. It exists since 1978. (Note by Reinout: I’ve done a screencast of my take on django-as-MVC)

On the web, model+controller is on the server, the view is mostly in the browser. This is different from what’s normal in a regular application. The main difference is that there’s no real open/close of the application on the web. You just have individual requests. It feels different.
Next up: REST. It describes (not prescribes) how the web and the http protocol is supposed to be used. What’s supposed to be in a REST interface: client-server, stateless, cacheable, layered, code-on-demand, uniform.
Next thing you need to know about: web sockets. It is a real TCP connection. You use a ‘magic’ http request to port 80. It reduces latency and it enables push.

The problem: REST and web sockets don’t match. Web sockets are long-running, which isn’t stateless. A direct TCP connection means it isn’t cacheable and layered. For instance.

You can try to put some state into REST by using the hashbang in the URL. But the only place it belongs is in #!/bin/sh. Twitter learned this the hard way.

The basic point: once you violate REST somewhere, you automatically lose something. Caching for instance.
A comparison: distributed verses centralized version control systems (VCS). Git is distributed, subversion is central. The central systems have huge overhead. If you do an svn log it takes a lot of time as it hits the server. Git and mercurial and bzr just do it locally.

Synthesis of all these points. Why not do MVC both on the client and on the server? A bit like the distributedness of git. Django on the server, backbone or so on the client. But then you still need to sync them.

Why not treat the client as a http server? Why don’t you send a json from the server to a client with a sort-of put request after the server itself got a real PUT request from a different client. (Note from Reinout: looks a bit like SOAP in practice. SOAP is basically http requests mimicked/embedded in xml messages; this proposal embeds it in json.)

Read RFC 2616. The official HTTP specification. Seriously. It will help you in your career. It will also help you work with caching. Do you know what an ETAG is? It will make your life easier. Cache-Control headers.

With this in place, you can do the normal HTTP conflict resolution. Does the URL still match? Is the ETAG still OK? What instructions do the Cache-Control headers give me? That tells me whether my local copy is still valid or if it is dirty (which means I have to update/delete/load it).

You need to think about conflict resolution if you write a real-time web app.

For this to work, there are some implications:

REST assumes orthogonality. If many things are not orthogonal, use something else than HTTP.
Lossless representations.
Authentication and authorization will need to be on a case-by-case basis.
Costs. Writing a pub/sub web app takes a lot of time and effort. It is expensive. It depends on your domain. How can you sync? Who can see/edit what? There are tools to help you: AMQP, zeroMQ, django signals.
You need a resource-oriented client. REST works best that way.

Some barriers:

Django’s ORM can be in the way. Pick a database and stick to it completely. Be opinionated. So if you use the django ORM with postgres, make sure you use triggers, for instance. Go all the way. Or go completely for couchdb or something like that.
Django needs better content negotiation. We now often use URLS like /yourapp/API/something/. So the fact that we want to talk to the API means we have different URL. This ought to be possible with the regular URL and poper content negotiation (“I want json instead of html”).
Proxies and middleware. Varnish, memcached and so on: they help a lot to make regular web apps blindingly fast. They don’t exist yet for web socket work.