This is just a simple tale of making a specific Django app faster. Just a tale of the steps I took to give you an idea of what you can do to make your app faster. This isn’t the definitive guide, it just aims at giving ideas and giving a feel for the process.
First measure, then optimize.
Django’s caching helps.
The app is lizard-fewsjdbc. It is on github, though we’re basically the only ones using it. The basic setup is that we’ve got an external special-kind-of-database that talks xml-rpc to the outside world. Over the xml-rpc we can send SQL queries to the database that in turn uses a jdbc connection to execute the query. Don’t ask.
So my job was three-fold:
Speed up the request so it takes less than 30 seconds.
Use the Douglas-Peucker algorithm (explained below) to send over less data to the browser so that it doesn’t become unbearably slow.
When zooming in to a Douglas-Peuker-simplified-graph, dynamically grab new data to get the detail back when zoomed in enough. And don’t take as much time as the original request!
Douglas-Peucker is an algorithm that takes a line and tries to simplify it. It calculates which points it can leave out without changing the way the line looks. Take a look at the wikipedia animated gif and it should be clear.
The algorithm itself was used in two other projects, so I just copied the code. Copying? Yeah, the project’s branch I’m using it in is already a mess (and about to be phased out at the end of the year), so copying was easier than making something nice and reusable… From the comment at the top of the file, some of the code is originally by Enthought. If interested, the code is here.
The only thing I needed to do was to configure the algorithm. There’s an “epsilon” in the main method: the allowable deviation. I simply made an assumption that the graph would be 2000x2000 pixels (bigger than would occur in real life), the allowed deviation would then be one pixel: you cannot see smaller deviations, obviously.
So for purpose of the calculation I simply resized the x and y data to range from 0 to 2000 :-) This allowed me to use the algorithm as-is without any fancy extra tweaks.
Sometimes the algorithm has to run twice or even three times before the number of points went down enough. Each time I’d increase the allowed epsilon.
Typical numbers? From 250000 points to 5000 and in the second step to 1800. And pretty quick, too! Grabbing the data would take for instance 45 seconds, Douglas-Peucker 1.5 seconds. A small price to pay for not blowing up the browser with 250000 data points!
Grabbing the original data still took a lot of time. And after zooming, the Flot graph would request a fresh set of data to keep the level of detail up to scratch (we remove detail with Douglas-Peucker, remember?). Again a lot of time wasted.
For that, there’s django’s build-in caching mechanism. Request the data once and cache it for further use. You need to have a cache key (data ID plus start and end date of the range, basically) and an expiration time. I used a couple of tricks:
If the period is long (more than two months), I can cache for a couple of hours. You won’t see it in the graph anyway if new data shows up every minute as that’s simply not visible at this scale. If the period is shorter, cache for 15 minutes. If the period is a day or less, cache for only a minute.
For the longer periods, say a month, don’t use “now minus a month till now” as the range. If we’re a minute later, “now” will be a minute later so the range will be different so the cache will be invalid.
So I rounded down “now” to midnight of the next day and rounded “now minus a month” up till midnight of the previous day.
Actually, for bigger periods I even rounded up/down to the nearest month to get even more benefit out of my cache.
The code isn’t pretty, but it is here if you want to take a peek.
I had one problem: for bigger periods, the amount of data would be more than one megabyte. And that’s more than what django’s preferred memcached cache can handle by default. So I set up a separate disk cache for this. Redis or so would have been better, but I didn’t want to set it up right now. Disk cache was fast enough for my purpose, anyway.
One extra cache advantage: even when the browser connection would time out after 30 seconds, the query would continue running, eventually depositing the result in the cache. A browser refresh would then grab the data from the cache instead of ending in an unavoidable timeout again and again and again.
The result: something that originally took more than a minute to grab, suddenly only takes a second when grabbing it from the disk cache! It doesn’t fix the initial slowness, but it surely improves the subsequent requests (and especially the extra requests that come from zooming in and out on the data).
All during this work, I also looked at the raw speed of those initial requests. The basic rule of making something faster: measure first, then optimize. Don’t fix something blindly, but measure first. Otherwise you’re fixing the wrong things. And often there’s something completely unexpected that’s taking a long time: you’d miss this if you wouldn’t measure.
All the other improvements are fun and well, but actually grabbing the original data still took way too much time. Half a million data points could take up to two minutes. I asked a colleague to look in the logs of of the actual database and the maximum he could find was 30 seconds. So… something inside our own python was taking a lot of time!
First things first: I added logging all over the place with timers:
import time def takes_long(): # Pseudo code t1 = time.time() do_xmlrpc_query() t2 = time.time() convert_the_data_we_got_back() t3 = time.time() douglas_peucker() t4 = time.time() logger.debug("xmlrpc took %s seconds, ... %s... %s", t2 - t1, t3 - t2, t4 - t3)
This was where I found out that Douglas-Peucker was very quick. Not worth optimizing.
Converting the data was something else. In one of the cases, the query would take 20 seconds and converting the data 60 seconds. That’s weird, almost the only thing that happens in that code is converting a date string into a proper datetime in a specific timezone. Weird.
This turned out to be happening in a very ineffective way, partially because it used the nice python iso8601 library to parse the datestrings. Which, in this case, was overkill as the date format was always the same. Switching to a simple datetime call was all it took:
datetime_format = "%Y%m%dT%H:%M:%SZ" timestamp = datetime.datetime.strptime( timestamp_string, datetime_format).replace( tzinfo=pytz.UTC)
From 20 seconds (query) plus 60 seconds (conversion) it went to 20 seconds (query) plus 3 seconds (conversion). Almost four times quicker. I sure didn’t originally think this to be the spot where I could make the most improvements! To me, it really reinforced the measure first, then optimize mantra.
It still wasn’t quite quick enough. I then looked at the xmlrpc call. We were already using a slightly customized base class of one of python’s build-in xmlrpclib classes (a small modification to add a timeout). The modification made it easy to add…. logging, of course! Time the sucker.
Turned out, transferring half a million datapoints over xmlrpc isn’t that good an idea. xmlrpc means “XML”. So for one single pair of datetime+value you’d get a payload of 50 characters or so. A query results in 25 megabytes of XML this way. Which takes a measurable amount of time transferring over the wire. Which takes an even greater time parsing!
To add insult to injury, xmlrpclib turned out to convert the date string in the xml into a proper Python datetime, which would then get converted into a datetime string again before ending up in my nicely fixed-up datetime string parsing code… I could weep when I figured that one out.
Solution: brute force it.
Read the full response into a string.
Cut out the XML parser.
Feed the string to a regex that extracts the datetime and the value. (See here)
Use the fast build-in datetime parser.
In the end, this surprisingly didn’t actually make it faster for the big 25 MB responses. But for the smaller ones, it made it 2-4 times faster again. It is good enough now. If I have to re-visit it I think doing some search/replacing to make the 25 MB string smaller before feeding it to the regex might be worthwile. My feeling says that the regex is a bit sensitive to such long strings. (Or I might have to toy with greedy/non-greedy regexes).
The whole build-up of the application is borked. xmlrpc isn’t the best solution for this. Json might be better. And as much calculation as possible should be pushed towards the database instead of doing it inside the django app.
By careful measuring and digging into the code for a couple of days, I managed to make the initial request a lot quicker. At a minimum 4x as fast. For certain requests up to 10x.
Caching gives another 20x speed boost (at least) for those requests where it applies. Careful cache key and expiration time tuning makes sure a lot of requests hit the cache.
The Douglas-Peucker algorithm limits the amount of data that gets fed to the browser. This makes it possible to view huge data sets.
In the end… I’m happy with it. I got to dive deeply in to some code and managed to emerge victoriously. And I had fun :-)
My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):