Pygrunn: Understanding PyPy and using it in production - Peter Odding/Bart Kroon

Tags: python, pygrunn

(One of my summaries of the one-day 2016 PyGrunn conference).

pypy is “the faster version of python”.

There are actually quite a lot of python implementation. cpython is the main one. There are also JIT compilers. Pypy is one of them. It is by far the most mature. PyPy is a python implementation, compliant with 2.7.10 and 3.2.5. And it is fast!.

Some advantages of pypy:

  • Speed. There are a lot of automatic optimizations. It didn’t use to be fast, but since 5 years it is actually faster than cpython! It has a “tracing JIT compiler”.

  • Memory usage is often lower.

  • Multi core programming. Some stackless features. Some experimental work has been started (“software transactional memory”) to get rid of the GIL, the infamous Global Interpreter Lock.

What does having a “tracing JIT compiler” mean? JIT means “Just In Time”. It runs as an interpreter, but it automatically identifies the “hot path” and optimizes that a lot by compiling it on the fly.

It is written in RPython, which is a statically typed subset of python which translates to C and is compiled to produce an interpreter. It provides a framework for writing interpreters. “PyPy” really means “Python written in Python”.

How to actually use it? Well, that’s easy:

$ pypy your_python_file.py

Unless you’re using C modules. Lots of python extension modules use C code that compile against CPython… There is a compatibility layer, but that catches only 40-60% of the cases. Ideally, all extension modules would use “cffi”, the C Foreign Function Interface, instead of “ctypes”. CFFI provides an interface to C that allows lots of optimizations, especially by pypy.

Peter and Bart work at paylogic. A company that sells tickets for big events. So you have half a million people trying to get a ticket to a big event. Opening multiple browsers to improve their chances. “You are getting DDOSed by your own customers”.

Whatever you do: you still have to handle 500000 pageviews in just a few seconds. The solution: a CDN for the HTML and only small JSON requests to servers. Even then then you still need a lot of servers to handle the JSON requests. State synchronisation was a problem as in the end you still had one single server for that single task.

Their results after using pypy for that task:

  • An 8-fold improvement. Initially 4x, but pypy has been optimized since, so they got an extra 2x for free. So: upgrade regularly.

  • Real savings on hosting costs

  • The queue has been tested to work for at least two million visitors now.

Guido van Rossum supposedly says “if you want your code to run faster, you should probably just use PyPy” :-)

Note: slides are online

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):