PyUtrecht: FFIne, building foreign function interfaces - Marten Wijnja

Tags: pun, python

(One of my summaries of the Dutch PyUtrecht meetup in Utrecht, NL).

Full title: this is FFIne: building foreign function interfaces without shooting yourself in the foot.

He works at channable (the host). They have both haskell and python codebases. Haskell is used for the more performance-critical code. They looked into using haskell from within python. With a foreign function interface. In some cases Haskell is faster. And it is handy to circumvent the “global interpreter lock” GIL. And sometimes there’s a library for which there is no good alternative.

Just so you know: many of the libraries you commonly use actually use another more system-oriented language under the hood.

You need a host language: python, of course. And an embedded language that can compile to a dynamic library such as C, haskell and others. And third you need python’s ctypes module. With ctypes you can load external libraries, including some necessary bookkeeping like input/output types. Behind the scenes, ctypes treats everything like C, including the problems iwth memory management, datatype mapping problems and undefined behaviour…

The alternative to including such a libary via ctypes would be to running the other library in its own web server code and call it: much easier, but much more overhead.

So… he tried his hand at a simpler FFI. One without C. (Note: if you want to bind to Rust, use https://github.com/PyO3/pyo3). His approach uses several layers.

  • The first layer of the solution is to have only one kind of item that’s passed through the boundary: a ByteBox which is a pointer to a list of bytes and the length. When you call a foreign function from python, you call it with two ByteBoxes: one for your input and one for the called function’s output. Going one way is called “lowering”, the other way “lifting”.

  • How we can go from generic bytes to bytes. The next layer is serialisation/deserialisation, what’s included in most web frameworks or database layers. Like converting something to/from json. Or better a binary format for performance reasons.

  • Layer 3: esceptions. In many languages it is basically a name, message, a cause and a traceback. He treats it as a list. In that way he can convert between languages. The tblib “traceback library” helps with that.

  • Layer 4: uncurrying to pass everything properly to the actual function.

He did all this in a day during a hackathon. It is reasonably production ready and reasonably performant. The code (and link to the slides) is at https://github.com/channable/virgil-ffi

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):