(One of my summaries of the Dutch PyUtrecht meetup in Utrecht, NL).
Full title: this is FFIne: building foreign function interfaces without shooting yourself in the foot.
He works at channable (the host). They have both haskell and python codebases. Haskell is used for the more performance-critical code. They looked into using haskell from within python. With a foreign function interface. In some cases Haskell is faster. And it is handy to circumvent the “global interpreter lock” GIL. And sometimes there’s a library for which there is no good alternative.
Just so you know: many of the libraries you commonly use actually use another more system-oriented language under the hood.
You need a host language: python, of course. And an embedded language that can compile
to a dynamic library such as C, haskell and others. And third you need python’s
ctypes
module. With ctypes you can load external libraries, including some necessary
bookkeeping like input/output types. Behind the scenes, ctypes
treats everything
like C, including the problems iwth memory management, datatype mapping problems and
undefined behaviour…
The alternative to including such a libary via ctypes would be to running the other library in its own web server code and call it: much easier, but much more overhead.
So… he tried his hand at a simpler FFI. One without C. (Note: if you want to bind to Rust, use https://github.com/PyO3/pyo3). His approach uses several layers.
The first layer of the solution is to have only one kind of item that’s passed through
the boundary: a ByteBox
which is a pointer to a list of bytes and the length. When
you call a foreign function from python, you call it with two ByteBoxes: one for your
input and one for the called function’s output. Going one way is called “lowering”,
the other way “lifting”.
How we can go from generic bytes to bytes. The next layer is serialisation/deserialisation, what’s included in most web frameworks or database layers. Like converting something to/from json. Or better a binary format for performance reasons.
Layer 3: esceptions. In many languages it is basically a name, message, a cause and a
traceback. He treats it as a list. In that way he can convert between languages.
The tblib
“traceback library” helps with that.
Layer 4: uncurrying to pass everything properly to the actual function.
He did all this in a day during a hackathon. It is reasonably production ready and reasonably performant. The code (and link to the slides) is at https://github.com/channable/virgil-ffi
My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):