Python Leiden (NL) meetup: serialisation in Python - John Labelle

Tags: pun, python

(One of my summaries of the second Python Leiden (NL) meetup in Leiden, NL).

Nice subtitle for the talk: “python serialisation: this ain’t your father’s cruesli”… :-)

He wants to show us how dangerous it is to de-serialize content provided by someone else. His examples are at https://github.com/airza/deserialization_labs

Serialisation: converting a data structure living in memory into data you can store on disk or send over. Deserialisation is converting it back into a python object. There are interoperable formats like json and xml. Most languages have their own specific methods: python has pickle.

Serialising a dict or list is often easy. json.dumps({"some": "structure"}) But what if you’ve got some non-standard data structure like a python object? json serialisation won’t work out of the box. And if you’ve got huge data structures, json (being human-readable) is slow and huge.

Pickle stores python objects in some binary format on disk. Fun fact: pickle was added to python in 1995, json only exists since 2006. I’ll paste one of his examples to make clear how picle works:

import pickle
from tsukimi import Tsukimi
cat = Tsukimi("Fluffy", "Empty")
pickle.dump(cat, open("tsukimi.pickle", "wb"))

Deserialising works like this:

import pickle
cat = pickle.load(open('tsukimi.pickle', 'rb'))
print(cat.fur)
print(cat.brain)

Pickle just stores the name of the class it needs to re-create plus the contents of the attributes. So not the methods or so.

Pickle is explained here: https://docs.python.org/3/library/pickle.html . It has a nice warning right at the top: Warning: The pickle module is not secure. Only unpickle data you trust.

Pickle stores all attributes by default, if you don’t want that you can define a special __reduce__() function that specifies just the attributes you want and the name of the class that can restore them. But… the name of that class is just looked up, there’s no validation. So you can also pass something that’s not the name of your class, but something like os.system which just calls anything on the command line…:

import os
import pickle
class EvilCat:
    def __reduce__(self):
        return os.system, ('export > version.txt',)
evil = EvilTsukimi()
pickle.dump(evil, open("evil.pickle", "wb"))

If the code that loads this pickle reads the version.txt (as in the exercise that he had us run), you suddenly see all the server’s environ variables.

So: never let people give you pickles. Use json for user input. Or protobuf.

Pytorch (a pydata library) uses pickles. They recently started overwriting the unpickler’s functionality, but he showed some ways to get around its “limitations”.

He recommended looking at https://github.com/b4rdia/HackTricks/tree/master/generic-methodologies-and-resources/python/bypass-python-sandboxes

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):