Djangocon: protecting personal data with django (because it’s the law) - Will Hardy

Tags: djangocon, django

(One of my summaries of a talk at the 2018 european djangocon.)

Will is a software developer with a law degree. Now that we have the GDPR, his law degree is suddenly very relevant. GDPR takes effect on 25 May 2018.

What is the GDPR? It is a law that regulates the use of personal data.

You’ll probably have had lots of emails from companies telling you that they’ll be good with your data and asking whether they’re still allowed to use it.

He encourages you to read the actual regulation. The first part is quite readable. The actual articles are quite detailed, but only the first 34 are relevant for us. He thinks we have a professional duty to be on top of this. We have to know about it.

As programmers, we’re in the front line. We might be the ones that can best advise the company on how to comply. We ought to know the details. If you help your company, you’re valuable to your company, so…

He has three categories in his talk: terms, rights, tasks.

Terms

  • Terms in the legal world aren’t defined as rigorously as in software standards. “Personal data is any information relating to an identied or identifiable natural person”. Ok…. is an IP adress personal data, yes or no?

  • Processing means…”, right, basically, it is everything you do.

  • Data controller, that’s what you are when you know something in a professional context about someone else. Info between friends is OK, for instance.

  • Processor: someone who does something with personal data belonging to a “data controller”. Freelancers: this is for you.

  • Special categories: really watch out when you store religion, sexual orientation or so.

  • Profiling: also watch out. You might be getting too close to “special categories”.

Rights

  • Transparency. You now have the right to know what they know about you.

  • Access. You get to see what they now about you. As a data controller, you also have to make clear what you do with the data.

  • Rectification. Does the system allow itself to be changed or updated? If you have a django website with an admin, you’re fine.

  • Erasure. Deleting a user, can you do that? Does that include your backups?

  • Data portability. In a structured, commonly machine readable format. Django can help

  • Restriction of processing. No deletion, but more “put me on hold”.

  • No automated decision-making. You have the right to be approved/disapproved by a human being

  • Right of consent.

Tasks for us

  • By design and default. Learn to do it properly. If you work with django, follow recommended django practices and feel that you’re behaving yourself, you’re probably OK.

    Important here is “data minimalization”. Don’t pass along full user objects to other systems. Even not the userid. Generate a UUID or so.

    Separate personal data completely. “Pseudo-anonymization”.

    For a medical database, does your database support staff need to see a person’s name? No. Only the doctor needs to know that. Then you might be better off encrypting the name.

  • Erasure. Can you split the backups? A separate one for personal data and one for the rest? That might make zapping personal data easier.

  • No discrimination. You cannot discriminate with prices on areas where people live, anymore. If you have algorithms that make decisions, watch out for biases.

    Note: gender and age are not included here! So special prices for older or younger people are fine. But, again, watch out for indirect discrimination. There are other laws that you have to take into account.

    (See my summary of the great talk on biases)

    Your algorithms will get better because of it.

  • Explain machine learning. If you make an automatic decision, you might have to explain it. If it is an unclear pile of a neural net, it might be hard to explain…

  • Anonymization. True anonymization is rare. And hard. The answer you have to ask is “is reidentification reasonably likely”. And as a programmer, you’re probably the only person that can answer it.

    Again, anonomyzation is hard. You’ll probably have to get outside expert help.

  • Breach notification. If there is a breach, you have to report it. Otherwise you are liable. Even putting too many people in an email’s CC field could be a breach…

What could django do?

  • Per-user encryption. So that you can delete a per-user key so that the encrypted personal data isn’t readable anymore.

  • Documentation.

  • Tag personal data.

The current situation isn’t clear yet. In a few years it probably will be.

https://abload.de/img/screen_shot_2016_02_007j20.png

Photo explanation: constructing a viaduct module (which spans a 2m staircase) for my model railway on my attic.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):