Djangocon: strategies to edit production data - Julie Qiu

Tags: djangocon, django

(One of my summaries of a talk at the 2018 european djangocon.)

She works at the catalog team of “spring”, a clothing website.

Internal tools are often not available. There are always edge cases. Time-sensitive changes are sometimes needed (“right now”).

You could just do a quick SQL query in the database. Normally, a colleague will look over your shoulder and double-check what you’re about to type. But when it is friday afternoon and your highest-paying client wants a last-minute change…. They do have a collections of horror stories…

Here are some strategies:

  • Develop a review process for manual edits. What they’ve done is to create a spreadsheet. You’d have to write in there your name, the sql code, what you want it to do, who you want to review it. Only after the review, you are allowed to run it on the server.

    The advantage is that it is easy to implement and that you get an audit trail. You also teach engineers what is the right thing to do.

    A disadvantage is that you still can get mistakes. And it is fine for smaller changes, but not really for elaborate SQL and long-running queries.

  • Write scripts and run them locally. Write a python script to make the change. Add commandline arguments so that you can re-use the script. Then you have to connect it to the database and run it.

    Advantage: it is also fine for more complex changes.

    Disadvantage: you run it locally, so logs are only available locally. You can still make mistakes. The local scripts are local: there’s no review for them. And you can have connection issues.

  • You can run the scripts also on an existing server. This way, you generally don’t have the connection issues. You do have to run it in ‘screen’.

    After writing the script, you have to get the script onto a server. SSH there and run inside a session. Julie normally runs it on the jenkins machine. But…. one of her scripts once ate up all CPU resources, so jenkins was down…

    Advantage, in general: you can have long-running scripts. And you have a much more reliable network connection.

    Disadvantage: you can affect the resources on the server. And you have to copy your script to the server.

  • Use a task runner. You can use jenkins to run scripts.

    Now you have to get your script reviewed like the rest of your code. The latest version is automatically on jenkins. Jenkins provides a way to pass arguments to such a script.

    A big advantage: the output of the run is stored in jenkins. You have an audit trail. And you have code review.

    Disadvantage: it is hard to manage credentials. Also: you apparently can connect to your production database from your jenkins test environment. This is asking for accidents to happen.

  • Then she decided to write a (jenkins) “script runner” service. So it was customizable.

    Again: write the script and get code review and run tests. Then you can run it with a nice user interface in jenkins. The custom script runner could be pre-configured with the various configs (dev, staging, production), so that managing the credentials was easy.

https://abload.de/img/screen_shot_2016_02_042jjd.png

Photo explanation: constructing a viaduct module (which spans a 2m staircase) for my model railway on my attic.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):