(One of my summaries of a talk at the 2017 PyCon.de conference).
Lightning talks, so I probably won’t get all the names right. If you have additions/corrections, mail me :-)
Raphael develops a web application. A web app needs documentation. Documentation needs screenshots.
Making screenshots by hand is a lot of work. You have to do it over and over again for every new version.
Selenium is a tool for instrumenting/steering your browser. He uses chrome
--headless
. So a proper browser, but without the GUI. py.test handles the
various testcases, including the fixtures. Django has a LiveTestServerTestcase
that is also handy.
The screenshots are written like tests. Really simple and powerfull.
Note: djangocon.eu 2018 will be in Heidelberg, Germany. 23-27 May. https://2018.djangocon.eu
He had to make some counter-intuitive optimizations lately.
He gathers data about websites and stores the info in mongodb. 500 million domains, more or less. He wanted, from an existing list, to know which domains weren’t in the system yet.
Doing it in python and querying the DB was slow. In the end he exported
everything to text files and did it with linux command line tools like
sort
. Much faster….
See http://zestreleaser.readthedocs.io/ :-) Easy releases of (python) programs: tagging, updating version number, new changelog header, pypi upload, etc.
When you deploy a machine learning application, you normally want to expose some function as an API for others to use.
They’ve written a tool called “firefly” to make this easier. https://github.com/rorodata/firefly
There is no database performance problem anymore. Regular queries ought to be 1ms or 1ns: way below the human perception speed of 25ms.
Problems start when you’ve got Big Data. But: what is Big Data? How big is big?
Big data…. there are four basic solutions to it:
Horizontal partitioning/sharding.
Replication (including caching, view.
Hashing (may be persisted or not).
Differential files/LSM. Invented five times a year. Don’t re-invent it.
So: it is mostly solved already.
She recently moved from university to industry…
You might not have as much data as they initially told you.
Watch out with ‘special’ values like -999. Those might actually mean ‘None’.
Essential data might be completely missing.
You sometimes need something to break. Only you’re not allowed to break it.
Boss: “I don’t care if it is impossible, I’ve already sold it to two customers”.
So: don’t trust anyone in the university if they tell you about perfect datasets without omissions and errors :-)
16+17 March 2018 there’ll be an open source conference in Kopenhagen. Anything open source is fine.
Photo explanation: simply a picture from my train trip (with a nice planned detour through the Eifel) from Utrecht (NL) to Karlsruhe (DE). The disused ironworks in Völklingen, a UNESCO heritage site.
My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):