Full title: “slow food digest better - or how to maintain an 8.5 year old python project without getting lost”. Christopher had to maintain such a project - and actually liked it. It was https://addons.mozilla.org, actually.
It started out as a quickly-hacked-together php project. Now it an almost modern django project. The transition from PHP to the django version took almost 16 months. During that time there were bugs, translation errors, downtime: irritating. The site went fully live in january 2010.
The big advantage of the move to django was that lots of tests were added at that time. The site wasn’t anything special. Mostly django. Still quite some raw SQL from the old system. Celery for some tasks.
Mozilla at one time had the “Firefox OS” for mobile phones. For that, they build the “firefox marketplace”. The work was based on the addons.mozilla.org code, but with some weird hacks based on which site it was running… During that time the addons.mozilla.org website itself was pretty much left alone.
So: complete rewrite or incremental improvement? They chose incremental improvement. Rewriting a huge site from scratch for a small team… no. And with the existing system they at least had the advantage of all the existing unittests!
The balance they had to make was between “removing technical dept” and “new features”.
What they did was create a new react-based frontend as a single page app. This got released in december 2017. So they incrementally rewrote the backend (where they had unittests) and did a full rewrite of the frontend (which had no tests).
One thing they used a lot: feature flags/switches. They used “waffle” for that. It makes it much easier to revert broken implementations as you only have to flip a swich back.
They steered their django upgrades with waffle feature flags. Once the new django version was fully in production, they could remove the feature flags for the old version.
A quality assurance safes lives. Unittests are good, but a real QA team that really tests it discovers lots of problems. And purely the fact that you need to explain the upgrade process to the QA engineers already helps.
And… don’t panic. You’re there for the long run. Great food needs time, why should your software be different?
Photo explanation: constructing a viaduct module (which spans a 2m staircase) for my model railway on my attic.
She works at the catalog team of “spring”, a clothing website.
Internal tools are often not available. There are always edge cases. Time-sensitive changes are sometimes needed (“right now”).
You could just do a quick SQL query in the database. Normally, a colleague will look over your shoulder and double-check what you’re about to type. But when it is friday afternoon and your highest-paying client wants a last-minute change…. They do have a collections of horror stories…
Here are some strategies:
Develop a review process for manual edits. What they’ve done is to create a spreadsheet. You’d have to write in there your name, the sql code, what you want it to do, who you want to review it. Only after the review, you are allowed to run it on the server.
The advantage is that it is easy to implement and that you get an audit trail. You also teach engineers what is the right thing to do.
A disadvantage is that you still can get mistakes. And it is fine for smaller changes, but not really for elaborate SQL and long-running queries.
Write scripts and run them locally. Write a python script to make the change. Add commandline arguments so that you can re-use the script. Then you have to connect it to the database and run it.
Advantage: it is also fine for more complex changes.
Disadvantage: you run it locally, so logs are only available locally. You can still make mistakes. The local scripts are local: there’s no review for them. And you can have connection issues.
You can run the scripts also on an existing server. This way, you generally don’t have the connection issues. You do have to run it in ‘screen’.
After writing the script, you have to get the script onto a server. SSH there and run inside a session. Julie normally runs it on the jenkins machine. But…. one of her scripts once ate up all CPU resources, so jenkins was down…
Advantage, in general: you can have long-running scripts. And you have a much more reliable network connection.
Disadvantage: you can affect the resources on the server. And you have to copy your script to the server.
Use a task runner. You can use jenkins to run scripts.
Now you have to get your script reviewed like the rest of your code. The latest version is automatically on jenkins. Jenkins provides a way to pass arguments to such a script.
A big advantage: the output of the run is stored in jenkins. You have an audit trail. And you have code review.
Disadvantage: it is hard to manage credentials. Also: you apparently can connect to your production database from your jenkins test environment. This is asking for accidents to happen.
Then she decided to write a (jenkins) “script runner” service. So it was customizable.
Again: write the script and get code review and run tests. Then you can run it with a nice user interface in jenkins. The custom script runner could be pre-configured with the various configs (dev, staging, production), so that managing the credentials was easy.
Photo explanation: constructing a viaduct module (which spans a 2m staircase) for my model railway on my attic.
We are humans and we make typos. So there are typos in our code.
The two common places for typos are documentation and the user interface.
Documentation is normally only provided in a single language and it are large text files, so spell checking is relatively easy. Django documentation is often build with Sphinx. For that, there is a sphinx extension: sphinxcontrib-spelling. You can even integrate it in your CI as a post-build check.
There will be words that are correct for your project but that aren’t in the regular dictionary: for that there’s a local “wordlist” you can use.
But there is a solution! Translations. Once your project gets bigger, you
probably want to start translating it. Once you have set up your translation
mechanism (gettext), you have all your strings gathered into one place: the
.po files. Hurray, now we can do spell checking.
Gettext is the standard mechanism in Django to deal with translations. You’ll
see lines like
from ... import ugettext as _ in your code.
He wrote a tool for it: potypo. polib +
pyenchant = potypo. Polib can read and write the “gettext”
files. pyenchant is an interface to libenchant.
You’re probably familiar with ispell or aspell (an ispell that fits better to unicode). myspell is openenoffice’s spellchecker, hunspell is a variant on this. For English, “aspell” is probably best, for other languages “hunspell”. Libenchant is a library that wraps them all. And pyenchant provides a python apy to libenchant.
When you start using it, you’ll have to install language packages like
aspell-en. Then add a bit of configuration and
potypo can start check your spelling. If desired, it can fail your build in
CI. You can also switch that off for specific languages (for instance if
you’ve just started translating).
Wordlists? You have multiple languages, so wordlists can be in a
wordlists/de.txt. You can also put just a
wordlist.txt inside the translations’
The “pytypo” project is quite new, but it is already used in several projects. Ideas, features, pull requests: everything is welcome!
Photo explanation: constructing a viaduct module (which spans a 2m staircase) for my model railway on my attic.
We have piles of data in our databases. We want to search in there.
What is “search”, actually? “Try to find something by looking or otherwise seeking carefully and thoroughly”. On the one hand, there’s “try”. On the other hand “careful and thorough”. So: search is hard.
What about searching in django? In a very basic sense, you already search if
get_object_or_404(Article, pk=12)! And if you don’t find it, you
return a “404 Not found”.
Searching text is more complicated. Text is unstructured data. So you’d have
text__icontains=search_parameter). This is not efficient for your database:
WHERE text ILIKE `%your search parameter%'.
There is an index in postgres that’s much more efficient: trigrams. These chop up your text into various pieces of string that are 1, 2 or 3 characters long. See https://www.postgresql.org/docs/current/static/pgtrgm.html
It is not included in django, but you can write a custom index for it. The code fitted (readable!) on one slide.
When we talk about searching text, we often mean full text search. Word order doesn’t matter, for instance. And the exact verb form also doesn’t matter. In linguistics there is the term “stemming”. Computer, compute, computation is all “stemmed” to just “comput”.
Often you also ignore stopwords. “Django is the best” becomes “django best”.
You can get quite far with postgres’ build in search. It is integrated in django, see the documentation.
If you want to go further, you can look at xapian, sorl, whoosh, lucene,
elasticsearch. Effectively a second database that you use just for textual
search. As it is a second database, you need to keep them in sync. You could
transaction.on_commit() for this. Perhaps in your
or in celery.
Same with deletion. But, as with the
.save() method, watch out that there
are some cases where those methods aren’t called. A queryset’s
doesn’t call the individual objects’
.delete() method, for instance.
You could keep more info in your search engine, if you use elasticsearch for instance. Like complete article summaries and urls and so. Even if the search index isn’t completely up to date, the user will still get a result. Better than a 404.
You could even run most of your website from out of your search engine so that your website continues to work even when your regular database is temporarily switched off.
Will is a software developer with a law degree. Now that we have the GDPR, his law degree is suddenly very relevant. GDPR takes effect on 25 May 2018.
What is the GDPR? It is a law that regulates the use of personal data.
You’ll probably have had lots of emails from companies telling you that they’ll be good with your data and asking whether they’re still allowed to use it.
He encourages you to read the actual regulation. The first part is quite readable. The actual articles are quite detailed, but only the first 34 are relevant for us. He thinks we have a professional duty to be on top of this. We have to know about it.
As programmers, we’re in the front line. We might be the ones that can best advise the company on how to comply. We ought to know the details. If you help your company, you’re valuable to your company, so…
He has three categories in his talk: terms, rights, tasks.
By design and default. Learn to do it properly. If you work with django, follow recommended django practices and feel that you’re behaving yourself, you’re probably OK.
Important here is “data minimalization”. Don’t pass along full user objects to other systems. Even not the userid. Generate a UUID or so.
Separate personal data completely. “Pseudo-anonymization”.
For a medical database, does your database support staff need to see a person’s name? No. Only the doctor needs to know that. Then you might be better off encrypting the name.
Erasure. Can you split the backups? A separate one for personal data and one for the rest? That might make zapping personal data easier.
No discrimination. You cannot discriminate with prices on areas where people live, anymore. If you have algorithms that make decisions, watch out for biases.
Note: gender and age are not included here! So special prices for older or younger people are fine. But, again, watch out for indirect discrimination. There are other laws that you have to take into account.
(See my summary of the great talk on biases)
Your algorithms will get better because of it.
Explain machine learning. If you make an automatic decision, you might have to explain it. If it is an unclear pile of a neural net, it might be hard to explain…
Anonymization. True anonymization is rare. And hard. The answer you have to ask is “is reidentification reasonably likely”. And as a programmer, you’re probably the only person that can answer it.
Again, anonomyzation is hard. You’ll probably have to get outside expert help.
Breach notification. If there is a breach, you have to report it. Otherwise you are liable. Even putting too many people in an email’s CC field could be a breach…
What could django do?
The current situation isn’t clear yet. In a few years it probably will be.
He’s a longtime django user. “The web framework for perfectionists”: yes, that’s him.
He has also build lots of backends for IOS apps. For that, he used django REST framework. He got involved on the mailinglist and on stackoverflow. Now he’s a core team member of django rest framework. He also started maintaining django-crispyforms and so. And now he’s the “django fellow”.
So far, so good. He has a job. Nothing million-euro-making, but fine. The problem: he’s getting older. There was a post on hacker news: “before you’re 40, make sure you have a plan B”.
You might get into problem when searching for django jobs: “you’re older, so you’re more expensive, so we’ll take a younger person”. You also might have a family, so moving (especially working abroad) is harder.
There’s a common path: become a manager. But he won’t do that. He’s a good programmer, but not a good manager. He wants to stay productive and creative.
So what if you just want to keep programming? He has some generic strategies:
Look outside the tech bubble. The software tech bubble. “Software is eating the world”, so who’s going to program that? Look at traditional industries!
(Note by Reinout: I’m working in a civil engineering construction firm now!)
Always ask “what’s next”? If you’re a freelancer, you can handle an occasional bad project if you have a good pipeline. But if you have a good project and a bad pipeline, you have a problem.
Also make sure you develop yourself constantly. Become more valuable. Become more valuable to your employer. Make sure he knows how valuable you are (especially how valuable you could be to other companies).
There are also specific strategies:
Be diligent. The number one priority is self care. Care about the pace. Don’t do “death march” projects. You cannot do 100 hour workweeks until your pension! Keep a good pace.
Similarly, make time for your family if you have one.
Be diligent: eliminate distractions. If you won’t work yourself to death in 100 hour workweeks, you have to make sure you work hard in the time you do work. Look hard at your distractions. How often do you check your email? Facebook? Your phone?
Procrastination is a big problem. You might have a big pile of underspecified work. The real job would be to specify/clarify the work, but it is much easier to let yourself be distracted.
Some apps he uses to prevent distraction:
Combined, these three tools help him eliminate
Regarding his phone? His tip is delete everything. Don’t have all those apps in there that can distract you. News sites you check 20x a day: what is the use to have that on your phone?
Be dilligent: develop good habits. By eliminating distractions, you’ve made time for other activities. What do you want to do?
Be prolific. Do one thing. And then another. And then another. The individual things you do might not be big of itself, but it all adds up after 15 years.
He himself is not a superstar developer, but he’s done a lot!
A tip: contribute to open source. There’s a low barrier to entry (in theory at least). And it is visible! It is stuff you can talk about. Anything you stick on github, you can point to it. If your work-time code is all private, you cannot show it. At job interviews, you might be asked for a code sample. If you’ve got lots of stuff on github, you won’t get that question.
Contributing to open source is a “slow burner”. It is not something you can do quickly in order to make a good impression at a job interview. It is something you have to do for quite some period of time. Small contributions over time in the end up as a sizable and visible contribution.
Some comments on open source:
One thing to watch out for, when contributing to open source: make sure you don’t burn out. People will ask the same support questions over and over and over again. There are risks of “contributing to open source is eating my time” or “is making me sick” or “limiting the time with my family”. You don’t want that.
That’s the basic purpose of the django fellow: it is a paid job to do the menial work so that it doesn’t burn out the other developers.
As a regular contributor, limit your time beforehand.
Get your employer to help. Open source is often a better alternative than building your own.
When working on open source code, you’ll also learn a lot. And it will improve you. And it will increase your value for your company.
They should be funding django, django rest framework, etc. It is much, much cheaper to fund those projects to get your bugs fixed than to hire programmers directly to fix it.
Specifically for django: when you contribute a pull request, you’ll get lots of comments. They’re necessary to keep up django’s quality. But it can be a bit discouraging. But don’t dispair! And let Carlton help you work through the comments. That’s his job as django fellow.
To close it off, the secret weapon: be social. Get involved in open source. Talk to people. Go to meetings. Genuine interaction instead of websites-with-an-algorithm.
It will help you find a good employer. As an employer, it will help you find good programmers. There are many good companies that are good for their employees. You’ll find them by talking to people! There are many good programmers looking for a nicer job, you’ll find them by talking to people.
Channels, started in 2015 as “django-onair”, had its 1.0 release in 2017. It used twisted, ran on python 2.7, and django runs synchronously.
Python 2.7 undermined it. Python 3 has asyncio support, 2.7 has not. Because of that, channels had to be too complex. The design was wrong.
Now, there’s channels 2.0. It requires python 3.5+. Native asyncio! Much simpler to deploy, also. It was quite a big rewrite: 75% was changed.
A big challenge was that django had to become partially asynchronous. The regular django ORM, views, middleware, url routing is still synchronous. Parallel to that, there’s channels (ASGI) middleware and so. Two separate worlds.
But still, there are a few contact points. Towards the ORM, for instance. So
he needed two functions,
async_to_sync, to move
between the two worlds. They took two months to write! Synchronous code has
to run in threads. The ThreadPoolExecutor does most of the hard work.
Both async and sync code are useful. Channels lets you write your code as both. Async is hard, so you don’t want to have to be forced to use it. Channels’ two functions make it possible.
But: the async interface is separate from the sync interface. You just cannot provide both through one API.
He has a blog post that further explains his thoughts about handling async and sync code: https://www.aeracode.org/2018/02/19/python-async-simplified/
ASGI. WSGI, web service gateway interface, is used by all python web frameworks for handling requests/responses. There’s one problem: it is synchronous. And you cannot have something that starts synchronous, is async in between, and ends up call the synchronous ORM: it will block the whole thread/process. You have to start asynchronous.
So: WSGI, but then async. So: ASGI :-) It is intended for general use, just
like WSGI. The core is an
Application object with an async
__call__(self, receive, send) method.
It is “turtles all the way down”: routing is an ASGI app. Middleware is an ASGI app.
What does this mean for django? Should it be in core? The main question is “how much can we make django async”. You could progressively change pieces of django to be async and call it from synchronous code. But….
The problem, he thought, was the ORM. What would an async ORM look like? Is it even a sensible endeavour? Note that it has to be done in small steps. But after recent talks, he thinks it could be done.
Another question: do we really need to replace WSGI? How much demand is there for long-polling and websockets? A new standard (ASGI) is another new standard. And an extra standard is not necessarily good.
Websockets are a niche. Long-polling is less of a niche, but still a niche.
For ASGI to become a standard, you need multiple servers that implement it (apart from “daphne”, there is now also “uvicorn”). And you need more frameworks that use it.
Another question: do we want to have everyone writing async? It is a pain to debug and hard to design. What is the balance? He would like for django to keep the existing sync interface for the majority of the cases. And if you need more power, that you then can dive deeper. Just like in many other cases.
So: if you have an opinion on where django should be going, talk to him and to other developers!
SQL. The “language to talk to databases in a structured way”.
The ORM. Object Relational Mapper. The magic that makes it all work in django (even though there’s no magic in there).
The talk is about the experience of experienced programmers that, for the first time, have to dive into a django project. She used http://glasnt-orm.us.aldryn.io/ (“unicodex”) as an example.
So. There’s a missing icon on the sample page. You have to debug/fix that as an experienced-programmer-without-django-experience. You’re used to SQL, but not to the django ORM.
You get a tip “use the shell”. Which shell? “The c shell? bash?”. No, they
manage.py shell. With a bit of
_meta and some copy/paste you can
get a list of the available models.
You could do the same with
manage.py dbshell, which would dump you in the
sql shell. List the databases and you get the same answer. Pick a promising
table and do a
select * from unicodex_codepoint.
You can do the same in the django shell with:
from unicodex.models import Codepoint Codepoint.objects.all()
The rest of the presentation was a nice combination of showing what happens in SQL and what happens when you use the ORM.
Once you get to the double underscores, for following relations and field lookups, the ORM starts to get more useful and easier than raw SQL:
It was a fun presentation. I can’t really do it justice in a textual summary: you should go and look at the video. It is much more convincing that way. Yeah, the video is already online: https://www.youtube.com/watch?v=AIke7IZdVJI
There’s a whole part about
Q() object and the magic ways in which you can
combine it with
How would it translate to SQL? You can show the latest SQL query:
from django.db import connection connection.queries[-1]
At the end of the presentation, she went back to the original usecase and started bughunting with the ORM. In the end it was a bug (a unicode ‘bug’ character) at the end of a filename :-)
For Jessica, it all began at PythonNamibia2015. She went there, not because she wanted to learn python, but because she was bored. And the conference was free. It had all changed by the end of the conference! Thanks to the organizers that inspired a lot of people there to become active with python.
In 2017, she helped organize a ‘computer day’. Talks and panel discussions, poster presentations, software project presentations and workshops. It was aimed at kids!
Especially the panel discussions were aimed at the newcomers: trying to transfer experience. In 2018, there were separate workshop days, amongst other an introductory python course.
There are some differences from organizing a conference for adults:
Ok… all this is quite some work. Why go through all that trouble?
She teaches at a school with 215 students. But there is not a single computer. How to let those students get into contact with computers? Organizing such a computer day at the university helped. They could use the university’s computers in the weekend that way. And it helped get some sponsorship for computers for her school.
Help is needed here!
Thanks for the python and django foundations, as they were the two that made the computer day 2017 possible. In 2018, they were more sponsors: thanks!
Rachel has used django since it was created, but this is her very first djangocon.
She hasn’t had a “normal” salaried job for 23 years. She’s been freelancing, self employed, some contracting to the Scottish government, had her own company, etc. So here are some tips for those that think about such a life.
We as programmers are very lucky. We’re very flexible. We can work anywhere we like at anytime we like. As a bus driver, you have to show up when the bus timetable says…. We have lots of freedom, we ‘only’ have to arrange it so that who we work for agrees to it.
She has a side project, https://luzme.com, for searching for low ebook prices. Side project? Not really. She uses it to try out new techniques. Firebase, django channels, etc. Fun! And it helps her learn a lot more than when she would have just followed tutorials.
Ok, back to you. You want to do something for yourself. Perhaps your own company? If that is what you want, you have to learn how to run a company. You have to learn how to do finances (otherwise you work for your accountant instead of the other way around). You have to learn how to get customers. You have to learn to…
Tip: start small. Build a wordpress plugin or write an ebook. One-off. No support (though you could let people pay for it).
Support: that’s service. Real support. Or “software as a service”. The good thing: recurring revenue. But: it costs time. Your time. And that is your most precious resource. So don’t treat your time as being free. Charge for it (also internally in your company), as if it
Publishing. Blog, vlog, podcasts.
How can you monitize?
Yes, you can:
OK to start small.
You don’t learn until you start to fail.
Don’t be afraid to think big.
Choose business-to-business and charge more.
Regular customers want everything for free or cheap. Businesses are used to spending money to get something of value.
Choose the customers you want. Define the customers you want.
Don’t be afraid to say “no, you’re better served by someone else” to a customer you don’t want. If you say “yes” to “anyone” at “any price”, it won’t work.
Charge more. Yes, really. And even more. If you do valuable work, it is valuable.
You have to get past your “imposter syndrome”.
Above all: start a mailing list.
There are four things you need to learn to say: I don’t know, I need help, I was wrong, I’m sorry. It is OK to say those things. (She has it from a book “still life”, by Louise Penny).
She’s a veteran programmer, but she doesn’t know everything. So it is fine to look something up. It is OK to say you’re sorry (because she deleted a whole project once: she apologized to her boss and therefore she was relieved enough to be able to think about the backup she could restore… :-) )
Also important: yet. I don’t know yet. I can’t write ruby. Yet. Next week, when I have the project, I’ll be able to.
Statistics: charts of posts per year and per month.
My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):