Ten years ago I made a time registration system for our company. Yeah, I know. Time registration systems are what everybody writes just for themselves instead of using one of the existing ones. Custom wishes: the same old excuses :-) But it was good fun to build and still is.
At that time it was becoming popular to build the frontend in javascript. So I build it with pure, unashamed django templates. Custom template tags for formatting; formsets; the whole hog.
After ten years, the website is still in heavy use, with only minimal work needed. Well, a bit too minimal: recently I really had to update the site as it was still running on python 3.5 and django 1.8. As it was pretty standard django, updating wasn’t that hard. (I gave a talk about upgrading old django websites with docker recently, which might be interesting).
Such a time registration system needs to output reports. Lots of different
reports. I picked plain *.csv
for that, as at the time there wasn’t a
handy xls exporter yet. The best one at the time needed you to install
openoffice on the webserver, which wasn’t ideal.
Probably due to my zope/plone (python cms) background, I liked subclassing a lot. So I made a base class for csv export:
class CsvResponseMixin(object):
prepend_lines = []
header_line = []
csv_lines = []
def csv_filename(self):
.... return filename generated from title or so
def render_to_response(self, context, **response_kwargs):
"""Return a csv response instead of a rendered template."""
response = HttpResponse(content_type="text/csv")
filename = self.csv_filename + ".csv"
response["Content-Disposition"] = 'attachment; filename="%s"' % filename
writer = csv.writer(response, delimiter=";")
for line in self.prepend_lines:
writer.writerow(line)
writer.writerow(self.header_line)
for line in self.csv_lines:
writer.writerow(line)
return response
Something like that. Django views that subclassed this only had to have @property
methods to return:
Optional extra lines at the top for some metadata (the “prepend lines”).
The line with column headers.
The lines with the actual csv content.
That worked quite well, apart from one problem: excel. And country preferences. Different excel versions (and openoffice/libreoffice, too, for that matter) reacted differently to what I outputted. Whether you installed the Dutch or English version of excel matters in which character is used as a decimal separator. In NL it is a comma, in EN it is a dot. Of course, the two persons that used my website the most both had a different version…
In the end the most reliable variant was to use delimiter=";"
and to just
output everything with dots for decimals, as is the default. And to tell those
two persons to adjust their import settings accordingly.
Like I said, I recently upgraded the website. Ubuntu 18.04 => 22.04, python
3.5 => 3.10, django 1.8 => 3.2. Somewhere in that chain, the output of
python’s decimal.Decimal
type, that I used for money, changed in the csv
output. It used to output 26643.04
, but started to output
26643.040000000
instead. I haven’t yet found the changelog entry that
explains where the change comes from. If you know, please mail
reinout@vanrees.org
and I’ll update it here.
The end effect was that excel’s import of the csv (and apple’s Numbers, too)
would show the 26643.040000000
as a string instead of as a number. Bah.
With a bit of string formatting, I could have probably fixed it. But I thought there probably was a pretty good xls outputter nowadays: why not make it more comfortable?
Yes: xlsxwriter fit the bill, so I added it to my requirements.
As I already had a base CsvResponseMixin
class, it was easy to change
every csv output to xls in one go by changing it to an
ExcelResponseMixin
.. The content type had to change to
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
, which
is terribly long, but which points at the joys of a bit of standardisation
instead of the old lock-in practices.
Python’s csv
module has a writer that happily accepts a bunch of
.writerow(some_list_of_items)
, but the xlsx write wants to know the row
and column of each cell. Well, there’s an .write_row()
method, but that
still needs the row number. So it is back to the good old row_number += 1
like I remembered from my Pascal years:
workbook = xlsxwriter.Workbook(response)
worksheet = workbook.add_worksheet()
# See text below, some lines have to be added here.
row_number = 0
for line in self.prepend_lines:
worksheet.write_row(row_number, 0, line)
row_number += 1 # yeah, right...
worksheet.write_row(row_number, 0, self.header_line)
row_number += 1
for line in self.excel_lines:
worksheet.write_row(row_number, 0, line)
row_number += 1
workbook.close()
return response
This worked like an absolute charm. Only problem was that some of my views
returned django model instances. Python’s csv module would just call str()
on those, but xlsxwriter would complain that it couldn’t call float()
on
it. Ok: different default. So I had to configure a mapping from my django
model classes to formatter. I added these lines (where I indicated it in the
code above):
worksheet.add_write_handler(Group, _django_model_instance_to_string)
worksheet.add_write_handler(Person, _django_model_instance_to_string)
worksheet.add_write_handler(Project, _django_model_instance_to_string)
And the function that does the conversion:
def _django_model_instance_to_string(
worksheet, row, col, instance, format=None):
# See https://xlsxwriter.readthedocs.io/working_with_data.html#writing-user-types
return worksheet.write_string(row, col, str(instance), format)
I updated the website with my changes. After 10 minutes I got a success message: “works!”.
With the csv export, I was used to error messages after every tweak. Apparently the xls export is much more robust. So: recommended!
Note: the relevant code (at moment I’m writing this) is available on github.
Addition: I got a message from Chris Adams on github with a note on streaming big responses. With csv, you can do that as it are just lines, but xlsxwriter needs you to have the entire thing ready before you can start returning the response, as an xlsx file is a zipfile. He points at using the “in-memory” mode to speed it up when you have large django querysets.
My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):