Django: excel output instead of csv

Tags: django, plone

Ten years ago I made a time registration system for our company. Yeah, I know. Time registration systems are what everybody writes just for themselves instead of using one of the existing ones. Custom wishes: the same old excuses :-) But it was good fun to build and still is.

At that time it was becoming popular to build the frontend in javascript. So I build it with pure, unashamed django templates. Custom template tags for formatting; formsets; the whole hog.

After ten years, the website is still in heavy use, with only minimal work needed. Well, a bit too minimal: recently I really had to update the site as it was still running on python 3.5 and django 1.8. As it was pretty standard django, updating wasn’t that hard. (I gave a talk about upgrading old django websites with docker recently, which might be interesting).

The old CSV export

Such a time registration system needs to output reports. Lots of different reports. I picked plain *.csv for that, as at the time there wasn’t a handy xls exporter yet. The best one at the time needed you to install openoffice on the webserver, which wasn’t ideal.

Probably due to my zope/plone (python cms) background, I liked subclassing a lot. So I made a base class for csv export:

class CsvResponseMixin(object):

    prepend_lines = []
    header_line = []
    csv_lines = []

    def csv_filename(self):
        .... return filename generated from title or so

    def render_to_response(self, context, **response_kwargs):
        """Return a csv response instead of a rendered template."""
        response = HttpResponse(content_type="text/csv")
        filename = self.csv_filename + ".csv"
        response["Content-Disposition"] = 'attachment; filename="%s"' % filename
        writer = csv.writer(response, delimiter=";")

        for line in self.prepend_lines:
            writer.writerow(line)
        writer.writerow(self.header_line)
        for line in self.csv_lines:
            writer.writerow(line)

        return response

Something like that. Django views that subclassed this only had to have @property methods to return:

  • Optional extra lines at the top for some metadata (the “prepend lines”).

  • The line with column headers.

  • The lines with the actual csv content.

That worked quite well, apart from one problem: excel. And country preferences. Different excel versions (and openoffice/libreoffice, too, for that matter) reacted differently to what I outputted. Whether you installed the Dutch or English version of excel matters in which character is used as a decimal separator. In NL it is a comma, in EN it is a dot. Of course, the two persons that used my website the most both had a different version…

In the end the most reliable variant was to use delimiter=";" and to just output everything with dots for decimals, as is the default. And to tell those two persons to adjust their import settings accordingly.

The new excel export

Like I said, I recently upgraded the website. Ubuntu 18.04 => 22.04, python 3.5 => 3.10, django 1.8 => 3.2. Somewhere in that chain, the output of python’s decimal.Decimal type, that I used for money, changed in the csv output. It used to output 26643.04, but started to output 26643.040000000 instead. I haven’t yet found the changelog entry that explains where the change comes from. If you know, please mail reinout@vanrees.org and I’ll update it here.

The end effect was that excel’s import of the csv (and apple’s Numbers, too) would show the 26643.040000000 as a string instead of as a number. Bah.

With a bit of string formatting, I could have probably fixed it. But I thought there probably was a pretty good xls outputter nowadays: why not make it more comfortable?

Yes: xlsxwriter fit the bill, so I added it to my requirements.

As I already had a base CsvResponseMixin class, it was easy to change every csv output to xls in one go by changing it to an ExcelResponseMixin.. The content type had to change to application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, which is terribly long, but which points at the joys of a bit of standardisation instead of the old lock-in practices.

Python’s csv module has a writer that happily accepts a bunch of .writerow(some_list_of_items), but the xlsx write wants to know the row and column of each cell. Well, there’s an .write_row() method, but that still needs the row number. So it is back to the good old row_number += 1 like I remembered from my Pascal years:

workbook = xlsxwriter.Workbook(response)
worksheet = workbook.add_worksheet()
# See text below, some lines have to be added here.

row_number = 0
for line in self.prepend_lines:
    worksheet.write_row(row_number, 0, line)
    row_number += 1  # yeah, right...
worksheet.write_row(row_number, 0, self.header_line)
row_number += 1
for line in self.excel_lines:
    worksheet.write_row(row_number, 0, line)
    row_number += 1

workbook.close()
return response

This worked like an absolute charm. Only problem was that some of my views returned django model instances. Python’s csv module would just call str() on those, but xlsxwriter would complain that it couldn’t call float() on it. Ok: different default. So I had to configure a mapping from my django model classes to formatter. I added these lines (where I indicated it in the code above):

worksheet.add_write_handler(Group, _django_model_instance_to_string)
worksheet.add_write_handler(Person, _django_model_instance_to_string)
worksheet.add_write_handler(Project, _django_model_instance_to_string)

And the function that does the conversion:

def _django_model_instance_to_string(
        worksheet, row, col, instance, format=None):
    # See https://xlsxwriter.readthedocs.io/working_with_data.html#writing-user-types
    return worksheet.write_string(row, col, str(instance), format)

The advantage of xls export instead of csv

I updated the website with my changes. After 10 minutes I got a success message: “works!”.

With the csv export, I was used to error messages after every tweak. Apparently the xls export is much more robust. So: recommended!

Note: the relevant code (at moment I’m writing this) is available on github.

Addition: I got a message from Chris Adams on github with a note on streaming big responses. With csv, you can do that as it are just lines, but xlsxwriter needs you to have the entire thing ready before you can start returning the response, as an xlsx file is a zipfile. He points at using the “in-memory” mode to speed it up when you have large django querysets.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):