This might sound stupid, but I had the infamous unicodedecodeerror
yesterday. The stupid thing was that it was in my Django model’s
__unicode__()
method. Just a simple one:
# -*- coding: utf-8 -*-
from django.db import models
class DataSet(models.Model):
name = models.CharField('name', max_length=80, blank=True)
def __unicode__(self):
return u'<DataSet %s>' % self.name
Django promises to always return unicode from the
database, so the u'<DataSet %s>' % self.name
should be perfectly fine. No
unicodedecodeerrors.
Something nagged in the back of my brain, so I added a test for it:
class DataSetTest(TestCase):
... other tests ...
def test_unicode(self):
non_ascii_data_set = DataSet(name='täääst')
non_ascii_data_set.save()
self.assertTrue(unicode(non_ascii_data_set))
# ^^^ Just testing that something non-error-like comes
# out of it.
And the sorry results:
Traceback (most recent call last):
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3
in position 1: ordinal not in range(128)
What? So there’s no unicode coming out of the database?
What’s really happening: I was getting the byte string 'täääst'
instead of a unicode string when I called dataset.name
because I just put
it in myself. The core of the matter is that I created the object instead of
Django creating the object when it pulls it out of the database.
So: in normal usage, Django would pull the object out of the database and put everything into the object as unicode. For this I modified the (now too elaborate) test:
class DataSetTest(TestCase):
... other tests ...
def test_unicode(self):
non_ascii_data_set = DataSet(name='täääst')
non_ascii_data_set.save()
id = non_ascii_data_set.id
data_set_from_django = DataSet.objects.get(pk=id)
self.assertTrue(unicode(data_set_from_django))
So the problem was only with the way I ran the test. And that I wanted to test
the __unicode__()
method in a too elaborate way.
(But I do want to test that method as I’ve seen errors in there. And those errors only occurred in the rendering of error messages, so the actual error would be hidden behind a unicodedecodeerror).
My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):