UnicodeDecodeError in plone’s catalog¶
For the second time in a few weeks I've been bitten by a UnicodeDecodeError in a collection (or smart folder or topic):
Module Products.PluginIndexes.common.UnIndex, line 393, in _apply_index UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 9: ordinal not in range(128)
The error occurred to me in the following case:
- A catalog index (like some custom "organization" FieldIndex) contains both string and unicode values. So both
"zest software"andu"Universit\xe9 de Paris". - You have a criteria where you select that Université de Paris.
- You view the collection... boom!
In my case, I parsed an xml file and the parser returned everything perfectly as unicode. Afterward, I did some string processing on it, like organizations = orgfield.split(",") to split some string on commas. The result is, surprisingly, a mix of normal strings (the entries containing only ascii characters) and unicode (the university with the accented character).
The solution was to do an organization = organization.encode("utf-8") before giving it to plone.