Cheetah, cherrypy, and Unicode

logo

I’ve been working on a web project that uses a python stack. It uses cherrypy for the web server, cheetah for the templates, and sqlite for the database engine. I wanted to get Unicode working throughout the application. The application should be able to transfer Unicode data to and from the database, have Unicode text in the script files themselves, and be able to display Unicode text, with UTF-8 encoding throughout. It took a bit to gather all the pieces together, but it’s finally working and here’s how.

First of all, if you want your python files to have Unicode text in them, you need to let python know about the encoding. On line 1 or 2 of each of your python source files, you need to add a “coding” string. Here is what the header of all my python files looks like:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# vim: et sw=4 ts=4

It’s good to know that in python, Unicode strings and bytestrings are both descendants of basestring. So if you need to see if an object is a string, use isinstance(obj, basestring). Unicode strings act just like bytestrings, so you won’t need to make any modifications to your handling of your new Unicode strings.

In your classes that need to have a default string representation, use def __unicode__(self): instead of def __str__(self):. What I did in my class heirarchy is redirect __str__ to check for a __unicode__ method. That way all child classes automatically inherit this functionality. (I stole this idea from Django).

Here is the method:

class Model(Resource):
 ...
 def __str__(self):
     if self.__unicode__:
         return self.__unicode__().encode('utf-8')
     else:
         print '__unicode__ not defined on %s' % self
         return ''

For other cases, you may need to use the decode or encode, but the above code shows how easy it is to do that.


SQLite is already set up to use UTF-8 by default, so you’ll have no problems on that front. I didn’t have any Unicode issues with Cherrypy either.


Cheetah templates took a little bit of work to figure out, because the documentation doesn’t have a lot to say about the filter mechanisms. Also, they’ve changed since version 1. Anyway, what you need is pretty simple.

In every Cheetah .tmpl template, add the following lines before you output anything:

#encoding UTF-8
#filter EncodeUnicode
... everything else goes here ...
#end filter

The #encoding directive ensures that the appropriate “coding” line appears at the top of the generated “.py” files. That allows you to use Unicode characters in your source.


That’s it! Now you have Unicode, UTF-8 goodness in your friendly pythonic application.


Related Posts

Tags:
Posted in programming on November 16th, 2007 |

2 Responses

  1. Joerg Beyer Says:

    can give an example, how the controller should return the result of the cheetah templates subsitution?

    I tried to build a mimimal example and have this contoller method:

    def unicodetest(self):
    return Template(file=”mailfe/templates/unicodetest.tmpl”, searchList=[{"text":u"test Text öäüÖÄÜ€" }])

    which fails with a TypeError, telling me the template object is not iterable.

    returning a strings works fine - but than I have no unicode.

  2. Mychael Says:

    Hi there Joerg!

    Near the top of my template (.tmpl) file, I have the following directives:

    #encoding UTF-8
    #implements respond
    #filter EncodeUnicode
    

    In my controller code, I instantiate the template, set some instance variables for passing data into the template, and then at the end of the controller function I call:

    return template.respond()
    

    It looks like you are instantiating the template and sending in a dict. Maybe you just need to tell the template which method it is “implementing” and then call that method on it to return an iterable unicode string?

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.