Europython: domain specific language and bayesian classifierΒΆ

Tags: europython, europython2006, plone

Anders Hammarquist - Python as a domain-specific language

They needed some new custom language for the BLMs (business logic modules) for their CAPS system. They had an older version, but that didn't have decent inheritance, the source files were too large, there was no introspection, etc.

Why not do it in python? Well, there are no typing constructs and you can't really add new keywords to the language (which they needed). But python had everything else that they needed. So they modified python a bit to fix it: metaclasses (pypy's __extend__), object typing, syntax checking. Also some general syntax abuse, like using default attributes for type declarations.

They use two pieces of code from pypy : __extend__ to extend exiting classes with some attributes and __new__ for compiletime checking of some things like having just one type declaration: just making sure the code (in the domain language) is sane.

What did they get? Python with some strange conventions. They gained all the python features.

Tarek Ziadé: CPSBayes, naïve bayesian classifier for CPS

Bayesian classifiers are simple probabalisting classifiers. They are used for document classification, spam detection, text mining, data mining, etc.

A bayesian classifier is given texts sorted in several categories (like "ham" and "spam") and will grab the words out of it. It then calculates which words are probable indicators of ham or spam. (That bloody Frenchman had an example with "winners" and "losers", with winner being France and losers Italy, Portugal and Germany) :-)

The bayesian classifier needs to get the relevant words from the text (so: exclude "the", "it", etc.) and process them. Reuse: it uses textindexNG3's splitting (on spaces, tabs, points, commas), normalising (lowercasing, and for French, it removes all accents, for instance) and stemming (make everything single instead of plural).

Their BayesCore is a pure zope3 product, CPSBayes is a tiny zope2 CMF layer around it.

Tarek experimented a bit with it and you could use it a bit for automatic filling in metadata and for automatic linking between documents.

 
vanrees.org logo

About me

My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):