This evening we had another of our regular Dutch python usergroup meetings. I regularly make summaries, so also today.
And a certain other individual (hi brother!) also made a summary :-)
Konrad DeLong is now writing a MSc thesis on… python and concurrency. Concurrency means a computer is executing several things simultaniously. Possibly even interacting with eachother. There’s a lot to talk about. Here’s some.
Python has two libraries: tread
(low level C) and threading
(OO API,
modeled on Java).
People don’t quite like threading in python because of the infamous GIL (Global Interpreter Lock). Only one thread is allowed to run python interpreted code at the same time. This is needed to make python thread-safe. So the interpreter itself only runs in one thread.
The GIL is bad as there’s no gain on multicore machines. The OS cannot schedule various python threads on several cores: python itself locks itself continously on one core and the rest of the python process has to wait even though cores are sitting completely idle. There’s also a problem with responding quickly to events.
On the other hand, the GIL is also good. It means you can optimise the hell out of it on a single core: lots of performance and safe to tweak. And, importantly, it is a safe environment for C extentions. And… there are other solutions apart from threads.
There’ve been a few tries at removing the GIL, mostly unsuccesful. The current policy for cPython is that a patch that removes it gets accepted if the speed on single processor machines is not slower…
Some alternative solutions:
Multiple processes instead of threads. The multiprocessing
module
mimicks the threading
module, only it runs in separate processes instead
of separate threads. The invocation (apart from the import statement) is
exactly the same. The end result is not completely the same. You don’t
have implicit shared memory, for instance. If you want to communicate
between processes you need to do it explicity with inter-process
communication.
Jython/ironpython.
Asynchronous solutions. There are a couple of them. Asyncore, twisted, tornado. The basic pattern is always “don’t call us, we’ll call you”. The one big gotcha is that you have to write your entire application this way. Your code is split in lots of small parts that are connected by callbacks and started by a central core (“reactor” in twisted). It is still a single CPU business, but the utilization is better. You can also have “hidden” async: eventlet, pyevent, gevent, epoll, kqueue, etc. They mostly react to kernel events (“yeah, I’ve read your full file, now you can have a go at it”). A core concept here is “greenlet”. (Reinout: You’d better google for it instead of fully trusting my summary here). Kamaelia functions in a similar way, only by tieing everything to generators. And EVE online (a massive multiplayer game) uses “stackless python”. Second life is where greenlets originated.
Albert Visser tries to work with python in non-python environments. He’s a cobol programmer and discovered he could generate quite a lot of cobol code with python which cut down on the amount of medial work.
Another area was the batch language called JCL. He wrote a tool for it that converts from a visual JCL representation to the actual batch commands.
Roald de Vries wanted to know what really
happened when he decorated a class’s method with the @property
decorator:
class MyClass
@property
def my_method(self):
return 'bla'
It isn’t an attribute or so. It turns out to be a special class attribute: a
so-called descriptor. A descriptor is an object with a __get__
method
(and optionally __set__
or __delete__
.
The attribute search other on an object is:
If the attribute is a special attribute or data-descriptor: __get__
The object.__dict__ key is looked up.
__get__
if attribute is a descriptor.
The class’s __dict__ key is looked up.
Behind the scenes, methods implement the __get__
method, too (which
returns a bound method).
Gijs Molenaar is still studying on computer vision and python. Computer vision is extracting information from images, usually with some sort of predefined purpose (“is someone smiling in this image?”, “what is the number on the license plate?”). Information extraction uses several techniques. Color space conversion (RGB to Hue Saturation Value and only looking at hues, for instance). Edge detection. Line detection. Clustering.
Some usecases are character recognition, handwriting recognition, surveillance, augmented reality. A practical example is the Dutch section control (trajectcontrole) where they need to find the car in images and then need to find the license plate, then color normalization, then you need to find the characters, then… There’s a free book from a microsoft researcher on the generic computer vision subject.
OpenCV originated at Intel: an open source computer vision library with a python binding. The late 2009 2.0 release is pretty good. It includes a “ML” machine learning module for dealing with the fuzzy data that comes out of the computer vision part.
He demoed a 10-line python program that started up his webcam and did edge detection with him standing in front of his computer. Including straight line detection that picked up the straight edges of the projection screen, his glasses and the ceiling fixtures. Wow.
He’s working on debian installers for the new API.
On the whole, the installation procedure (unless you’ve got an official package) is hairy. The library itself is sometimes buggy. The mailinglist and irc channel is useless. And it is sometimes undocumented. But it is getting better. And… there’s an O’Reilly book: Learning OpenCV.
Klaas van Schelven lived in the stone age for a while as he didn’t know about virtualenv two months ago.
Virtualenv provides an isolated sandbox python environment so different installations don’t step on eachother.
Pip installs packages and allows you to install a fixed known set of them. An easy_install replacement.
(Unrelated note to myself: try out the xmonit window manager.)
Guido Wesdorp likes small tools and frameworks, so he’s working a lot with small wsgi “frameworks”. But you cannot have for instance the ZODB object database that way as it is too heavy. But he likes object database.
Enter pydirs. A simple codebase of just 500 lines. Everything is stored on disk as directories and text files. Great for debugging: you can use standard “cat”, “ls”, “grep” etc.
Basic python datatype attributes are stored as plain text files, everything else is a python pickle. Pydiritems (so: the dictionary-like tree of objects) are directories. It loads fast enough to render it usable for cgi and wsgi.
It is simple and effective. Quite stable. He uses it in production on a website with some 20000 objects with 5 or 6 attributes each. He’s planning to release it Real Soon Now. Just one small modification left to make…
(Update 2010-03-29: pydirs is now available)
My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):