Reinout van Rees’ weblog

Developer apprenticeships (Christian Theune)

2009-07-02

Tags: europython2009, europython

A quote by Joel Spolsky: “you can go through thousands of job applications and quite frankly never see a great software developer. Not a one”. That is a problem Christian had too with his company. Finding a good programmer is hard.

And hiring is delicate. If someone doesn’t work out, you could be blowing a lot of money. And the programmer in addition also has to fit into the culture. And what comes out of the university often isn’t immediately usable for various reasons. Soustrup said something like “a lot of people can write a single function but haven’t build a full system before”.

Christian views programming as a craft. Craftmanship is something you learn from a master. Apprenticeships are also relatively cheap compared to the money you lose on a mis-hire. Apprentices are often young and thus are quick learners. They also adapt quickly to their working environment.

One thing they have to learn is to look for information themselves (google, man pages, RFCs).

A risk in apprenticeships is that they’re only training code monkeys. Without much theoretical background. That can be fixed by following a couple of courses at a university for instance.

Christian also supervises some apprentices in another company. He goes over every 6-8 weeks to work with them for a week: basically a sprint.

The most important things he learned:

  • Rubber ducking. Don’t immediately solve the apprentice’s problems but let them explain it to you. Most of the time they’ll discover it by explaining it to you. Keep yourself in check.
  • Allow them to make mistakes. Mistakes is what you learn from. Mistakes are important.

The trainer has to be a good programmer. Otherwise he’ll just try and solve problems instead of helping the apprentices figure it out themselves. The apprentices must actually be programming. And include them in your daily business. Three apprentices seem to be the optimal size. Be prepared to do a lot of explaining over and over again. They’ll get it eventually.

Summary: it is worthwhile and cost-effective to coach apprentices.

Laura’s tip: ask prospective apprentices why they want to do this. If the answer is not “I want to create something”, pick another.

Hallway

Technorati tags: europython2009, europython

Python software foundation (Steve Holden)

2009-07-02

Tags: europython2009, europython, python

His goal with this talk: connect the Python foundation to us, the community.

The PSF (Python software foundation) has 112 members, most of whom are not core developers. Anyone that has done something for Python (code, organizing, blogging) can be nominated by any of the current members. You must have demonstrated some commitment.

Nobody knows how many Python users there are. But usage is growing. So we can expect a lot of new users. The nice thing about Python is that it has always been a friendly and welcoming community.

One thing that needs improvement is the newcomer-friendliness of the Python.org website. Ruby and openoffice are both examples of sites that are more enthusing and user-oriented. Steve’d like a new home page strategy: make it obvious you can do extremely cool things with Python. Help wanted! Python.org ought to be more of a portal that shows the vibrant community and the many projects that are available. News from projects. Job announcements. The Python planet links.

There are a lot of cool things done with Python. See the wx toolkit’s demo. Or the Phatch photo editor.

The conference activity is vigorous. Pycon, international pycons (from Brazil to the UK), EuroPython, etc. Unconferences are also appearing. PSF supports the conferences with grants.

There’s help needed for Python’s core developers. First patches people contribute often get effectively ignored. They fall in a black hole.

PSF has handed out development grants in the past. A big success here is the support given to jython (just enough to keep the project alive until Sun took over sponsorship). Jython is now at the level of cPython 2.5.

Some recent changes: bylaws changes to make it easier to add PSF members. A full-time conference coordinator. The treasurer gets paid. And there’s a new administrative assistant to catch up on some long-standing tasks.

Leading an open source project is hard as there’s “constructive anarchy”, which is good, btw. The PSF tries to enable rather than push. You can’t push volunteers anyway. So supporting community initiatives like conferences and projects. A big issue is the limit on available (foundation) manpower to implement the many ideas that are available. So: encourage people to lead the projects and support them with resources (mostly money) as required. But also with equipment, like the investment made in video equipment as used at the latest pycon to capture all the talks.

Money is not (yet) really the problem. (Wo)manpower is. Without people to run programs we cannot do everything we’d like to: grants, conferences, infrastructure, web site, publications, development, etc.

What’s also needed: engage the world. Python users are the best advocates. Give commercial users a channel to promote Python and to promote themselves. Publications! Coverage! And: Python is an excellent teaching language.

We, the attendees at the conference and all Python programmers, are the Python community. Python needs us. This is an opportunity! Pro-actively represent the community. Increasing Python’s popularity increases our own opportunities, too. Evangelism is OK. Let’s give something back to Python.

Kit and Jan

Technorati tags: europython2009, europython

Metaprogramming and decorators (Bruce Eckel)

2009-07-02

Tags: europython2009, europython

Basically, metaprogramming is code that modifies other code. Several languages have something like that. C++’s template metaprogramming (which is too difficult for most people). Java has aspect oriented programming and annotations. Ruby can add methods to metaclass objects. Python first had metaclasses and moved on to the much simpler decorators afterwards.

You use metaprogramming when you start to repeat yourself over and over again but you cannot filter out a common method or base class.

Built-in decorators in python are @property, @classmethod, @contextmanager. Twisted, turbogears, django and zope all have their own decorators for common tasks. Someone came up with an @accepts(int, string) argument checker for checking the type of method arguments. And a @memoize decorator can store a method’s result in a cache so that the method only has to calculate everything once and that the cached value is returned afterwards, all without having to deal with it in the method itself as the decorator handles it in a generic manner.

An example:

def addstring(function):
    function.mystring = 'added string'
    return function

@addstring
def f():
    return 1

This results in:

>>> f()
1
>>> f.mystring
'added string'

So the addstring decorator gets passed the item it applies to and returns it again. But modifying it in-between. The way it works is extremely simple, but the effect is big and very useful. It gets a bit more evolved when you want to pass an argument to the decorator (@addstring('hello')), but that’s what examples are for. See for instance the functools standard library.

You can also use decorators on classes. They work on a whole class instead of on a function. Typical use is to register the class somewhere or to augment it. For example (see activestate recipe) adding the rest of the ordering methods give one of __lt__, __gt__, etc.

Metaclasses are still part of the python language. A big problem is that a metaclass is more or less hidden; decorators are visible right in front of your class, on the line above. In plain sight. Metaclasses are much harder to understand and should only be used in special cases.

In case you think you need metaclasses: have you looked at __new__() in addition to __init__()? The output of __new__() is what gets fed into __init__(), so if you need to do some fiddling with the inheritance tree or other specialized weird thing, you can just use __new__() instead of using metaclasses.

Again, if you only want to add some attributes/methods or register the class or so: just use a simple decorator.

For an (according to bruce excellent) presentation on “class decorators: radically simple”, see http://pycon.blip.tv/file/1949345 .

Summary: decorators are easy and very useful. Don’t forget to use functools.wraps. Metaclasses are hardly ever needed and when you need it you’ll know that you need it.

Photographer at europython

Technorati tags: europython2009, europython

Things I helped create (Martijn Faassen)

2009-07-02

Tags: europython2009, europython

Comment beforehand: this is just a dump of some quotes hurriedly typed down while Martijn rattled on. Pick some nice ones out of the list and be happy.

This talk is about creativity in programming. Especially creating open source software. Some of his comments will contradict each other, which is fine. Both are valid, then, and just need to be weighed against each other. Sleep on it, pick one and change it later on if you’re wrong.

To create you need to pick a goal. You might not reach it, but you at least created something.

If you are inexperienced with the tools of creation, find a mentor. (He was helped by his father for his first project).

Inexperienced programmers often look for a trick or code snippet if they encounter a problem without understanding it. They lack the knowledge to really apply the basic idea in other situations. Knowledge helps to generalize and generalization helps creativity.

In creation, embrace your limitations.

Make the goal fit what you created. After the fact if needed.

Access to knowledge is greatly improved. Even early-80’s MSX assembly codes are easy to google now.

If all goals look infeasible, you might not create anything at all. Writing a computer game in the 1980’s seemed feasible. Writing a full 3D game now? Seems hard.

Creations unseen by others can still be valuable if you learn something from it.

Lack of knowledge means you cannot be discouraged by knowledge.

Recreating something is a great way to learn about it.

Networks stimulate creation. Networks bring creations in front of people.

Build on other peoples’ work.

Social software encourages creation by its users.

A creation is a project and a process as much as it is a product.

“Publish early” is a good idea. Not publishing in any case means no success.

Creation is not done in isolation. And it is often a group effort.

Martijn re-discovered the joy of quick creation in 1998 with python like he had with basic (in contrast to C++).

Feedback stimulates creativity. (For instance on the python prompt).

Good tools stimulate creativity.

Creation can be just a matter of speaking up. What works for God (God said “let there be light” and there was light) sometimes also works for people. Speak up, announce something, start something.

A community around software forms more easily if the software aims at developers instead of end-users as developers are more easily converted to contributors.

Software is a human endeavour.

If your approach doesn’t work, go with the flow of your strengths.

To create a business together with someone else is one of the most educational things you can ever do with your life.

A creation is successful if it continues to evolve after you’re no longer involved.

It is easier to make a creation evolve without you if you are not alone in its creation.

Creation if often started by communication.

Creation by a team is a lot easier if you are in the same room physically.

Getting an idea is often overvalued. Implementation is most-times what counts.

If you want a community to create something, avoid doing all the work yourself.

Name recognition is useful in creation. Just making regular noise on a mailinglist might be enough.

Europython was basically started by Martijn. He mailed Guido that there was to be a EU python conference and whether he wanted to attend (“yes”). Then he mailed around in the EU that Guido was coming to the EU python conference and asked for volunteers to help organize it. So others did the work and we have now Europython conferences.

Create something because it is cool. Not because it will succeed but because you will learn something from it.

Integration is a form of creation. Integration of software can integrate communities. (Example: the Five project that brought zope2 and zope3 back together).

Talk about what you create. Use some oft-repeated phrases in those talks. “Evolution, not revolution” for instance. After a while people were picking up these phrases.

Separate concerns. Each component gets a clearer purpose and clearer boundaries. And the basic components become candidates for reuse.

Document your creation.

To create, be clever and use what is already there.

To create, be lazy and unoriginal.

Martijn preparing his talk

Technorati tags: europython2009, europython

Lighting talks thursday

2009-07-02

Tags: europython2009, europython

Collaborations in healthcare open source

Link: http://www.chos-wg.eu

It is about collaboration between healthcare professionals. A patient record shared between several health actors in charge of a common patient. The family doctor has a coordinating role.

There’s already a lot of health care open source software, but they don’t talk together. So there is a need for a more modular approach in order to share those components. Python seems a good match for the project, but they need technical help for this. See the open source working groups of the international situation for telemedicine

Reimplementing the google app engine datastore in berkeley DB - java edition

The GAE isn’t really portable. That’s just for practical issues, not by design. One of the barriers is the datastore. So he chose to reimplement the datastore (as alternatives weren’t practical yet). Targeting small to medium apps. In the end he went for BDB which shared some features with the original google data store. He uses java for it, for instance as Python’s protocol buffers are dog slow. And the java implementation of BDB is pretty good and solves some issues with the normal BDB.

Python and excel (Chris Withers)

Python and excel: you could use CSV or the “HTML hack”. There’s something better: xlrd. This can directly read and write excel files.

Link: http://www.Python-excel.org/

tl.eggdeps (Thomas Lotze)

Link: http://pypi.Python.org/pypi/tl.eggdeps

It collects declared dependencies between eggs as a tree. And it can visualize them using dot/graphviz. You can filter out uninteresting packages (like setuptools) and zoom in on the dependencies of one specific package. You can also group for instance all zope.tal.* packages into one zope.tal node to make the image cleaner. And you can even filter by regular expression.

MOAI (Kit Blake)

MOAI is an open access server platform for institutional repositories. The server manipulates OAI feeds that are in some xml format. One of the users is http://www.cwi.nl which actually is the birth place of Python. Infrae harvested the 8963 documents in CWI’s repository. So in the demo Kit was able to find 16 documents by a certain Guido van Rossum from 1995.

So 14 years after Guido left CWI, the first fully fledged Python application is installed at CWI. Python has come home!

The cloud in Five (Kevin Noonan)

He demonstrates amazon’s cloud stuff. Cloudware is outsourced virtualization. So a virtual server or storage hosted on a remote platform. The two well known once are Amazon Web Services and Google’s app engine. It is good for scaling on demand. And you can get some outsourcing of your infrastructure. The oldest of Amazon’s web services is S3, the storage solution. There are Python libraries to connect to it and manage it.

How to hack like an evil overlord (Jonathan Lang)

How to hack like an evil overlord in five easy steps:

  • Shooting is not too good for my enemies. There is no software bug for which shooting is not the good solution.
  • Take a 5 year old as an advisor. They will always spot the flaws in your plans. So always have someone else look at your code.
  • Do not consume any more energy than will fit in your head. So no metaclasses.
  • If I have an unstoppable superweapen, I will use it as early and often as possible. So pdb instead of print statements.
  • “Push the button” should be enough. Automate everything. Don’t do long manual steps or the hero will slay you.

Internet censorship (Holger Krekel)

Governments are turning the internet and the mobile networks into the greatest mass surveillance system ever. Iran likes that as it can use our surveillance software and our data-retaining mobile infrastructure to retain all the protesters SMS messages and analyze them afterwards... So our western measures enacted for our protection is going to send thousands of Iranians into the torture chambers.

And the government in some countries is allowed to put trojan horses on your devices. And the government in Germany can in the future sensor any webpage they want. Just by executive decision instead of democratic process. And France tries to put a three-strikes law into effect.

And all that even though the internet is so great. But it is turning into a huge surveillance database for the government. There are technical counter-strategies. Unlocatable content. Untrackable connections. Untrackable access. And there are political counter-strategies like blogging, involvement, twitter, etc.

So support political actions and create cooler technology! For a free internet.

CMS in django (Tommi)

He showed an event-like system used for re-rendering placeholder-tags in an html page with the real content.

SciPy (Stefan Schwarzer)

At the end of July there’ll be a European conference for Python in science in Leipzig, Germany. http://www.euroscipy.org

PyCharm (Dmitry Jemerov)

They’re working on a new Python IDE: pycharm. Currently a plugin for intelliJ IDEA, but soon standalone. See http://www.jetbrains.com . Price not yet announced, should be available later this year.

He demoed it with nice warnings, quite intelligent autocompletion. Integrated test support. Snippets. Display of docstrings for methods with just a keyboard shortcut.

Psyco 2.0 (Christian Tismer)

He showed a speed test of a normal and a pure Python property implementation to see how much psyco2 can speed that up. On average, there was a 100x improvement. There’ll be a psyco2 release on Saturday.

Self-service terminals (Bernard Nikolaus)

There are self-service terminals at his university (Wirtschaftsuniversitaet Wien), implemented with Python, for paying study fees, taking photos, etc. So a card reader, printer, touch screen, etc. All vandal-proof.

The hardest thing of the project was to communicate with all the hardware. There’s a watchdog that monitors a subversion repo for updates. And it starts the main program. Zope is used with xmlrpc for the data. The data is stored in an oracle database. The use mostly WXPython widgets for the user interface. Quite big buttons as the interface is a touch screen.

Jacob Hallen

Please use political activism to prevent the recording industry from being allowed to lift us from our beds in the middle of the night for horrid crimes like downloading stuff or even developing open source software. It is a very real danger. We prevailed against software patents after a long struggle. There are politicians that are willing to listen, but we must be the ones to do the talking.

Python system information

PSI is a Python C extension to get information from the kernel via system calls and kernel hooks and so. It currently supports most unixy platforms already, but BSD is lacking and windows too. They do want to add that.

Twisted interface to Erlang (Thomas Herve)

TwOTP: twisted interface to Erlang. http://launchpad.net/twotp . Erlang is a functional language with a focus on scalability. There is an “ERM” protocol to communicate between nodes, which is documented. CouchDB is one of the nice applications made with Erlang. But as a Python programmer we want to access that too.

TwOTP has parsing/packing to/from Python types, an EPMD deamon implementation and implementations for server and client protocols. And monitoring of Erlang processes. Send and receive messages from/to Erlang processes. The good thing: every Python goodie is available to develop with, like UI, web libraries and database interfaces. Several of those things are hard(er) with Erlang.

Python for numerical analysis (Eric)

Most numerical analysis is still done with Fortran. Not nice. So he tried to do it with Python. All the goodies like eigenvalues, green functions, plots of the green functions. The number of lines in the program was much lower than with a comparable Fortran program. He hopes that the Python community will work towards promoting the scientific application of Python. It is so much nicer.

FilterPype network (Rob Collins)

Filterpype: “complex systems in 10 lines of code”. A classic case of a filter and pipe system is an oil refinery. With filterpype you can also work with pipes and filters: from source to sink.

There are base classes for all sorts of filters. Stuff you can do? Just pass it on, bzip compression, etc. You tie the filters together in a pipeline with a small config file that specifies the route between the various filters.

Link: http://www.filterpype.org/

TPS reports in django (Felix)

Django’s admin interface is highly customizable. At his company he needed to make some modifications, especially adding reports for their issues database. In the end, he generated reports with word by using a template in word’s xml format where he could get the software to fill in data. The template was simply uploaded to the site. The documentation on how he did it is all on the wiki of djangoproject.org.

Distributed version control system (Radomir Dopieralski)

Use it, it really changes the way you program. It is real easy to make a clean copy of something, work on it and possibly push the changes back. He thinks there should be more tools like that. For instance for bug reports: there is a tool that embeds bug reports in such a DVCS.

hatta is something that he made that embeds a wiki in the DVCS, including the whole page history. It comes with a small web server that you run straight from your DVCS checkout, so you can easily browse your version of the wiki and even change it via the web interface. And also directly in the source file, of course.

music stand, laptop stand

Technorati tags: europython2009, europython

Tapping into the web of data (Cosmin Basca)

2009-07-01

Tags: europython2009, europython, rdf

Cosmin works for DERI which is one of the largest semantic web research institutes of the world. (In one of the rainiest parts of the world, too, btw).

Why would you want to use the semantic web? Well, there is a lot of data and you might want to use it. You can easily aggregate multiple sources. You can evolve data sources without too much worries about migration issues: the semantic web is by nature very robust. You might not support all the new features of a new format, but your application will probably still work.

Some popular much-used formats are FOAF and DOAP. Also interesting is DBPedia which provides information extracted from wikipedia. And as semantic web data is linked, you can actually already find a lot of linked information once you have a starting point. Start with a town and you find the name of a famous local musician which links to a music database in turn.

A starting point for the semantic web is the data format RDF. The core of RDF is a subject/predicate/object “triple”, for instance reinout/blogs_about/europython. With the addition that almost all items aren’t plain text but a url. So “http://reinout.vanrees.org” instead of “reinout”. The big advantage: a URL is a strong reference. It is unique.

SPARQL is a query language for semantic web files. But just as you have an ORM (object-relational mapper) for mapping pure sql queries onto objects, you can have an O-RDF (object-rdf mapper). This is handled by their SuRF tool. How do we see RDF data or as a set of resources? Resources map much more naturally to objects. SuRF is inspired by ActiveRDF which was developed at DERI for the ruby language.

An RDF resource is defined as all triples (subject/predicate/object) with the same subject. So it only looks at the “outgoing” predicates/relations. If something else says something about you, it is not automatically included. The predicates are accesses as attributes, so instance.namespace_attribute. It uses lazy loading. So cosmin.foaf_knows returns the friends that Cosmin knows. There’s also a convention for looking up the “reverse” properties (so what other instances are telling about us). Using a dynamic language like Python really helps here.

SuRF has session handling which means that if you modify data, the changes aren’t written back to the various aggregated data stores until you call a commit(). For data sources, you can write a plugin for SuRF and provide an RDFReader and RDFWriter class.

The code is open source (BSD) and available on pypi. It is easy to get started with a simple read-only instance.

If you want to integrate SuRF with a web framework, try to pick a framework like pylons (and probably turbogears) that doesn’t have a lot of home-grown components. In pylons it is easy to plug in a different data source (so: SuRF).

Web of data talk

Technorati tags: europython2009, europython

Turbogears: a framework reborn (Mark Ramm)

2009-07-01

Tags: europython2009, europython

Mark Ramm is the third maintainer (and the longest lasting one!) of Turbogears. With a couple of people they rewrote turbogears (with the same external API) on top of pylons.

Turbogears is a modern MVC platform. It is mostly cobbled together from various bits and pieces and libraries found on the Internet. Building a web framework from spare parts can be fun. It is no fun to write everything yourself: just use sqlalchemy for the database integration instead of writing your own, for instance.

Why would I learn Turbogears? Well, it is reusable learning. You have to learn sqlalchemy, which is useful in other projects and frameworks. Genshi templates, the same way. Most of they key elements of turbogears are useful elsewhere. That’s not always true of other tightly coupled web frameworks.

Turbogears 2.0 is better than 1.0 because there are so many good quality python libraries nowadays. For instance beaker for caching. And the whole WSGI middleware thing with handy tools like WebError.

When you make a whole framework yourself, it is hard to get the individual pieces right as it is hard to think about the interfaces between the pieces. If you re-use a template system instead, you make it inherently easy to switch to other template systems.

Mark plugs sqlalchemy as being the best ORM (object relational mapper) in any language.

A lot of new web frameworks don’t work that well scalability-wise. Many design decisions don’t work that well. Too small sql queries and so. Mostly because of something called the active record pattern. There’s a one-on-one mapping between objects and database tables. With the help of sqlalchemy, turbogears 2.0 uses a better mechanism: eager, lazy and dynamic object graphs. The data mapper pattern. Sourceforge is going to use/is already using turbogears for most of their pages!

Turbogears is an industrial strength web framework that can handle huge deployments. But it is also still easy to get started with for small tasks. “You should be able to easily create a form today and scale tomorrow”.

Birmingham at night

Technorati tags: europython2009, europython

Introduction to PyObjC (Orestis Markou)

2009-07-01

Tags: europython2009, europython

If you fire up “xcode”, you get the option to create a python cocoa project. This sets up a basic structure for your Mac python application. Inside xcode, you can draw the user interface in the “Interface Builder” with input fields, labels, buttons and so on.

OSX uses a model/view/controller (MVC) architecture for the applications, so you need to write a controller now to react to events in your fresh user interface. The controller is a python file. You normally have placeholders for the events and actions and link them with xcode (drag/drop) to the user interface, though you also can do it with python.

In case you wonder about all the prefixed names (NSSomething, NSAnother, IBSomething): objective-C doesn’t have namespaces like python, so everything needs a prefix (NS for NextStep, IB for interface builder, etc.)

When you do this, you program with Cocoa. Cocoa is build with objective-C. Objective-C is open, but it is basically useless without Cocoa. Cocoa is a set of classes to create OSX (and iphone) programs. Strong MVC support. There are lots of libraries (webkit and so).

PyObjC is a bridge that allows you to interact with Cocoa classes. Three warnings:

  • You must learn Cocoa.
  • You will curse all other UI toolkits.
  • You can’t develop for the iPhone.

Your python NSObject subclasses must adhere to Cocoa conventions as they’re just proxied objects. This means that they’re not really pythonic. You simply have to follow the cocoa conventions. The good news is that after a while you get used to it. A handy feature (as objective-C uses getters and setters methods that don’t look that well in python) is the key/value support which translates pretty well to python properties (so something that calls a getter/setter behind the scenes).

Conclusion: it is not as easy as you’d ideally want. But it is straightforward once you get used to it. And, to repeat, you need to learn Cocoa and objective-C anyway. It is possible to write a nicer wrapper, such as Ruby has apparently done. But it takes a lot of work. And there’s something to be said for the current straightforward system.

Two handy links: http://www.cocoabuilder.com and http://www.stanford.edu/class/cs193e

Comments on the iphone

I exchanged some emails with Mikko who said “actually you can develop for iPhone with PyObjC”.

I replied “The presenter said some people seemed to get it working after jailbreaking their phones and installing python manually on it.” and asked for more information from Mikko. Here it is:

Here is some old discussion:

http://www.telesphoreo.org/pipermail/iphone-python/2008-November/000244.html (check also December)

  • Jailbreaking definitely works
  • PyObjC without ctypes should work even without jailbreaking
  • It is possible to write Python apps for Appstore - there is no legal issues as long you cripple Python a bit or sign your binaries
  • It might not be worth of the trouble to use Python on iPhone, since you have nice HTML5 + Javascript frameworks with direct native bridge like Phonegap: http://phonegap.com/ for which you can use to accomplish almost everything
Reflection

Technorati tags: europython2009, europython

The science of computing and the engineering of software (Sir Tony Hoare)

2009-07-01

Tags: europython2009, europython

In his younger days he worked on an Algol 60 compiler. It was much better than the legacy language of the time, Fortran, of course. Fortran was 4 years old at that time :-)

Science of computing or engineering of software: does any of those two exist? He hopes to show that both exist and that both exist because of the other.

There’s a scale from science to engineering. At one end the practicing engineer and at the other the pure scientist. With lots in between like an engineering scientist. A practicing engineer produces working software, but uses scientific thinking behind the scenes.

Some differences between the two, scientists and engineers:

  • Science is more long-term. It is interested in eternity. You hope that what you discover is valid for ever. Commercial engineering products are often short-term. If you have an engineering product, it will at some time get outdated.
  • Scientists seek perfection and are idealists. There is no need for a practical application. Lasers were discovered long before they were used in consumer projects. An engineer’s task is to not be idealistic. You need to be realistic as you have to compromise between conflicting interests.
  • A scientist is after certainty. The more proof, the more certainty. An engineer has a lot of uncertainties and has to learn to live with them. So an engineer is after confidence. Risk management. Confidence in the solution. As long as the customer can be given the same confidence.
  • Perfection for the scientist and adequacy for the engineer. Good enough is good enough.
  • A scientist wants to generalise. An engineer has to adapt to a particular marketplace or situation, and so is after particularity.
  • A scientist wants separation of concerns. Specialise. An engineer has to integrate several partial solutions in one whole solution.
  • Science: unification of a theory is important. An engineer is after diversity: have knowledge of many theories and solutions and combine it.
  • Science is mostly after originality. Plagiarism is The High Crime. Engineering is after best practice. Originality means risk. The worst thing that can happen is if a project fails because of undue risk-taking, possibly even leading to imprisonment.
  • Formality is science’s terrain. Detailed mathematical proof if possible. Nowadays, there’s a lot of computing going on behind the scenes. And who writes that software? Often research results end up rather in software instead of in formal articles. An engineer relies on research results embedded in software, mostly. In places where the software cannot reach, the engineer relies on a finely-honed intuition. You won’t see the word intuition in a scientific article.
  • Scientists want aforementioned programs to be correct. An engineer is after dependability in the software. Suitability to its purpose, for instance.

Sir Tony assumes most of us present at Europython to be software engineers. So he doesn’t have to proof that a software engineer exists.

But what about the science of computing? Well, the kinds of questions a scientist asks are “what does a program do?”, “how does it work?”, “why does it work?”, “how do we know?”. The same questions an animal researcher asks for instance.

  • What does it do? That’s described by a specification. But then we need to have a framework for making such specifications. Here there’s a clash with engineers. Most software doesn’t have the kind of specifications that would satisfy a scientist. On the other hand, it could be considered the task of a scientist to come up with the specifications. Take aircraft for instance as a reason for putting the task in the scientist’s hand: the first aircraft were build without specifications. They tried things. Some worked, some not. And it took quite a while for science to come up with a framework for specifying the properties of an airplane so that calculations could be done on specifications and proof of airworthiness could be delivered.

  • How does it work? A start is to split a program into modules with defined interfaces. Then you can start looking at the modules.

  • Why does it work? You can answer it by the theory of program semantics. You have the programming language as a start. Try to find out if the program meets the specification.

  • How do we know? As in all other fields: by calculation and proof. And

    here we’ll have to use computers again. Code analysis tools. There is research interest in proving whether programs actually work. Moore’s law helps us here in lowering the cost of the research tools that look at this. You already have program analysis tools that warn of common errors. Tony has seen the successes of such tools in practice.

One day...

  • Software will be the most reliable component of every product that contains it.
  • Software engineering will be the most dependable of all engineering professions. It doesn’t rust. It cannot be eaten. It doesn’t decay. If software becomes inserviceable it is because of errors/bugs in the original product and not because of the passing of time.

All this goodness happens because of the successful application of research into a) the science of programming and b) the engineering of software.

Question: is there research that can help marketing come up with specifications so that we engineers can build the software? Answer: that’s the engineer’s job. Marketing doesn’t understand programming. Neither does marketing understand the customers. (Loud laughing at this point).

The most dangerous place for me: the book table

Technorati tags: europython2009, europython

Semantic applications with CubicWeb (Nicolas Chauvat)

2009-07-01

Tags: europython2009, europython, rdf

CubicWeb: the semantic web is a construction game.

The value of a network like the current internet grows exponentially with the amount of information that is available and linked. The semantic web extends that linking to data instead of just documents.

The semantic web is a world-wide database with URLs as the keys into the “database”. Specific semantic web formats include RDF and OWL. And ontologies describe what you’re describing. There’s even a specific query language: SPARQL.

Our current frameworks don’t really support all this, so we need adapted frameworks. For instance what logilab did with cubicweb. Cubic has a couple of core concepts. An entity-relationship model of the data. Views to present results of queries (RQL, soon SPARQL). The visible web app uses html and json views; there are also semantic views that output rdf and owl.

Cubicweb’s back-end combines/aggregates various sources such as sql, RQL, LDAP, comma separated files and is queryable with RQL. So if you don’t like the second layer (the web engine), you can always reuse the back-end layer as that only communicates with plain queries. The aggregation allows you to store for instance metadata in an sql database (“last modification time” and so), cubicweb will aggregate it. In his demo at the end he showed turbogears as a front-end as proof that you can use the back-end separately.

For the front-end, there is support for partially generated user interfaces. And there is a library of reusable components (called “cubes”).

They’re keen on agile methodologies at Logilab. So cubicweb supports that. For instance by generating a basic CRUD (create, delete, etc) user interface based on a data model. That’s all you need to get started. You can then improve views progressively. And there’s support for data migration as the data of course also evolves. They’re using cubicweb internally in logilab for 5 years now and always managed to migrate the older content.

Aformentioned “cubes” are python classes that define their attributes in a schema style (like zope’s schema). There are more frameworks that do that now, but 7 years ago it was much less common. Those entities are tied together by relations: also a class, but now with a subject and an object attribute (so source/target for the relation). Semantic web style linking! Cubicweb aggregates all those cubes’ content so that you can query it.

Some examples of build-in semantic views: owl, foaf (friend of a friend, social networking), doap (description of a project) and a couple of common microformats as RDFa, iCal, vCard.

Since last year, version 3.0 is out as a LGPL-licensed download.

Nicolas Chauvat

Technorati tags: europython2009, europython