"The world is like a book that can be read in two different ways. We read in it the power of creation, to create something out of nothing. And we read in it the power to destroy, reducing something to nothing" - Rabbi Pinchas of Koritz
This chapter's goal is to implement the vocabulary of the previous chapter. It has to be a proof-of-concept.
First, the way of implementing is explained. After that, the resulting code and files are presented in groups, first the generic utilities. Then the vocabulary and views are made and filled. The filled vocabulary and views are visualised and, finally, used for a search.
The goal of this section is to present the programming and the tools used.
In order to keep everything simple, I have chosen to follow the so-called UNIX style of programming. This means that I build a lot of small programs (in this case: many XSL/T stylesheets) that do only one thing (and do that well). These small programs are then linked together in various ways to provide the needed functionality. This as opposed to the much-used style of programming which builds one big does-it-all program.
I used four programming languages/tools for the prototype. They were chosen for their ease of use and their technological appeal.
Being the most used and useful technology surrounding XML, this stylesheet language is used to transform one XML format into another. By transforming an XML file multiple times by consecutive stylesheets, quick results can be achieved. The extensive use that is made of this technology in the XML world makes it indispensable for this prototype. An explanation of XSL/T is found in the section called eXtensible Stylesheet Language/Transformation part (XSL/T) in Appendix A.
There are many XSL/T stylesheet processors, I chose the Xalan processor from Apache's XML project because it is one of the most actively developed processors, keeping up with even the latest standards and achieving nearly 100% standards compliance. This turned out to be a good choice, because I did not find a single sign of strange, non-standard behaviour. Xalan is implemented in Java and therefore has to be called from a Java program or it has to be executed on the command-line (I chose the latter).
Python is part of the so-called family of scripting languages. It has an easy and clear syntax, it allows for quick programming (typical python programs are 10% the size of comparable C++ or Java programs) and it has a large amount of modules that provide extra functionality. It is perfectly suited to the task of gluing together bits and pieces of functionality into one program. Because it is an interpreted language, it allows one to try out many things, which proved very useful.
There are four areas in which I used python mostly:
Processing files with XSL/T stylesheets by calling Java on the command-line and passing all the right arguments.
Guiding files through a series of XSL/T stylesheets using temporary files.
Reading and interpreting command-line parameters.
Serving as a CGI program, called from a web browser.
Sed is a small utility that processes text files according to command-line parameters. In this it serves a similar purpose as XSL/T, only on a much smaller scale. I used it for some tasks that were hard to do in XSL/T, like generating a DTD out of an XML file. A DTD is not XML compliant and contains a number of characters which XSL/T does not like. So they had to be left out and were later added to the file by means of a simple Sed command. I tried to use the right tool for the right job, not trying to see every problem as a nail, having XSL/T's hammer in my hand.
For displaying some XML files, I used CSS. This allowed me to view an XML file in a browser without having to generate lots of HTML to create the desired look&feel, since CSS can be used to specify how a particular XML tag should be visualised. For an XML and CSS compliant browser, I used the newest release of Mozilla (the former Netscape browser), because it supported all the new standards well.
The goal of the prototype was to serve as a proof-of-concept. It was not intended to be the most good-looking, blindingly-fast program possible. Therefore I made only a few browser interfaces. For most of the functionality it doesn't even make much sense to provide an interface, for most functionality is more the server-side kind. This results in a number of command-line tools.
The speed of the programs also needs to be mentioned. The processing of the XSL/T stylesheets takes up most of the time and is extremely slow. Every stylesheet takes about a second to process, which adds up to 5-10 seconds per program (which consists of multiple stylesheet processings). This is due to the fact that the python program executes every stylesheet by starting up a shell with a Java command-line. So, the results are not passed directly from XSL/T process to XSL/T process (which is possible), but are stored in intermediate files. So every time, Java has to start, the file has to be read, the XSL/T processor has to be run and the result written back to a file. This is about the slowest way to implement it, but that does not mind much for a prototype. Implementing everything in Java would have made it faster, but that also means much more programming effort and it means you have to deal with a big Java disadvantage: there is no direct way to pass parameters to a Java program when called as a CGI program from a web browser. For my prototype, this was absolutely necessary. But the much larger amount of programming work when choosing Java was enough reason by itself not to do it.
In technical terms, you map an incoming XML tree into an outgoing XML tree according to rules specified in the XSL/T stylesheet. The outgoing format is allowed to be something different from XML, though, in case that should prove useful.