Subsecties

Grammar: involving the computer meaningfully

This section analyses two data structuring technologies and a way of interacting with that data. The first technology is XML. The second is the data structuring aspect of RDF, of which the linking mechanism has already been analysed above. At the end, web services are discussed as a means of accessing the information.

Data format: XML

With XML [2], documents are no longer inaccessible for computers. An XML file is a text file, containing information on a certain topic (like `I sell wooden doors 2 meters in height'), tagged with sort-of nametags that explain what is meant with the text. To say that `height' in above example is a property, you would do it in an XML file like this: <property>height</property>. The mechanism is analogous to HTML, analysed earlier.

Text is the only truly portable data, binary formats being inherently more difficult to use6.4. XML files, containing all the information you need, can be directly inputted into an application and can be also quite easily formatted for the human eye, either as a web page or in print. So, XML covers both the human and the computer side of information exchange. XML contains the syntax, mark-up can be applied to it: separation of mark-up and syntax.

The syntax of XML documents is defined in a schema like XSD [64], DTD [2] or RNG [65]. Such a schema allows applications to know what data to expect.

XML was designed by a workgroup that cleaned up and simplified the existing SGML standard [66]. Hypertext was a good idea in general: the small, simple HTML made it attractive and widely-used. Likewise, SGML was a good information structuring idea: the smaller and simpler XML made it acceptable and widely-used. Tools are ubiquitous: parsers, writers, editors.

It is helpful to compare working with XML to working with STEP's Express-based data, as STEP also tries to make computer-readable data available. On the technical level, STEP data is more elaborate and powerful than plain XML. On the acceptance and usage level, XML is way ahead of STEP data standards. The simplicity of XML is probably a big factor, but also the availability of tools. Every computer language nowadays allows you to use XML. When dealing with STEP, the choice of libraries is restricted; free libraries are hard to find, a drawback not encountered with XML. (STEP and XML have different origins and goals, so in that sense, the comparison is not completely fair).

Data format: RDF

As it has been shown in the introduction of the previous section, RDF links information items to other information items with a named relation. The source, target and relation are all identified with a URL6.5. Such a three-part combination of source, target and relation is called a `triple', see figure 4.3. With a set of `triples', everything that be expressed with XML can be expressed by RDF, too. A set of triples is the simplest way to encode information.

An advantage is that using triples is a very flexible method. The flexibility comes from the built-in RDF schema language [67], which allows you to state, for instance, subclass relations. Take as an example a current property called `is connected to'. In the future, more fine-grained Semantics are needed, so properties like `is bolted to' and `is screwed together with' are created. Existing software continues to work just fine, provided `is bolted to' and `is screwed together with' are defined as subproperties of `is connected to'.

The flexibility comes at the cost of loss of predictability of the physical data format. An RDF file contains a set of triples and the order in which the triples are specified (or nested) does not matter. Though RDF is most often stored in RDF's XML representation format, the many different ways in which you can store the same information makes the use of standard XML tools impractical. RDF tools are needed, which are less widespread than XML; RDF tools are available for most computer languages, but they are not as widespread as those for XML.

RDF provides a lot of flexibility, but at the price of some standard XML tools. The alternatives will have to be weighed for individual cases. A solution that gains some prominence in RDF/XML practice is to provide an RDF mapping [68] for XML-based formats [69], trying to get the best of both worlds. An advantage observed in those efforts is that the more formally and mathematically defined Semantics of base RDF force you to think through the model more rigorously.

Accessing the information: web services

Web services are Internet-accessible applications that can be used directly by other applications; web services use the Internet and XML [58]. Web Services enable the bridging of data islands [70], allowing applications an active role on the Internet. For an introduction, see [71].

As analysed in the beginning of this chapter, using `plain' HTTP access instead of SOAP is advocated. With HTTP, `webifying' is a useful term to indicate the work needed. Webifying data means that every piece of useful data should have a URL. There is no reason to restrict the number of URLs [72]. The success of the World Wide Web is entirely based on assigning a URL to every web page and image and enabling links between them. Every object on the Internet is accessible directly, be it http://www.cnn.com or a page describing the lodging facilities for 2002's ECPPM conference.

Webifying a door catalogue, for example, doesn't mean creating a human-readable page containing pictures and some text listing the available types. Instead it means having your catalogue available at http://company.co.uk/doorcatalog and the parts describing the various individual doors at http://company.co.uk/doorcatalog/door1. This makes it possible to link to a specific door in your catalogue from the project where they want to use your door.

With XML and RDF, we now have the means to access and to provide data using HTTP, instead of just text-oriented HTML pages. For applications that want to use web services to enable their integration in what is sometimes called the `Internet operating system' [73], what is needed is to choose XML/RDF models to support and to design URLs for their information.

Choosing XML/RDF models
Both for information being received by the application and for information provided by the application, a format must be chosen or designed. For this, either XML or RDF should be used, being the preferred standard data formats. For some applications, for instance ifcXML might be a good choice. For others, a self-made Specification format. Such a self-made format should be well-documented and publicly available to enable interoperability.
Designing URLs
This has been covered in this chapter's first section. Take care to make every useful item in the data available using an URL. As a minimum, other applications are then able to download that item's information. More advanced, this opens up the way to delete, append or change the item--from another application.
Reinout van Rees 2006-12-13