Appendix A. Deeper introduction on XML and related technologies

Table of Contents
eXtensible Markup Language (XML) itself
eXtensible Stylesheet Language/Transformation part (XSL/T)
Stylesheet languages for visualisation
XML Namespaces
XML Linking
Cocoon - Apache's xml effort

Abstract

This section elaborates on the introduction given in the section called General working of XML in Chapter 4.

In this section XML (eXtensible Markup Language) and a selection of related technologies is discussed. XML is in effect a standardised way of dealing with data. This makes it possible for a number of additional technologies to surface and blossom together with XML, thereby making XML more powerful and useful. The first section introduces XML as it stands by itself, the rest of the sections deals with all the standards that supplement and enhance XML.

eXtensible Markup Language (XML) itself

XML itself is only the way data is tagged with information about the data. This section describes this central technology, central to the rest of the XML framework.

Introduction to XML

XML stands for eXtensible Markup Language. It is a subset of SGML, the Structured General Markup Language, which was a promising technology, but had a reputation of intense complexity stemming from the enormous levels of customisability and flexibility [St. Laurent, 1999], thus making it too difficult and unattractive to receive a large following. XML is much simpler and smaller, though it allows for flexibility undreamed of for many developers and users. The syntax is quite akin to that of another SGML-descendant, HTML. XML uses an easy to understand HTML-like syntax. For most uses, XML will probably completely replace SGML.

Example A-1. XML syntax example

		
	      <chapter>
	      <title>Coffee</title>
	      <para>
		The delicious aroma surrounding the coffee machine puts
		a smile on <emphasis>many</emphasis> faces. 
	      </para>
	      </chapter>
		
	      

Structure of XML

XML - as a technology - consists of two parts.

Content

The XML file itself, the file containing the markup and the data, as in above example.

Semantics

The XML Schema (newer technology) or Document Type Definition (DTD) (older technology). At the beginning of the XML file, a reference is made to a specific DTD. The file used to make this very document points to the DocBook 3.1 DTD, which specifies tags like <chapter> and <emphasis>. The DTD specifies the allowed tags and their hierarchy. A section2 is only allowed within a section1, for example. Also specified are the allowed attributes like colour="blue" strength="B35".

Additionally, for XML files that aren't meant just for data storage or data exchanges between computers, information on how to visualise the xml data is needed. For these needs, a third part is necessary:

Visualisation

One or more associated visualisation stylesheets. For every XML schema, there should be one or more stylesheets, specifying how to display/print/export/save an XML file associated with the schema. For example, it is possible to use a html-stylesheet with the XML-file which contains this document, a print-stylesheet, etc. This is only needed for XML files that need viewing/printing/etc.

Usage of XML

By use of this threefold model (XML-file, schema, stylesheet), this technology is a good example of the maxim "divide and conquer". The data can be stored neatly, readable and object-wise in a simple text format. The format (specified in the schema) itself is adaptable to whatever need there may be in a well-defined way. Whatever format one chooses, any XML-enabled program can read the information, provided the schema is accessible to that program. To read it in this case means that the program can build a tree-like representation of the data because of the hierarchical nature of XML. To some applications, this is enough to be able to use the data. This is the case when XML is used as a simple way to store information only meant to be read by a specific application which knows what to do with it. For world wide web-like applications, an associated stylesheet is needed. A program which converts XML-documents like this article, which uses the DocBook 3.1 schema, to a printable format will need information which specifies that a <para>...</para> pair indicates a block of text with one cm above and below, margins of 3 cm and with a TimesRoman font. Likewise, with the same data, the same schema and a different stylesheet, another program can easily generate a set of webpages from the same source.

Likewise, a set of XML-files with various schema's and stylesheets can be used to represent a building. Some parts, like an elevator, get their own XML file because it is being supplied by a subcontractor. The file elevator.xml is referenced in the main XML document. It is now possible to read all files and use a stylesheet (which must be supplied with all the schema's) to generate a nice-looking viewable model of the entire building, including a moving elevator. The stylesheets - in this case - must be able to specify the information contained in the XML files in a way suited for VRML (Virtual Reality Modelling Language, a file format for generating three dimensional images and animations). Current W3C (world wide web consortium) research includes XSL/T (eXtendible Stylesheet Language/Transformation part), which allows transformations from one format into another (see the section called eXtensible Stylesheet Language/Transformation part (XSL/T)).

In the same way, a complete list of needed parts can be generated, provided the schema ensures the XML files to contain that information and again provided a usable stylesheet is available.

XML is, in essence, a standardised way to deal with metadata. The schema is the place to specify the metadata, the actual XML file is where the data is placed, using XML's standard way to tag the various parts of information in it with the metadata specified in the schema.