XML Design for Diverse Data

Issue: What is the best way to design an XML instance document that is to contain a collection of diverse data?

Example

I want to create an XML instance document containing information about a camera I recently purchased.

The camera is a hybrid: it has a Nikon body, an Olympus lens, and a Pentax manual adaptor. Nikon provides this information about the body: weight and description. Olympus provides this information about the lens: zoom and f-stop. And Pentax provides this information about the manual adaptor: speed.

Thus my instance document is to be comprised of a diverse collection of information: basic camera information (date of purchase and warranty), the Nikon body information, the Olympus lens information, and the Pentax manual adaptor information.

What's the best way to design this instance document? The following sections explore various designs. At the end I make a recommendation.

Design #1: Use XML Schema Imports

Create an XML Schema for the basic camera information, and set it to import the Nikon, Olympus, and Pentax XML Schemas.

Camera.xsd imports Nikon.xsd, Olympus.xsd, and Pentax.xsd

The instance document conforms to the camera schema, along with its imported schemas.

MyCamera.xml conforms to Camera.xsd

Here are the Design #1 files.

Design #2: Use the XML Schema <any/> Element

Create an XML Schema for the basic camera information. Don't import the other schemas. Instead, make the camera schema extensible using the <any/> element.

Camera.xsd, Nikon.xsd, Olympus.xsd, and Pentax.xsd are independent

The instance document fills in the open area created by the <any/> element with the Nikon, Olympus, and Pentax information.

MyCamera.xml conforms to Camera.xsd

Here are the Design #2 files.

Design #3: Use NVDL to Validate the Compound Document

Create an XML Schema for the basic camera information. Don't import the other schemas. And don't make the camera schema extensible. In the instance document assemble all the desired information - the basic camera information, the Nikon information, the Olympus information, and the Pentax information. Use NVDL to map each part of this compound document to the appropriate schema and to the XML Schema validator.

Camera.nvdl maps each part of the compound document to the appropriate schema

Here are the Design #3 files.

Design #4: Use Different Types of Schemas (XML Schema, Relax NG)

In the prior designs all the XML grammars were expressed using XML Schemas. In this design the Olympus schema is expressed using a Relax NG schema. Proceed as with design #3. In the instance document assemble all the information - the basic camera information, the Nikon information, the Olympus information, and the Pentax information. NVDL maps each part of the compound document to the appropriate schema and to the appropriate validator (XML Schema validator, or Relax NG validator).

MyCamera.nvdl maps each part of the compound document to the appropriate schema and the appropriate validator

Here are the Design #4 files.

Design #5: Use Grammar-based and Rule-based Validation

Olympus has a Relax NG schema to express grammar contraints, and a Schematron schema to express a co-constraint between the lens size and the f-stop. The other schemas just express grammar constraints and use XML Schema. Use NVDL to map to the XML Schemas, Relax NG schema, and the Schematron schema.

MyCamera.nvdl maps each part of the compound document to the appropriate schema and the appropriate validators

Here are the Design #5 files.

Design #6: Use Linking and Assembly of Parts

Modify the camera schema to incorporate optional hyperlinks. In the instance document the Nokia, Olympus, and Pentax information can be either linked to, or embedded within the instance document. NVDL maps each part to the appropriate schemas.

The instance document has a hyperlink; in addition, it has parts that are matched up via NVDL

Here are the Design #6 files.

Recommendation

With each successive design there is increasing flexibility and robustness, culminating with design #6.

Design #6 has these features:

Loose coupling, unlimited assembly of parts, unconstrained set of schema languages, independent schemas that are easier to develop and maintain — this is a powerful design. I recommend design #6.

Last updated: April 22, 2008, by Roger L. Costello.