Extending XML Schemas

(A Collectively Developed Set of Schema Design Guidelines)

  XML Schemas: Best Practices     Default Namespace - targetNamespace or XMLSchema?     Hide (Localize) Versus Expose     Element versus Type  
  Global versus Local     Zero, One, or Many Namespaces     Variable Content Containers     Creating Extensible Content Models  

Table of Contents

Issue

What is Best Practice of checking instance documents for constraints that are not expressable by XML Schemas?

Tutorial

This document contains an overview of the topic of extending XML Schemas. For a much more in-depth, hands-on tutorial on extending XML Schemas please see the Best Practices Homepage. The tutorial contains fully worked examples, labs, and a (Powerpoint) tutorial.

Introduction

XML Schemas is very powerful. However, it is not "all powerful". There are many constraints which cannot be expressed with XML Schemas.

Example. Consider this simple instance document:

<?xml version="1.0"?>
<Demo xmlns="http://www.demo.org" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.demo.org demo.xsd">
    <A>10</A>
    <B>20</B>
</Demo>
With XML Schemas we can check the following constraints: In fact, here's an XML Schema which expresses these constraints:
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            targetNamespace="http://www.demo.org"
            xmlns="http://www.demo.org"
            elementFormDefault="qualified">
    <xsd:element name="Demo">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="A" type="xsd:integer"/>
                <xsd:element name="B" type="xsd:integer"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>
XML Schemas does not give us the capability to express the following constraint: So what do we do to check this constraint? (Interestingly, for the above instance document, the XML Schema that is shown would accept it as valid, whereas, in fact it is not since the value of A is less than the value of B. We need something else to check this constraint.) There are three options.

Three Options for Extending XML Schemas

(1) Supplement with Another Schema Language

There are many other schema languages besides XML Schemas:
    - Schematron
    - TREX
    - RELAX
    - SOX
    - XDR
    - HOOK
    - DSD
    - Assertion Grammars
    - xlinkit
Thus, the first option is to use one (or more) of these schema languages to express the additional constraints.

For example, using Schematron you can embed the additional constraints within the XSD document (within <appinfo> elements). The XSD document shown earlier has been enhanced (below) with Schematron directives:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
        targetNamespace="http://www.demo.org"
        xmlns="http://www.demo.org" 
        xmlns:sch="http://www.ascc.net/xml/schematron"
        elementFormDefault="qualified">
    <xsd:annotation>
        <xsd:appinfo>
            <sch:title>Schematron validation</sch:title>
            <sch:ns prefix="d" uri="http://www.demo.org"/>
        </xsd:appinfo>
    </xsd:annotation>
    <xsd:element name="Demo">
        <xsd:annotation>
            <xsd:appinfo>
                <sch:pattern name="Check A greater than B">
                    <sch:rule context="d:Demo">
                        <sch:assert test="d:A > d:B" 
                                    diagnostics="lessThan">
                                A should be greater than B.
                        </sch:assert>
                    </sch:rule>
                </sch:pattern>
                <sch:diagnostics>
                    <sch:diagnostic id="lessThan">
                        Error! A is less than B 
                        A = <sch:value-of select="d:A"/>
                        B = <sch:value-of select="d:B"/>
                    </sch:diagnostic>
                </sch:diagnostics>
            </xsd:appinfo>
        </xsd:annotation>
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="A" type="xsd:integer"/>
                <xsd:element name="B" type="xsd:integer"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>
Schematron will extract the directives out of the XSD document to create a Schematron schema. Schematron will then validate the instance document against the Schematron schema.

The key points to note about using Schematron are:

(2) Write Code to Express Additional Constraints

The second option is to write some Java, Perl, C++, etc code to check additional constraints.

(3) Express Additional Constraints with an XSLT/XPath Stylesheet

The third option is to write a stylesheet to check the constraints.

For example, the following stylesheet checks instance documents to see if the contents of the A element is greater than the contents of the B element:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:d="http://www.demo.org"
                version="1.0">

    <xsl:output method="text"/>

    <xsl:template match="/">
        <xsl:if test="/d:Demo/d:A < /d:Demo/d:B">
            <xsl:text>Error! A is less than B</xsl:text>
            <xsl:text>A = </xsl:text>
            <xsl:text>B = </xsl:text><xsl:value-of select="/d:Demo/d:B"/>
        </xsl:if>   
        <xsl:if test="/d:Demo/d:A >= /d:Demo/d:B">
            <xsl:text>Instance document is valid</xsl:text>
        </xsl:if>   
    </xsl:template>
</xsl:stylesheet>
Upon running this stylesheet on the above XML data the following output is generated:
Error! A is less than B
A = 10
B = 20
This is exactly what is desired.

Thus, the methodology for this third option is:

- check as many constraints as you can using XML Schemas
- for all other constraints write a stylesheet to do the checking

If both the schema validator and the XSL processor generate a positive output then you know that your instance document is valid.

This combination of XML Schemas plus stylesheets provides for a powerful constraint checking mechanism.

Advantages/Disadvantages of the Three Options

(1) Supplement with Another Schema Language

Advantages
Disadvantages

(2) Write Code to Express Additional Constraints

Advantages
Disadvantages

(3) Express Additional Constraints with an XSLT/XPath Stylesheet

Advantages
Disadvantages