The awesome power of Schematron + XPath 2.0
(Now I can express all my data requirements!)

A few days ago Rick Jelliffe mentioned some of the new capabilities that XPath 2.0 adds to Schematron.

The things that he mentioned sounded very exciting to me, so I put together what is for me a typical set of data requirements. I then implemented those data requirement using Schematron+XPath 2.0. Then, for comparison, I attempted to implement the same data requirements using XML Schemas.

It was a very enlightening experience. Schematron+XPath 2.0 was able to implement all of my data requirements (including all grammar constraints). Conversely, XML Schemas was only able to implement the grammar constraints (which are actually of lesser importance to me than my other data requirements).

Of course, this represents only one example; other examples must be explored. Nonetheless, the fact that Schematron+XPath 2.0 could implement all of my (fairly extensive) data requirements is very exciting.

Below is my set of data requirements followed by the Schematron+XPath 2.0 implementation, as well as the XML Schema implementation. Perhaps you have similar data requirements?

Highlights of What I Discovered

Schematron+XPath 2.0 was able to express:

Conversely, XML Schemas was only able to express the grammar constraints (data requirements, #1 and #8). It was unable to express the other data requirements (#2 - #7).

Sample XML Instance Document (i.e. Sample Data)

<?xml version="1.0" encoding="UTF-8"?>
<Document classification="secret">
    <NumParas>4</NumParas>
    <Para classification="unclassified">
          One if by land, two if by sea;
    </Para>
    <Para classification="confidential">
          And I on the opposite shore will be,
          Ready to ride and spread the alarm
    </Para>
    <Para classification="unclassified">
          Ready to ride and spread the alarm
          Through every Middlesex, village and farm,
    </Para>
    <Para classification="secret">
          For the country folk to be up and to arm.
    </Para>
    <Hash>304</Hash>
</Document>

Here's the sample xml instance document.

Data Requirements

  1. Document Organization

    1. The document is comprised of one or more paragraphs.
    2. Each paragraph is labeled with a classification, which can be one of top-secret, secret, confidential, or unclassified.
    3. A paragraph's text must not exceed 200 characters in length, and shall be comprised of only these characters: a-z, A-Z, 0-9, whitespace, comma, period, colon, semi-colon.
    4. The document has an overall classification, which can also be one of top-secret, secret, confidential, or unclassified.
    5. The information in the document may be ordered in any way the author sees fit.
  2. Security Classification Policy

    1. No paragraph may have a classification higher than the overall document classification.
  3. Reserved Word Filter

    1. No paragraph may contain these reserved words: SCRIPT, FUNCTION.
  4. Data Integrity Checks

    1. The document must contain a count of the number of paragraphs in the document, and that count must match the actual number of paragraphs.
    2. The document must contain a hashcode, and that hashcode must match the hash of the document.
  5. Accreditation

    1. For accreditation purposes an implementation of any one of these requirements must reference the specific requirement that it is implementing.
  6. Future Requirements

    1. Additional future requirements must be backward and forward compatible.
  7. Validation in Stages

    1. It must be possible to validate the data in stages, e.g. check the data against the security policy and only perform the other checks if it succeeds.
  8. XML Grammar

    1. The root element is <Document>.
    2. <Document> has one attribute, classification, whose value can be one of top-secret, secret, confidential, or unclassified.
    3. <Document> is comprised of one <NumParas>, one or more <Para>, and one <Hash>.
    4. These child elements may occur in any order.
    5. Each <Para> has one attribute, classification, whose value can be one of top-secret, secret, confidential, or unclassified.
    6. The value of each <Para> is a string, constrained to a maximum of 200 characters, comprised of only these characters: a-z, A-Z, 0-9, whitespace, comma, period, colon, semi-colon.
    7. The value of <NumParas> is a nonNegativeInteger.
    8. The value of <Hash> is a long.

Schematron + XPath 2.0 Implementation

Here's the Schematron + XPath 2.0 implementation.

Notes regarding the use of Schematron with XPath 2.0

XML Schema Implementation

Here's the XML Schema implementation.

Acknowledgements

I would like to gratefully thank the following people for their help with creating my example and with helping me to realize the power of Schematron + XPath 2.0:

Tags

Last Updated: October 26, 2007