Two Types of XML Schema Language
  • Roger L. Costello and
  • Robin A. Simmons
  1. There are two types of XML schema language: grammar-based and rule-based.
  2. For specifying structure, form, and syntax, use a grammar based language.
  3. For expressing data relationships, such as operational and business rules, use a rule-based language.
  4. DTD, XML Schema, and Relax NG are grammar based schema languages.
  5. Schematron is a rule based schema language.
  6. No current schema language supports both grammar- and rule based validation.

XML schema languages come in two types:

  1. Grammar-based schema languages
  2. Rule-based schema languages

This paper discusses which schema language is appropriate for a specific purpose.

Definition: A grammar-based schema language specifies the structure and contents of elements and attributes in an XML instance document. For example, a grammar-based schema language can specify the presence and order of elements in an XML instance document, the number of occurrences of each element, and the contents and datatype of each element and attribute.

Definition: A rule-based schema language specifies the relationships that must hold between the elements and attributes in an XML instance document. For example, a rule-based schema language can specify that the value of certain elements must conform to a rule or algorithm.

DTD , XML Schema, and Relax NG are grammar-based schema languages. Schematron is a rule-based schema language.

Example: The following XML instance document contains a classification attribute on the <Document> element and on the <Para> element:

  1. <?xml version="1.0"?>
  2. <Document classification="secret">
  3. <Para classification="unclassified">
  4. One if by land; two if by sea.
  5. </Para>
  6. </Document>

A grammar-based schema language can be used to specify, for example:

  1. The <Document> element must have a classification attribute, whose value is top-secret, secret, confidential, or unclassified.
  2. The <Document> element must contain one or more <Para> elements.
  3. The <Para> element must have a classification attribute, whose value is top-secret, secret, confidential, or unclassified.
  4. The <Para> element contains text.

A rule-based schema language can be used to specify, for example:

  1. The <Para> classification value cannot be more sensitive than the <Document> classification value.
  2. The <Para> element cannot contain any restricted keywords, and the list of restricted keywords is found in a separate file.

The above examples illustrate that a grammar-based schema specifies syntax and a rule-based schema specifies business rules on the data. Neither schema specifies semantics.

Recommendation:

To constrain the structure, form, or syntax of XML instance documents, use a grammar-based schema language.

To specify data relationships, use a rule-based schema language

In some cases, both types of schema language might be necessary.


Recap: