an XML Collection...

Implementing Substitution Group Element Hierarchies
(A Collectively Developed Set of Schema Design Guidelines)

Table of Contents

Issue

How can we give a hierarchy to elements in a substitution group?

Introduction

Note: please read the variable content container section PRIOR to reading this.

In the variable content container section we saw several methods for implementing variable content containers. One method was to create an abstract element, and then create elements that are substitutable for it. For example:

<xsd:element name="Publication" abstract="true" 
             type="PublicationType"/>

<xsd:element name="Book" substitutionGroup="Publication" 
             type="BookType"/>
<xsd:element name="Magazine" substitutionGroup="Publication" 
             type="MagazineType"/>
where BookType and MagazineType both derive from PublicationType:
            PublicationType
              /         \
          BookType   MagazineType
The variable content container then is declared to contain the abstract element:
<xsd:element name="Catalogue">
    <xsd:complexType>
        <xsd:sequence>
            <xsd:element ref="Publication" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>
Thus, the contents of <Catalogue> can be any element that is in the substitution group with Publication. Here's a sample instance document:
<Catalogue>
    <Book>...</Book>
    <Magazine>...</Magazine>
    <Book>...</Book>
</Catalogue>
Note that an alternative design is to pull out the anonymous complexType to create a named type:
<xsd:complexType name="PublicationContainer">
    <xsd:sequence>
        <xsd:element ref="Publication" maxOccurs="unbounded"/>
    </xsd:sequence>
</xsd:complexType>

<xsd:element name="Catalogue" type="PublicationContainer"/>
Now suppose that we want to declare an element just to hold <Book> elements. Here's what we could do:
<xsd:element name="BookCatalogue">
    <xsd:complexType>
        <xsd:sequence>
            <xsd:element ref="Book" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>
Alternatively, we can again pull out the anonymous complexType to create a named type:
<xsd:complexType name="BookContainer">
    <xsd:sequence>
        <xsd:element ref="Book" maxOccurs="unbounded"/>
    </xsd:sequence>
</xsd:complexType>

<xsd:element name="BookCatalogue" type="BookContainer"/>
Likewise, if we want to declare an element just to hold <Magazine> element we would have a similar set of declarations:
<xsd:complexType name="MagazineContainer">
    <xsd:sequence>
        <xsd:element ref="Magazine" maxOccurs="unbounded"/>
    </xsd:sequence>
</xsd:complexType>

<xsd:element name="MagazineCatalogue" type="MagazineContainer"/>
Let's summarize to see what we have.

We have the original type hierarchy:

            PublicationType
              /         \
          BookType   MagazineType
We have a substitution group comprised of three elements:
     {Publication, Book, Magazine}
And we have three types which contain the substitution group elements:
     PublicationContainer, BookContainer, MagazineContainer
Note that these "container types" are unrelated, i.e., one does not derive from another.

Adding Relationships between the Container Types

As we know, there are benefits to creating type hierarchies. Namely, we can declare an element to be of the hierarchy's root type and then the content of the element can be substituted by any derived type (due to the principle of type substitution).

Thus, it would be beneficial if could create a type hierarchy for the above container types. In fact we can:

            PublicationContainer
              /              \
       BookContainer    MagazineContainer
where BookContainer and MagazineContainer derive-by-restriction from PublicationContainer:
<xsd:complexType name="BookContainer">
    <xsd:complexContent>
        <xsd:restriction base="PublicationContainer">
            <xsd:sequence>
                <xsd:element ref="Book" maxOccurs="unbounded"/>
            </xsd:sequence>
        </xsd:restriction>
    </xsd:complexContent>
</xsd:complexType>

<xsd:complexType name="MagazineContainer">
    <xsd:complexContent>
        <xsd:restriction base="PublicationContainer">
            <xsd:sequence>
                <xsd:element ref="Magazine" maxOccurs="unbounded"/>
            </xsd:sequence>
        </xsd:restriction>
    </xsd:complexContent>
</xsd:complexType>
You may ask, "I thought that when you do derive-by-restriction the restricted type (e.g., BookContainer) must repeat everything in the base type (PublicationContainer). The base type contains an element Publication, whereas the restricted type contains, for example, an element Book. Is this allowed?"

The answer is "yes". Think of a substitution group as a choice. Thus, in PublicationContainer there is a choice of either Publication, Book, or Magazine. In the BookContainer we are simply restricting the choice to just Book. Likewise in MagazineContainer we are simply restricting the choice to just Magazine.

In a real sense, what we have accomplished by embedding the substitution group elements each within their own type and then setting up a hierarchy among the types is: a hierarchy among the substitution group elements:

            Publication
              /     \
            Book  Magazine
Previously the substitution group elements were flat. That is, there was no hierarchy among them. Now they have a hierarchy.

Summary

Let' recap what we've discussed: First declare the abstract element and its substitution group elements:
<xsd:element name="Publication" abstract="true" 
             type="PublicationType"/>

<xsd:element name="Book" substitutionGroup="Publication" 
             type="BookType"/>
<xsd:element name="Magazine" substitutionGroup="Publication" 
             type="MagazineType"/>
Next, declare a container type for each element, and have the container type holding the head element be the root of the type hierarchy:
<xsd:complexType name="PublicationContainer">
    <xsd:sequence>
        <xsd:element ref="Publication" maxOccurs="unbounded"/>
    </xsd:sequence>
</xsd:complexType>
 
<xsd:complexType name="BookContainer">
    <xsd:complexContent>
        <xsd:restriction base="PublicationContainer">
            <xsd:sequence>
                <xsd:element ref="Book" maxOccurs="unbounded"/>
            </xsd:sequence>
        </xsd:restriction>
    </xsd:complexContent>
</xsd:complexType>

<xsd:complexType name="MagazineContainer">
    <xsd:complexContent>
        <xsd:restriction base="PublicationContainer">
            <xsd:sequence>
                <xsd:element ref="Magazine" maxOccurs="unbounded"/>
            </xsd:sequence>
        </xsd:restriction>
    </xsd:complexContent>
</xsd:complexType>
Lastly, declare <Catalogue> to be of type PublicationContainer:
<xsd:element name="Catalogue" type="PublicationContainer"/>
Here's a sample instance document:
<Catalogue>
    <Book>...</Book>
    <Magazine>...</Magazine>
    <Book>...</Book>
</Catalogue>
PublicationContainer contains an abstract element (Publication), so <Catalogue> must only contain elements that are in the substitution group with Publication, as we have shown here.

Because of the principle of type substitutability we can alternatively substititute the PublicationContainer type with a derived type. For example:

<Catalogue xsi:type="BookContainer">
    <Book>...</Book>
    <Book>...</Book>
    <Book>...</Book>
</Catalogue>

Best Practice

This approach gives us the advantages of both element substitutability and type substitutability. In our example we saw that the contents of <Catalogue> can be any element in the substitution group with Publication. We saw that we can alternatively restrict the contents of <Catalogue> to a particular substitution group element.

Acknowledgements

Special thanks to Jeni Tennison, Simon Cox, and Eddie Robertsson for bringing this issue to my attention, and enlightening me on its practice. Thanks!