Creating Extensible Content Models

(A Collectively Developed Set of Schema Design Guidelines)

  XML Schemas: Best Practices     Default Namespace - targetNamespace or XMLSchema?     Hide (Localize) Versus Expose     Element versus Type  
  Global versus Local     Zero, One, or Many Namespaces     Variable Content Containers     Extending XML Schemas  

Table of Contents

Issue

What is Best Practice for creating extensible content models?

Definition

An element has an extensible content model if in instance documents the authors can extend the contents of that element with additional elements beyond was specified by the schema.

Introduction

Consider this schema snippet:
<xsd:element name="BookCatalogue">
    <xsd:complexType>
         <xsd:sequence>
             <xsd:element name="Book" maxOccurs="unbounded">
                 <xsd:complexType>
                     <xsd:sequence>
                         <xsd:element name="Title" type="xsd:string"/>
                         <xsd:element name="Author" type="xsd:string"/>
                         <xsd:element name="Date" type="xsd:gYear"/>
                         <xsd:element name="ISBN" type="xsd:string"/>
                         <xsd:element name="Publisher" type="xsd:string"/>
                     </xsd:sequence>
                 </xsd:complexType>
             </xsd:element>
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>
This schema snippet dictates that in instance documents the <Book> elements must always be comprised of exactly 5 elements <Title>, <Author>, <Date>, <ISBN>, and <Publisher>. For example:
<Book>
     <Title>The First and Last Freedom</Title>
     <Author>J. Krishnamurti</Author>
     <Date>1954</Date>
     <ISBN>0-06-064831-7</ISBN>
     <Publisher>Harper & Row</Publisher>
</Book>
The schema specifies a fixed/static content model for the Book element. Book element's content must rigidly conform to just the schema specification. Sometimes this rigidity is a good thing. Sometimes we want to give our instance documents more flexibility.

How do we design the schema so that Book's content model is extensible? Below are two methods for implementing extensible content models.

Extensibility via Type Substitution

Consider this version of the above schema, where Book's content model has been defined using a type definition:
<xsd:complexType name="BookType">
    <xsd:sequence>
        <xsd:element name="Title" type="xsd:string"/>
        <xsd:element name="Author" type="xsd:string"/>
        <xsd:element name="Date" type="xsd:string"/>
        <xsd:element name="ISBN" type="xsd:string"/>
        <xsd:element name="Publisher" type="xsd:string" />
    </xsd:sequence>
</xsd:complexType>
<xsd:element name="BookCatalogue">
    <xsd:complexType>
        <xsd:sequence>
            <xsd:element name="Book" type="BookType"
                         maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>
Recall that via the mechanism of type substitutability, the contents of <Book> can be substituted by any type that derives from BookType. For example, if a type is created which derives from BookType:
<xsd:complexType name="BookTypePlusReviewer">
    <xsd:complexContent>
        <xsd:extension base="BookType" >
            <xsd:sequence>
                <xsd:element name="Reviewer" type="xsd:string"/>
            </xsd:sequence>
        </xsd:extension>
    </xsd:complexContent>
</xsd:complexType>
then instance documents can create a <Book> element that contains a <Reviewer> element, along with the other five elements:
<Book xsi:type="BookTypePlusReviewer">
     <Title>My Life and Times</Title>
     <Author>Paul McCartney</Author>
     <Date>1998</Date>
     <ISBN>94303-12021-43892</ISBN>
     <Publisher>McMillin Publishing</Publisher>
     <Reviewer>Roger Costello</Reviewer>
</Book>
In this example, BookTypePlusReviewer has been defined within the same schema as BookType. In general, however, this may not be the case. Other schemas can import/include the BookCatalogue schema and define types which derive from BookType. Thus, the contents of Book may be extended, without modifying the BookCatalogue schema!

This type substitutability mechanism is a powerful extensibility mechanism. However, it suffers from two problems:

Disadvantages

It would be nice if there was a way to explicitly flag places where extensibility may occur: "hey, instance documents may extend <Book> at this point, so be sure to write your code taking this possibility into account." In addition, it would be nice if we could extend Book's content model at locations other than just the end ... The <any> element gives us these capabilities beautifully, as is discussed in the next section.

Extensibility via the <any> Element

An <any> element may be inserted into a content model to enable instance documents to contain additional elements. Here's an example showing an <any> element at the end of Book's content model:
<xsd:element name="BookCatalogue">
    <xsd:complexType>
        <xsd:sequence>
            <xsd:element name="Book" maxOccurs="unbounded">
                <xsd:complexType>
                    <xsd:sequence>
                        <xsd:element name="Title" 
                                     type="xsd:string"/>
                        <xsd:element name="Author" 
                                     type="xsd:string"/>
                        <xsd:element name="Date" 
                                     type="xsd:string"/>
                        <xsd:element name="ISBN" 
                                     type="xsd:string"/>
                        <xsd:element name="Publisher" 
                                     type="xsd:string"/>
                        <xsd:any namespace="##any" minOccurs="0"/>
                    </xsd:sequence>
                </xsd:complexType>
            </xsd:element>
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>
In this version of the schema it has been explicitly specified that after the <Publication> element any well-formed XML element may occur and that XML element may come from any namespace.

For example, suppose that the instance document author discovers a schema, containing a declaration for a Reviewer element:

<xsd:element name="Reviewer">
    <xsd:complexType>
         <xsd:sequence>
             <xsd:element name="Name">
                 <xsd:complexType>
                     <xsd:sequence>
                         <xsd:element name="First" 
                                      type="xsd:string"/>
                         <xsd:element name="Last" 
                                      type="xsd:string"/>
                     </xsd:sequence>
                 </xsd:complexType>
             </xsd:element>
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>
And suppose that for an instance document author it is important that, in addition to specifying the Title, Author, Date, ISBN, and Publisher of each Book, he/she specify a reviewer. Because the schema has been designed with extensibility in mind, the instance document author can use the Reviewer element in his/her BookCatalogue:

<BookCatalogue xmlns="http://www.publishing.org"
               xmlns:rev="http://www.PublishingCompany.org"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xsi:schemaLocation=
                           "http://www.publishing.org
                            BookCatalogue.xsd
                            http://www.PublishingCompany.org
                            PublishingCompany.xsd">
    <Book>
        <Title>My Life and Times
        <Author>Paul McCartney
        <Date>1998
        <ISBN>94303-12021-43892
        <Publisher>McMillin Publishing
        <rev:Reviewer>
            <rev:Name>
                <rev:First>Roger
                <rev:Last>Costello
            </rev:Name>
       </rev:Reviewer>
    </Book>
</BookCatalogue>
The instance document author has enhanced the instance document with an element that the schema designer may have never even envisioned. We have empowered the instance author with a great deal of flexibility in creating the instance document. Wow!

An alternate schema design is to create a BookType (as we did above) and embed the <any> element within the BookType:

<xsd:complexType name="BookType">
   <xsd:sequence>
       <xsd:element name="Title" type="xsd:string"/>
       <xsd:element name="Author" type="xsd:string"/>
       <xsd:element name="Date" type="xsd:gYear"/>
       <xsd:element name="ISBN" type="xsd:string"/>
       <xsd:element name="Publisher" type="xsd:string"/>
       <xsd:any namespace="##any" minOccurs="0"/>
   </xsd:sequence>
</xsd:complexType>
and then declare Book of type BookType:
<xsd:element name="Book" type="BookType" maxOccurs="unbounded"/>
However, then we are then back to the "unexpected extensibility" problem. Namely, after the <Publication> element any well-formed XML element may occur, and after that anything could be present.

There is a way to control the extensibility and still use a type. We can add a block attribute to Book:

<xsd:element name="Book" type="BookType" block="#all" 
             maxOccurs="unbounded"/>
The block attribute prohibits derived types from being used in Book's content model. Thus, by this method we have created a reusable component (BookType), and yet we still have control over the extensibility.

With the <any> element we have complete control over where, and how much extensibility we want to allow. For example, suppose that we want to enable there to be at most two new elements at the top of Book's content model. Here's how to specify that using the <any> element:

<xsd:complexType name="BookType">
   <xsd:sequence>
       <xsd:any namespace="##any" minOccurs="0" maxOccurs="2"/>
       <xsd:element name="Title" type="xsd:string"/>
       <xsd:element name="Author" type="xsd:string"/>
       <xsd:element name="Date" type="xsd:gYear"/>
       <xsd:element name="ISBN" type="xsd:string"/>
       <xsd:element name="Publisher" type="xsd:string"/>
   </xsd:sequence>
</xsd:complexType>
Note how the <any> element has been placed at the top of the content model, and it has set maxOccurs="2". Thus, in instance documents the <Book> content will always end with <Title>, <Author>, <Date>, <ISBN>, and <Publisher>. Prior to that, two well-formed XML elements may occur.

In summary:

Best Practice

The <any> element is an enabling technology. It turns instance documents from static/rigid structures into rich, dynamic, flexible data objects. It shifts focus from the schema designer to the instance document author in terms of defining what data makes sense. It empowers instance document authors with the ability to decide what data makes sense to him/her.

As a schema designer you need to recognize your limitations. You have no way of anticipating all the varieties of data that an instance document author might need in creating an instance document. Be smart enough to know that you're not smart enough to anticipate all possible needs! Design your schemas with flexibility built-in.

Definition: an open content schema is one that allows instance documents to contain additional elements beyond what is declared in the schema. As we have seen, this may be achieved by using the <any> (and <anyAttribute>) element in the schema.

Sprinkling <any> and <anyAttribute> elements liberally throughout your schema will yield benefits in terms of how evolvable your schema is:

Enabling Schema Evolution using Open Content Schemas

In today's rapidly changing market static schemas will be less commonplace, as the market pushes schemas to quickly support new capabilities. For example, consider the cellphone industry. Clearly, this is a rapidly evolving market. Any schema that the cellphone community creates will soon become obsolete as hardware/software changes extend the cellphone capabilities. For the cellphone community rapid evolution of a cellphone schema is not just a nicety, the market demands it!

Suppose that the cellphone community gets together and creates a schema, cellphone.xsd. Imagine that every week NOKIA sends out to the various vendors an instance document (conforming to cellphone.xsd), detailing its current product set. Now suppose that a few months after cellphone.xsd is agreed upon NOKIA makes some breakthroughs in their cellphones - they create new memory, call, and display features, none of which are supported by cellphone.xsd. To gain a market advantage NOKIA will want to get information about these new capabilities to its vendors ASAP. Further, they will have little motivation to wait for the next meeting of the cellphone community to consider upgrades to cellphone.xsd. They need results NOW. How does open content help? That is described next.

Suppose that the cellphone schema is declared "open". Immediately NOKIA can extend its instance documents to incorporate data about the new features. How does this change impact the vendor applications that receive the instance documents? The answer is - not at all. In the worst case, the vendor's application will simply skip over the new elements. More likely, however, the vendors are showing the cellphone features in a list box and these new features will be automatically captured with the other features. Let's stop and think about what has been just described Without modifying the cellphone schema and without touching the vendor's applications, information about the new NOKIA features has been instantly disseminated to the marketplace! Open content in the cellphone schema is the enabler for this rapid dissemination.

Clearly some types of instance document extensions may require modification to the vendor's applications. Recognize, however, that thevendors are free to upgrade their applications in their own time. The applications do not need to be upgraded before changes can be introduced into instance documents. At the very worst, the vendor's applications will simply skip over the extensions. And, of course, those vendors do not need to upgrade in lock-step

To wrap up this example suppose that several months later the cellphone community reconvenes to discuss enhancements to the schema. The new features that NOKIA first introduced into the marketplace are then officially added into the schema. Thus completes the cycle. Changes to the instance documents have driven the evolution of the schema.