Approaches to Extending the Semantics of a Community's Self-Interested XML Vocabulary

Contents:

Purpose of this Article

The purpose of this article is to document the different approaches for extending the semantics of a Community's tag-set.

Let's start with an example to illustrate what is meant by "extending the semantics of a Community's tag-set."

Example

Community #1 has defined a set of tags for expressing a person's contact information. Here's a sample XML document that shows the Community's XML vocabulary:

<?xml version="1.0" encoding="UTF-8"?>
<Point-of-Contact>
    <Name>John Smith</Name>
    <Address>
        <Street>10 Tremont St.</Street>
        <City>Boston</City>
        <State>MA</State>
    </Address>
    <Telephone>617-123-4567</Telephone>
</Point-of-Contact>

Everyone in Community #1 understands the semantics of these tags, so within their Community they merrily interoperate.

Interoperating With Other Communities

At some point in time, Community #1 recognizes that to grow and thrive they must extend beyond their island and share their information with other Communities and use information from other Communities.

Unfortunately for Community #1, those other Communities use different tags to represent a person's contact information.

Below are five approaches that Community #1 may take to bridge the gap with the other Communities.

1. Out-of-Band Translation

The first approach is for Community #1 to leave their XML documents intact and to bridge the gap by building a translator — for example, an XSLT stylesheet that maps Community #1's tag-set to Community #2's tag-set (and a translator to Community #3, #4, and so forth)

Advantages:

Disadvantages:

2. Mimic the (X)HTML Model for Extending Semantics

The HTML specification says that the class attribute may be used for "general user agent processing". So, by adding class names to elements, authors of an HTML or XHTML document are empowered to expand the semantics of their documents.

Let's see how Community #1 can exploit this idea of using class attributes to extend the semantics of their documents. Suppose that Community #1 knows that some other Communities use the vcard specification for representing a person's contact information. Thus, Community #1 extends the semantics of their XML vocabulary as follows:

<?xml version="1.0" encoding="UTF-8"?>
<Point-of-Contact class="vcard">
    <Name class="fn">John Smith</Name>
    <Address class="adr">
        <Street class="street-address">10 Tremont St.</Street>
        <City class="locality">Boston</City>
        <State class="region">MA</State>
    </Address>
    <Telephone class="tel">617-123-4567</Telephone>
</Point-of-Contact>

Note that a class attribute has been added to each element, and the value of each class is a vcard term.

Now Community #1 can interoperate with any Community that understands vcards. And, of course, within Community #1 they simply ignore the class attributes, since the semantics of the elements are already understood.

The HTML specification also says: "Multiple class names must be separated by white space characters." So, the class attributes can be used in a polymorphic fashion to support other Communities. For example, suppose some other Communities use the EDI terminology to represent a person's contact information. Community #1 can accommodate those Communities as well:

<?xml version="1.0" encoding="UTF-8"?>
<Point-of-Contact class="vcard contact">
    <Name class="fn contact-name">John Smith</Name>
    <Address class="adr location">
        <Street class="street-address mailing-address">10 Tremont St.</Street>
        <City class="locality district">Boston</City>
        <State class="region province">MA</State>
    </Address>
    <Telephone class="tel phone">617-123-4567</Telephone>
</Point-of-Contact>

Note that each class attribute now has two values: a vcard term and an EDI term.

Now Community #1 can interoperate with any Community that understands vcards, as well as any Community that understands EDI. And, of course, within Community #1 they still ignore the class attributes, since the semantics of the elements are already understood.

Additional extensions can be made to the class attributes to support other Communities.

Advantages:

Disadvantages:

3. Universal XML Vocabulary

The third approach is for all the Communities to get together, throw out their existing tag-set, and get everyone to agree to use one, standard, universal tag set.

Advantages:

Disadvantages:

4. Common Data Exchange Format

The fourth approach is for all the Communities to agree to a common data exchange format. Thus, when Community #2 fetches data from Community #1, Community #1 will translate its information into the common data format. Community #2 will translate the data that it receives (which is in the common data format) into its own format. Data exchanges within Community #1 is unchanged.

Advantages:

Disadvantages:

5. Consult an Ontology or Abstract Structure

The fifth approach is to create an ontology or abstract structure (e.g. UML models or XML Schema complexTypes) that specifies the relationships between the "concepts" used by the various Communities. When a Community #1 application processes the XML document that it has just fetched from Community #2, it consults the ontology or abstract structure to determine how to map the Community #2 tags and data into a form that it can understand and process.

Advantages:

Disadvantages:

Acknowledgements

The following people contributed to the creation of this document:

Tags

Last Updated: November 25, 2007