an XML Collection...

HTTP Extensions for Resource Metadata (HERM)
Last Updated: February 17, 2003

This work was developed collaboratively by The MITRE Corporation and members of the distributed-registry list group and made freely available in the public interest. Any use of this material must acknowledge The MITRE Corporation and the distributed-registry list group.

Contents:

Purpose - What We Prescribe And What We Don't Prescribe

Join the Discussion!

Design Goals

Evolving Today's HTML-based Web Services

Proposed Architecture

Conclusions

Contributors

FAQ

Purpose - What We Prescribe And What We Don't Prescribe

We DO prescribe how meta information about web resources (e.g., web services) should be discovered and published!

We do NOT prescribe what technology you should use to build your web services (e.g., REST, SOAP, RPC, etc)

We do NOT prescribe what technology you should use to encode meta information about your web services (WSDL, RDF, DAML, OWL, XHTML, RDDL, WSIL, etc)

Join the Discussion!

This architecture is being developed collaboratively by the Internet community on the distributed-registry list.

To subscribe to the list send an empty email message to: distributed-registry-subscribe@yahoogroups.com

To unsubscribe send an empty email message to: distributed-registry-unsubscribe@yahoogroups.com

To view the list archives go to: http://groups.yahoo.com/group/distributed-registry/

All those who contribute will be acknowledged (see the list of Contributors)


Design Goals

  1. There are often multiple resources that provide similar information or services. How does a client decide which resource to utilize? This technology must faciliate matchmakers in establishing resource reputations.

    There are currently two successful reputation systems on the Internet:

    • The Google style PageRank[1] which uses the number of references to a URL to help gauge the reputation of the service.
    • The EBay like system which allows customers to provide direct feedback into a reputation system.

  2. Third parties should be able to add value to a resource description with their own description.

    For example, anyone should be able to publish metadata about any Web service by creating a description document about it.

  3. Ratings and "seals of approval" may be applied to third party resource descriptions.

  4. Clients must be able to find the resources that meet their needs.

    For static web pages, this is currently solved by keyword searches through a search engine. Solutions for web services can range from simple keyword matching against text descriptions, like with static pages, to semantic understanding of an XML vocabulary.

  5. Publication and retrieval of resource metadata should be independent of what is behind the resource (HTML, SOAP, REST, etc).

    This is where the majority of focus on generating standards has been to date. The solution must support this ever widening field of resource types.

  6. Resources may be described in many ways - using XHTML, WSDL, RDF, DAML, OWL, etc. The architecture must support these various formats. Additionally, the architecture must support the meta-meta service description formats - RDDL, WSIL, tModels, etc.

    The form of documents used to describe a resource will vary. The descriptions vary not just because the types of resources are different, but also because the providers of those resources will be targeting different audiences with different capabilities and different needs. The architecture must support this variation in how Web resources are described.

  7. Distributed maintenance: Each resource provider is responsible for maintaining and publishing its resource.

  8. There should be a low barrier to entry to using this technology.

Evolving Today's HTML-based Web Services

Consider what is involved in setting up a Web site today. I will take my Web site as an example. Creating my Web site was really straightforward - I created an HTML page that contained the content, and in the header section of the HTML I added a tag to provide subject and category metadata. In a short period of time many different search engines had indexed and categorized my Web site. Further, over time other Web sites have linked to my Web site.

Note: After creating my Web site I did not then "register" it (anywhere). The search engines found my Web site. I did not have to go out and find the search engines. This is a good thing. The responsibility is on the search engines to keep track of the Web sites that are out there. The responsibility is not on individual Web site developers to keep track of what search engines are out there.

My Web site is a Web service, in its simplest form. It is an HTML-based Web service. It provides an information service to Web clients. My service has metadata.

Today's Web service architectural model has three key ingredients - content, metadata, and search engines (matchmakers). This architectural model has many wonderful properties:

  1. It is massively scalable.
  2. It is a completely distributed architecture.
  3. It is a lightweight architecture - just build a Web service and you're done.
  4. Standard Web components are used - URIs to identify a Web service, the HTTP verbs (GET, PUT, POST, DELETE) to access and manipulate the service.
  5. Tremendous interconnectedness - others link to my Web service and I can link to other Web services. Recall Metcalf's Law which states that the value of a networked system is related exponentially to the amount of interconnections.
  6. A wonderful side-effect of the architecture is a whole search engine cottage industry has been created. This is because the Web services are decoupled from the search engines.

Since the current Web architecture has been so successful, it seems reasonable to use that architecture at least as the starting point for the next generation of Web services. Here's how the next generation Web services would operate if we simply apply the current approach:

A client invoking a next generation Web service will receive SOAP or XML or XML-RPC content, with metadata bundled into the response (analogous to today where the client receives HTML content with metadata bundled in).

However, there is a problem with simply applying the current approach to the next generation of Web services:

The next generation Web services will be a lot more complicated, so the metadata will need to be a lot more involved, i.e., bigger. You don't want clients to be forced to receive an extensive description of everything the service does and all the instructions on how to use the service every time they use a Web service. The first time they use it they will want to view the service description metadata but from then on they will not.

So we probably want to keep the metadata and the service representation separate. This has the additional benefit of flexibility with regards to the format of the metadata - it could be expressed as RDF or DAML or OWL or even XHTML.

However, keeping the metadata separate from the service representation then introduces a problem:

In the following section we describe how to solve this problem.

Proposed Architecture

Meta-Location and Meta-About:
To implement the architecture we propose the addition of two new HTTP headers: Meta-Location and Meta-About

The proposed architecture is best described with an example:

Parts Depot, Inc is a (fictitious) supplier of auto, bicycle, and roller blade parts. One of the Web services that it makes available is a list of parts. A client issues this URL to the Web service to obtain a list of parts:

   http://www.parts-depot.com/parts

A representation of the parts list is returned to the client. In the HTTP header is this:

   Meta-Location: http://www.parts-depot.com/parts/metadata

The URL in the Meta-Location HTTP header indicates the location of the service description metadata for the Web service.

Note: the URL to the service description metadata can be of any form. That is, "/metadata" is just an example.

If the client follows the metadata URL:

   http://www.parts-depot.com/parts/metadata

Then a representation of the parts metadata is returned. In the HTTP header is this:

   Meta-About: http://www.parts-depot.com/parts

The URL in the HTTP Meta-About header is to the service that this metadata describes.

Review: the URL in the HTTP Meta-Location header identifies the metadata description. Conversely, the URL in the HTTP Meta-About header identifies the Web service.

Note: the representation of metadata returned may be influenced by the content negotiation HTTP Accept header.  That is, if the client is browser-based then an XHTML metadata description might be returned. Whereas a Web service client might ask for a WSDL or RDDL representation.

If the client has the URL to a service, it can perform this two-step process to get the service's metadata:

  1. Issue an HTTP HEAD to the Web service. The response is an HTTP header. The Meta-Location header contains the URL to the metadata. Then ...
  2. Issue an HTTP GET to the metadata URL

Third-party Resource Descriptions: Anyone, anywhere, anytime can publish metadata about any Web service by simply creating a description document and setting the Meta-About HTTP header to point to the service. While the Web service "owner" would have a pointer to the "authoritative" metadata, there is nothing preventing others from providing metadata in their own format. Over time a collection of description documents for a service will evolve. These service description documents will be found and indexed by search engines. Thus, search engines can act as the agents for creating a "federated registry". People who want alternative kinds of metadata for a service can see what's available by looking on the Web for pages "about" that service that provide the kinds of metadata, in the formats, they desire. This then can help "select" the "winning" kinds of metadata. For example, if I were managing a Web service and determined that most people were using, say, RDF-based descriptions to find my service, and I **wasn't** using RDF, that might be a good reason to change my own description.

Ratings and Seals of Approval Ratings and seals of approval can be applied to the alternative description pages, either by third party rating services, or by the Web service providers themselves (that is, a vendor might have a seal on the description page he has written for a given service saying in effect that the provider of the Web service has verified the content of **that** description as accurately describing the service). Thus, these alternative sources of metadata might be made just as authoritative as those provided by the creator of the Web service, while at the same time offloading the burden from the service provider to support all sorts of alternative formats and taxonomies.

Conclusions

The architecture proposed above levies no requirements on existing Web resources, but instead provides a framework for them to advertise their metadata should they choose to, and in the form they choose.  At a minimum we recommend that public Web resources create an HTML/XHTML description to facilitate discovery and reputation ranking via Google.  Since the architecture is ambivalent towards  particular metadata technologies it allows each to thrive or die on their own merits.

There are many advantages to this architecture:

In this paper we have provided a list of design goals for a Web architecture which supports the next generation Web resources. We then showed how these goals may be implemented by simply adding two new HTTP header fields, Meta-Location and Meta-About.


Contributors

This architecture was developed by:

We wish to gratefully acknowledge the following people. Their questions, comments, and suggestions have enabled us to advance and focus this architecture:


FAQ

I think it would be appropriate to document your vision for search engines of the future, since this is an assumption in defining the proposed web services architecture. I'm not sure, but it sounds like you expect them to do more than they currently do now. Can you state, at a high level, what the delta is? I think you should say what you expect them to do (even if you don't dictate how they'll do it). For instance, does the search engine of the future need to be able to distinguish a web service from other content? Does it need to know what to do with the new html headers you propose?

The only new features that the architecture introduces is the Meta-Location and the Meta-About HTTP headers:

We make no restrictions nor suggestions on how search engines handle these HTTP headers. However, I am certainly willing to offer ideas!

A search engine Web crawler will eventually encounter either the Web service or its description. In either case, once it has one it can obtain the other (using the HTTP headers above). So, the search engine can index both the service and its metadata.

Thus, when a user asks the search engine for information on xyz then the search engine can provide the user with the URL to the metadata (if the user needs instructions on how to use the service), or the URL to the service (and the user can immediately start to use the service).

It seems like the current concept you're proposing requires that web service descriptions be contained in static html pages.

No. Let's take an example: suppose that Parts Depot, Inc provides a Web service to enable clients to get a list of parts. In addition to the parts list Web service it has a service description (i.e., service metadata). Here's the URL to the service description:

http://www.parts-depot.com/parts/metadata

It is very important to note that this is a "logical URL". That is, there is no reference to the "format" of the service description.

Now suppose that you are at a browser and you type in the above service description URL. You will receive an HTML version of the metadata.

Suppose that you are at an RDF tool and you issue the above service description URL. You will receive an RDF version of the metadata.

Suppose that you are at a DAML tool and you issue the above service description URL. You will receive a DAML version of the metadata.

You might ask: how can this be so? How can the same URL return different representations?

Answer: recall that the service description URL is a "logical URL". It is up to the resource identified by this logical URL to return a "representation" that is appropriate to the requester. The resource can accomplish this by examining the HTTP Accept header. If the requester is a browser then the HTTP Accept header will contain:

Accept: text/html, text/xhtml

and the service description resource responds with an HTML/XHTML representation.

If the requester is an RDF inference engine then the HTTP Accept header will contain:

Accept: text/xml+rdf

and the service description resource responds with an RDF representation.

And so forth.

Thus, when this URL is invoked:

http://www.parts-depot.com/parts/metadata

the code that resides behind this URL will check the Accept header and subsequently return the metadata in the appropriate format.

So, with this approach a Web service will always return **one** URL for Meta-Location ... a URL to the "logical service description resource". When the service description URL is invoked the representation which is returned is influenced via content negotiation using the HTTP Accept header.

How does UDDI fit in with this architecture?

UDDI is fundamentally a matchmaker tool, i.e., it's a tool to match a client to a service. You may note that Google, Yahoo, Altavista, etc are also fundamentally matchmaker tools. Thus, UDDI is a competitor of Google, Yahoo, and Altavista.

UDDI requires Web service implementors to "register" with the UDDI registry. Google, Yahoo, and Altavista have Web crawlers which are constantly finding "what's out there". Consequently, there is no need to register with them.

With UDDI the responsibility is on Web service implementors to register, determine what "category" the service belongs in, and a whole host of other duties.

With Google, Yahoo, and Altavista the Web service implementor simply implements the service and that ends his/her responsibility. These search engines are responsible for finding the services and categorizing them.

The architecture that we present here advocates decoupling of the service implementation/description and matchmaker tools. That is, the service implementor's responsibilities end after the service has been implemented and the description completed. We advocate and encourage the marketplace to drive the evolution of matchmaker tools.

Forcing Web service implementors to register with matchmakers puts an undue burden on the implementors, is not consistent with the current Web architecture, and binds the services to the matchmakers.

How does this relate to IBM and Microsoft's suggestion for simple discovery called WS-Inspection or WSIL?

The WSIL format looks interesting and bears watching. I am not a fan of the publication/discovery method they describe in section 6 however. We specifically crafted our architecture so that it could support efforts like this, RDDL and other metadata aggregation formats without having to choose a winner. WS-Inspection asks users to look around the interested URL for an inspection.wsil document. This feels messy and error prone because the standard does not guarantee where to find the document but instead asks the publisher to scatter the inspection document to whereever they think a user might look for it. Of course it also presumes that a WSIL document is THE answer.

You seem to be punting on the matchmaking (service discovery) problem.

We did not "punt" on matchmaking. Quite the opposite. We recognized matchmaking as a part of the Web service equation, and that it is important that matchmaking should be decoupled from the service implementation and the service description. This is how it is today on the Web. The consequence of keeping matchmaking decoupled is that a wonderful search engine cottage industry has developed. We see this as a very good thing.

UDDI and ebXML support extensive taxonomies so that you can categorize the information to enable users to find service providers that meet their needs. How are services categorized in this architecture?

Taxonomies are fine as long as the matchmakers automatically generate them from the service description metadata (that is stored with the Web service). Forcing Web service implementors to register with a matchmaker tool and figure out what category to place the service into is putting an unacceptable burden on the Web service implementor. So, to answer your question: the matchmakers are able to automatically categorize services based upon their service description metadata. That is how it is today and that is how it should be.

I have read and/or thought that some of the first steps in establishing the Semantic Web would involve annotating existing web pages with RDF/DAML/OWL. Then a search engine could take this into consideration. However, existing search engines will have to be extended to take advantage of semantic markup. I think the same will be true of this web services approach. Existing search engines may have to change for the approach to work.

You are absolutely right! Service descriptions using technologies like RDF/DAML/OWL will drive the market to develop new search engines that can process such vocabularies. Our architecture encourages such developments in search tools! In fact, our architecture completely decouples Web service implementation and description from the discovery tool.

In your writeup you have specified "tModel" as a service description format. I don't think that a tModel is a service description format.

You are right that we should be more careful in distinguishing between a description format and a meta-description format when tossing the terms around. There are a number of meta-meta formats being bandied about (tModel, RDDL, WSIL, etc). The point we are trying to make is our architecture can support these formats as well as the description formats. They just add an additional layer of indirection.

Instead of introducing two new HTTP headers, Meta-Location and Meta-About, couldn't you just use one new HTTP header, Content-Alternatives? This HTTP header contains a comma separated list of content-types that can be served from the URL.

Thanks for the suggestion! Your suggestion is actually very close to what we are doing, only one step removed.

The problem with that idea is when both the service and its description have the same content-type. For example, the service returns an XHTML representation of the service, and the service description returns an XHTML representation of the metadata.

We didn't want to overload the semantics of the service URL because of this confusion that occurs when the content type of the service and the content type of the meta data are the same.

Suppose that your company wishes to use a particular Web service. The service description that the Web service provides is not quite adequate for your company (the service is good, but the service description is inadequate). For example, the Web service doesn't currently support an RDF description and you need an RDF description. You would like to enhance the service description with an RDF description that your company can use. How can you do that?

When you invoke the Web service it returns a representation that has an HTTP header, Meta-Location, which contains the URL to the service description. Thus, the Web service "owner" has a pointer to the "preferred" service description. However, you are free to create your own service description and in your own format. For example, your company could create an RDF description of the service. When that description is invoked it returns an RDF description and in the HTTP header is Meta-About, which contains a URL to the service. Thus, you have seamlessly extended the description of the service, without any impact to the service nor its "preferred" description. Nice!

Will this effort to be able to either federate registries or represent services from other registries in some way?

After we announced, the feedback we got made it clear that our choice of using the word registry was a mistake. When reading the word "registry" most people think of centralized repositories of meta-data (e.g., UDDI), and if you add distributed, then they think you are just creating a federation of these repositories.

Our goal was to make the web itself the repository for metadata. So federation is not an issue, since there are no repositories to federate.

As you might have noted however, this architecture, unlike UDDI, does not specify how aggregation and querying of this metadata should occur. We would expect Google to be the initial source of aggregation and searching, but as the formats for metadata settle down, we would expect either Google to be enhanced to recognize those formats or for specialized aggregators to appear.

One might want to declare more than one Meta-Location for a resource. How would that be handled, or is there a good reason to disallow a one to many relationship other than the fact that teeing up Meta-About with Meta-Location works well as long there's only one each?

How does one determine which Meta-Location is authorative? [I'm assuming that it will be the one offered by the origin server of the resource question, but it's not stated anywhere.]

These two questions are related. We thought it best for a resource to have one authoritive source of metadata. This is determined by the Meta-Location header returned by the resource. Of course, if you have a whole bunch of metadata documents you want associated with the resource we would advocate having Meta-Location point to an RDDL document or other such aggregation document type.

With respect to Meta-About, there is nothing stopping third parties from creating their own metadata for a resource and pointing to it with the Meta-About header. So while there is only one authoritive set of metadata, there can be countless alternatives offered.

Wouldn't it be simpler to put the Meta-Location URL into a SOAP header?

First, SOAP is just one method for implementing Web services. There are other, competing technologies, such as REST and XML-RPC. We wanted our technology to be usable regardless of which implementation technology is employed. Second, in the discussion above I gave an example of a client which has the URL to a service, but wants the URL to the service description. The client has no interest in downloading the entire service document. The solution is for the client to issue an HTTP HEAD method. This results in returning to the client just the HTTP header (which contains the Meta-Location URL).

How can I modify my Apache Web Server to return, in the HTTP headers, the Meta-Location and Meta-About headers?

Here is a URL that shows how easy it is to add the headers.

http://httpd.apache.org/docs/mod/mod_headers.html

Using this module all a publisher has to do is create a .htaccess file (if they didn't already have one) and add the following style of directives:


    <Files document.htm>
        set Meta-Location http://acme.com/document-meta.xml
    </Files>

    <Files document-meta>
        set Meta-About http://acme.com/document.htm
    </Files>


[1] PageRanking is an approach which computes and assigns a "score" to each potential candidate which "matches" the target of a query, where "matches" and "score" are determined by qualities such as number of page hits, recency of hits, number of links to the page, "who" is linking to the page, number of times search terms appear, proximity of search terms, etc.

 

This work was developed collaboratively by The MITRE Corporation and members of the distributed-registry list group and made freely available in the public interest. Any use of this material must acknowledge The MITRE Corporation and the distributed-registry list group.