Current Status of XSV: Coverage, Known Bugs, etc.

Applies to XSV 1.189/1.95 of 2001/05/07 08:38:12

Henry S. Thompson
Richard Tobin
7 May 2001

1.   What is XSV

XSV (XML Schema Validator) is an open source (GPLed) work-in-progress attempt at a conformant schema-aware processor, as defined by XML Schema Part 1: Structures, May 2, 2001 (REC) version. It has been developed at the Language Technology Group of the Human Communication Research Centre in the Division of Informatics at the University of Edinburgh, with support for one of us (Thompson) from the World Wide Web Consortium.

2.   How can I use XSV

2.1.   Using XSV online

The simplest way to use XSV is via a form-based interface on the web.

2.2.   Running XSV at your own installation

2.2.1.   Win32 one-click installation

New: I've packaged the latest version up in to a self-installing package for Win32 platforms: just fetch it, run it, and add the installation directory to your PATH, then

> xsv [flags] target [schemas . ..]
-o errfile
Output error file to errfile rather than stderr
-s stylefile
Include an XSL style PI to stylefile in the error output
-r
Reflect the PSVI as an XML file to stdout
-w
Include warnings in error output
-t
Show stage timings
-k
Attempt instance validation even if schema(s) has/have errors
-i
Input should all be schemas, assume they are meant to be complete and check them as such

I hope to provide Solaris, FreeBSD and Linux packages shortly.

2.2.2.   Source distributions for the more adventurous

Python library download now available for installing XSV (Win32 binary and UN*X/Win32 source distributions).

You can download the (Python) sources from the W3C public CVS repository (main branch for all modules now), install PyLTXML to get the necessary XML validating library, install Python 1.6and do:

> set PYTHONPATH=c:\progra~1\hcrclt~1\ltxml~1\python\lib
> python applyschema.py ...

No, the above instructions aren't sufficiently detailed, but you probably don't want the sources unless you can figure out how to make it work :-)

3.   What is implemented

The basic framework of schema checking and instance schema-validation is implemented. Many details of both are not yet filled in. Here's a brief tabulation (to be extended):

3.1.   Implemented at least in part

Content-model validation
Attribute validation
Include
Import
Equivalence classes
Local and global element and attribute declarations
Type definition derivation by extension and restriction
Identity-constraint checking
Content and attribute wildcards
xsi:schemaLocation, xsi:noNamespaceSchemaLocation and, as last resort, dereferencing of namespace URIs to find schema documents
xsi:null
xsi:type
Opportunistic validation inside <any> and <anyAttribute>
redefinition of simple and complex types
whitespace processing

3.2.   Not implemented yet

Simple type conformance, other than enumerations and max/min for numeric types
Detailed enforcement of derivation by restriction
Redefinition of named groups and attribute groups
Occurrence ranges over 100 for elements or groups in content models (performance limitation)

3.3.   Recent Changes

Handle extending with empty content model correctly
Reflect more thoroughly
Complete standalone schema checking, i.e. assuming this is all you're going to get
Multiple keys is error, not warning
Handle bogus xsi attrs
Better crash logging
Obscure bug in defaulted NS attributes fixed
Added full independent schema check switch (-i)
Fixed bug in use of DTD to pre-validate schemas, this caused serious and baffling problems, sorry.
Supports 'decimal', not 'number'
Supports all renamings, http://www.w3.org/2001/XMLSchema namespace
Improve stylesheet wrt XML parsing error output
XPath implementation now implements NS prefixes properly
Improve efficiency for large schemas
support for whitespace processing added
PSVI now mostly supported, can be reflected with -r switch
Fixed missing DTD complaint
Vintage 2000-09-22 changes restricting what can be specified in conjunction with element declarations of the form <element ref="..."> now implemented
Important: All schemas are now validated against the DTD for schemas before being loaded, even if they lack a DOCTYPE of their own. This may mean errors are found where none were before, or a change in error message. Feedback on this change is welcome.
Handle shadowing of e.g. elementFormDefault in <include>d schemas correctly
(Partial) support for restricting a simpleContent complexType with a nested simpleType
Check 'fixed' attribute values for correctness
Provide control of whether instance validation warnings appear (default is that they don't)
Improve bomb-proofing and recovery after crashes
Backlog of bug fixes, including forestalling crashes when <restriction>/<extension> are missing
Allow references to 'anyType' to work (oops)
Display more (more useful) output even if validator crashes
add -o outfile for command line invocation on e.g. win98 where capturing stderr is hard
Fix xmlschema-instance namespace, so xsi:schemaLocation works (oops)
losing itemType no longer causes crash
fix bug in restriction of lists
new version syntax, support for redefine of simple and complex type defns
Chameleon include: including a schema doc't with no target namespace into one with a target namespace does the right thing
No files on command line causes read from stdin
Bug fixes: handle nested attribute group references, more than two explicitly supplied schemas
Bug fixes: handle lists as element content correctly; allow simple types as type for document element; don't crash on restriction of type with simple content; don't crash on min/max for date/time types
Bug fixes: don't crash on missing group defn or attr. type or bogus XPath
Bug fixes: allow (but still don't implement) minInclusive on string types, don't crash if XPath ends with a '/'
Bug fixes: catch bad 'content' attr, don't crash after empty <group>, missing base
Better handling of some content-model errors in schemas
Allow appropriate facets on types derived by 'list' (still not actually enforced :-()
Bug fixes: require 'value' on facets; don't crash if simple type has element content; handle min/max facets on float/double
More bug fixes: catch complex type used for attributes or base of simple type cleanly; don't die if xsi:type encountered during lax validation
Don't require 'fixed' attributes to appear, fix obscure bug in use of xsi:type and no-target-namespace schemas
Allow <unique> fields to be missing without comment
More thorough enforcement of Unique Attribution (== determinism) constraint on content models, including checking for <any>-derived ambiguities
Bug-fixes to catch logged crashes: missing basetypes, bogus attr type derivation, minInclusive
Upgraded stylesheets to report schema errors and warnings properly
Fixed bugs in cases of missing attribute group definition, missing attribute type definition, missing base type definition
16-bit, XML output version now the default
now checks enumerated types, user-defined min-max bug fixed
Partial check of QName simple type conformance
Fix bad knock-on effect of failed import
Improve fidelity of lax validation; validate laxly instead of throwing error if no declaration found for document element
Fix bug in opportunistic validation of attribute values
Support for xsi:null
Fix bug causing bogus errors when restricting elements declared to have the urtype definition
Fix bug which made the <anyAttribute namespace="other"> in the schema for schemas overly generous
Try to catch all 404 (not found) errors better
Fixed ref-to-undeclared-elt bug
Fixed a bug causing a crash if you used an element with no content model at all == the ur-type by default, manifesting itself as an 'Attribute Error, note' crash
Now checks for and handles gracefully case where supplied file is not a schema document

3.4.   Known bugs/features

Lists don't allow min/maxLength, but they should