XSV (XML Schema Validator) is an open source (GPLed) work-in-progress attempt at a conformant schema-aware processor, as defined by XML Schema Part 1: Structures, May 2, 2001 (REC) version. It has been developed at the Language Technology Group of the Human Communication Research Centre in the Division of Informatics at the University of Edinburgh, with support for one of us (Thompson) from the World Wide Web Consortium.
The simplest way to use XSV is via a form-based interface on the web.
New: I've packaged the latest version up in to a self-installing package for Win32 platforms: just fetch it, run it, and add the installation directory to your PATH, then
> xsv [flags] target [schemas . ..]
errfile
rather
than stderr
stylefile
in the error outputstdout
I hope to provide Solaris, FreeBSD and Linux packages shortly.
Python library download now available for installing XSV (Win32 binary and UN*X/Win32 source distributions).
You can download the (Python) sources from the W3C public CVS repository (main branch for all modules now), install PyLTXML to get the necessary XML validating library, install Python 1.6and do:
> set PYTHONPATH=c:\progra~1\hcrclt~1\ltxml~1\python\lib > python applyschema.py ...
No, the above instructions aren't sufficiently detailed, but you probably don't want the sources unless you can figure out how to make it work :-)
The basic framework of schema checking and instance schema-validation is implemented. Many details of both are not yet filled in. Here's a brief tabulation (to be extended):
Content-model validation
Attribute validation
Include
Import
Equivalence classes
Local and global element and attribute declarations
Type definition derivation by extension and restriction
Identity-constraint checking
Content and attribute wildcards
xsi:schemaLocation, xsi:noNamespaceSchemaLocation and, as last
resort, dereferencing of namespace URIs to find schema documents
xsi:null
xsi:type
Opportunistic validation inside <any> and <anyAttribute>
redefinition of simple and complex types
whitespace processing
Simple type conformance, other than enumerations and max/min for numeric types
Detailed enforcement of derivation by restriction
Redefinition of named groups and attribute groups
Occurrence ranges over 100 for elements or groups in content models
(performance limitation)
Handle extending with empty content model correctly
Reflect more thoroughly
Complete standalone schema checking, i.e. assuming this is all
you're going to get
Multiple keys is error, not warning
Handle bogus xsi attrs
Better crash logging
Obscure bug in defaulted NS attributes fixed
Added full independent schema check switch (-i)
Fixed bug in use of DTD to pre-validate schemas, this
caused serious and baffling problems, sorry.
Supports 'decimal', not 'number'
Supports all renamings,
http://www.w3.org/2001/XMLSchema
namespace
Improve stylesheet wrt XML parsing error output
XPath implementation now implements NS prefixes properly
Improve efficiency for large schemas
support for whitespace processing added
PSVI now mostly supported, can be reflected with -r switch
Fixed missing DTD complaint
Vintage 2000-09-22 changes restricting what can be specified in conjunction with
element declarations of the form <element ref="...">
now implemented
Important: All schemas are now
validated against the DTD for schemas before being loaded, even if they lack a
DOCTYPE
of their own. This may mean errors are found where none
were before, or a change in error message. Feedback on this change is welcome.
Handle shadowing of e.g. elementFormDefault in <include>d schemas correctly
(Partial) support for restricting a simpleContent complexType with a
nested simpleType
Check 'fixed' attribute values for correctness
Provide control of whether instance validation warnings appear
(default is that they don't)
Improve bomb-proofing and recovery after crashes
Backlog of bug fixes, including forestalling crashes when
<restriction>/<extension> are missing
Allow references to 'anyType' to work (oops)
Display more (more useful) output even if validator crashes
add -o outfile for command line invocation on e.g. win98 where
capturing stderr is hard
Fix xmlschema-instance namespace, so xsi:schemaLocation works (oops)
losing itemType no longer causes crash
fix bug in restriction of lists
new version syntax, support for redefine of simple and complex type defns
Chameleon include: including a schema doc't with no target namespace
into one with a target namespace does the right thing
No files on command line causes read from stdin
Bug fixes: handle nested attribute group references, more than two
explicitly supplied schemas
Bug fixes: handle lists as element content correctly; allow simple
types as type for document element; don't crash on restriction of type with
simple content; don't crash on min/max for date/time types
Bug fixes: don't crash on missing group defn or attr. type or bogus XPath
Bug fixes: allow (but still don't implement) minInclusive on string
types, don't crash if XPath ends with a '/'
Bug fixes: catch bad 'content' attr, don't crash after empty
<group>
, missing base
Better handling of some content-model errors in schemas
Allow appropriate facets on types derived by 'list' (still not
actually enforced :-()
Bug fixes: require 'value' on facets; don't crash if simple type has
element content; handle min/max facets on float/double
More bug fixes: catch complex type used for attributes or base of
simple type cleanly; don't die if xsi:type encountered during lax validation
Don't require 'fixed' attributes to appear, fix obscure bug in use
of xsi:type and no-target-namespace schemas
Allow <unique> fields to be missing without comment
More thorough enforcement of Unique Attribution (== determinism)
constraint on content models, including checking for <any>
-derived ambiguities
Bug-fixes to catch logged crashes: missing basetypes, bogus attr
type derivation, minInclusive
Upgraded stylesheets to report schema errors and warnings properly
Fixed bugs in cases of missing attribute group definition, missing
attribute type definition, missing base type definition
16-bit, XML output version now the default
now checks enumerated types, user-defined
min-max bug fixed
Partial check of QName simple type conformance
Fix bad knock-on effect of failed import
Improve fidelity of lax validation; validate laxly instead of
throwing error if no declaration found for document element
Fix bug in opportunistic validation of attribute values
Support for xsi:null
Fix bug causing bogus errors when restricting elements declared to
have the urtype definition
Fix bug which made the <anyAttribute namespace="other"> in the
schema for schemas overly generous
Try to catch all 404 (not found) errors better
Fixed ref-to-undeclared-elt bug
Fixed a bug causing a crash if you used an element with no
content model at all == the ur-type by default, manifesting itself as
an 'Attribute Error, note' crash
Now checks for and handles gracefully case where supplied file is
not a schema document
Lists don't allow min/maxLength, but they should