Working with Namespaces in XML Schema

 

Dare Obasanjo
Microsoft Corporation

August 20, 2002

Summary: Dare Obasanjo discusses various aspects of W3C XML Schema and how they are affected by namespaces. Topics covered include proper usage of the targetNamespace, elementFormDefault and attributeFormDefault attributes, as well as the include, import, and redefine elements within a schema. (13 printed pages)

My Kingdom for Some Power Tools

This weekend, the bookcase I mentioned ordering in my last article finally arrived. Instead of going off to buy tools, I eagerly attempted to put it together with nothing more than a screwdriver and an old shoe to use as a hammer. Several blisters and a few hours later, my bookcase was assembled and a little wobbly.

After getting off the phone with my significant other who couldn't help laughing at the fact that I had gotten blisters by simply "putting some furniture" together, I decided to continue with my XML-based book catalog in a brazen attempt to restore my dignity.

I decided to create a schema for my XML instances so that I could not only check them for validity in applications I built, but could also use the cool features in .NET XML Serialization to convert the XML to C# objects as needed.

But first I needed a handy command line tool for performing validation of both instance documents and schemas. Below is the tool I built to simplify this process:

using System; 
using System.Xml; 
using System.Xml.Schema;

public class XsdValidate{
  
  static XmlSchemaCollection sc = new XmlSchemaCollection();
  static string xsdFile = null; 
  static string xmlFile = null; 
  static string nsUri = null; 

  static string usage = @"Usage: xsdvalidate.exe [-xml <xml-file>] 
   [-xsd <schema-file>] [-ns <namespace-uri>]

Sample:  xsdvalidate.exe -xml t.xml
Validate the XML file by loading it into XmlValidatingReader with
   ValidationType set to auto.  

Sample:  xsdvalidate.exe -xml t.xml -xsd t.xsd -ns ns1
This will validate the t.xml with the schema t.xsd with target namespace 'ns1'

Sample:  xsdvalidate.exe xsd t.xsd -ns ns1
This will validate the schema t.xsd with target namespace 'ns1'";

  public static void ValidationCallback(object sender, ValidationEventArgs args) {

    if(args.Severity == XmlSeverityType.Warning)
      Console.Write("WARNING: ");
    else if(args.Severity == XmlSeverityType.Error)
      Console.Write("ERROR: ");
    
    Console.WriteLine(args.Message); // Print the error to the screen.
  }

  public static void Main(string[] args){

    if((args.Length == 0) || (args.Length %2 != 0)){
      Console.WriteLine(usage);
      return; 
    }
    
     for(int i = 0; i < args.Length; i++) {
       switch(args[i]){

       case "-xsd":     xsdFile = args[++i];     break; 
       case "-xml":     xmlFile = args[++i];     break;     
       case "-ns":    nsUri  = args[++i];     break; 
    
       default:     Console.WriteLine("ERROR: Unexpected argument " + args[i]);    return; 

       }//switch
     }//for

     if(xsdFile != null){       
       sc.ValidationEventHandler += new ValidationEventHandler(ValidationCallback);
       sc.Add( nsUri, xsdFile);
       Console.WriteLine("Schema Validation Completed");
     } 
     
     if(xmlFile != null){
       XmlValidatingReader vr = new XmlValidatingReader(new XmlTextReader(xmlFile));
       vr.Schemas.Add(sc); 
       vr.ValidationType = ValidationType.Schema;
       vr.ValidationEventHandler += new ValidationEventHandler(ValidationCallback);
       
       while(vr.Read());
       Console.WriteLine("Instance Validation Completed");
     }
  }//Main
}//XsdValidate

Target Namespace, Schema Location: What's the Difference?

The first decision I had to make was whether I wanted to create a schema with a target namespace or not. The target namespace of a schema specifies the namespace of the elements and attributes that can be validated by that schema. Since the instance document from my previous article used the namespace urn:xmlns:25hoursaday-com:my-bookshelf, the choice was really whether I wanted to use that as my target namespace or create instance documents without a namespace.

Given that I was effectively creating a new markup vocabulary and namespaces provide a mechanism for disambiguating markup vocabularies, I decided to go with a target namespace. Thus the global (or top level) element and attribute declarations in the schema will refer only to elements and attributes from the urn:xmlns:25hoursaday-com:my-bookshelf namespace. The same applies to the global type definitions in the schema. The first line of my schema is shown below:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    targetNamespace="urn:xmlns:25hoursaday-com:my-bookshelf"
    xmlns:bk="urn:xmlns:25hoursaday-com:my-bookshelf">

The second related decision I made was to use a schema location in my XML instance documents. The attributes schemaLocation and noNamespaceSchemaLocation from the http://www.w3.org/2001/XMLSchema-instance namespace are used in an instance document to provide hard-coded references to one or more schemas that can be used to validate the document. The referenced schema(s) applies to the entire document and not just the scope of the element on which they appear. However, it is an error to specify a schema location after the first occurrence of an attribute or element whose namespace name is the same as the target namespace of the schema.

The schemaLocation attribute has as its value one or more pairs of target namespaces and URI references to a schema's location. Below is a snippet of an instance document that uses a schemaLocation attribute to refer to the target namespace and location of a schema to use in validating the document:

<bk:books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns:bk="urn:xmlns:25hoursaday-com:my-bookshelf" 
xsi:schemaLocation="urn:xmlns:25hoursaday-com:my-bookshelf file:///C:/books.xsd" >

The value of the noNamespaceSchemaLocation is a single URI reference to a schema without a target namespace.

Note   Both the schemaLocation and the noNamespaceSchemaLocation attributes are only hints to the validating processor that can be ignored if other means are used to specify the schema(s) for the document.

I decided against using the schemaLocation or the noNamespaceSchemaLocation attribute in my instance documents because I expect to utilize the documents on different machines that also may or may not have Internet connectivity, so a hard-coded reference to a schema would, in many cases, be inappropriate.

If at First You Don't Succeed

On rethinking the format for my XML books catalog, I decided to remove the on-loan attribute from the root element but keep the rest of the format unchanged. Thus, given the (slightly modified) instance document below from my last article:

<?xml version="1.0" encoding="UTF-8" ?> 
<bk:books xmlns:bk="urn:xmlns:25hoursaday-com:my-bookshelf">
 <bk:book publisher="IDG books" on-loan="Sanjay" >
  <bk:title>XML Bible</bk:title> 
  <bk:author>Elliotte Rusty Harold</bk:author>
 </bk:book>
 <bk:book publisher="QUE">
  <bk:title>XML By Example</bk:title> 
  <bk:author>Benoit Marchal</bk:author>
 </bk:book>
</bk:books>

I created following schema to validate it and others of its ilk:

<?xml version="1.0" encoding="UTF-8" ?> 
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    targetNamespace="urn:xmlns:25hoursaday-com:my-bookshelf"
    xmlns:bk="urn:xmlns:25hoursaday-com:my-bookshelf">

 <xs:element name="books"> 
  <xs:complexType>
   <xs:sequence> 
    <xs:element name="book" type="bk:bookType" maxOccurs="unbounded" />
   </xs:sequence> 
  </xs:complexType>
 </xs:element>

 <xs:complexType name="bookType">
  <xs:sequence>
   <xs:element name="title" type="xs:string" />
   <xs:element name="author" type="xs:string" />
  </xs:sequence>
  <xs:attribute name="publisher" type="xs:string" />
  <xs:attribute name="on-loan" type="xs:string" use="optional" />
 </xs:complexType>

</xs:schema>

Surprisingly, although the above schema validated successfully with my validation tool, multiple error messages were displayed once I attempted to validate the XML instance document. Specifically when I executed the following command:

xsdvalidate -xml books.xml -xsd books.xsd -ns urn:xmlns:25hoursaday-com:my-bookshelf

I got the following output (line numbers trimmed):

Schema Validation Completed
ERROR: Element 'urn:xmlns:25hoursaday-com:my-bookshelf:books' has invalid child
element 'urn:xmlns:25hoursaday-com:my-bookshelf:book'. Expected 'book'
ERROR: The 'urn:xmlns:25hoursaday-com:my-bookshelf:book' element is not declared. 
WARNING: Could not find schema information for the attribute 'publisher'. 
WARNING: Could not find schema information for the attribute 'on-loan'. 
ERROR: The 'urn:xmlns:25hoursaday-com:my-bookshelf:title' element is not declared. 
ERROR: The 'urn:xmlns:25hoursaday-com:my-bookshelf:author' element is not declared. 
ERROR: The 'urn:xmlns:25hoursaday-com:my-bookshelf:book' element is not declared. 
WARNING: Could not find schema information for the attribute 'publisher'. 
ERROR: The 'urn:xmlns:25hoursaday-com:my-bookshelf:title' element is not declared. 
ERROR: The 'urn:xmlns:25hoursaday-com:my-bookshelf:author' element is not declared. 
Instance Validation Completed

The first error message gave me a clue as to what was wrong. I quickly changed the first line of the schema to the following:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    targetNamespace="urn:xmlns:25hoursaday-com:my-bookshelf"
    xmlns:bk="urn:xmlns:25hoursaday-com:my-bookshelf"
 elementFormDefault="qualified" >

When I reran the tool got the following output:

   Schema Validation Completed
Instance Validation Completed

The appearance of the error messages was due to the fact that the schema contains local element declarations and the default value of the elementFormDefault attribute on the xs:schema element is "unqualified". These concepts are explained in more detail in the following sections.

Think Globally, Act Locally

Element and attribute declarations that appear as children of the xs:schema element are considered to be global declarations. All other element and attribute declarations are considered to be local declarations. A local element or attribute declaration can reference a global declaration through the ref attribute, which effectively makes the local declaration the same as the global one. The names of global declarations are placed in a separate symbol space from those of local declarations. Also, the scope of a local declaration's name is that of its enclosing type definition. Thus, a schema can have two or more type definitions that contain element or attribute declarations with the same name and no naming conflict will ensue. This is also the case with a global element or attribute that shares the same name as one or more local elements or attributes.

Both local element declarations and references to global elements can have their cardinality expressed using occurrence constraints. The occurrence constraints are specified using the minOccurs and maxOccurs attributes.

Below is a sample schema that uses local and global elements, as well as references to a global element declaration:

<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
 targetNamespace="http://www.example.com"
 xmlns="http://www.example.com">

 <!-- global element declaration --> 
 <xs:element name="language" type="xs:string" />

 <!-- complex type with local element declaration -->  <xs:complexType name="sequenceOfLanguages" >  
  <xs:sequence>
   <xs:element name="language" type="xs:NMTOKEN" maxOccurs="unbounded" />
  </xs:sequence>
 </xs:complexType>

 <!-- complex type with reference to global element declaration -->
  <xs:complexType name="sequenceOfLanguages2" >  
  <xs:sequence>
   <xs:element ref="language" maxOccurs="10" />
  </xs:sequence>
 </xs:complexType>
</xs:schema>

By default, global elements have a namespace name equivalent to that of the target namespace of the schema, while local elements have no namespace name. This means that for the above schema, the global language element declaration can validate language elements in an instance document that have http://www.example.com as their namespace name. However, the local declaration of the language element in the sequenceOfLanguages type can only validate language elements in an instance document that have no namespace name.

Type definitions that occur as children of the xs:schema element are considered to be global type definitions. Global type definitions must have a name. Global type definitions can be referenced through the type attribute of attribute and element declarations or the base attribute of derived types. Type definitions can also be created locally as part of an element or attribute declaration, in which case they must have no name and are considered anonymous types.

Below is a sample schema that uses anonymous and global type definitions as well as references to a global type definition:

<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
 targetNamespace="http://www.example.org" 
 xmlns:tns="http://www.example.org">

 <!-- element declaration that references a global complex type --> 
 <xs:element name="languages" type="tns:sequenceOfLanguages" />

 <!-- global complex type definition -->
 <xs:complexType name="sequenceOfLanguages" >  
  <xs:sequence>
   <xs:element name="language" type="xs:NMTOKEN" maxOccurs="unbounded" />
  </xs:sequence>
 </xs:complexType>

 <!-- attribute declaration with anonymous simple type  -->
 <xs:attribute name="positiveDecimal">
  <xs:simpleType>  
  <xs:restriction base="xs:decimal">
   <xs:minExclusive value="0"  />
  </xs:restriction>
 </xs:simpleType>
 </xs:attribute>
</xs:schema>

Type definitions, element declarations, and attribute declarations do not share the same symbol space for names. So, it is possible to have a schema where a type definition, global declaration, and local declaration share a single name. This practice is extremely confusing and should be avoided.

Are You Qualified?

In the last section, I mentioned that by default global declarations validate elements or attributes with a namespace name, while local declarations validate elements or attributes without a namespace name. The term used to describe elements or attributes with a namespace name is namespace qualified.

It is possible to override the default behavior with regards to whether local declarations validate namespace qualified elements and attributes or not. The xs:schema element has the elementFormDefault and attributeFormDefault attributes, which specify whether local declarations in the schema should validate namespace qualified elements and attributes respectively. The valid values for either attribute are qualified and unqualified. The default value of both attributes is unqualified.

The form attribute on local element and attribute declarations can be used to override the value of the elementFormDefault and attributeFormDefault attributes on the xs:schema element. This allows for finer grained control of how validation of elements and attributes in the instance document should operate in relation to local declarations.

The following examples highlight how one can control local declarations using the elementFormDefault , attributeFormDefault, and form attributes.

Schema Valid Instance Document
<?xml version="1.0" encoding="UTF-8" ?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://www.example.org"

xmlns:tns="http://www.example.org">

<!-- elementFormDefault and attributeFormDefault

have value unqualified in this schema -->

<xs:element name="root" type="tns:rootType" />

<xs:complexType name="rootType" >

<xs:sequence>

<xs:element name="child1" type="xs:string" maxOccurs="2" />

<xs:element name="child2" type="xs:string" form="qualified" />

</xs:sequence>

<xs:attribute name="attr" type="xs:string" />

</xs:complexType>

</xs:schema>

<?xml version="1.0" encoding="UTF-8" ?>

<ex:root xmlns:ex=http://www.example.org attr="unqualified">

<child1>I am not namespace qualified</child1>

<child1>Neither am I</child1>

<ex:child2>I am namespace qualified</ex:child2>

</ex:root>

<?xml version="1.0" encoding="UTF-8" ?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://www.example.org"

xmlns:tns="http://www.example.org"

elementFormDefault="qualified"

attributeFormDefault="qualified">

<xs:element name="root" type="tns:rootType" />

<xs:complexType name="rootType" >

<xs:sequence>

<xs:element name="child1" type="xs:string" maxOccurs="2" />

<xs:element name="child2" type="xs:string" />

</xs:sequence>

<xs:attribute name="attr1" type="xs:string" />

<xs:attribute name="attr2" type="xs:string" form="unqualified"/>

</xs:complexType>

</xs:schema>

<?xml version="1.0" encoding="UTF-8" ?>

<ex:root xmlns:ex="http://www.example.org"

ex:attr1="qualified" attr2="unqualified">

<ex:child1>I am namespace qualified</ex:child1>

<ex:child1>So am I</ex:child1>

<ex:child2>Me too</ex:child2>

</ex:root>

The Whole Is Greater than the Sum of Its Parts

A schema can be constituted from multiple schemas that are assembled into a single logical schema during validation. W3C XML Schema provides three elements that can be used to assemble global declarations and type definitions from external schemas into a target schema document. The three elements are xs:include, xs:import, and xs:redefine.

The xs:include is used to bring in definitions from schemas that either have no target namespace or have the same target namespace as the enclosing schema. xs:import is similar to xs:include with the difference being that imported schema must have a different target namespace from the enclosing schema. If an imported schema has no namespace name, then the enclosing schema must have a target namespace.

The example below shows how imported declarations are referenced in the enclosing schema as well as how namespace qualification affects local declarations.

<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
 targetNamespace="http://www.example.org" 
 xmlns:tns="http://www.example.org"
xmlns:imp="http://www.import.org">
<xs:import namespace="http://www.import.org" schemaLocation="file:///c:/import.xsd"
  <xs:element name="root" type="imp:rootType" />
</xs:schema>

Imported Schema: import.xsd
<xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema
targetNamespace="http://www.import.org" elementFormDefault="qualified"> 

<xs:complexType name="rootType" >  
  <xs:sequence>
   <xs:element name="child1" type="xs:string" maxOccurs="2" />
   <xs:element name="child2" type="xs:string"/>
  </xs:sequence>
</xs:complexType>

</xs:schema>

Instance Document: [when elementFormDefault="qualified" in import.xsd]
<?xml version="1.0" encoding="UTF-8" ?>
<ex:root xmlns:ex=http://www.example.org xmlns:imp="http://www.import.xsd">
 <imp:child1>I am from imported schema </imp:child1>
 <imp:child1>So Am I </imp:child1>
 <imp:child2>Me too </imp:child2>
</ex:root>

Instance Document: [when elementFormDefault="unqualified" in import.xsd]
<?xml version="1.0" encoding="UTF-8" ?>
<ex:root xmlns:ex="http://www.example.org">
 <child1>Don't know where I come from </child1>
 <child1>neither do I </child1>
 <child2>Me too </child2>
</ex:root>

xs:redefine is used for type redefinition by performing what are essentially two tasks. The first is to act as an xs:include by bringing in declarations and definitions from another schema document and making them available as part of the current target namespace. The included declarations and types must be from a schema with the same target namespace, or it must have no namespace. Secondly, types can be redefined in a manner similar to type derivation with the new definition replacing the old one.

Examples and further explanations of xs:include, xs:import and xs:redefine are available in the W3C XML Schema Primer.

Karma Chameleon

Schemas without a target namespace are often referred to as chameleon schemas. Chameleon schemas can be included by any schema regardless of its target namespace, which then makes the type definitions and declarations in the chameleon schema acquire the target namespace of the enclosing schema.

Here's an example that uses chameleon schemas.

Further Reading

Dare Obasanjo is a member of Microsoft's WebData team, which among other things develops the components within the System.Xml and System.Data namespace of the .NET Framework, Microsoft XML Core Services (MSXML), and Microsoft Data Access Components (MDAC).

Feel free to post any questions or comments about this article on the Extreme XML message board on GotDotNet.