Understanding XML Schema

 

Aaron Skonnard
DevelopMentor

March 2003

Applies to:
   Type systems
   XML Schema definition language (XSD)
   Web Services development

Summary: XML Schema is poised to play a central role in the future of XML processing, especially in Web services where it serves as one of the fundamental pillars that higher levels of abstraction are built upon. This article describes how to use the XML Schema definition language in more detail. (22 printed pages)

Contents

Introduction
Datatypes: Value and Lexical Spaces
Defining Types in a Namespace
Defining Simple Types
Defining Complex Types
Locating and Managing Schemas
Conclusion
References

Introduction

1 + 2 = ?

In software, it's the type system that provides the information needed to answer such a question. Programming languages use type systems to simplify the task of producing quality code. A type system defines a set of types and operations that developers can choose to work with in their programs. A type defines a value space, or in other words, a set of possible values. For example, if the operands above are considered numeric types, the answer might be 3, but if they're considered strings, it might be "12", depending on how the + operator is defined.

One of the main benefits of a type system is the fact that compilers can use it to determine whether code contains bugs before it ever runs, immediately avoiding a wide range of possible errors. Compilers also leverage type system information to generate code for operations on a given type. In addition, both compilers and runtimes rely heavily on the type system to determine how to allocate memory for a particular type when it's used, allowing developers to forgot about such tedious concerns.

Many languages and runtimes also make it possible to programmatically inspect type information at runtime. This allows a developer to walk up to an arbitrary instance, ask questions about its type characteristics, and make decisions based on the answers. This technique of inspecting type information at runtime is generally referred to as reflection. Reflection plays a major role in today's mainstream managed programming environments like the Microsoft® .NET Framework and Java where a virtual machine (e.g., the common language runtime or JVM) provides additional services that most programs need, such as security, garbage collection, serialization, remote method invocation, and even Web service integration, effectively reducing the number of things developers have to worry about in their code.

Figure 1. Benefits of type information

A well-defined type system along with reflection also makes it possible to build better tools for working with the language. Developers have quickly grown used to things like Microsoft® Intellisense®, code completion, and those handy red squiggles that greatly speed up the development process. Overall a good type system offers many interesting benefits (see Figure 1), most of which are easy to take for granted but greatly missed when absent.

XML 1.0 is a good example of a language that lacked a sensible type system. Without a type system, the information found in an XML 1.0 document can only be treated as text. This requires developers to know about the "real type" ahead of time so they can perform the necessary coercions in their code.

The XML Schema definition language (XSD) provides a type system for XML processing environments. In a nutshell, XML Schema makes it possible to describe types that you intend to use. An XML document that conforms to an XML Schema type is often referred to as an instance document, very much like the traditional object-oriented (OO) relationship between classes and objects (see Figure 2). This is a conceptual shift away from the way Document Type Definitions (DTD) fundamentally worked, one that offers more flexibility when mapping to traditional programming language or database type systems. In these environments, XML Schema largely deprecates the use of DTDs.

Figure 2. OO v. XML concepts

XML Schema is capable of providing all of the benefits illustrated in Figure 1, only in a completely XML-centric way. A logical XML document that contains XML Schema type information is often referred to as a post schema-validation Infoset (PSVI). PSVI makes it possible to perform XML Schema-based reflection at runtime just like in other programming environments. Overall, XML Schema is poised to play a central role in the future of XML processing, especially in Web services where it serves as one of the fundamental pillars that higher levels of abstraction are built upon. The remainder of this article describes how to use the XML Schema definition language in more detail.

Datatypes: Value and Lexical Spaces

XML Schema provides a repertoire of built-in datatypes that developers can use to constrain text (see the W3C XML Schema Part 2: Datatypes Web page for a helpful figure). All of these types are found in the https://www.w3.org/2001/XMLSchema namespace. Each type has a defined value space. A type's value space is simply the set of values that can be used in an instance of the given type.

Figure 3. Byte value space

For example, XML Schema provides a built-in type named byte, which has a value space of -128 through 127. Another example is the XML Schema boolean type, whose value space is much simpler as it only consists of two values: true and false. In total, there are forty-four built-in types for you to choose from, each with different value spaces intended to satisfy a wide variety of data modeling needs.

Figure 4 illustrates that many of the built-in types are defined as subsets of another type's value space, also known as derivation by restriction. For example, byte's value space is a subset of short's value space, which is a subset of int's value space, which is a subset of long's value space, etc. Hence, basic set theory tells us that an instance of a derived type is also a valid instance of any of its ancestor types. (Strictly speaking, they are subsets of anySimpleType itself.)

Although programming languages use value space information to figure out how much memory will be needed to represent values, developers seldom need to worry about representing them as text. With XML, however, one cannot ignore the fact that instances will most likely be serialized into an XML 1.0 file, requiring a lexical representation for the value. If every XML Schema processor were to decide how to do this independently, interoperability would quickly be lost. Hence, in addition to defining the value space of each type, XML Schema defines their allowed lexical representations as well.

Figure 4. Type subsets

For example, the boolean true value can be represented as either "true" or "1" while the boolean false value can be represented as "false" or "0". The double value 10 can be represented as "10", "10.0", "10.0000", or even "0.01E3". And the date value of January 1, 2003 can be lexically represented as "2003-01-01". Standardizing the lexical format (and any possible variations) for each type makes it possible for developers to deal exclusively with values in their code while ignoring the complexities of how it's actually serialized.

Defining Types in a Namespace

In addition to providing built-in types, most programming languages let developers define their own types, often referred to as user-defined types (UDTs). When defining UDTs, most programming languages also allow you to qualify them with a namespace so as to not confuse them with other UDTs that coincidentally share the same name. See Understanding XML Namespaces for detailed information on how XML namespaces work. Figure 5 shows a C# namespace definition and a comparable XML Schema definition. As you can see, XML Schema also supports defining types within a namespace.

Figure 5. Defining types in a namespace

The xsd:schema element scopes what's in the namespace and the targetNamespace attribute specifies the namespace's name. For example, the following XML Schema template defines a new namespace called https://example.org/publishing:

<xsd:schema xmlns:xsd="https://www.w3.org/2001/XMLSchema"
   targetNamespace="https://example.org/publishing"
   xmlns:tns="https://example.org/publishing"
>

   <!-- type definitions -->
   <xsd:simpleType name="AuthorId">
      <!-- define value space details here -->
      ...
   </xsd:simpleType>

   <xsd:complexType name="AuthorType">
      <!-- define structural details here -->
      ...
   </xsd:complexType>

   <!-- global element/attribute declarations -->
   <xsd:element name="author" type="tns:AuthorType"/>
   <xsd:attribute name="authorId" type="tns:AuthorId"/>
   ...

</xsd:schema>

Everything placed within the xsd:schema element (as an immediate child) is considered global and therefore automatically associated with the target namespace. In the previous example, there are four things in the https://example.org/publishing namespace including AuthorId, AuthorType, author, and authorId. As a result, whenever you refer to one of these things within your schema you must use a namespace-qualified name.

To use a namespace-qualified name you'll need another namespace declaration that maps to the schema's targetNamespace value. The 'tns' namespace declaration shown above serves this purpose. Hence, whenever I need to reference something I've defined in my schema, I can prefix the name with 'tns' as illustrated in the example.

There are two classes of types that you can define within the xsd:schema element: simple types (using xsd:simpleType) and complex types (using xsd:complexType). Simple types can only be assigned to text-only elements and attributes since they don't define structure, but rather, value spaces. An element with additional structure, such as one that carries attributes or has child elements, must be defined as a complex type.

In addition to defining types, you can also define global elements (using xsd:element) and attributes (using xsd:attribute) within the schema and assign them a type. In the previous example, I defined a global element named author and a global attribute named authorId. Since these constructs are also global, they need to be qualified by the target namespace when I use them in instance documents. The following XML document contains an instance of the author element defined earlier:

<x:author xmlns:x="https://example.org/publishing">
  <!-- structure determined by complexType definition -->
  ...
</x:author>

And the following XML document contains the global authorId attribute:

<!-- authorId value constrained by simpleType definition -->
<publication xmlns:x="https://example.org/publishing"  
   x:authorId="333-33-3333"/>  

It's also possible to explicitly assign a type to an element in an instance document using the type attribute from the https://www.w3.org/2001/XMLSchema-instance namespace. This namespace contains a handful of attributes that can only be used in instance documents. Using the type attribute is similar to casting between types in some programming languages. The following example explicitly assigns the genericId element (which has not been defined in the schema) the AuthorId type:

<genericId 
  xmlns:x="https://example.org/publishing"
  xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
  xsi:type="tns:AuthorId"
>333-33-3333</genericId>

Notice that AuthorId is the same type that we assigned to the global authorId attribute shown above. This illustrates that you can assign simple types to either attributes or text-only elements to constrain their values respectively. Also, it's important to note that the xsi:type technique for assigning type only applies to elements and not attributes.

Defining Simple Types

Most programming languages only allow developers to arrange the various built-in types into a structured type of some sort, but they don't allow developers to define new simple types that have user-defined value spaces. XML Schema is different in this regard because it allows users to define their own custom simple types, whose value spaces are subsets of the predefined built-in types.

You define a new simple type using the xsd:simpleType element as shown earlier. Within the xsd:simpleType element you specify a base type whose value space you wish to restrict (using the xsd:restriction element). Within the xsd:restriction element, you specify exactly how you wish to restrict the base type by constraining one or more of its facets. For example, the following simple types constrain the xsd:double and xsd:date value spaces down to more specific ranges using the xsd:minInclusive and xsd:maxInclusive facets:

	...
		<xsd:simpleType name="RoyaltyRate">
		  <xsd:restriction base="xsd:double">
			 <xsd:minInclusive value="0"/>
			 <xsd:maxInclusive value="100"/>
		  </xsd:restriction>
		</xsd:simpleType>
		<xsd:simpleType name="Pubs2003">
		  <xsd:restriction base="xsd:date">
			 <xsd:minInclusive value="2003-01-01"/>
			 <xsd:maxInclusive value="2003-12-31"/>
		  </xsd:restriction>
		</xsd:simpleType>
		<xsd:element name="rate" type="tns:RoyaltyRate"/>
		<xsd:element name="publicationDate" type="tns:Pubs2003"/>
	...

The following documents contain valid instances of the elements defined above:

    <x:rate xmlns:x="https://example.org/publishing">17.5</x:rate>
    <x:publicationDate xmlns:x="https://example.org/publishing"
    >2003-06-01</x:publicationDate>

XML Schema defines the facets available for each type (see Table 1). Most facets don't apply to all types (some only make sense on certain types). Most facets restrict a type's value space while the pattern facet restricts the type's lexical space. Restricting either the value or lexical space indirectly restricts the other. The previous examples constrained the base type's value space while the following examples constrain string's lexical space using regular expressions:

	   <xsd:simpleType name="SSN">
		  <xsd:restriction base="xsd:string">
			 <xsd:pattern value="\d{3}-\d{2}-\d{4}"/>
		  </xsd:restriction>
	   </xsd:simpleType>
	   <xsd:simpleType name="PublisherAssignedId">
		  <xsd:restriction base="xsd:string">
			 <xsd:pattern value="\d{2}-\d{8}"/>
		  </xsd:restriction>
	   </xsd:simpleType>
	   <xsd:simpleType name="Phone">
		  <xsd:restriction base="xsd:string">
			 <xsd:pattern value="\(\d{3}\)\d{3}-\d{4}"/>
		  </xsd:restriction>
	   </xsd:simpleType>

	   <xsd:element name="authorId" type="tns:SSN"/>
	   <xsd:element name="pubsAuId" type="tns:PublisherAssignedId"/>
	   <xsd:element name="phone" type="tns:Phone"/>
	...

The following documents contain valid instances of the elements defined above:

    <x:authorId xmlns:x="https://example.org/publishing"
    >123-45-6789</x:authorId>
    <x:pubsAuId xmlns:x="https://example.org/publishing"
    >01-23456789</x:pubsAuId>
    <x:phone xmlns:x="https://example.org/publishing"
    >(801)390-4552</x:phone>

Only strings matching the regular expression (specified in the pattern facet) are considered a valid instance of the given type.

Table 1. Facets

Facet Element Description
xsd:enumeration Specifies a fixed value that the type must match.
xsd:fractionDigits Specifies the maximum number of decimal digits to the right of the decimal point.
xsd:length Specifies the number of characters in a string-based type, the number of octets in a binary-based type, or the number of items in a list-based type.
xsd:maxExclusive Specifies the exclusive upper-bound on the value space of the type.
xsd:maxInclusive Specifies the inclusive upper-bound on the value space of the type.
xsd:maxLength Specifies the maximum number of characters in a string-based type, the maximum number of octets in a binary-based type, or the maximum number of items in a list-based type.
xsd:minExclusive Specifies the exclusive lower-bound on the value space of the type.
xsd:minInclusive Specifies the inclusive lower-bound on the value space of the type.
xsd:minLength Specifies the minimum number of characters in a string-based type, the minimum number of octets in a binary-based type, or the minimum number of items in a list-based type.
xsd:pattern Specifies a pattern, based on a regular expression, the type must match.
xsd:totalDigits Specifies the maximum number of decimal digits for types derived from number.
xsd:whiteSpace Specifies rules for whitespace normalization.

Another interesting facet is xsd:enumeration, which allows you to constrain a value space down to a list of enumerated values. The following examples constrains the value space of xsd:NMTOKEN down to four specific enumerated values:

...
   <xsd:simpleType name="PublicationType">
      <xsd:restriction base="xsd:NMTOKEN">
         <xsd:enumeration value="Book"/>
         <xsd:enumeration value="Magazine"/>
         <xsd:enumeration value="Journal"/>
         <xsd:enumeration value="Online"/>
      </xsd:restriction>
   </xsd:simpleType>
   <xsd:element name="pubType" type="tns:PublicationType"/>
...

The following document contains a valid instance of the element defined above:

<x:pubType xmlns:x="https://example.org/publishing"
>Online</x:pubType>

Table 2. Simple Type Construction Techniques

Derivation Element Description
xsd:restriction The new type is a restriction of the existing type, which means it has a narrower set of legal values.
xsd:list The new type is a whitespace-delimited list of another simple type.
xsd:union The new type is a union of two or more other simple types.

In addition to restricting a type's value space, it's also possible to construct new simple types that are lists or unions of other simple types. To do this you use either the xsd:list or xsd:union element instead of xsd:restriction (see Table 2). When using xsd:list, you're essentially defining a whitespace-delimited list of values from the specified value space. It's worth mentioning that when using xsd:list or xsd:union, there is no derivation hierarchy as with xsd:restriction so type compatibility doesn't apply in these cases. The following example defines a new type named AuthorList as a list of SSN values.

...
   <xsd:simpleType name="AuthorList">
      <xsd:list itemType="tns:SSN"/>
   </xsd:simpleType>
   <xsd:element name="authors" type="tns:AuthorList"/>
...

The following document contains a valid instance of the authors element:

<x:authors xmlns:x="https://example.org/publishing"
>111-11-1111 222-22-2222 333-33-3333 444-44-4444</x:authors>

In the case of xsd:union, you're creating a new type that combines multiple value spaces into a new value space. An instance of a union type can be a value from any of the specified value spaces. For example, the following type named AuthorId combines the SSN value space with the PublisherAssignedId value space:

...
   <xsd:simpleType name="AuthorId">
      <xsd:union memberTypes="tns:SSN tns:PublisherAssignedId"/>
   </xsd:simpleType>
   <xsd:element name="authorId" type="tns:AuthorId"/>
...

Each of the following documents shows a valid instance of the authorId element:

    <x:authorId xmlns:x="https://example.org/publishing"
    >111-11-1111</x:authorId>
    <x:authorId xmlns:x="https://example.org/publishing"
    >22-22222222</x:authorId>

The XML Schema support for user-defined types, and more specifically custom value/lexical spaces, is one of the more powerful aspects of the language. The fact that most programming languages don't allow this forces developers to deal with such issues in their application code (typically via property setters). The ability to define custom value/lexical spaces that fit your exact needs makes it possible to push error handling and validation code down a layer.

Defining Complex Types

XML Schema makes it possible to arrange different simple types (or value spaces) into a structure, also know as a complex type. You use the xsd:complexType element to define a new complex type within the schema's target namespace as illustrated here:

...
   <xsd:complexType name="AuthorType">
      <!-- compositor goes here -->
   </xsd:complexType>
...

The xsd:complexType element contains what's known as a compositor, which describes the composition of the type's content, also known as its content model. XML Schema defines three compositors that can be used in complex type definitions including xsd:sequence, xsd:choice, and xsd:all (see Table 3).

Compositors contain particles, which includes things like other compositors, element declarations, wildcards, and model groups. Attribute declarations are not considered particles because they don't repeat. Hence, attribute declarations are not placed within a compositor but after the compositor at the end of the complex type definition.

Table 3. Complex Type Compositors

Compositor Definition
xsd:sequence An ordered sequence of contained particles
xsd:choice A choice of the contained particles
xsd:all All of the contained particles in any order

An element declaration (xsd:element) is probably the most commonly used particle. The following complexType named AuthorType defines an ordered sequence of two element children and an attribute, each of a different simple type:

...
   <xsd:complexType name="AuthorType">
      <!-- compositor goes here -->
      <xsd:sequence>
         <xsd:element name="name" type="xsd:string"/>
         <xsd:element name="phone" type="tns:Phone"/>
      </xsd:sequence>
      <xsd:attribute name="id" type="tns:AuthorId"/>
   </xsd:complexType>
   <xsd:element name="author" type="tns:AuthorType"/>
...

The elements and attributes declared within the xsd:complexType element are considered local to the complex type. Local elements and attributes can only be used within the context where they're defined. This raises an interesting question about whether local elements/attributes need to be namespace qualified in instance documents. Since local elements and attributes will always contain an ancestor element (typically a global element) qualified by the target namespace, one could argue that it's not necessary. This is similar to how things work in most programming languages—if you define a class within a namespace, only the class name is qualified by the namespace, not its local members.

Due to this reasoning, in XML Schema local elements and attributes should be unqualified by default. Hence, a valid instance of the author element looks like this:

<x:author xmlns:x="https://example.org/publishing"
   id="333-33-3333"
>
   <name>Aaron Skonnard</name>
   <phone>(801)390-4552</phone>
</x:author>

XML Schema makes it possible, however, to explicitly control whether a given local element/attribute should be qualified or unqualified using the form attribute on xsd:element/xsd:attribute or by using the elementFormDefault/attributeFormDefault attributes on xsd:schema as illustrated here:

<xsd:schema xmlns:xsd="https://www.w3.org/2001/XMLSchema"
   targetNamespace="https://example.org/publishing"
   xmlns:tns="https://example.org/publishing"
   elementFormDefault="qualified" 
   attributeFormDefault="qualified"
>
   ...
</xsd:schema>

With this schema in place, the following instance would be considered a valid instance (while the previous instance wouldn't be):

<x:author xmlns:x="https://example.org/publishing"
   x:id="333-33-3333"
>
   <x:name>Aaron Skonnard</x:name>
   <x:phone>(801)390-4552</x:phone>
</x:author>

In most situations it doesn't matter which namespace style you use for local elements as long as the instances agree with the schema.

You can also reference global element/attribute declarations from within a complex type using the ref attribute as illustrated here:

...
   <!-- global definitions -->
   <xsd:attribute name="id" type="tns:AuthorId"/>
   <xsd:element name="name" type="xsd:string"/>
   <xsd:element name="author" type="tns:AuthorType"/>

   <xsd:complexType name="AuthorType">
      <!-- compositor goes here -->
      <xsd:sequence>
         <!-- reference to global element -->
         <xsd:element ref="tns:name"/>
         <xsd:element name="phone" type="tns:Phone"/>
      </xsd:sequence>
      <!-- reference to global attribute -->
      <xsd:attribute ref="tns:id"/>
   </xsd:complexType>
...

Since id and name are global elements, they always need to be qualified in instance documents. Using "ref" specifies that the global element can also be used within the context of AuthorType, but it doesn't change the fact that it needs to be qualified. The phone element is still defined locally, which means it may or may not need to be qualified in an instance depending on the form in use. So assuming elementFormDefault="unqualified", a valid instance would look like this:

<x:author xmlns:x="https://example.org/publishing"
   x:id="333-33-3333"
>
   <x:name>Aaron Skonnard</x:name>
   <phone>(801)390-4552</phone>
</x:author>

Now for a slightly more sophisticated example that uses nested complex types, other compositors, and repeating particles:

...
   <xsd:complexType name="AddressType">
      <xsd:all>
         <xsd:element name="street" type="xsd:string"/>
         <xsd:element name="city" type="xsd:string" minOccurs="0"/>
         <xsd:element name="state" type="tns:State" minOccurs="0"/>
         <xsd:element name="zip" type="tns:Zip"/>
      </xsd:all>
   </xsd:complexType>
   <xsd:complexType name="PublicationsListType">
      <xsd:choice maxOccurs="unbounded">
         <xsd:element name="book" type="xsd:string"/>
         <xsd:element name="article" type="xsd:string"/>
         <xsd:element name="whitepaper" type="xsd:string"/>
      </xsd:choice>
   </xsd:complexType>
   <xsd:complexType name="AuthorType">
      <xsd:sequence>
         <xsd:choice>
            <xsd:element name="name" type="xsd:string"/>
            <xsd:element name="fullName" type="xsd:string"/>
         </xsd:choice>
         <xsd:element name="address" type="tns:AddressType"/>
         <xsd:element name="phone" type="tns:Phone" 
            minOccurs="0" maxOccurs="unbounded"/>
         <xsd:element name="recentPublications"
            type="tns:PublicationsListType"/>      
      </xsd:sequence>
      <xsd:attribute name="id" type="tns:AuthorId"/>
   </xsd:complexType>
   <xsd:element name="author" type="tns:AuthorType"/>
...

In this example, AuthorType contains a sequence of another compositor, a choice, and is followed by three element declarations. Some of the elements are of other user-defined complex types (AddressType and PublicationsListType), which effectively define nested structures within the type. The choice means that either the name or fullName element is allowed to appear at that location. And finally, the all compositor in AddressType indicates that the order of the elements is insignificant.

Notice also that the phone element declaration specifies occurrence constraints using the minOccurs and maxOccurs attributes. Occurrence constraints may be applied to any particle in a complex type. The default value for each is 1, which means the given particle must appear exactly once at the specified location. Specifying minOccurs="0" makes the given particle optional and specifying maxOccurs="unbounded" allows the particle to repeat infinitely. You can also specify arbitrary limits like minOccurs="3" maxOccurs="77" if you like. Using occurrence constraints on a compositor applies to the entire group as a whole (notice the PublicationsListType which applies occurrence constrains to a choice). Here is an example of a valid instance of our new AuthorType:

<x:author xmlns:x="https://example.org/publishing"
   id="333-33-3333"
>
   <name>Aaron Skonnard</name>
   <address>
      <street>123 Main</street>
      <zip>84043</zip>
   </address>
   <phone>801-729-0924</phone>
   <phone>801-390-4555</phone>
   <phone>801-825-3925</phone>
   <recentPublications>
     <whitepaper>Web Service Abstractions</whitepaper>
     <book>Essential XML Quick Reference</book>
     <article>Web Services and DataSets</article>
     <article>Understanding SOAP</article>
     <book>Essential XML</book>
   </recentPublications>
</x:author>

By default complex types have closed content models. This means that only the specified particles are allowed to appear in an instance. XML Schema makes it possible, however, to define an open content model using what are known as wildcards. Using xsd:any within a complex type means that any element can appear at that location, effectively making it a placeholder for things that you cannot predict ahead of time. You can also use xsd:anyAttribute to define placeholders for attributes.

...
   <xsd:complexType name="AuthorType">
      <!-- compositor goes here -->
      <xsd:sequence>
         <xsd:element name="name" type="xsd:string"/>
         <xsd:element name="phone" type="tns:Phone"/>
         <xsd:any minOccurs="0" maxOccurs="unbounded"/>
      </xsd:sequence>
      <xsd:anyAttribute/>
   </xsd:complexType>
   <xsd:element name="author" type="tns:AuthorType"/>
...

The following illustrates a valid instance of the author element defined above:

<x:author xmlns:x="https://example.org/publishing"
   xmlns:aw="https://www.aw.com/legal/contracts"
   aw:auId="01-3424383"
>

   <!-- explicitly defined by the complexType -->
   <name>Aaron Skonnard</name>
   <phone>801-825-3925</phone>

   <!-- extra elements that replace wildcard -->
   <aw:contract xmlns:aw="https://www.aw.com/legal/contracts">
      <title>Essential Web Services Quick Reference</title>
      <deadline>2003-06-01</deadline>
   </aw:contract>
   ...
</x:author>

When using wildcards it's also possible to constrain the namespace the content actually comes from. Both xsd:any and xsd:anyAttribute come with an optional namespace attribute that may contain any of the values shown in Table 4. This makes it possible to be very specific about where the wildcard replacement content comes from.

Table 4. Wildcard Namespace Attribute

Attribute Value Allowed Elements
##any Any from any namespace
##other Any in a namespace other than the targetNamespace
##targetNamespace Any in the targetNamespace
##local Any unqualified (no namespace)
list of ns strings Any from listed namespaces

With wildcards you can also specify how the schema processor should treat the wildcard content during validation. Both xsd:any and xsd:anyAttribute come with a processContents attribute that can specify one of three values: lax, strict, and skip. This value tells the processor whether it should perform schema validation on the content in place of the wildcard. Strict indicates that it must perform validation on the content. Lax indicates that the processor should perform validation if schema information is available. And skip indicates that it must not perform schema validation.

Let's look at an example that uses these attributes. The schema for SOAP 1.1 actually leverages wildcards and both of these attributes to define the structure of the soap:Header and soap:Body elements:

<xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema"        
  xmlns:tns="https://schemas.xmlsoap.org/soap/envelope/"         
  targetNamespace="https://schemas.xmlsoap.org/soap/envelope/" 
>
  ...
  <xs:element name="Header" type="tns:Header" />
  <xs:complexType name="Header" >
    <xs:sequence>
      <xs:any namespace="##other" minOccurs="0" 
       maxOccurs="unbounded" processContents="lax" />
    </xs:sequence>
    <xs:anyAttribute namespace="##other" 
     processContents="lax" />
  </xs:complexType>
  
  <xs:element name="Body" type="tns:Body" />
  <xs:complexType name="Body" >
    <xs:sequence>
      <xs:any namespace="##any" minOccurs="0" 
       maxOccurs="unbounded" processContents="lax" />
    </xs:sequence>
    <xs:anyAttribute namespace="##any" 
     processContents="lax" />
  </xs:complexType>
  ...
</xs:schema>

According to the schema, soap:Header may contain zero or more elements and any number of attributes from any namespace other than the targetNamespace while soap:Body may contain zero or more elements and any number of attributes from any namespace whatsoever. In both cases, validation should only be performed if schema information is available at runtime (e.g., lax validation). Since there's no way to predict what's going to be place in the soap:Header or soap:Body elements ahead of time, wildcards provide a way to define a flexible, open framework for XML messaging.

Locating and Managing Schemas

One of the questions that always comes up at this point is how does an XML Schema processor locate the required schema definitions for a given instance document at runtime? XML Schema processors key off of the instance document's namespaces to locate the corresponding schemas, but the XML Schema specification doesn't specify exactly how processors should do this. Most processors allow you to load a schema cache ahead of time that contains all of the schemas that you're going to need. Then at runtime you simply point the processor to the schema cache so it can efficiently look up the schemas it needs for a particular instance.

XML Schema also defines a way to provide a schema location hint in an instance document. This is done through the xsi:schemaLocation attribute as illustrated here:

<x:author xmlns:x="https://example.org/publishing"
  xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="https://example.org/publishing pubs.xsd"
>
...

The xsi:schemaLocation attribute allows you to provide a space delimited list of namespace name and URI location pairs that indicate where to look for a particular schema file. But again, this is only a hint and processors might not actually look there if a more efficient retrieval mechanism is available.

Conclusion

XML Schema provides an expressive type system for XML capable of providing many powerful services. We've covered the basics of XML Schema definitions including simple and complex type definitions. Simple type definitions allow you to define custom value spaces for text-only elements and attributes. Complex type definitions, on the other hand, allow you to arrange simple types into structures.

XML Schema is actually capable of much more than what we had space to discuss here. For example, complex type definitions support derivation by extension and restriction, allowing you to define complex type hierarchies in a way that maps nicely to OO class hierarchies. With complex type hierarchies in place, it's also possible to leverage substitution techniques in instance documents. XML Schema also makes it possible to factor XML Schema definitions into multiple files and namespaces, which can then be included and/or imported to increase reuse and simplify maintenance. These more advanced topics, however, are better left for a future piece on XML Schema design.

For more information on XML Schema, check out the electronic version of the Essential XML Quick Reference (freely available online)—the XML Schema chapters contain simplified descriptions and examples of each construct and datatype.

References

XML Schema Part 0: Primer

XML Schema Part 1: Structures

XML Schema Part 2: Datatypes

Essential XML Quick Reference