Working with XML Schemas in InfoPath

In this article

Introduction

Using externally-authored XML Schemas

Unsupported XSD Constructs

XSD Constructs with Reduced Functionality

XSD Constructs with Special Meaning in InfoPath

Debugging Common XSD Errors

How to Edit or Author an XSD for InfoPath

Conclusion

Introduction

A form template that you create with Microsoft Office InfoPath 2007 uses an XML Schema (XSD) to perform structural and data validation on the XML that is input, edited, and output from an InfoPath form. Every form template created in InfoPath design mode contains at least one XSD schema file (.xsd) that is used for validation at run time.

Bb251017.vs_note(en-us,office.12).gif  Note
The information contained in this article applies to form templates designed for use in the Office InfoPath 2007 client. Browser-compatible form templates have stricter XSD Schema requirements. For more information, see the documentation about XML Schemas in browser-compatible form templates available on MSDN.

Using externally-authored XML Schemas

To load an XSD schema file that was authored outside of InfoPath, follow these steps:

  1. On the File menu, click Design a Form Template.
  2. In the Design a Form Template dialog box, click XML or Schema, and click OK.
Bb251017.vs_note(en-us,office.12).gif  Note
You can specify any XSD file you want to use by manually editing your template files outside of design mode. The XSD file you specify, however, must be one that Microsoft XML Core Services (MSXML) 5.0 considers valid.

For additional information about XML Schema standards and implementation, see the following resources:

Unsupported XSD Constructs

The following section describes XSD constructs that InfoPath cannot handle at run time. Avoid these constructs when creating a form template in InfoPath design mode.

ENTITY and ENTITIES Types

The ENTITY and ENTITIES types require a document type definition (DTD) for validation, which InfoPath does not support. InfoPath does not allow you to design a form template against such a schema and displays an error message that recommends changing the ENTITY type to the NCName type from which ENTITY derives.

Bb251017.vs_note(en-us,office.12).gif  Note
If you manually author an InfoPath form template outside of design mode, and it uses an XSD that includes ENTITY and ENTITIES types, the form template may work at run time if the Template.xml file contains the required DTD for these types.

Required xsd:any Element

An occurrence of an xsd:any wildcard element, that is, an occurrence of an xsd:any element with a minOccurs attribute value greater than zero ("required any"), prevents InfoPath from deterministically creating a valid instance for this schema fragment. InfoPath must be able to create a valid instance when generating a form that uses this schema fragment. As part of running the Data Source Wizard, schemas with required xsd:any elements require you to choose which schema element you want to use in place of the required xsd:any element.

Elements with an Abstract Complex Type

InfoPath design mode supports designing a form template against schemas that use abstract complex types. For example, if an element named shippingAddress has an abstract complex type named address that has two derivations, USAddress and CanadianAddress, InfoPath treats any instance of shippingAddress as a choice between shippingAddress with type USAddress and shippingAddress with type CanadianAddress. In this example, if the provided schemas contain no types that derive from address, then InfoPath requests an additional schema that fulfills this requirement.

XSD Constructs with Reduced Functionality

The following section describes XSD constructs that have reduced functionality when used to create a form template in InfoPath design mode.

Substitution Groups

All members of the substitution group appear in the Data Source task pane. InfoPath represents the substitution possibilities as a choice of all the substitution groups (including the defining element, if it is not abstract). If there are no substitution groups for an abstract element, InfoPath prompts you to provide a schema that contains at least one element that is a substitution group.

Unbounded Choice Elements

The following schema fragment shows an unbounded choice element:

  <xsd:choice maxOccurs="unbounded">
    <xsd:element name="my_element_1"/>
    <xsd:element name="my_element_2"/>
</xsd:choice>

InfoPath displays repeating choice elements as repeating choices in the Data Source task pane. There is a Repeating Choice Group control that you can use to represent the heterogeneous list defined by the repeating choice element in the XSD.

Repeating Sequence

The following schema fragment shows a repeating sequence:

  <xsd:sequence maxOccurs="unbounded">
    <xsd:element name="my_element_1"/>
    <xsd:element name="my_element_2" minOccurs="0"/>
</xsd:sequence>

As long as the repeating sequence contains a required element, InfoPath loads the XSD without modifying it and allows you to bind repeating section controls to the repeating sequence.

Choice of Model Groups

The following schema fragment shows the choice element containing several model groups:

  <xsd:choice>
    <xsd:element name="my_element_1"/>
    <xsd:sequence>
        <xsd:element name="my_element_2"/>
        <xsd:element name="my_element_3"/>
    </xsd:sequence>
</xsd:choice>

InfoPath design mode supports such XSD constructs without requiring any modification by the form designer. While InfoPath does not modify the meaning of the schema, it simplifies the choice construct above into an equivalent collapsed single choice in the Data Source task pane.

Optional Sibling with Same Qualified Name

The following schema fragment shows an optional sibling with same qualified name (QName):

  <xsd:sequence>
    <xsd:element name="my_element_1" minOccurs="0"/>
    <xsd:element name="my_element_2"/>
            <xsd:element name="my_element_1"/>
</xsd:sequence>

XPath expressions for these nodes can be complex because every potential XML instance must be accounted for in the InfoPath design mode. InfoPath does not expose parts of the schema for which it may have difficulty creating correct XPath bindings. Warnings appear regarding the portions of the schema that are ignored.

XSD Constructs with Special Meaning in InfoPath

The following sections describe XSD constructs that have special meaning when used in creating a form template in design mode. These sections describe how you can use the constructs in your schema to enable certain behaviors.

Adding New Element Fields and Groups with the Data Source Task Pane

You can construct your schema so that you can use the Data Source task pane to add new element fields and groups to an element at design time. To do so, you declare an element in your schema with an optional, unbounded xsd:any element that specifies the namespace attribute with the ##any wildcard. Then, in design mode, you can use the Data Source task pane to add new element fields and groups to that element. For example, you could add new content to the following element:

  <xsd:element name="open">
    <xsd:complexType>
         <xsd:sequence>
             <xsd:any minOccurs="0" maxOccurs="unbounded" namespace="##any"processContents="lax"/>
         </xsd:sequence>
    </xsd:complexType>
</xsd:element>

Adding New Attribute Fields with the Data Source Task Pane

Similarly to the element case, you can declare an attribute with an anyAttribute element that has the namespace attribute specified as the ##any wildcard. At design time, you can use the Data Source task pane to add new content to that schema attribute.

  <xsd:element name="open">
    <xsd:complexType>
        <xsd:anyAttribute namespace="##any" processContents="lax"/>
    </xsd:complexType>
</xsd:element>

Storing XML Signatures in the Data Source

To enable users to digitally sign a form at run time, the schema of the data source must declare an element named signature for storing the XML Signatures (digital signature) information that is created when a user signs the form. You make this declaration by using the xsd:any element with the namespace attribute specified as the XML Signatures namespace with a wildcard character, as follows: "http://www.w3c.org/2000/09/xmldsig#"

  <xsd:element name="signature">
    <xsd:complexType>
        <xsd:sequence>
            <xsd:any namespace="http://www.w3c.org/2000/09/xmldsig#" 
             processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
        <xsd:sequence>
    </xsd:complexType>
</xsd:element>

Binding a Field to a Rich Text Box Control

Rich Text Box controls in InfoPath generate generic XHTML; consequently, your schema must specify that any number of text and XHTML nodes is valid in the XML of the form instance. You can achieve this specification with the following XSD construct:

  <xsd:element name="xhtml">
    <xsd:complexType mixed="true">
        <xsd:sequence>
            <xsd:any minOccurs="0" maxOccurs="unbounded" namespace="http://www.w3.org/1999/xhtml" processContents="lax"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>

Bb251017.vs_note(en-us,office.12).gif  Note
InfoPath never modifies the content of the schema file (.xsd), but it may logically infer a subset of it for design purposes. The schema file is always untouched within the form template at both design time and run time.

Debugging Common XSD Errors

If you load externally authored XSD files to create form templates in InfoPath design mode, you may receive either of two types of error messages: MSXML error messages or InfoPath error messages. MSXML error messages appear in the Details section of an InfoPath error message dialog box, and they always begin with a reference to the name or path of the schema file that is raising the error. Some valid XSD schema constructs are not supported by InfoPath; these are discussed in the Unsupported XSD Constructs section. This section describes some common errors that can cause schemas to fail to load successfully in InfoPath.

The XSD Namespace Declaration

Similar to all W3C standards, XML Schemas (XSD) went through a lengthy review process on its way to becoming a recommendation. There were many working drafts, and consequently, many XSD files were written based on these evolving standards. During this process, Microsoft created a proprietary schema language called XML-Data Reduced (XDR) that was included with MSXML 3.0. With the release of MSXML 4.0 and later, Microsoft XML Core Services supports the full recommendation of XSD. Many programs for creating schemas did not wait for XSD to become a full recommendation. Older versions of these programs may produce outdated XSD files that the MSXML 5.0 infrastructure, on which InfoPath depends, does not support.

To ensure that an XSD file supports the full XSD recommendation, it should contain the following XML namespace declaration in the <schema> tag:

  xmlns:xsd="http://www.w3.org/2001/XMLSchema"

Similar to all XML namespace declarations, the XML prefix (in this case 'xsd') can be any valid prefix string. Some common prefixes you may see in practice are 'xsd', 'xs', and '' (no prefix). MSXML 5.0 usually reports an error about the root not being properly defined if this namespace declaration is missing.

Importing and Including Schemas

XSD schemas are extensible and can import and include other schemas. Generally, you should import a schema if the schema specified in the targetNamespace attribute differs from the current schema. You should include it if the schema specified in the targetNamespace attribute is the same as the current schema.

The semantics for importing and including schemas are as follows:

  <xsd:import namespace = "[anyURI]" schemaLocation = "[anyURI]"/>
<xsd:include schemaLocation = "[anyURI]"/>

If the schemaLocation attribute is missing (as happens with some converters), then MSXML 5.0 raises an error because it cannot find the file. If you get this error, also check to make sure that the resource or location specified in the schemaLocation attribute is accessible by users of the form template. Obviously, errors occur if the schemaLocation attribute references a server or directory that is down or nonexistent or if users do not have access permissions. Also, be sure to examine all imported and included schemas to make sure they are valid.

Bb251017.vs_note(en-us,office.12).gif  Note
Errors caused by problems with the schemaLocation attribute are an issue only when InfoPath first imports the schemas; that is, when you first start designing a form based on an existing schema. After that, InfoPath works with cached versions of the schema files that are stored in the form template.

An empty namespace attribute is allowed when importing a schema, if that schema does not specify a targetNamespace attribute. In general, the namespace on the import must match the targetNamespace specified in the schema that you import.

Nondeterministic Schemas

The MSXML 5.0 infrastructure that InfoPath depends upon can reliably detect and raise errors to alert you to nondeterministic schemas, but the resultant error message does not provide a line number to tell you which part of the schema is raising the error. This section discusses why it is important for XSD schema files to be deterministic and what it means to be nondeterministic. It also shows some common errors to avoid.

XSD schemas exist for the purpose of validating XML data structure and type semantics. To accomplish this task, the validating system (in this case, MSXML 5.0) must map XML nodes to XSD declarations. Without this mapping, the validating system cannot accomplish its task. If a mapping can be guaranteed, then the schema is deterministic. If there is a single XML instance that makes this mapping impossible, then the schema is nondeterministic.

The following example schema is nondeterministic:

  <xsd:element name="file_Information">
    <xsd:complexType>
        <xsd:sequence>
            <xsd:element name="file_name"/>
            <xsd:choice>
                <xsd:element name="file_path"/>
                <xsd:sequence>
                    <xsd:element name="file_path" minOccurs="0"/>
                    <xsd:element name="URI"/>
                </xsd:sequence>
            </xsd:choice>
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>

To illustrate why this XSD segment is nondeterministic, assume you have the following XML fragment that you want to validate with this schema:

  <file_Information>
    <file_name>my_Schema.xsd</file_name>
    <file_path>c:\xsd</file_path>
</file_Information>

In this XML fragment, it is not clear whether the <file_path> element is the required node from the first part of the choice declaration or the optional one from the second part of the choice declaration. This distinction is important for the following reasons:

  1. If the XML fragment is validated against the first part of the choice declaration, then the XML is valid against the schema.
  2. If the XML fragment is validated against the second part of the choice declaration, then the schema is not valid, because the required <URI> node is missing.

Some XSD validation systems err toward validating against this schema because there is a valid path. MSXML 5.0 is stricter and raises an error stating that the schema is nondeterministic.

What follows are a few more examples of nondeterministic schemas. The first deals with optional elements. These cases often arise from XDR to XSD converters because of differences in the default cardinalities in the two schema languages. The first case to consider is optional elements declared with xsd:choice and xsd:sequence elements. Optional elements declared in an xsd:sequence element usually validate properly, as long as you do not have elements with the same name more than once, with only optional elements in between. For example:

  <xsd:element name="container">
    <xsd:complexType>
        <xsd:sequence>
            <xsd:element ref="aNode" />
            <xsd:element ref="anotherNode" minOccurs="0"/>
            <xsd:element ref="aNode" />
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>

To understand why this schema segment is nondeterministic, assume you have the following invalid XML fragment:

  <container>
    <aNode/>
    <aNode/>
    <anotherNode/>
</container>

Looking at this fragment, you can see why it is invalid: there are two <aNode> elements before the <anotherNode> element, when only one is allowed.

Now assume that you have the following XML instance to validate:

  <container>
    <aNode/>
    <aNode/>
</container>

The challenge is to determine whether this instance is valid. Do you have two <aNode> elements where only one is allowed, or do you have an <aNode> element where it is allowed and another where it is allowed? The schema is nondeterministic because there is no way to know.

Similarly, optional elements declared in an xsd:choice element are usually problematic. In the following simplified example, there is no way to determine whether the choice occurred once with the optional element not being there or whether it never occurred at all.

  <xsd:choice>
    <xsd:element name="node" minOccurs="0"/>
</xsd:choice>

The final questionable practice is using an xsd:any element without a namespace definition, as in <xsd:any namespace="##other"/>, after an xsd:sequence element. This construct is especially troublesome when it follows an optional element. If you revisit the previous example and change just the last node to an xsd:any element, you can see that all the previous arguments about nondeterminism still apply, as follows:

  <xsd:element name="container">
    <xsd:complexType>
        <xsd:sequence>
            <xsd:element ref="aNode" />
            <xsd:element ref="anotherNode" minOccurs="0"/>
            <xsd:any />    
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>

Illegal Enumeration Values

XSD schemas typically do not perform any type validation until you validate an actual instance document. An exception to this is when you have an enumeration in your schema. In this case, the schema validates the enumeration values against the enumeration types to ensure they are proper node values. Here are two examples:

  <xsd:simpleType name="showTimes">
    <xsd:restriction base="xsd:time">
        <xsd:enumeration value="18:30:00"/>
        <xsd:enumeration value="20:45:00"/>
        <xsd:enumeration value="eleven o'clock"/>
    </xsd:restriction>
</xsd:simpleType>

This schema is invalid because "eleven o'clock" is not a valid value for an element of type xsd:time.

The following is a more complex example:

  <xsd:simpleType name="concession">
    <xsd:restriction base="xsd:NMTOKEN">
        <xsd:enumeration value="GummyBears"/>
        <xsd:enumeration value="SnowCaps"/>
        <xsd:enumeration value="M&Ms"/>
    </xsd:restriction>
</xsd:simpleType>

To understand why this example is invalid, you must understand how the type xsd:NMTOKEN is defined. The W3C datatypes specification defines the NMTOKEN type as follows: "An NMTOKEN (name token) is any mixture of name characters."

If you investigate further, you find that '&' is not a valid name character, and therefore "M&Ms" does not validate as an NMTOKEN type.

Empty Sequence or Choice Elements

MSXML 5.0 sometimes raises errors about schema declarations that contain empty xsd:choice or xsd:sequence elements, as shown in the following example:

  <xsd:element name="emptyContainer">
    <xsd:complexType>
        <xsd:choice />
    </xsd:complexType>
</xsd:element>

Removing the empty <xsd:choice /> tag should fix this problem.

Regular Expressions

MSXML 5.0 can have problems validating regular expression patterns on load. Regular expressions can be complicated, and you should be careful when using them. Every XSD parser seems to have flexible regular expression languages; that is, they implement the official XSD regular expression language plus elements from other regular expression languages. If InfoPath design mode has a problem parsing a regular expression, then the sample data InfoPath generates might be invalid or might not be generated at all. This is acceptable at design time, because InfoPath uses only sample data for formatting. However, if you use a regular expression that MSXML 5.0 does not support, then InfoPath cannot validate a value against it when a user is filling out a form in edit mode. XML Schema Part 0: Primer Second Edition describes what is supported in XSD regular expressions. For more information about XSD regular expressions and Unicode level 1 regular expressions, see the Unicode Home Page.

targetNamespace Attribute Issues

XSD is interesting in that, by default, the targetNamespace attribute refers to only the top-level declarations, although you can set attributeFormDefault=qualified and elementFormDefault=qualified to override this default behavior. As an example, assume you have the following XSD:

  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://ns" >
    <xsd:element name="root">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="local"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>

Your XML instance document looks like the following:

  <ns:root xmlns:ns="http://ns">
    <local/>
</ns:root>

Local definitions do not require the target namespace because qualification is turned off by default. However, if you change your local definition to be global, then your reference must be qualified with the namespace prefix. For example, the following schema is invalid:

  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://ns" >
    <xsd:element name="root">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element ref="global"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
&lt;xsd:element name="global"/&gt;

</xsd:schema>

This schema is invalid because "global" is in the namespace "http://ns". The simple ref="global" is not recognized because the default namespace is not "http://ns". To fix this, you must add a prefix for the target namespace and use that for all global references and type uses. The corrected schema looks like the following:

  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
    xmlns:ns="http://ns" targetNamespace="http://ns" >
    <xsd:element name="root">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element ref="ns:global"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
&lt;xsd:element name="global"/&gt;

</xsd:schema>

If your schema has the targetNamespace attribute specified, ensure that all global references are qualified with the correct namespace prefix.

XML Processing Instruction Encoding (Unicode vs. ANSII)

XML supports only Unicode character sets, so you may lose information if you save files that use ANSII characters. However, saving files as UTF-16 may be excessive for your particular use. To reduce the implementation cost of an XML reader, the XML author must state which encoding they are using in the top-level XML processing instruction. You may recognize something that looks like this:

  xml version="1.0" encoding="UTF-8"

This processing instruction tag specifies that the encoding of the file is UTF-8. You must ensure that the file encoding is the same as the encoding stated in the processing instruction tag. You can determine the encoding by looking at the bytes of the file and looking for the Unicode byte order marks, but there is an easier way. If you are having problems opening an XSD schema, specify the encoding as "UTF-8", open it in a text editor such as Notepad, and then save the file using UTF-8 encoding (Notepad provides the Encoding drop-down list in the Save As dialog box). If you still have problems opening the file, it is not an encoding issue.

maxOccurs Attribute Inside the xsd:all Element

Due to the way nondeterminism is defined in the XML Schema recommendation, the only legal value for the maxOccurs attribute of an xsd:element element inside of an xsd:all element is 1. For example, the following is valid:

  <xsd:all>
    <xsd:element name="x" minOccurs="0"/>
    <xsd:element name="docs" minOccurs="0"/>
</xsd:all>

However, this example is not valid:

  <xsd:all>    
    <xsd:element name="x" minOccurs="0" maxOccurs="unbounded"/>
    <xsd:element name="docs" minOccurs="0" maxOccurs="unbounded"/>
</xsd:all>

This example is invalid because the validation system cannot determine whether two occurrences of <x/> map to the single declaration or to the declaration and another illegal definition. Along the same lines, you cannot have two elements of the same name in an <xsd:all> tag.

This example is also interesting because it allows you to have any number of <x/> and <docs/> nodes inside a containing element in any order. Although this construct is illegal, there is a workaround. By using the xsd:choice element, you can achieve the same result, as demonstrated in the following example:

  <xsd:choice minOccurs="0" maxOccurs="unbounded">
    <xsd:element name="x" />
    <xsd:element name="docs" />
</xsd:choice>

How to Edit or Author an XSD for InfoPath

The two examples in this section demonstrate how to edit or author a schema to produce specific results in InfoPath.

Allowing User-defined Elements to Be Inserted in the Data Source Task Pane

To allow user-defined elements to appear under a parent element in the Data Source task pane, you must insert an xsd:any element under the parent element. To allow user-defined elements to be inserted inside <your_node_name>, the XSD declaration should look something like the following:

  <xsd:element name="your_node_name">
    <xsd:complexType>
        <xsd:sequence>
            <xsd:any namespace="##any | ##other" 
                minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>

If you want to allow user-defined attributes as well, then you need to add <xsd:anyAttribute namespace="##any | ##other"/> to the element declaration.

Allowing Rich Text Elements to be Bound in InfoPath Design and Edit Modes

If you want to declare an element that can be bound to a Rich Text Box control, then it should have the following form, which includes the xsd:any element with a namespace attribute set to "http://www.w3.org/1999/xhtml".

  <xsd:element name="your_node_name">
    <xsd:complexType mixed="true">
        <xsd:sequence>
            <xsd:any namespace="http://www.w3.org/1999/xhtml" 
                minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>

Conclusion

By taking advantage of InfoPath support for designing XML form solutions based on externally authored XML Schema (.xsd) files, you can create a form template that works with an industry-standard schema or custom schema created by your company or organization. With the information in this article, you can create custom XSD schema files that are compatible with InfoPath, and you can troubleshoot common issues that you may encounter when loading externally authored XSD files into the InfoPath design environment.