XML Options in Microsoft SQL Server 2005

 

Microsoft Corporation

January 2005

Applies to:
   Microsoft SQL Server 2005
   Microsoft Visual Studio 2005
   Microsoft .NET Framework 2.0
   XML and relational data

Summary: Three options for handling XML data in the Visual Studio 2005/SQL Server 2005 environment are discussed, including usage scenarios and guidelines to help you choose between them. (34 printed pages)

Contents

Introduction to System.Xml, SQLXML, and the XML Data Type
XML Usage Scenarios
XML Classes in the .NET Framework
SQLXML
Server-Side Support for Relational/XML Integration (FOR XML/OPENXML)
XML Data Type in SQL Server 2005
Comparison of Different Approaches
Conclusion
For More Information

Introduction to System.Xml, SQLXML, and the XML Data Type

This section contains a brief history of the XML support provided in Microsoft SQL Server 2000 and an overview of the three options provided in the Microsoft Visual Studio 2005/SQL Server 2005 environment for manipulating XML and relational data. The options are: 1) the classes in the System.Xml namespace, 2) the SQLXML classes, and 3) the XML data type provided in SQL Server 2005.

XML support was added to Microsoft SQL Server 2000 to provide the user with features such as:

  • Exposing relational data as XML
  • Shredding XML documents into row sets
  • Creating XML views by mapping XML schemas to database schemas using XML-Data Reduced (XDR) schemas
  • Creating queries on XML views using XPath
  • Exposing data in SQL Server through HTTP

This support was further enhanced in subsequent SQLXML Web releases. Enhancements include:

  • Updategrams and XML Bulkload to persist changes made to XML views
  • Support for annotated XML Schema definition language (XSD) to describe mappings (support for XDR still exists but its use is not recommended)
  • Client-side FOR XML
  • SQLXML Managed Classes
  • Support for Web services

The Microsoft .NET Framework 1.0 has extensive support for reading, writing, and processing XML documents. This support has been further enhanced in the .NET Framework 2.0 to improve the performance and usability of various XML classes. New classes provided by the .NET Framework in the System.Xml namespace can be used to map XML data to and from relational data.

SQLXML is a set of libraries and technologies that enables the seamless integration of relational data in SQL Server databases with XML. It is a middle-tier component and does not include the server-side XML support provided by FOR XML and OPENXML. SQLXML provides a schema-driven mapping approach to generating XML from relational source data and loading XML that represents relational information back into relational tables. SQLXML classes provide XML support to SQL Server 2000 databases and above.

Microsoft SQL Server 2005 adds built-in support for XML in the form of the XML data type. XML data can be stored natively in XML data type columns. Additionally, an XML data type column can be further constrained by associating an XML schema collection with this column. XML values stored in XML data type columns can be manipulated with the help of XQuery and an XML Data Modification Language (DML). Indexes can be built on XML data to improve query performance. Furthermore, FOR XML and OPENXML have been enhanced to support the new XML data type.

New features introduced in SQL Server 2005 to store and process XML data coupled with XML features offered in earlier versions of SQL Server provide developers with several ways in which XML data can be stored and processed in XML applications. As there are multiple ways in which XML applications can be built using the alternative approaches provided by SQL Server 2005, it is important to understand the scenarios, trade-offs, and synergies of the different technologies in order to make the right choice. This article provides guidance in selecting the appropriate alternative for developing XML applications with SQL Server 2005.

XML Usage Scenarios

The areas in which XML is used can be broadly classified as follows:

  • XML for Business Integration: Business integration*,* also known as enterprise application integration (EAI), includes A2A (application-to-application), B2B (business-to-business), and B2C (business-to-consumer) applications. Applications working on disparate systems communicate with each other using XML-based messages.
  • XML for Content Management: Content management systems based on XML enable users to store, retrieve, modify, and query XML documents. These systems store XML documents in their native format.

Next a few scenarios that fall under the previously mentioned categories are discussed. The solutions for these scenarios are presented in subsequent sections with detailed treatments of the different XML options available in the SQL Server 2005/Visual Studio 2005 environment.

Scenario 1: Insurance Claims

An auto insurance company providing services on the Internet allows its insurance buyers or agents to enter insurance claims through the company's Web site. These claims will be processed by the centralized system located at the corporate head quarters. Once the processing is complete, the system is required to store specific information related to the claim in a specified XML format. Exact copies of these XML documents must be maintained in the system for legal purposes. This scenario shows the use of XML for content management.

Scenario 2: Data Exchange Between Automobile Manufacturer and Parts Suppliers I

An automobile manufacturer interacts with multiple parts suppliers to procure the parts required for the company. Currently the manufacturer receives invoices from suppliers. The data corresponding to these invoices is then manually fed to a legacy invoice processing system. The invoice processing system stores the data in relational format. The manufacturer now wants to automate the invoice data feeding process to the legacy invoice processing system. This scenario is an example of the use of XML for business integration.

Scenario 3: Data Exchange Between Automobile Manufacturer and Parts Suppliers II

This scenario involves an automobile manufacturer that interacts with multiple parts suppliers as mentioned in the previous scenario. The manufacturer's present system does not provide the facility for the suppliers to check the status of an invoice or to get a copy of payment instructions from the manufacturer. Currently this information is available to the suppliers only over the phone. The automobile manufacturer needs to be able to expose this information over the Web so that the suppliers can perform these tasks automatically. This scenario demonstrates the use of XML for business integration.

Scenario 4: Content Management System

A company provides information in the fields of medicine, law, and technology to its customers through various channels including Web, books, and CD-ROM. The company wants to build a content management system to help it deliver high quality content to its customers in less time. This scenario illustrates the use of XML for content management.

Scenario 5: Customer Survey

A company provides air ticket booking services on the Internet and conducts surveys for each season to identify the most sought after destination by its customers for the current season. The questionnaire used for each season is different and the questionnaire may change in the future. The company will analyze the information, and the analysis results are used to design packaged travel deals that will satisfy the needs of the maximum number of customers. This scenario can be classified as the use of XML for content management.

XML Classes in the .NET Framework

The Microsoft .NET Framework has excellent support for developing XML-based products. In the .NET Framework, the core classes such as XmlTextReader, XmlTextWriter, XmlDocument, XmlValidatingReader, etc., are available in the System.Xml namespace, which is the root namespace for all XML classes. These core classes enable users to read, write, and validate XML documents using both stream-based and DOM-based (Document Object Model-based) navigation/access models. The System.Xml namespace contains the following subnamespaces:

  • System.Xml.Schema – Contains classes to deal with XML Schemas definition language (XSD) schemas.
  • System.Xml.Serialization – Provides classes for the serialization of objects into XML format documents or streams.
  • System.Xml.XPath – Includes classes for navigating through XML documents using XPath expressions.
  • System.Xml.Xsl – Has classes for performing Extensible Stylesheet Transformations (XSLT).

Enhancements in the System.Xml Namespace

In Visual Studio 2005, enhancements to some classes can be used to perform various operations on XML data, including modifying XML documents, applying XSL transformations, etc.

Some of the enhancements in Visual Studio 2005 with respect to XML classes in the System.Xml namespace are as follows:

  • XML Schema validation support has been added to the XmlDocument class.
  • The XmlReader and XmlWriter classes have been enhanced to provide significant performance improvements and to support XML schema types. Also, static Create methods have been added to provide an easier way to create instances of XmlReader and XmlWriter using an XmlReaderSettings and an XmlWriterSettings class to configure the type created.

For more information on enhancements in System.Xml see the What's New in System.Xml for Visual Studio 2005 and the .NET Framework 2.0 Release white paper.

The classes in the System.Xml namespace can be used to implement custom XML parsing, manipulation, and storage logic. Leveraging the common language runtime (CLR) hosting capability of SQL Server 2005 and using XML classes in Visual Studio 2005, XML processing can be performed either in the middle tier or in the database tier.

Use of the .NET Framework XML classes involves storing XML documents as a column of type [n]varchar(max) or varbinary(max) in a database, or as files on the file system, and processing these documents in the middle tier or in the database using classes in the System.Xml namespace. XML classes in the .NET Framework can also be used to operate on data stored in XML data type.

The .NET Framework XML classes are a suitable choice if:

  • You want to get access to all of the .NET Framework XML functionality, such as streaming parsers, document type definition (DTD) and XSD validation, XSLT processing, etc.
  • You want to use SQL Server merely as a data store for XML documents and do not require fine-grained access inside the database.
  • You do most or all processing on the XML document using the .NET Framework XML classes and perform updates at the document level.

You can use [n]varchar(max), varbinary(max), or the XML data type to store XML.

If you use [n]varchar(max) or varbinary(max) you get the following benefits:

  • You preserve textual fidelity for XML documents with an exact copy of the document including white space and formatting.
  • Your application gets the fastest possible performance for insert and retrieval operations on the document as a whole.

The benefits of using the XML data type are given in a later section.

Performing XML Processing in the Middle Tier

XML processing can be performed in the middle tier using the various XML classes provided by the .NET Framework. As mentioned earlier, when this approach is adopted, XML documents can be stored as a column of type [n]varchar(max) or XML in a database or as files on the file system. In the middle tier, these documents are retrieved from the database and processed according to the requirements of the user as follows:

  • If you need to read an XML document, use an XmlReader created via the XmlReader.Create() method to load the document obtained from the database. Navigate through the document using Read(). The XmlReader class provides fastest possible read-only, forward-only, non-cached access to the XML document.
  • If you need write access to an XML document and require full navigational access to XML data, use the XmlDocument class to load and access the XML document. XmlDocument is an implementation of the Document Object Model (DOM) in the .NET Framework, which is an in-memory tree representation of an XML document that enables navigation and editing of this document.
  • If you need to validate an XML document based on DTD/XSD or to obtain XSD information at runtime, use the XmlReader class. Create methods with either XsdValidation or DTDValidation set to true on the XmlReaderSettings class. A ValidationEventHandle() event handler can be also set to handle validation errors encountered during reading.
  • If you need to apply an XSL transformation to an XML document, use the XPathDocument class to load the XML document and apply the transformation using the XslCommand class. The XPathDocument class provides a fast, high-performance cache for XML document processing using XSLT.
  • If you need to query an XML document using XPath expressions, load the XML document using either XPathDocument (read-only) or XmlDocument (read/write). Create an instance of an XPathNavigator using the CreateNavigator() method, and pass the required XPath expression as an argument to the Select() method of XPathNavigator.

Performing XML Processing in the Database

The integration of SQL Server 2005 with CLR enables developers to use the XML classes provided by the .NET Framework to perform processing in the database tier as well. This integration offers the capability of writing stored procedures, functions, triggers, and user-defined types in any language supported by the .NET Framework. Furthermore, CLR hosting also provides access to the complete .NET Framework base class library. As a result, the different XML processing options described in the previous section can also be performed in the database.

The advantages of using CLR integration are as follows:

  • It provides the ability to use object-oriented languages like C# and Visual Basic .NET for authoring database objects in managed code.
  • Managed database objects are more secure than the extended stored procedures available in earlier versions of SQL Server.
  • It provides the ability to define user-defined data types and user-defined aggregates.
  • Under certain conditions, compiled managed database objects deliver improved performance over Transact-SQL.

In SQL Server 2005, database developers have two options for writing stored procedures, triggers, and user-defined functions. They are Transact-SQL and any language available in the .NET Framework, such as C# or Visual Basic .NET. The choice of the language depends on the type of operations that are performed on the data. Transact-SQL is best suited when the code will perform mostly data access with little or no procedural logic. Managed classes are best suited for mathematically intensive functions and procedures such as string handling, date operations, system resource access, file access, or image processing.

The steps involved in using the XML classes of the .NET Framework in the database tier are as follows:

  • Developing a managed assembly. Using any language available in the .NET Framework, implement processing functionality as an assembly and package this assembly as a DLL. Assemblies can also refer to other assemblies.
  • Registering an assembly and granting permissions. Assemblies developed using the .NET Framework can be registered with SQL Server by using the CREATE ASSEMBLY T-SQL statement. While registering an assembly, it is also possible to specify the code access permissions granted to the assembly. An assembly can be unregistered using the DROP ASSEMBLY T-SQL statement.
  • Exposing managed types to T-SQL. Processing functionality offered by an assembly can be exposed to T-SQL via scalar-valued user-defined functions, table-valued user-defined functions, user-defined procedures (UDPs), or user-defined triggers. Scalar user-defined functions can be used in any scalar expression. Table-valued user-defined functions can be used in any FROM clause. UDPs can be invoked in an EXEC statement.

Analyzing Scenarios

Insurance claims contain data-centric information such as Claim ID, Policy Number, Claim Settlement Data, etc., and document-centric information such as a description of the accident damage. XML documents are excellent at aggregating data-centric and document-centric information. The key requirement in the present scenario (see the Scenario 1: Insurance Claims section) is the need to maintain an exact copy of insurance claims in XML format. With SQL Server, this requirement can be met easily by storing the insurance claims as a column of type [n]varchar(max) or varbinary(max) in the database. It is important to note that the XML data type should not be used to store the documents if information such as insignificant white spaces, order of attributes, namespace prefixes, and the XML declaration need to be preserved.

Benefits

The benefits of using [n]varchar(max) or varbinary(max) as a storage medium and classes from the System.Xml namespace for the manipulation of XML documents are as follows:

  • It is flexible when you need to make changes to the schema of an XML document. It is also useful when you want to store XML documents with different schemas in the same column.
  • It provides textual fidelity for XML documents when [n]varchar(max) or varbinary(max) is used to store XML. This may be a requirement for applications that deal with legal documents such as insurance claims.
  • You can take advantage of database features such as transactional updates, concurrent access, backup, replication, etc., compared to storing XML instances as files in a file system.
  • As this approach does not depend on XML support offered by the database, the application can easily be extended to support multiple database servers such as SQL Server, Oracle, etc.
  • It can use the processing power of the client system, thereby reducing the load on the server. By performing CPU-intensive XML processing on the middle tier, the server is relieved of some of the load and is available for other important tasks.
  • It offers the best possible performance for document-level insertion and retrieval operations.
  • Complex operations such as XSL transforms can be done in the database as stored procedures, triggers, or functions.

Limitations

The limitations of using [n]varchar(max) or varbinary(max) for storing and classes from the System.Xml namespace for processing XML instances can be summarized as follows:

  • Coding complexity is high compared to the XML data type (see the XML Data Type in SQL Server 2005 section) or the SQLXML option (see the SQLXML section). Even though the database logic is simple, implementation of the code to handle the parsing and processing of XML in the middle tier or database tier is complicated.
  • The amount of code required to implement this solution is high. As a result, the cost of maintenance is also high compared to SQLXML option.
  • Since the XML document is stored as [n]varchar(max) in the database, fine-grained updates, inserts, or deletes on the XML document are not possible. It offers limited querying capabilities.
  • The size of an XML document that can be stored in [n]varchar (max) data type is limited to 2GB.
  • It is very costly to search a column of documents stored this way based on the XML content.

Example of Using XML Classes in the .NET Framework

Consider the insurance claim scenario that was discussed earlier in this paper (see the Scenario 1: Insurance Claims section). The insurance company, once the claim is approved, would like to store the claim information for legal purposes. The claim information can be stored in a database as [n]varchar(max) data type.

The flow of the application is as follows:

  1. After processing the claim, the application either approves or rejects the claim.
  2. Using classes in the System.Xml namespace, the XML document for the claim is generated.
  3. The generated XML document is sent to a stored procedure.
  4. The stored procedure inserts the XML document into the table.

The following code example generates an XML document using the claim details available with the system and inserts the XML document into the database.

using System;
using System.Xml;
using System.IO;
using System.Text;
using System.Data;
using System.Data.SqlClient;
namespace InsuranceClaim
{
    class Insurance
    {
        static void Main(string[] args)
        {
            Insurance.InsertInsuranceClaim();
        }
        static void InsertInsuranceClaim()
        {
        StringWriter strWriter = null;
     XmlWriter writer = null;
  XmlWriterSettings settings = null;
     SqlConnection connection = null;
     SqlCommand command = null;
     try
     {
strWriter = new StringWriter();
settings = new XmlWriterSettings();
//Use indenting for readability.
      settings.Indent = true;
      settings.Encoding = System.Text.Encoding.UTF8;
      writer = XmlWriter.Create(strWriter, settings);
      //Write the XML delcaration. 
      writer.WriteStartDocument();
      writer.WriteStartElement("InsuranceClaim");
      writer.WriteStartElement("ClaimInfo");
            writer.WriteElementString("ClaimID", "C1234");
            writer.WriteElementString("ClaimType", "3");
            writer.WriteStartElement("SettlementDetails");
            writer.WriteStartElement("PaymentDetails");
            writer.WriteElementString("PaidTo", "Jeff");
            writer.WriteElementString("Amount", "2000");
            writer.WriteElementString("Date", "05/12/2002");
            writer.WriteElementString("ApprovedBy", "Mike");
            writer.WriteEndElement();//End of PaymentDetails
            writer.WriteEndElement();//End of SettlementDetails
            writer.WriteEndElement();//End of ClaimInfo
            writer.WriteStartElement("DamageReport");
writer.WriteString("Minor accident occured on ");
      writer.WriteElementString("Address", "ABC Street, Sample City, Sample State");
      writer.WriteString(" due to ");
      writer.WriteElementString("Cause", "bad weather");
      writer.WriteString(" resulted in damage to ");
      writer.WriteElementString("DamagedItem", "Head Lights");
      writer.WriteElementString("DamagedItem", "Engine");
            writer.WriteEndElement();//End of DamageReport
            writer.WriteEndElement();//End of InsuranceClaim
      writer.WriteEndDocument();
      //Write the XML to file and close the writer.
      writer.Flush();
      connection = new SqlConnection();
      connection.ConnectionString = @"server=localhost;
      database=AdventureWorks;Integrated Security=SSPI;";
      command = connection.CreateCommand();
      command.CommandText = "InsertInsuranceClaim";
      command.CommandType = System.Data.CommandType.StoredProcedure;
command.Parameters.Add("@CustomerID", 
System.Data.SqlDbType.Char);
command.Parameters.Add("@Claim", 
System.Data.SqlDbType.VarChar);
      String xml = strWriter.ToString();
string strCustomerID = "1001";
            command.Parameters[0].Value = strCustomerID;
            command.Parameters[0].Size = strCustomerID.Length;
      command.Parameters[1].Value = xml;
      command.Parameters[1].Size = xml.Length;
      connection.Open();
      command.ExecuteNonQuery();
      connection.Close();
       }
       finally
       {
      if (connection.State == ConnectionState.Open)
         connection.Close();
      if (writer != null)
         writer.Close();
      if (strWriter != null)
         strWriter.Close();
       }
    }
    }
}

Here is the script that creates the database table:

CREATE TABLE [InsuranceClaim](
   [CustomerID] [char](4) NOT NULL,
   [Claim] [varchar](max) NOT NULL,
   [ModifiedDate] [datetime] NOT NULL DEFAULT (getdate())
)

The following stored procedure is used to insert the XML document into the database.

CREATE PROCEDURE [dbo].[InsertInsuranceClaim]
   @CustomerID [char](4),
   @Claim [varchar](max)
AS
BEGIN
       SET NOCOUNT ON;
   INSERT INTO [InsuranceClaim] ( CustomerID, Claim )
   VALUES ( @CustomerID, @Claim )
END;

SQLXML

Introduced with SQL Server 2000, SQLXML encompasses a whole range of features related to XML processing on the client side. It is a set of libraries and technologies enabling the seamless integration of relational data in SQL Server databases with XML describing relationally structured data.

In the pre-SQL 2000 days, the developer had to provide the code layer for interaction between the relational data and the data in XML format. With the advent of SQLXML, life was made easier, for it provided the link between relational data and XML. The topic of discussion in this paper is confined to the SQLXML Managed Classes. For more information on the applicability of other features of the library, see the SQLXML page on MSDN.

SQLXML comprises many of the features introduced in SQL Server to support XML. These features are:

  • Ability to transform the result of a query into XML on the client side
  • Ability to create an XML view of relational data by using an annotated XSD mapping schema file, which allows you to
    • Define an XPath query against the XML view
    • Perform an insert, update, or delete on data in the database using XML templates known as updategrams
    • Perform XML bulk load operations
  • Ability to access SQL Server using HTTP, which enables you to:
    • Specify SQL statements in the URL
    • Specify template queries in the URL
    • Specify template files in the URL
    • Specify XPath queries written against the annotated XSD mapping schema files in the URL
  • Ability to expose functionality offered by stored procedures, user-defined functions, and template queries as SOAP-based Web services
  • Ability to write code in the .NET Framework to take advantage of the XML features offered by SQLXML using SQLXML Managed Classes

Client-side XML formatting. Specifying a FOR XML clause on the client side causes the middle tier to perform the FOR XML transformation on the rowset returned by the server in response to a query. To perform XML formatting on the client side:

  • Set the ClientSideXml property of the SqlXmlCommand object to true if you are using SQLXML Managed Classes.
  • Set the ClientSideXML provider-specific property to True if you are using SQLXMLOLEDB Provider.
  • Specify client-side-xml="1" in the template, if you are using template queries.
  • Select the Run on the Client option on the virtual directory in the Settings tab if you are accessing SQL Server using HTTP.
  • The XML formatting modes that are valid with FOR XML on the client side are RAW, NESTED, and EXPLICIT. When RAW mode is used, the resulting XML document contains an XML element for each row in the query result and an attribute corresponding to each column in a row. When NESTED mode is specified, the base table names are returned as the element names of the resulting XML document. The EXPLICIT mode generates XML documents of any shape by allowing you to specify the format of the desired XML in the query itself.
  • XML views. XML views are created by using annotated XSD schemas that define the mappings between relational and XML data. These XML views can be queried using XPath queries. The relational data exposed through XML views can also be modified and the modifications can then be submitted to the database using updategrams. Furthermore, XML views are also useful for inserting huge XML documents into the database with the help of the COM-based XML Bulk Load object.
  • HTTP access to SQL Server. SQLXML provides a utility known as the IIS Virtual Directory Management utility, which can be used to set up an IIS virtual directory to expose the XML features of an SQL Server via HTTP. Support for specifying the SQL statements, stored procedures, templates queries, template files, and XPath queries directly at the URL is provided with the help of SQL ISAPI extension.
  • Web services support in SQLXML. Support for exposing the functionality of an SQL Server as a SOAP-based Web service was added in SQLXML 3.0. This feature allows the SQL Server to receive SOAP HTTP requests from clients to execute stored procedures, user-defined functions, and templates.
  • SQLXML Managed Classes. Access to SQLXML functionality in the .NET Framework is provided through SQLXML Managed Classes. There are three managed classes in SQLXML:
  • SqlXmlCommand - handles the database connectivity and query execution aspects
  • SqlXmlParameter - helps specify the parameters in queries
  • SqlXmlAdapter - facilitates interaction with the dataset in the .NET Framework

SQLXML Managed Classes allow you to:

  • Execute an SQL query with FOR XML clause
  • Execute an XPath query against a mapping schema
  • Execute a template query
  • Execute a template query file
  • Execute an updategram
  • Execute a DiffGram

Using SQLXML to expose relational data as XML documents is a suitable choice if:

  • Your application receives XML data, which is highly structured and maps well into relational tables.
  • Your application must load huge XML documents received from external applications into a database and persist them in relational format.
  • Your application does not have a requirement to preserve the document order.
  • Your application is required to present the same data to multiple data consumers in different formats.
  • Performance of DML operations is critical for your application.
  • Your application has a requirement to utilize the full potential of the optimizer for query optimization.
  • Your application performs fine-grained data manipulations.
  • Your application is required to expose existing relational data as XML.

Analyzing Scenarios

In the first data exchange scenario described in the XML usage scenarios (see the Scenario 2: Data Exchange Between Automobile Manufacturer and Parts Suppliers I section), the interaction between an automobile manufacturer and multiple parts suppliers presents a classic case for using SQLXML. The manufacturer has to communicate with different suppliers to exchange invoice data. The proposed solution uses a Web service and SQLXML to solve this problem. The manufacturer exposes a Web service that can be used by the suppliers to send invoices to the manufacturer. The Web service uses an XSLT that is specific to the customer to transform the invoice from the supplier format to a common format used by the manufacturer. The Web service then shreds the XML document using an XML view that maps the content of the invoice document to the columns of a relational table. The legacy invoice processing system will be able to pick the data from the relational table and proceed with processing. The advantages of using XML views for this scenario are as follows:

  • Low maintenance costs. Any changes made to the invoice schema by the supplier can be accommodated easily by modifying the XSLT file that is specific to the supplier.
  • The coding complexity is less compared to FOR XML (see the Server-Side Support for Relational/XML Integration (FOR XML/OPENXML) section).
  • New suppliers can be supported easily by creating an XSLT file specific to the supplier.

Benefits

The advantages of using SQLXML are summarized as follows:

  • Creating an annotated mapping schema to map relational data to the XML data is a relatively simple and more maintainable solution than the FOR XML EXPLICIT on the server side.
  • SQLXML offers the ability to create updatable bi-directional XML views compared to read-only XML representations of relational data created using FOR XML.
  • Mapping XSD provides the facility for accommodating the change requests in XML format without major code changes. This provides ease for maintenance.
  • SQLXML provides the ability to perform XML formatting on the client side by allowing the user to set the ClientSideXML property to true for the SqlXMLCommand class and thereby reducing the load on the server.

Limitations

On the downside, there are a few limitations when SQLXML is used from the client side:

  • XML views are not suitable for cases where the hierarchy of an XML document is too deep or recursive of unknown depth.
  • SQLXML is not suitable for descriptive documents containing mixed content markup and ordered data such as product catalogs, news reports, etc.
  • Because the document order is not preserved, it is difficult to reconstruct the original XML document.
  • While decomposing XML documents into relational tables gives excellent searching performance, the transformation to and from XML can be expensive.
  • When default mapping is used in XSD mapping schemas, it can expose the database table name and column information, which can lead to unintentional information leakage. This risk can be avoided by specifying explicit mapping for tables and columns.
  • SQL statements in the URL should only be used in trusted intranets. The use of such queries on the Internet can lead to potential security risks.

Example of Using SQLXML

With the overview of SQLXML complete, the next step is to delve into an example where SQLXML Managed Classes can be applied. Consider a simple example—exporting sales order details for a specific customer. The tables used for the example are available in the AdventureWorks database.

Data from the database has to be available in the client in XML format, which can be displayed by the presentation layer. Next you will see how the relational data in the SQL database can be manipulated as XML data using the SQLXML class. A mapping XML Schema is used to manipulate and map the XML node names to table fields. For more information on manipulating relational data using the SQLXML managed library, see the SQLXML page on MSDN.

The following annotated XSD schema defines the mappings between the relational tables [Sales.Customer], [Sales.SalesOrderHeader], [Sales.SalesOrderDetail] and the target XML representation of sales order details of a customer. Parent-child relationships in XML can also be defined using XSD mapping schemas as shown in the following XSD schema.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
<xsd:annotation>
  <xsd:appinfo>
    <sql:relationship name="CustomerOrderHeader"
        parent="Sales.Customer"
        parent-key="CustomerID"
        child="Sales.SalesOrderHeader"
        child-key="CustomerID" />
    <sql:relationship name="OrderHeaderOrderDetail"
        parent="Sales.SalesOrderHeader"
        parent-key="SalesOrderID"
        child="Sales.SalesOrderDetail"
        child-key="SalesOrderID" />
  </xsd:appinfo>
</xsd:annotation>
  <xsd:element name="Customer" sql:relation="Sales.Customer" >
   <xsd:complexType>
     <xsd:sequence>
        <xsd:element name="Order" sql:relation="Sales.SalesOrderHeader"  
            sql:relationship="CustomerOrderHeader" maxOccurs="unbounded" >
          <xsd:complexType>
              <xsd:sequence>
                <xsd:element name="OrderDetail" 
                             sql:relation="Sales.SalesOrderDetail" 
                             sql:relationship="OrderHeaderOrderDetail" 
                             maxOccurs="unbounded" >
                  <xsd:complexType>
                    <xsd:attribute name="SalesOrderID" 
type="xsd:integer" />
                    <xsd:attribute name="ProductID" type="xsd:integer" />
                    <xsd:attribute name="OrderQty" type="xsd:integer" />
                  </xsd:complexType>
                </xsd:element>
              </xsd:sequence>
              <xsd:attribute name="SalesOrderID" type="xsd:integer" />
              <xsd:attribute name="CustomerID" type="xsd:integer" />
              <xsd:attribute name="OrderDate" type="xsd:date" />
              <xsd:attribute name="ShipDate" type="xsd:date" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
      <xsd:attribute name="CustomerID" type="xsd:integer" />
     </xsd:complexType>
  </xsd:element>
</xsd:schema>
class ExportOrders
{
/// <summary>
///    This method use SqlXmlCommand class to select the records from 
///   Sales.Customer, Sales.SalesOrderHeader and Sales.SalesOrderDetail
///   tables.The data is fetched as such from server and formatted into
///   xml at client side. Note that the ClientSideXml is set to true.
/// </summary>
static void Main(string[] args)
{
   if (args.Length < 1)
   {
      Console.WriteLine("Usage");
      Console.WriteLine("CustomerOrders <CustomerID> [OrderID]");
      return;
   }

   try
   {
      StringBuilder strBuilder = new StringBuilder();
      strBuilder.Append("/Customer[@CustomerID='");
      strBuilder.Append(args[0]);
      strBuilder.Append("']");
      if (args.Length > 1)
      {
         strBuilder.Append("/Order[@SalesOrderID='");
         strBuilder.Append(args[1]);
         strBuilder.Append("']");
      }
SqlXmlCommand xmlCommand = new SqlXmlCommand(@"Provider=
SQLOLEDB; Server=localhost; database=AdventureWorks;
Integrated Security=SSPI;");
      xmlCommand.ClientSideXml   = true;
      xmlCommand.RootTag      = "CustomerOrders";
      xmlCommand.SchemaPath      = @"CustomerOrderDetails.xsd";
      xmlCommand.CommandType      = SqlXmlCommandType.XPath;
      xmlCommand.CommandText      = strBuilder.ToString();
      Stream reader         = xmlCommand.ExecuteStream( );
      FileStream fsOut = File.Create("CustomerOrder.xml");
      StreamWriter sw = new StreamWriter(fsOut);
      using (StreamReader sr = new StreamReader(reader))
      {
         sw.Write(sr.ReadToEnd());
      }
      sw.Flush();
      sw.Close();
      fsOut.Close();
   }
   catch (Exception exception)
   {
      Console.WriteLine( exception.ToString() );
   }
}
}

The method provided above exports the sales order details for a Customer ID specified as a command line argument to the application. The data is converted into XML format on the client side, thereby avoiding performance issues on the server side. Note that the annotated XSD schema mapping provided above has to be saved as CustomerOrderDetails.xsd for the above code fragment to work properly.

Note   The code that goes into retrieving the data as XML from the database is minimal as can be seen in the example.

Server-Side Support for Relational/XML Integration (FOR XML/OPENXML)

Support for returning the results of an SQL query as XML documents on the server side is provided in SQL Server using the FOR XML extension to the SELECT statement. On the other hand, the OPENXML keyword provides the ability to extract rowsets from an XML document.

FOR XML

Server-side FOR XML supports four XML transformation modes—RAW, AUTO, EXPLICIT, and PATH.

By default, RAW mode maps each row in the query result set to an XML element and each column in the row to an attribute. When the ELEMENTS option is specified with ROW mode, each column in a row is mapped to a subelement of the element that is generated for the row. It is also possible to request an inline schema for the XML that is generated by specifying XMLSCHEMA option.

AUTO mode provides support for generating nested XML elements where each table in the FROM clause, for which at least one column is listed in the SELECT clause, is mapped to an XML element and the columns listed in the SELECT clause are mapped to attributes (or subelements if the ELEMENTS option is specified) by default.

EXPLICIT mode provides the maximum control over the shape of the XML that is generated from a query result. It allows you to generate XML with any shape by specifying the format for the desired XML in the query itself.

Composing complex XML documents using the EXPLICT mode is cumbersome. Using the PATH mode together with the ability to write nested FOR XML queries and the TYPE directive to return XML type instances provides an alternative to writing complex EXPLICIT mode queries. The PATH mode maps the columns in the rowset returned by the SELECT query to attributes and subelements by interpreting the column names as XPath-like syntax. For more information on FOR XML enhancements in SQL Server 2005 see the What's New in FOR XML in Microsoft SQL Server 2005 white paper.

OPENXML

OPENXML, together with the sp_xml_preparedocument and sp_xml_removedocument system stored procedures, provides a relational rowset view of an XML document. In order to use OPENXML on an XML document, sp_xml_preparedocument must be used to create an in-memory representation of the XML document. This stored procedure parses the XML document using the MSXML parser and returns a handle to the XML document that can be used with OPENXML. Now, parameters such as XML document handle, rowpattern, which is an XPath expression that maps nodes of XML data to rows, rowset schema, and mapping between the rowset columns and the XML nodes can be passed to OPENXML to obtain the rowset. The XML document must be unloaded from memory using the sp_xml_removedocument stored procedure when it is no longer required.

Enhancements in FOR XML

FOR XML has been enhanced in SQL Server 2005 with the following features:

  • Ability to typecast the results of FOR XML using the new TYPE directive
  • Ability to assign the results of FOR XML to a variable of type XML
  • Ability to nest FOR XML queries to generate XML hierarchies
  • Ability to use the new PATH mode to generate complex XML documents
  • Ability to generate an inline schema in XDR or XSD format using the XMLDATA and XMLSCHEMA options, respectively
  • Ability to use the ELEMENTS directive with RAW mode to generate element-centric XML
  • Ability to use the XSINIL option with the ELEMENT directive to map NULL values to an element with an attribute xsi:nil="true"

Enhancements in OPENXML

In SQL Server 2005, OPENXML has been enhanced to support the following features:

  • Ability to pass XML type data to sp_xml_preparedocument
  • Ability to use new data types in the WITH clause

Using FOR XML and OPENXML to compose and decompose XML documents is a suitable choice if:

  • Your application has a requirement to store data relationally and expose this information to another application as XML.
  • Your application does not have the requirement to preserve the order of XML.
  • Your application performs heavy-element–level DML operations.
  • Your application requires fine-grained data access and update.
  • Your application requires you to expose the relational data through a Web service.

Analyzing Scenarios

The requirement in scenario 3 (see the Scenario 3: Data Exchange Between Automobile Manufacturer and Parts Suppliers II section) is to provide Web services that suppliers can use to obtain the status of an invoice or get a copy of payment instructions. FOR XML in conjunction with a Web service provides a solution that allows the manufacturer to expose these services on the Internet. The supplier uses a Web service to enquire about the status of his invoice. The Web service then uses the invoice ID provided by the supplier and generates the response in XML format from the relational data using a FOR XML statement. The generated XML document will be returned to the supplier. In the current scenario, using the FOR XML statement-based approach provides the following benefits:

  • FOR XML provides an easy way to dynamically compose simple XML documents from relational data.
  • Maintaining a FOR XML query is easy compared to an XML view when they are used to compose simple XML documents.

Benefits

Some of the benefits of using FOR XML/OPENXML are as follows:

  • FOR XML offers a simple way of generating XML from relational data on the server.
  • FOR XML provides the ability to expose business information through Web services.
  • OPENXML allows you to pass a rowset to a stored procedure in XML format which enables you to perform bulk INSERT, UPDATE, and DELETE operations with just one network roundtrip.
  • FOR XML together with XSL can be used for application integration or business integration.

Limitations

The limitations of using FOR XML/OPENXML for composing and decomposing XML documents are:

  • Shaping XML structures using FOR XML with the EXPLICT option is a difficult task.
  • It is difficult to maintain complex queries written using FOR XML EXPLICIT.
  • XML documents generated by FOR XML AUTO can expose the database table name and column information leading to inadvertent information disclosure. This situation can be avoided by specifying aliases for tables and columns.

Example of Using FOR XML and OPENXML

The following examples use the AdventureWorks database that comes with SQL Server 2005. Let us look at an example where FOR XML is used to obtain customer, order, and order detail information from the [Sales.Customer], [Sales.SalesOrderHeader], [Production.Product], and [Sales.SalesOrderDetail] tables for a specified range of customers.

Example: Using FOR XML

SELECT Cust.CustomerID, 
       OrderHeader.CustomerID,
       OrderHeader.SalesOrderID, 
       Detail.SalesOrderID, Detail.LineNumber,Detail.ProductID, 
       Product.Name,
       Detail.OrderQty
FROM Sales.Customer Cust, 
     Sales.SalesOrderHeader OrderHeader,
     Sales.SalesOrderDetail Detail,
     Production.Product Product
WHERE Cust.CustomerID = OrderHeader.CustomerID
AND   OrderHeader.SalesOrderID = Detail.SalesOrderID
AND   Detail.ProductID = Product.ProductID
AND   (Cust.CustomerID BETWEEN 44 AND 46)
ORDER BY OrderHeader.CustomerID,
         OrderHeader.SalesOrderID
FOR XML AUTO

Result of the query is as shown below:

<Cust CustomerID="44">
  <OrderHeader CustomerID="44" SalesOrderID="53575">
    <Detail SalesOrderID="53575" LineNumber="2" ProductID="952" OrderQty="2">
      <Product Name="Chain" />
    </Detail>
    <Detail SalesOrderID="53575" LineNumber="1" ProductID="969" OrderQty="1">
      <Product Name="Touring-1000 Blue, 60" />
    </Detail>
    <Detail SalesOrderID="53575" LineNumber="3" ProductID="972" OrderQty="1">
      <Product Name="Touring-2000 Blue, 54" />
    </Detail>
  </OrderHeader>
  <OrderHeader CustomerID="44" SalesOrderID="59024">

    <Detail SalesOrderID="59024" LineNumber="1" ProductID="972" OrderQty="3">
      <Product Name="Touring-2000 Blue, 54" />
    </Detail>
    <Detail SalesOrderID="59024" LineNumber="2" ProductID="957" OrderQty="2">
      <Product Name="Touring-1000 Yellow, 60" />
    </Detail>
  </OrderHeader>
</Cust>
<Cust CustomerID="46">
  <OrderHeader CustomerID="46" SalesOrderID="48354">
    <Detail SalesOrderID="48354" LineNumber="1" ProductID="730" OrderQty="1">
      <Product Name="LL Road Frame - Red, 62" />
    </Detail>
  </OrderHeader>
</Cust>

The following example extracts order details specified in the XML document using OPENXML and XPath expressions.

Example: Using OPENXML

DECLARE @XmlDocumentHandle int
DECLARE @XmlDocument nvarchar(max)
SET @XmlDocument = N'<ROOT>
<Cust CustomerID="44">
  <OrderHeader CustomerID="44" SalesOrderID="53575">
    <Detail SalesOrderID="53575" LineNumber="2" ProductID="952" OrderQty="2">
      <Product Name="Chain" />
    </Detail>
    <Detail SalesOrderID="53575" LineNumber="1" ProductID="969" OrderQty="1">
      <Product Name="Touring-1000 Blue, 60" />
    </Detail>
    <Detail SalesOrderID="53575" LineNumber="3" ProductID="972" OrderQty="1">
      <Product Name="Touring-2000 Blue, 54" />
    </Detail>
  </OrderHeader>
  <OrderHeader CustomerID="44" SalesOrderID="59024">
    <Detail SalesOrderID="59024" LineNumber="1" ProductID="972" OrderQty="3">
      <Product Name="Touring-2000 Blue, 54" />
    </Detail>
    <Detail SalesOrderID="59024" LineNumber="2" ProductID="957" OrderQty="2">
      <Product Name="Touring-1000 Yellow, 60" />
    </Detail>
  </OrderHeader>
</Cust>
<Cust CustomerID="46">
  <OrderHeader CustomerID="46" SalesOrderID="48354">
    <Detail SalesOrderID="48354" LineNumber="1" ProductID="730" OrderQty="1">
      <Product Name="LL Road Frame - Red, 62" />
    </Detail>
  </OrderHeader>
</Cust>
</ROOT>'
-- Create an internal representation of the XML document.
EXEC sp_xml_preparedocument @XmlDocumentHandle OUTPUT, @XmlDocument
-- Execute a SELECT statement using OPENXML rowset provider.
SELECT *
FROM OPENXML (@XmlDocumentHandle, '/ROOT/Cust/OrderHeader/Detail',2)
WITH (CustomerID  varchar(10) '../@CustomerID',
      OrderID     int         '../@SalesOrderID',
      LineNumber  int         '@LineNumber',
      ProductID   int         '@ProductID',
      Quantity    int         '@OrderQty')
-- Remove the internal representation.
EXEC sp_xml_removedocument @XmlDocumentHandle

Result of the query is as follows:

--------------------------------------------------------
CustomerID   OrderID   LineNumber   ProductID   Quantity
--------------------------------------------------------
44      53575      2      952      2
44      53575      1      969      1
44      53575      3      972      1
44      59024      1      972      3
44      59024      2      957      2
46      48354      1      730      1
--------------------------------------------------------

XML Data Type in SQL Server 2005

The hierarchical nature of XML data makes it hard to model it as relational data as the structure of the data becomes complex (e.g., depth of hierarchy increases). Moreover, when XML data is mapped to relational data, the order of elements in XML instances is also not preserved and there is a significant cost involved in composing the original XML document from decomposed relational data. The limitations of the relational model to store XML data make it ideal to store XML instances natively. Native XML instances do not suffer from the limitations of relational models and offer features such as the ability to handle hierarchical or nested data, the ability to preserve the order of elements, a straightforward way to store and retrieve XML data, the flexibility to support multiple schemas, and so on.

Microsoft SQL Server 2005 provides extensive support for XML data processing. With SQL Server 2005, XML values can be stored natively in an XML data type column, which can be typed according to a collection of XML schemas, or can be left untyped. Fine-grained data manipulation is supported using XQuery and XML DML, the latter being an extension for data modification. Furthermore, the XML column can be indexed for improved query performance.

Typed XML

Typed XML is ideal for situations where you have XML schemas describing your XML data. In such cases, you can associate a collection of XML schemas with the XML column to yield typed XML. Validation on XML type columns is performed based on the XML schema collection associated with the column. Additionally, the performance of queries that involve typed XML data is better compared to untyped XML data, as it requires no runtime conversion of node values.

Untyped XML

Using untyped XML is suitable if you have schemas, but you do not want the server to validate the data, or if no schema is available. You may want to store untyped XML even when a schema is present if an application:

  • Has no fixed schema
  • Performs the validation on the client side before storing the data at the server
  • Temporarily stores XML data that is invalid according to the schema
  • Uses schema components that are unsupported at the server (e.g., key/keyref)

Untyped XML documents are checked to ensure that they are well formed, even though they are not associated with any schema. It is important to note that untyped XML imposes a performance penalty due to the runtime conversion of node values, as the node values are stored internally as Unicode strings.

Usage Scenarios of XML Data Type

With the new XML data type in SQL Server 2005, it is now possible to:

  • Create a table with one more columns of type XML in addition to the relational columns.
  • Create a typed XML column type by associating it with an XML schema collection.
  • Create a constraint on an XML column involving other XML or non-XML type columns to enforce business rules.
  • Create a variable of type XML that can be used to store an instance of an XML data type.
  • Create a parameter of type XML to a stored procedure or user-defined function.
  • Return an XML type value from a user-defined function.
  • Assign the result of a FOR XML query obtained using the new TYPE directive to a variable of type XML.
  • Run a subset of XQuery to query into the XML structure and transform XML data.
  • Create a computed column based on a column of type XML.
  • Create an XML index on a column of type XML to improve the performance of queries.
  • Perform element-level insert, delete, and update operations on XML instances using XML DML.
  • Pass an instance of XML type data to sp_xml_preparedocument to prepare an in-memory representation of the XML document.
  • Write cross-domain queries that will include both relational and XML columns using XQuery and XML DML.
  • Cast or convert the XML type to varchar or nvarchar types using CAST and CONVERT, respectively.
  • Convert or cast string data types such as [n]varchar, [n]text, varbinary, and image into XML type using CAST or CONVERT.

XML Data Type Methods and XML DML

Queries and manipulation on XML data type columns are supported through five methods. Fragments of an XML document can be extracted using the query() method of XML data type. The query() method accepts an XQuery expression as an argument and returns an untyped XML instance. Scalar values can be extracted from an XML instance using the value() method by specifying an XQuery expression and the desired SQL type to be returned. Existential checks on an XML instance can be performed using the exist() method. Decomposing an XML document into relational data is facilitated with the help of the nodes() method.

Data manipulation operations can be performed on an XML instance using the modify() method. Support for XML DML is provided through insert, delete, and update keywords added to XQuery. One or more nodes can be inserted, deleted, and updated using the insert, delete, and update keywords, respectively.

XML Indexing

The parsing and shredding involved in query processing operations on XML data type columns can consume a prohibitively long time if the size of the XML instance is very big. Query performance on the XML data type can be improved by creating indexes on these columns. The size of the XML data and usage scenario play an important role in dictating the type of index that is required. SQL Server supports two types of indexes—primary XML indexes and secondary XML indexes; the latter cannot exist without the former.

The creation of a primary XML index on an XML column shreds the XML BLOB and stores these values in an internal table. This results in improved performance during query execution time as there is no shredding involved at runtime. Depending on the usage scenario, query performance can be further improved by creating secondary XML indexes. Three types of secondary XML indexes—PATH, PROPERTY, and VALUE—can be created to improve the performance of queries based on path, property, and value respectively. More information on selecting an appropriate secondary index for an XML type column is available in the Performance Optimizations for the XML Data Type white paper.

Storing XML documents as an XML data type is a suitable choice if:

  • Your application has a requirement to preserve the Infoset content of XML instances. The Infoset content of an XML document includes document hierarchy, order of elements, values of elements and attributes, and so on. Information such as order of attributes, prefixes of namespaces, insignificant white spaces, and XML declaration is not preserved.
  • Your application requires element-level modify and query operations on XML documents.
  • Your application needs indexes on XML data type columns to speed up query processing.
  • Your XML data may or may not have a schema.
  • Your application uses XML documents with a wide range of structures, or XML documents conforming to different or complex schemas that are too hard to map to relational structures.

Analyzing Scenarios

Analyzing Scenario: Content Management System

Now, let us analyze the content management system described in the XML usage scenarios (see the Scenario 4: Content Management System section). Publishing companies deal with information in various forms including text, images, audio, video, etc. Blocks of information that can be used independently are collected from various sources and maintained in a database. These blocks of information are known as components. By assembling individual components, documents are created. Which components are included in a document depends on the requirements of the users. These documents will be delivered to the subscribed users through various channels. Content management systems typically have requirements such as the ability to store, retrieve, search, and update the content with high performance and scalability.

XML as a unified data model provides an attractive option for storing both XML data and XML content in the same document. XML also provides the ability to separate the presentation from the data itself, which is important because the presentation of the same information to different users may differ. The native XML data type offered by SQL Server 2005 can be used to meet the requirements of this content management system. With the XML data type, it is possible to store XML documents, modify XML documents at element-level using XML DML, and perform queries on XML documents using XQuery.

Analyzing Scenario: Customer Survey

In the customer survey XML usage scenarios (see Scenario 5: Customer Survey section), the key requirement is the ability to store the survey information with multiple schemas. Data that does not have a fixed schema cannot be modeled using a single relational table. A relational table with an XML column provides an excellent option for storing such information. An additional column can be added to the relational table to store the type of survey. Information corresponding to a particular type of survey can be analyzed by fetching all records for the survey type using the survey type column. In a typical survey, customers usually don't answer all the questions. Therefore, instead of creating several columns (one for each question in the survey) and storing NULL values in the database for unanswered questions, storing the per-customer survey information as an XML document in one column would be beneficial. Storing the customer survey information as an XML type column is more appropriate for this scenario because:

  • Using XML type columns allows the survey information with different schemas to be stored in a single XML type column. Associating the XML type column with an XML schema collection allows the user to store data for multiple survey types.
  • Validation of survey information can be performed by the user interface and there is no need to store this as a typed XML data type column to enforce validation at the database level.
  • XQuery can be used to perform analysis on the data.

Benefits

The benefits of storing XML data in an XML data type column can be summarized as follows:

  • The XML data type offers a simple and straightforward way of storing your XML data at the server while preserving document order and document structure. This is especially true in the case of documents where the document order and document structure are very important. Assume a sample scenario where your application gets an XML document from some data source and your application is supposed to store the document. Storing it in an nvarchar or text column does not guarantee well-formed XML and does not provide easy access to its content. In such cases, it is appropriate to store the incoming XML document natively in the XML column.
  • The XML data type provides the ability to perform fine-grained query and modify operations on XML data. Prior to SQL Server 2005, there was no way to store XML natively in the database. Hence, if you wanted to modify or query XML data, the data would have to be loaded from the nvarchar or text column, an XML document created out of the string, and then modified. Similar steps were required to write the modified data back to the database. Now with the XML data type, these steps become much easier.
  • With the XML data type, it is possible to create indexes on XML data type column for faster query processing.
  • When the XML data type is used, you can use XML schema collection and constraints on XML data to enforce business rules. The XML schemas are used to validate the data, add type-based operational semantics, perform more precise type checks than untyped XML during compilation of query and data modification statements, and optimize storage and query processing.
  • As the XML type data is stored in the database, it is included in a variety of database activities such as backup and restore, SQL Server security, transactions, logging, and so on.

Limitations

The following are some of the limitations that one must be aware of when working with the new XML data type:

  • An exact copy of the data is not stored. Insignificant white spaces, namespace prefixes, order of attributes, and XML declaration are not preserved.
  • The maximal depth of hierarchy in an XML document is 128.
  • The maximal size of the internal binary representation of an XML document is 2GB.
  • XML instances cannot be compared. Thus:
    • An XML column cannot be part of a primary or foreign key constraint.
    • An XML column cannot be used in a GROUP BY statement as a grouping value.
  • XML cannot be cast to text, ntext, and image data types as these types are deprecated in SQL Server 2005. However, the XML data type can be cast to [n]varchar and [n]varbinary types.

Example of Using the XML Data Type

The sample application uses the Sales.Store table from the AdventureWorks database. The Sales.Store table contains CustomerID as the primary key and Demographics as the XML column. The Demographics column contains the Store Survey information. The information that gets stored as part of Store Survey is optional. This means that the Demographics column may or may not contain all the elements. If the same information were stored in a relational format, these elements would need to be created as columns of the table. Since most of the Store Survey information is optional, these columns will contain NULL values for most of the data. This would lead to a waste of table space. To avoid such waste, the Store Survey information in the Demographics column is stored in XML format. The Demographics column contains the sales information, viz., Annual Sales, Annual Revenue, Bank name, etc. per customer. These fields are stored as XML elements in the data.

The sample application performs the following functions:

  • Display the list of all the customers along with their demographic information.
  • Display the demographic information for a particular customer.
  • Insert a new customer in the Sales.Store table along with the demographic information.
  • Modify some elements (such as Annual Sales, Annual Revenue, etc.) of the demographic information for a given customer.
  • Delete the demographic information for a given customer.

The application uses the System.Data.SqlTypes.SqlXml class to retrieve the data from an XML column. The SQLXML class is a direct mapping to the XML column.

Using the SQLXML class, it is possible to directly retrieve the data from the XML column without any mappings or conversions.

Here, let us look at an example of how to retrieve the Annual Revenue element for the customer ID 12. The following code example illustrates the second function mentioned above.

Public void RetreiveAnnualRevenue ()
{
SqlConnection conn = new SqlConnection();
conn.ConnectionString = @"Server=localhost; Database=AdventureWorks;
integrated security=SSPI";
   conn.Open();
   SqlCommand command = conn.CreateCommand();
   command.CommandText = @"select Demographics.query(
'declare namespace SS="https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey"
<StoreInfo>

   <AnnualRevenue>
{data(/SS:StoreSurvey/SS:AnnualRevenue)}
   </AnnualRevenue>
</StoreInfo>') as
Result from Sales.Store where CustomerID=12";
   SqlDataReader datareader = command.ExecuteReader();
System.Text.StringBuilder builder = new System.Text.StringBuilder();
   While (datareader.Read())
   {
      SqlXml sqlxml = datareader.GetSqlXml(0);
      builder.Append(sqlxml.Value);
   }
   //Note: xml1 is a XML web control
   this.xml1.DocumentContent = builder.ToString();
this.xml1.TransformSource = @"StoreInfo.xslt";
}

The above method creates an instance of the SqlConnection object to the AdventureWorks database. The command text property contains the query to retrieve the value of Annual Revenue field for the customer ID 12.

This is an XQuery that will get directly executed on the XML column. The return value of the query will be an XML fragment, which is then mapped to the SQLXML class. The XML fragment can then be retrieved using the Value property of the SQLXML class.

The retrieved XML fragment is then displayed in the client application, using an XML Web server control.

Comparison of Different Approaches

Feature XML classes in the .NET Framework FOR XML / OPENXML SQLXML XML Data Type
Code complexity High. There are no classes that directly map between XML data and relational data. Medium. Writing queries using FOR XML EXPLICIT is difficult. Low. The classes provide a mechanism to manipulate the relational data as XML data and the updategrams provide a facility to update the records as well. Low. Since XML data is stored as such in the column, complexity is less. Besides, Visual Studio 2005 provides classes to manipulate the XML data type. XML DML can be used for modifying the XML data.
Maintainability Complicated. Changes in the fields of the table or XML require code change. Difficult. Changes in the fields of the table or XML require changes to the query. Easy. In most cases, modifying the mapping XSD files can accommodate changes. Easy. XQuery provides easy syntax for querying the XML columns in the database.
Installation Requires no special installation other than the .NET Framework. No special installations required. Requires that SQLXML library is installed on the client machines. No special installation required.
Security Secure to a very good extent, since the data type and formats are not exposed on the client side in normal case. Secure if proper care is taken to avoid disclosure of table names and column names. The design has to secure the mapping XSD file if it is stored client side instead of mid-tier. Secure
Support for .NET Compact Framework Limited support. XmlDataDocument is not supported in the Microsoft .NET Compact Framework. Supported Not supported Not supported. If XML data type columns from SQL Server are synced to SQL Server mobile, it will be converted to ntext.
Data Validation Can be enforced by the client and the server. Can be enforced by the server. Can be performed by the client. Can be enforced by the server using XML Schema.
Data Storage [n]varchar(max), XML, or varbinary(max) Relational tables (can use XML as a field). Relational tables (can use XML as a field). XML data type
Fidelity Textual fidelity (preserves the XML data at byte level) Relational fidelity (preserves the hierarchical structure of the data, but the order among elements is ignored) Relational fidelity InfoSet fidelity (preserves the InfoSet content of the XML data)
Data access and update in store Supports updates at document level. Supports fine-grained data access and updates. Supports fine-grained data access and updates. Supports fine-grained data access and updates.

Conclusion

This paper provides the user with the different options for handling XML in SQL Server 2005. The System.Xml namespace, SQLXML, and the XML data type are discussed with the relative benefits and limitations of each, and include sample scenarios. The performance exhibited in ideal scenarios enables the user to choose the appropriate XML option for the user's application.

For More Information

Microsoft SQL Server 2005 Web site

XML Support in Microsoft SQL Server 2005

XML Best Practices for Microsoft SQL Server 2005

What's New in FOR XML in Microsoft SQL Server 2005

Performance Optimizations for the XML Data Type