Shankar Pal, Mark Fussell, and Irwin Dolobowsky
Microsoft Corporation
December 2005
Applies to:
Microsoft SQL Server
Microsoft .NET Framework 2.0
Summary: This paper provides an overview of the XML support that is built into Microsoft SQL Server 2005. It includes a discussion of how XML integrates with client-side programming support in both the .NET Framework 2.0 and native code such as OLEDB and SQLXML. (40 printed pages)
Click here to download the Word document of this article, XML_Support_in_SQL.doc.
Contents
Introduction
Motivating Scenarios for XML Storage
Server-Side XML Processing in SQL Server 2005
Client-Side XML Processing in SQL Server 2005
Conclusion
References
Introduction
eXtensible Markup Language (XML) has been widely adopted as a platform-independent format for data representation. It is useful for exchanging information among loosely coupled, disparate systems, such as in business-to-business (B2B) applications and workflow situations. Data interchange has been a major driver of XML technologies.
XML is increasingly present in enterprise applications that are used for modeling semi-structured and unstructured data. One such application is document management. Documents (e-mail messages, for example) are semi-structured by nature. If documents are stored inside a database server as XML, powerful applications can be developed such as:
- Applications that retrieve documents based on their content.
- Applications that query for partial content, such as finding the section whose title contains the word "Background."
- Applications that aggregate documents.
Such scenarios are becoming feasible with the increase in the development and availability of applications that generate and consume XML. For example, the Microsoft Office 2003 System allows users to generate Microsoft Word, Excel, Visio, and Infopath documents as XML markup.
Why Use Relational Databases for XML Data?
Storing XML data in a relational database provides benefits in the areas of data management and query processing. Microsoft SQL Server provides powerful query and data modification capabilities over relational data. In SQL Server 2005, these capabilities are extended to querying and modifying XML data. This allows your company to leverage investments made over past releases, such as investments in the areas of cost-based optimizations and data storage. For example, indexing techniques in relational databases are well-known. These have been extended to indexing XML data so that queries can be optimized using cost-based decisions.
XML data can interoperate with existing relational data and SQL applications. This means that XML can be introduced into the system as data modeling needs arise without disrupting existing applications. The database server also provides administrative functionality for managing XML data (for example, backup, recovery, and replication).
Native XML support within SQL Server 2005 is necessary to address increasing XML usage. Enterprise application development benefits from the XML support in SQL Server 2005.
The following sections give an overview of XML support in SQL Server 2000 and 2005, describe some of the motivating scenarios for XML usage, and include detailed discussions of the server-side and client-side XML feature sets.
XML Support in SQL Server 2000
This section provides a brief, high-level overview of the XML support in SQL Server 2000 and subsequent Web releases of the SQLXML client-side programming platform that provides rich support for mapping data back and forth between the relational and the XML data models.
Server-Side Support
At the server, XML data is generated from tables and query results by using a FOR XML clause in a SELECT statement. This is ideal for data interchange and Web service applications. The converse of FOR XML is a relational rowset generator function named OpenXML; it extracts values from the XML data into columns of a rowset by evaluating XPath 1.0 expressions. OpenXML is used by applications that shred incoming XML data into tables or for querying by using the Transact-SQL language.
Client-Side Support
The client programming support for SQL Server 2000 is referred to as SQLXML. (For more information, see SQLXML on the Microsoft Developer Network (MSDN)). At the center of this technology is XML view, which is a bi-directional mapping of an XML Schema to relational tables. SQL Server 2000 supports only the mapping of XDR schemas, though support for XSD was added in later Web releases. The XML view allows querying by using a subset of XPath 1.0 where the mapping is used to translate the path expressions into SQL queries on the underlying tables, and the query results are packaged into XML results.
SQL XML also enables you to create XML templates that you can use to create an XML document that has dynamic sections. Within the XML document, you can embed FOR XML queries and XPath 1.0 expressions over mapping queries. When the XML template is executed, the query block is replaced with the result of the query. In this way you can create XML documents that include both some static content and some dynamic content that is data-driven.
In SQL Server 2000, there are two main ways to access SQLXML functionality.
- SQLXMLOLEDB Provider—The SQLXMLOLEDB Provider is an OLE DB provider that exposes Microsoft SQLXML functionality through ADO.
- HTTP Access—SQLXML functionality in SQL Server 2000 can also be accessed via HTTP by using the SQLXML ISAPI filter. By using our configuration tool, you can set up a Web site to receive incoming requests to execute the XML templates, and the FOR XML and XPath 1.0 statements over XML views using HTTP.
Limitations in XML Support
The server and client programming platforms provide rich support for the generation and consumption of XML data based on mapping between tabular and XML data. This handles fairly structured XML data well. In SQLXML, the query language is a subset of XPath 1.0 and has some limitations. For example, the descendant axes (//-operator) are not supported. Consequently, restrictions exist that affect the development of certain solutions. For example, XML document order is not preserved, which is so crucial for applications such as document management. Furthermore, recursive XML schemas are not supported. In spite of these limitations, client SQLXML and server XML functionality have been widely used in application development. SQL Server 2005 addresses many of these limitations, extends the relational-XML interchange features, and provides native XML support.
Overview of XML Support in SQL Server 2005
This section provides a brief, high-level overview of the new XML support in SQL Server 2005, which is complemented by support in the .NET Framework 2.0 and by native client data access such as OLE DB.
XML Data Type
The XML data model has characteristics that make it very hard if not practically impossible to map to the relational data model. XML data has a hierarchical structure that may be recursive; relational databases provide weak support for hierarchical data (modeled as foreign key relationships). Document order is an inherent property of XML instances and must be preserved in query results. This is in contrast with relational data, which is unordered; order must be enforced with additional ordering columns. Re-assembling the result during querying is costly for realistic XML schemas that decompose the XML data into a large number of tables.
SQL Server 2005 introduces a native data type called XML. A user can create a table that has one or more columns of type XML in addition to relational columns. XML variables and parameters are also allowed. XML values are stored in an internal format as large binary objects (BLOBs) in order to support the XML model characteristics, such as document order and recursive structures, more faithfully.
SQL Server 2005 provides XML schema collections as a way to manage W3C XML Schemas as metadata. An XML data type can be associated with an XML schema collection to enforce schema constraints on XML instances. When the XML data is associated with an XML schema collection, it is called typed XML; otherwise it is called untyped XML. Both typed and untyped XML are accommodated within a single framework, the XML data model is preserved, and query processing enforces XML semantics. The underlying relational infrastructure is used extensively for this purpose. It supports interoperability between relational and XML data, thereby making way for more widespread adoption of the XML features.
XML Data Type Query and Data Modification
XML instances can be retrieved by using a Transact-SQL SELECT statement. Five built-in methods on the XML data type are provided to query and modify XML instances.
The XML data type methods accept XQuery. XQuery is an emerging W3C standard language (currently in Last Call). It includes the navigational language XPath 2.0. A language for modifying XML data, such as adding or deleting subtrees and updating scalar values, is also available. Together with a large set of functions, embedded XQuery and data modification languages provide rich support for manipulating XML data.
The XQuery type system is aligned with that of W3C XML schema types. Most of the SQL types are compatible with the XQuery type system (for example, decimal). A handful of types (for example, xs:duration) are stored in an internal format and suitably interpreted to be compatible with the XQuery type system.
The compilation phase checks static type correctness of XQuery expressions and data modification statements, and uses XML schemas for type inferences in the case of typed XML. Static type errors are raised if an expression could fail at run time due to a type safety violation.
XML Indexing
Query execution processes each XML instance at run time. This becomes expensive whenever the XML value is large in size or the query is evaluated on a large number of rows in a table. Consequently, a mechanism for indexing XML columns is provided to speed up queries.
B+trees have been extensively used for indexing relational data. A primary XML index on an XML column creates a B+tree index on all tags, values, and paths of the XML instances in the column. The primary XML index provides efficient evaluation of queries on XML data and reassembly of the XML result from the B+tree, while preserving document order and document structure.
Secondary XML indexes can be created on an XML column to speed up different classes of commonly occurring queries: PATH index for path-based queries, PROPERTY index for property bag scenarios, and VALUE index for value-based queries.
XML Schema Processing
XML columns, variables, and parameters can optionally be typed according to a collection of XML schemas that may be related (for example, by using <xs:import>) or unrelated to one another. Each typed XML instance specifies the target namespace from the XML schema collection it conforms to. During data assignment and modification, the database engine validates the instance according to the XML schema.
XML schema information is used in storage and query optimizations. Typed XML instances contain typed values in the internal, binary representation as well as in XML indexes. This provides efficient processing of typed XML data.
Relational and XML Integration
Users can store both relational and XML data within the same database. Briefly, the database engine knows how to honor the XML data model in addition to the relational data model. Relational data and SQL applications continue to behave correctly upon upgrade to SQL Server 2005. XML data residing in files and text or image columns can be moved into XML data type columns at the server. The XML column can be indexed, queried, and modified by using the XML data type methods.
The database leverages existing relational infrastructure and engine components such as the storage engine and the query processor for XML processing. For example, XML indexes create B+trees and query plans can be viewed in Showplan output. Because data management functionality, such as backup/restore and replication, is integrated into the relational framework, this functionality is available on XML data. In addition, the new data management features, such as database mirroring and snapshot isolation, work with the XML data type to provide a seamless user experience.
Structured data should be stored in tables and relational columns. The XML data type is a suitable choice for semi-structured and markup data using XML when the application needs to perform fine-grained query and modification of the data.
FOR XML and OpenXML enhancements
The existing FOR XML functionality has been enhanced in several ways. It works over XML data type instances and other new SQL types such as [n]varchar(max). For more information about FOR XML enhancements, see What's New in FOR XML in Microsoft SQL Server 2005 on MSDN.
The new TYPE directive generates an XML data type instance that can be assigned to an XML column, variable, or parameter, or queried by using XML data type methods. This allows the nesting of SELECT… FOR XML TYPE statements.
The PATH mode allows users to specify the path in the XML tree where a column's value should appear and—together with the aforementioned nesting—is more convenient to write than FOR XML EXPLICIT.
The directive XSINIL, used in conjunction with ELEMENTS, maps NULL to an element with the attribute xsi:nil="true". Also, the new ROOT directive allows a root node to be specified in all modes of FOR XML. The new XMLSCHEMA directive generates an XSD inline schema. FOR XML in SQL Server 2005 also allows users to specify element names to replace the default <row> in FOR XML RAW mode.
OpenXML functional enhancements consist of the ability to accept the XML data type in sp_preparedocument, and to generate XML and new SQL type columns in the rowset.
Client Access to XML Data Type
Clients can access XML data in the server in several ways. Native SQL client access by using ODBC and OLE DB delivers XML data as a Unicode string. OLE DB also provides ISequentialStream access to XML data for streaming Unicode data.
Managed access via ADO.NET in the .NET Framework 2.0 delivers XML data as a new class called SqlXml. It supports a method named CreateReader() that returns an XmlReader instance to read the returned XML. Equally, DataSet can load instances of the XML data type into columns on the mid-tier that can be edited as an XML document and saved back to SQL Server. Both of these enable SQL queries to be issued to the server to retrieve XML columns for manipulation on the mid-tier.
SOAP access directly to HTTP endpoints in SQL Server 2005 can be used to query, retrieve, and modify XML data.
Both native and managed client technologies provide new interfaces for retrieving the XML schema collection that types an XML column.
Motivating Scenarios for XML Storage
XML data is becoming more pervasive. It can represent customer data, with or without XML schemas that describe the data. The XML data and the XML schemas must be managed together. Quite often, the XML schemas for realistic applications are complex. Mapping such XML schemas to tables and columns is a complex task. Maintaining this mapping over time, when XML schemas change or new ones are added to the system, is troublesome. Quite often, XML data is stored in the file system or in text columns at the database server. Using a text column provides the benefits of data management capabilities, such as replication and backup/restore, but it does not allow query support based on the XML structure of the data. With native XML support, application development using XML becomes faster.
Custom Property Management
Some applications, such as user interface software, allow users to choose from a fixed set of properties. Others allow users to define their own custom properties of interest. These custom properties can be managed nicely if they are stored in XML format. Applications can support more types of properties than just scalar properties. They can also support:
- Multi-valued properties on objects, such as multiple phone numbers.
- Complex properties; for example, the author property of a document might be the author's contact information.
Object properties can be stored in an XML data type column and indexed for efficient query processing.
Data Exchange and Workflow
XML allows a platform-independent way of exchanging data among applications. The data can be modeled as messages with XML markup. Instead of constantly shredding and generating XML messages, it is prudent to store messages in XML format. This fits well with data flow requirements. An XML message reaching a workflow stage carries the current state. Each message is processed, the progress is recorded in the XML content (for example, state change), and the XML data is forwarded to the next stage of workflow processing. Because messages might be of different types or even semi-structured, and have different XML schemas associated with them, mapping them to tables is not always an easy task.
XML-based standards are emerging for different, vertical domains, such as for financial and geo-spatial data. These standards describe the structure of the data, based on which instance data can be queried and updated. Quite often, the actual data is in binary form, while the XML data provides metadata information regarding them.
As a simple example, to pass a table of input parameters to a stored procedure or function, an application converts the data to XML and passes it in as an XML data type parameter. Within the stored procedure or function, the rowset is regenerated from the XML parameter.
Document Management
Suppose a call center maintains patient records and conversations as an XML document. When a patient calls in, the call center wants to recall the previous conversation to set the context of the incoming call. This is possible by querying XML markup, which benefits the application. Furthermore, viewing the details of conversations that occurred on earlier occasions and recording the current conversation are facilitated.
Documents, such as e-mail messages, are semi-structured in nature. Documents with XML markup are becoming easier to create, for example, with Microsoft Office 2003. These XML documents can be stored in XML columns, indexed, queried, and updated. Thus, developers can do more with native XML support.
Server-Side XML Processing in SQL Server 2005
SQL Server 2005 support consists of providing one database in which you can store both relational and XML data.
The XML Data Type
You can create a table with an XML column by using the usual CREATE TABLE statement. The XML column can then be indexed in a special way.
Untyped XML
The SQL Server 2005 XML data type implements the ISO SQL-2003 standard XML data type. As such, it can store not only well-formed XML 1.0 documents but also so-called XML content fragments with text nodes and an arbitrary number of top-level elements. Checks for well-formedness of the data are performed, which does not require the XML data type to be bound to XML schemas. Data that is not well-formed is rejected.
Untyped XML is useful when the schema is not known a priori so that a mapping-based solution is not possible. It is also useful when the schema is known but mapping to a relational data model is very complex and hard to maintain, or multiple schemas exist and are late bound to the data based on external requirements.
Example: Untyped XML Column in Table
The following statement creates a table named docs with an integer primary key named pk and an untyped XML column named xCol.
|
CREATE TABLE docs (pk INT PRIMARY KEY, xCol XML not null)
|
A table can also be created with more than one XML or relational column with or without a primary key.
Typed XML
If you have XML schemas that describe your XML data in an XML schema collection, you can associate the XML schema collection with the XML column to yield typed XML. The XML schemas are used to validate the data, perform more precise type checks than untyped XML during compilation of query and data modification statements, and optimize storage and query processing.
Type information is stored in both the XML BLOB and the XML indexes, and uses more space for XML BLOBs. Typed XML yields better performance with value-based queries (that is, where the search value is more selective than, say, the path in which the value occurs) by avoiding run-time value conversions so that index seeks into the XML indexes become possible.
Typed XML columns, parameters, and variables can store XML documents or content, which you can specify as an option (DOCUMENT or CONTENT, respectively, with CONTENT as the default) at the time of declaration. Furthermore, you must provide the collection of XML schemas. Specify DOCUMENT if each XML instance has exactly one top-level element; otherwise, use CONTENT. The query compiler uses the DOCUMENT flag in type checks to infer Singleton top-level elements.
Example: Typed XML Column in Table
XML columns, variables, and parameters can be bound to a collection of XML schemas. (For more details and examples, see XML schema processing later in this paper.) Suppose myCollection is the name of one such collection. The following statement creates a table named XmlCatalog with an XML column Document typed using myCollection. The typed XML column is also specified to accept XML fragments, not just XML documents.
|
CREATE TABLE XmlCatalog (
ID INT PRIMARY KEY,
Document XML(CONTENT myCollection))
|
Constraining XML Data Type Columns
In addition to typing an XML column, you can use relational (column or row) constraints on typed and untyped XML data type columns. Most SQL constraints are applicable to XML columns as well. The notable exceptions to this are unique, primary key, and foreign key constraints, since XML data type instances cannot be compared. Thus, you can specify an XML column to be nullable or non-nullable, supply a default value, and define CHECK constraints on the column. For example, an untyped XML column can have a CHECK constraint to verify that the stored XML instances confirm to an XML schema.
Use constraints under the following conditions:
- Your business rules cannot be expressed in XML schemas. For example, the delivery address of a flower shop must be within 50 miles of its business location. This can be written as a constraint on the XML column. The constraint may involve XML data type methods.
- Your constraint involves other XML or non-XML columns in the table. An example is the enforcing the constraint so that the value of Customer ID (that is, the result of the path expression /Customer/@CustId) in an XML instance equals the value of an integer in the CustomerID column.
- You want to limit an XML schema collection to the top-level elements or the schema namespaces that are to be permitted in a typed XML column. This is useful when the same XML schema collection is used to type multiple XML columns and each XML column should contain a different top-level element. It is also useful when the insertion of XML instances conforming to only the latest version of an XML schema is allowed.
Example: Constraining XML Column
To ascertain that the <last-name> of an <author> of a <book> is different from the <author>'s <first-name>, the following CHECK constraint CK_name is specified. XML data type methods must be provided within a user-defined function, for which udf_Check_Names() is introduced.
|
CREATE FUNCTION udf_Check_Names (@xmlData XML)
RETURNS int AS
BEGIN
RETURN (SELECT @xmlData.exist('/book/author[first-name = last-name]'))
END
GO
CREATE TABLE docs (pk INT PRIMARY KEY,
xCol XML not null
CONSTRAINT CK_name CHECK (dbo.udf_Check_Names(xCol) = 0))
GO
|
Example: Limiting XML Schema Collection
Suppose the XML column Document in the table XmlCatalog (described in the typed XML column in table example) is to be constrained to allow the storage of only a single <dvdstore> element per row and to prevent the insertion of <bookstore> elements. The following CHECK constraint achieves this.
|
CREATE FUNCTION udf_Check_Top_Level_Nodes (
@xmlData XML(CONTENT myCollection))
RETURNS int AS
BEGIN
RETURN (@xmlData.value ('count(/*)', 'int'))
END
CREATE FUNCTION udf_Check_Dvdstore_Nodes (
@xmlData XML(CONTENT myCollection))
RETURNS int AS
BEGIN
RETURN (@xmlData.value('declare default element namespace "http://myDVD";
count(/dvdstore)', 'int'))
END
ALTER TABLE XmlCatalog
ADD CONSTRAINT dvdstore_only_check
CHECK (dbo.udf_Check_Top_Level_Nodes (Document) = 1 AND
dbo.udf_Check_Dvdstore_Nodes (Document) = 1)
|
Text Encoding
SQL Server 2005 stores XML data as Unicode (UTF-16). XML data retrieved from the server comes out in UTF-16 encoding as well. If you want a different encoding, you must perform the necessary conversion, either by casting or on the mid-tier, after retrieving the data. For example, you could cast your XML data to varchar type on the server, in which case the database engine serializes the XML with an encoding determined by the collation of the varchar.
Storing XML Data
You can supply an XML value for an XML column, parameter, or variable in multiple ways:
- As a character or binary SQL type that is implicitly converted to XML data type.
- As the content of a file.
- As the output of the XML publishing mechanism FOR XML with the TYPE directive that generates an XML data type instance.
The supplied value is checked for well-formedness. An XML column by default allows both XML documents and XML fragments to be stored. If the data fails the well-formedness check, it is rejected with an appropriate error message.
For typed XML, the supplied value is checked for conformance to XML schemas that are registered with the XML schema collection that is typing the XML column. The XML instance is rejected if it fails this validation. Furthermore, the DOCUMENT flag on typed XML restricts accepted values to XML documents only while CONTENT allows both XML document and content to be supplied.
Example: Inserting Data into Untyped XML Column
The following statement inserts a new row into the table named docs with the value 1 for the integer column pk and a <book> instance for the XML column. The <book> data, supplied as a string, is implicitly converted to XML data type and checked for well-formedness during insertion.
|
INSERT INTO docs VALUES (1, '<book genre="security"
publicationdate="2002" ISBN="0-7356-1588-2">
<title>Writing Secure Code</title>
<author>
<first-name>Michael</first-name>
<last-name>Howard</last-name>
</author>
<author>
<first-name>David</first-name>
<last-name>LeBlanc</last-name>
</author>
<price>39.99</price>
</book>')
INSERT INTO docs VALUES (2,
'<doc id="123">
<sections>
<section num="1"><title>XML Schema</title></section>
<section num="3"><title>Benefits</title></section>
<section num="4"><title>Features</title></section>
</sections>
</doc>')
|
Example: Inserting Data into Untyped XML Column from File
The INSERT statement in the following code segment reads the content of file C:\temp\xmlfile.xml as a BLOB by using OPENROWSET. A new row is inserted into the table named docs with a value of 10 for the primary key and the BLOB for the XML column xCol. A well-formedness check occurs when file content is assigned to the XML column.
|
INSERT INTO docs
SELECT 10, xCol
FROM (SELECT * FROM OPENROWSET
(BULK 'C:\temp\xmlfile.xml',
SINGLE_BLOB) AS xCol) AS R(xCol)
|
Example: Inserting Data into Typed XML Column
Typed XML columns require that the XML instance data specify the target namespace of the XML schema that is used to type it (the namespace may be empty). In the following code segment, this is done via the namespace declaration xmlns=http://myDVD.
|
INSERT XmlCatalog VALUES(2,
'<?xml version="1.0"?>
<dvdstore xmlns="http://myDVD">
<dvd genre="Comedy" releasedate="2003">
<title>My Big Fat Greek Wedding</title>
<price>19.99</price>
</dvd>
</dvdstore>')
|
Example: Storing XML Data Generated Using FOR XML with TYPE Directive
FOR XML has been enhanced with a TYPE directive to generate the result as an XML data type instance. The resulting XML can be assigned to an XML column, variable, or parameter. In the following statement, the XML instance generated by using FOR XML TYPE is assigned to an XML data type variable @xVar. The variable can be queried using XML data type methods.
|
DECLARE @xVar XML
SET @xVar = (SELECT * FROM docs FOR XML AUTO,TYPE)
|
Storage Representation
XML data type instances are stored in an internal, binary representation that is streamable and optimized for efficient parsing. Tags are mapped to integer values and the mapped values are stored in the internal representation. This yields some compression of the data, as well.
For untyped XML, node values are stored as Unicode (UTF-16) strings, so that run-time type conversion is required to perform operations. For example, to evaluate the predicate /book/price > 9.99, the value of the book's price is converted to decimal. On the other hand, for typed XML, values are encoded in the type specified in the XML schemas. This makes parsing of the data much more efficient and avoids run-time conversion.
The stored binary form is limited to 2 GB per XML instance, which can accommodate most XML data. Furthermore, the depth of XML hierarchy is limited to 128 levels.
The InfoSet content of the XML data is preserved. It may not be an exact copy of the text XML, since the following information is not retained: insignificant white spaces, order of attributes, namespace prefixes, and XML declaration. (For more information about InfoSet, see the WC3 recommendations for XML information at http://www.w3.org/TR/xml-infoset).
Data Modeling Considerations
Quite often, a combination of relational and XML data type columns is appropriate for data modeling. Some of the values from your XML data can be stored in relational columns, and the rest, or the entire XML value, stored in an XML column. This may yield better performance and locking characteristics.
Values within the XML data can be promoted to computed columns in the same table for Singleton values (that is, single-valued properties). A multi-valued property requires a separate table for the property, which must be populated and maintained by using triggers. Queries need to be written directly against the property table.
The granularity of the XML data stored in an XML column is critical for locking and update characteristic. SQL Server employs the same locking mechanism for both XML and non-XML data. When the granularity is large, locking large XML instances for updates causes throughput to decline in a multi-user scenario. On the other hand, severe decomposition loses object encapsulation and raises re-assembly cost.
Querying and Modifying XML Data
Querying XML instances stored in an XML column requires parsing binary XML data in the column. Parsing binary XML is much faster than parsing the text form of the XML data. XML indexing avoids reparsing and is discussed in Indexing XML data later in this paper.
Methods on XML Data Type
You can retrieve entire XML values or you can retrieve parts of XML instances. This is possible by using four XML data type methods that take an XQuery expression as argument: query(), value(), exist() and nodes(). A fifth method, modify(), allows modification of XML data and accepts an XML data modification statement as input.
- The query() method is useful for extracting parts of an XML instance. The XQuery expression evaluates to a list of XML nodes. The subtree rooted at each of these nodes is returned in document order. The result type is untyped XML.
- The value() method extracts a scalar value from an XML instance. It returns the value of the node the XQuery expression evaluates to. This value is converted to a Transact-SQL type specified as the second argument of the value() method.
- The exist() method is useful for existential checks on an XML instance. It returns 1 if the XQuery expression evaluates to non-null node list; otherwise it returns 0.
- The nodes() method yields instances of a special XML data type, each of which has its context set to a different node that the XQuery expression evaluates to. The special XML data type supports the query(), value(), nodes(), and exist() methods, and can be used in count(*) aggregations and NULL checks. All other uses result in an error.
- The modify() method permits modifying parts of an XML instance, such as adding or deleting subtrees, or replacing scalar values such as the price of a book from 9.99 to a 39.99.
Example: Using Query() Method
Consider the following query on the XML column named xCol of table docs that extracts <section> elements anywhere under the <doc> element whose id is 123. The query also retrieves the value from the integer primary key column. The query() method in the SELECT list is evaluated for each row in the table yielding a sequence of <section> elements, which, together with their subtrees, are retrieved in document order. Each XML instance without a <doc> element having id 123 or without a <section> element below it returns no result; the return value of the query() method is empty XML.
|
SELECT pk, xCol.query('/doc[@id = 123]//section')
FROM docs
|
Empty XML values can be filtered in an outer SELECT statement. Alternatively, the exist() method may be used, as shown in the next example.
Example: Using Exist() Method
Consider the following query. It involves the query() and exist() methods on the XML column xCol of table docs. The exist() method evaluates the path expression /doc[@id = 123], checking for the existence of a top-level <doc> element that has an attribute named id with the value 123. For each such row, the query() method in the SELECT clause is evaluated; in this example, the query() method yields a sequence of <section> elements anywhere under the <doc> element. Any row that returns 0 from the exist() method is skipped.
|
SELECT xCol.query('/doc[@id = 123]//section')
FROM docs
WHERE xCol.exist ('/doc[@id = 123]') = 1
|
Example: Using Value() Method
The following query extracts the title of the third section of a document as Unicode string using the value() method. The SQL type nvarchar(max) of the result is specified as the second argument of the value() method. The XQuery function data() extracts a scalar value from the <title> node.
|
SELECT xCol.value(
'data((/doc//section[@num = 3]/title)[1])', 'nvarchar(max)')
FROM docs
|
Example: Using GROUP BY with XML Data Type Methods
XML data type methods are not allowed in the Transact-SQL GROUP BY clause. However, you can extract values from an XML column in a subquery, alias the grouping columns, and use the aliases in the GROUP BY clause. The following query illustrates this by computing the number of books published by authors with the same first name.
|
SELECT Fname, count(Fname)
FROM
(SELECT nref.value('(author/first-name)[1]', 'nvarchar(max)') Fname
FROM docs CROSS APPLY xCol.nodes('/book') T(nref)
WHERE nref.exist ('author/first-name') = 1) Result
GROUP BY FName
ORDER BY Fname
|
Example: Executing SQLCMD
Executing XQuery and XML data modification statements requires that the connection option QUOTED_IDENTIFIER be ON. The default value of this option in SQLCMD is OFF. This must be changed to ON by using the –I switch.
|
sqlcmd -E -I -d <database> -Q "SELECT xCol.query('//author') FROM docs" |
Example: Difference in Behavior Between cast() and value() Method
The query() method returns XML data type instances, which have special XML characters entitized when converted to a string type. A value() method, on the other hand, returns an SQL type value, which does not entitize the special characters. This difference is evident in event notification. While the first of the following two queries may contain entitized carriage returns:
|
SELECT EVENTDATA().value('
(/EVENT_INSTANCE/TSQLCommand/CommandText/text())[1]','nvarchar(max)')
|
the second query does not:
|
SELECT CAST (EVENTDATA().query('
/EVENT_INSTANCE/TSQLCommand/CommandText/text()') AS nvarchar(max))
|
The XQuery Language
There are numerous sources of XML from Office documents stored on the file system, Web services, or configuration files. In fact, data is being increasingly generated either in XML format or as virtual XML documents. To cope with this increasing amount of data with a powerful query language, XQuery was conceived. The justification for XQuery is described in the W3C XQuery Language specification as:
- A query language that uses the structure of XML intelligently can express queries across all kinds of data, whether physically stored in XML or viewed as XML via middleware. This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources.
- XQuery is designed to meet the requirements identified by the W3C XML Query Working Group XML Query 1.0 Requirements and the use cases in XML Query Use Cases. It is designed to be a language in which queries are concise and easily understood. It is also flexible enough to query a broad spectrum of XML information sources, including both databases and documents.
- XQuery can also be summarized in the following statement: The XQuery language is to XML as the SQL language is to relational databases.
A subset of XQuery embedded in Transact-SQL is the language supported for querying the XML data type. The language is under development (currently in Last Call) by the World Wide Web Consortium (W3C) with the participation of all major database vendors including Microsoft Corporation. Our implementation is aligned with the July 2004 draft of XQuery.
XQuery includes XPath 2.0 as a navigation language. The implementation of XQuery by SQL Server 2005 provides constructs for iteration over nodes (FOR), node check (WHERE), returning values (RETURN), and sorting (ORDER BY). It also provides element construction for reshaping data during querying.
SQL Server 2005 also provides language constructs for data modification language (DML) of the XML data type. (For more information, see XML Data Modification later in this document). The following example shows how to use XQuery on the XML data type.
Example: Using Rich Language Constructs in XQuery
The following query shows several XQuery language constructs used together. It returns the title, wrapped in a new tag <topic>, of sections with section number 3 and higher from a document with id 123.
|
SELECT pk, xCol.query('
FOR $s in /doc[@id = 123]//section
WHERE $s/@num >= 3
RETURN <topic>{data($s/title)}</topic>')
FROM docs
|
The FOR iterates over all <section> elements under <doc> elements with id 123, and binds each such <section> to the variable $s. The WHERE clause ensures that the section number (@num attribute of the <section> element) is 3 or higher. The query returns the value of the section <title> in document order wrapped in a constructed element called <topic>.
Query Compilation and Execution
The SQL statement is parsed by the SQL parser. When it encounters the XQuery expression, it jumps into the XQuery compiler, which then compiles the XQuery expression. This yields a query tree that is grafted into the tree for the overall query.
The overall query tree undergoes query optimization and produces a physical query plan that it picked based on cost-based estimates. The Showplan output shows mostly relational operators and some new operators such as UDX for XML processing.
Query execution is tuple-oriented as in the rest of the relational framework. The WHERE clause is evaluated on each row of the table named docs; this involves parsing the XML BLOB at run time to evaluate XML data type methods. If the condition is satisfied, the row is locked and the SELECT clause is evaluated in the row. The result is produced as XML data type for query() method and converted into the specified target type for value() method.
If, on the other hand, the row does not satisfy the conditions in the WHERE clause, it is skipped and execution moves to the next row.
XML Data Modification
SQL Server 2005 provides constructs for data modification as an extension to XQuery. Subtrees can be inserted before or after a specified node, or as the leftmost or rightmost child. Furthermore, a subtree can be inserted into a parent node, in which case it becomes the rightmost child of the parent. Attribute, element, and text node insertions are all supported.
Deletion of subtrees is supported. In this case, the entire subtree is removed from the XML instance.
Scalar values can be replaced with new scalar values.
Example: Insertion of Subtree into XML Instances
This example shows the use of the modify() method to insert a new <section> element to the right of the <section> element whose number is 1.
|
UPDATE docs SET xCol.modify('
insert
<section num="2">
<title>Background</title>
</section>
after (/doc//section[@num=1])[1]')
|
Example: Update Price of this Book to $49.99
The following UPDATE statement replaces the <price> of a book whose ISBN is 1-8610-0311-0 to $49.99. The XML instance is typed with the XML schema http://myBooks, hence the namespace declaration in the XML data modification statement.
|
UPDATE XmlCatalog
SET Document.modify ('
declare namespace bk = "http://myBooks";
replace value of (/bk:bookstore/bk:book
[@ISBN="1-861003-11-0"]/bk:price)[1] with 49.99')
|
Type Checking and Static Errors
XQuery introduces type checking. The compilation phase checks the static type correctness of XQuery expressions and data modification statements, and uses XML schemas for type inferences in case of typed XML. It raises static type errors if an expression could fail at run time due to a type safety violation. Examples of static errors are the addition of a string to an integer, receiving a sequence of values where the operation expects a single value, and querying for a non-existent node for typed data. Explicit casting to the proper type is a workaround for static errors resulting from type mismatches. XQuery run-time errors are converted into empty sequences.
Location steps, function parameters, and operators (for example, eq) that require Singletons return an error if the compiler cannot determine whether a Singleton is guaranteed at run time. The problem arises often with untyped data. For example, the lookup of an attribute requires a Singleton parent element; an ordinal selecting a single parent node is adequate.
Example: Type Checks in value() Method
The following query on an untyped XML column requires an ordinal specification on //author/last-name since the value() method expects a Singleton node as the first argument. Without it, the compiler cannot determine whether only one <last-name> node will occur at run time.
|
SELECT xCol.value('(//author/last-name)[1]',
'nvarchar(50)') LastName
FROM docs
|
The evaluation of the node()-value() combination to extract attribute values may not require the ordinal specification, as shown in the next example.
Example: Known Singleton
The nodes() method in the following statement generates a separate row for each <book> element. The value() method evaluated on a <book> node extracts the value of @genre, which, being an attribute, is a Singleton.
|
SELECT nref.value('@genre', 'varchar(max)') LastName
FROM docs CROSS APPLY xCol.nodes('//book') AS R(nref)
|
Binding Relational Data in XQuery and XML DML
When your data resides in a combination of relational and XML data type columns, you might want to write queries that combine relational and XML data processing. You can convert the data in relational and XML columns to an XML data type instance using FOR XML with the TYPE directive and query it by using XQuery. Conversely, you can generate a rowset from XML values and query it by using Transact-SQL.
A more convenient and efficient way of writing cross-domain queries is to use the value of an SQL variable or column within XQuery or XML data modification expressions as follows:
- Apply the value of a SQL variable in your XQuery or XML DML expression using sql:variable().
- Use the values from a relational column in XQuery or XML DML context with sql:column().
This approach allows applications to parameterize queries, as shown in the following example. sql:column() is used in a similar way and provides additional benefits. Indexes over the column can be used for efficiency as decided by the cost-based query optimizer. Furthermore, a computed column can be used.
XML and user-defined types cannot be used with sql:variable() and sql:column().
Example: Binding Relational Data Using sql:variable()
In this query, the ISBN of a <book> element is passed in using a SQL variable named @isbn. Instead of using a constant, sql:variable() supplies the value of the ISBN, and the query can be used to search for any ISBN, not just the one whose ISBN is 0-7356-1588-2.
|
DECLARE @isbn varchar(20)
SET @isbn = '0-7356-1588-2'
SELECT xCol
FROM docs
WHERE xCol.exist ('/book[@ISBN = sql:variable("@isbn")]') = 1
|
Example: Specifying Ordinal Position Using sql:variable()
The following query retrieves the <last-name> of the N-th <author> of a <book> where the value of N is specified by using sql:variable():
|
DECLARE @aNum int
SET @aNum = 2
SELECT xCol.value ('(/book/author,
[sql:variable("@aNum")]/last-name)[1]''nvarchar(64)')
FROM docs
|
Rowset Generation from XML Data
In custom property management and data interchange scenarios, applications quite often map some part of the XML data to a rowset. For example, to pass a table of input parameters to a stored procedure or function, an application converts the data to XML and passes it in as an XML data type parameter. Within the stored procedure or function, the rowset is regenerated from the XML parameter.
SQL Server 2000 provides OpenXml() for this purpose. It is a facility for generating a rowset from an XML instance by specifying the relational schema for the rowset and how values inside the XML instance map to columns in the rowset.
Alternatively, the nodes() method can be used to generate node contexts within an XML instance, and use the node contexts in value(), query(), exist(), and nodes() methods to generate the desired rowset. The nodes() method accepts an XQuery expression, evaluates it on each XML instance in an XML column, and uses XML indexes effectively. The next example illustrates the use of the nodes() method for rowset generation.
Example: Extract Properties from XML Instance
Suppose you want to extract first and last names of authors, whose first name is not "David", as a rowset consisting of two columns, FirstName and LastName. Using the nodes() and value() methods, you can achieve this as follows:
|
SELECT nref.value('first-name[1]', 'nvarchar(50)') FirstName,
nref.value('last-name[1]', 'nvarchar(50)') LastName
FROM docs CROSS APPLY xCol.nodes('//author') AS R(nref)
WHERE nref.exist('.[first-name != "David"]') = 1
|
In this example, nodes('//author') yields a rowset of references to <author> elements for each XML instance. The first and last names of authors are obtained by evaluating value() methods relative to those references. For good performance, the XML column should be indexed, which is the topic of the next section.
Example: Extract Properties from XML Variable
The CROSS APPLY operator in the previous query is not required when properties are extracted from an XML variable or parameter. This example considers an XML variable named @xVar to which a <book> instance is assigned and retrieves the <first-name> and <last-name> of authors.
|
DECLARE @xVar XML
SET @xVar =
'<book genre="security" publicationdate="2002" ISBN="0-7356-1588-2">
<title>Writing Secure Code</title>
<author>
<first-name>Michael</first-name>
<last-name>Howard</last-name>
</author>
<author>
<first-name>David</first-name>
<last-name>LeBlanc</last-name>
</author>
<price>39.99</price>
</book>'
SELECT nref.value('first-name[1]', 'nvarchar(50)') FirstName,
nref.value('last-name[1]', 'nvarchar(50)') LastName
FROM @xVar.nodes('//author') AS R(nref)
WHERE nref.exist('.[first-name != "David"]') = 1
|
Indexing XML Data
XML data is stored in an internal binary form and can be up to 2 GB in storage. Each query parses the XML blob at run time one or more times in each row of the table. This makes query processing slow. If querying is common in the workload, it is beneficial to index the XML column, although the cost of XML index maintenance during data modification must be taken into account.
XML indexes are created by using a new DDL statement on typed and untyped XML columns. This creates a B+tree for all XML instances in the column. The first index on an XML column is the primary XML index. Using it, three types of secondary XML indexes are supported on the XML column to speed up common classes of queries, as described in the following section.
Primary XML Index
The primary XML index requires a clustered index on the primary key of the base table (that is, the table in which the XML column is defined). It creates a B+tree on a subset of the Infoset items of XML nodes. Columns of the B+tree represent tags such as element and attribute names, node values, and node types. Other columns capture the document order and structure in the XML data, and the path from the root of the XML instance to each node for efficient evaluation of path expressions. The primary key of the base table is duplicated in the primary XML index to correlate index rows with base table rows.
Tags and type names given in XML schemas are mapped to integer values and the mapped values are stored in the B+tree to optimize storage. The path column in the index stores a concatenation of the mapped values in reversed order, that is, from a node to the root of the XML instance. The reverse representation allows path values to be matched when the path suffix is known (for example, in a path expression such as //author/last-name).
If the base table is partitioned, the primary XML index is partitioned the same way; that is, using the same partitioning function and partitioning scheme.
Full XML instances are retrieved from XML columns (for example, SELECT * FROM docs or SELECT xCol FROM docs). Queries involving XML data type methods use the primary XML index, returning scalar values, or XML subtrees from the index itself.
Example: Creating Primary XML Index
The following statement creates an XML index named idx_xCol on the XML column xCol of table docs.
|
CREATE PRIMARY XML INDEX idx_xCol on docs (xCol)
|
Secondary XML Indexes
Once the primary XML index is created, secondary XML indexes can be created to speed up different classes of queries within a workload. Three types of secondary XML indexes—PATH, PROPERTY, and VALUE—benefit path-based queries, custom property management scenarios, and value-based queries, respectively.
- The PATH index builds a B+-tree on the columns (path, value) of the primary XML index. A value for the path is computed from a path expression; a node's value, if one is provided, is used as well. Since the leading fields of the PATH index are known, index seek into the PATH index speeds up evaluation of the path expression. The most common case is the use of the exist() method on XML columns in the WHERE clause of a SELECT statement.
- The PROPERTY index creates a B+-tree on the columns (PK, path, value) of the primary XML index, where PK is the primary key of the base table. This index benefits property value lookups within an XML instance.
- The VALUE index creates a B+-tree on the columns (value, path) of the primary XML index. This index benefits queries where a node's value is known but its path is imprecisely specified in the query. This typically occurs with descendant axes lookups, such as //author/last-name [. ="Howard"], where <author> elements can occur at any level of the hierarchy. It also occurs in wildcard queries, such as /book [@* = "novel"], where the query looks for <book> elements with some attribute having the value "novel". Furthermore, the VALUE index is useful for value-based range scan of typed XML.
Up to 128 levels of the XML hierarchy are accommodated; XML instances containing longer paths are rejected during insertion and modification.
Similarly, up to the first 128 bytes of a node's value are indexed; longer values are accommodated within the system and are not indexed.
Example: Path-Based Lookup
Suppose the following query is common in your workload.
|
SELECT xCol
FROM docs
WHERE xCol.exist ('/book[@genre = "security"]') = 1
|
The path expression /book/@genre and the value securitycorrespond to the key fields of the PATH index. Consequently, a secondary XML index of type PATH is helpful for this workload.
|
CREATE XML INDEX idx_xCol_Path on docs (xCol)
USING XML INDEX idx_xCol FOR PATH
|
Example: Fetching Properties of an Object
Consider the following query that retrieves the properties genre, title, and ISBN of a book from each row in table T.
|
SELECT xCol.value ('(/book/@genre)[1]', 'varchar(50)'),
xCol.value ('(/book/title)[1]', 'varchar(50)'),
xCol.value ('(/book/@ISBN)[1]', 'varchar(50)')
FROM docs
|
The property index is useful in this case and is created as follows:
|
CREATE XML INDEX idx_xCol_Property on docs (xCol)
USING XML INDEX idx_xCol FOR PROPERTY
|
Example: Value-Based Query
In the following query, the descendant-or-self axis (//-operator) specifies a partial path so that the lookup based on the value of ISBN benefits from the use of the VALUE index.
|
SELECT xCol
FROM docs
WHERE xCol.exist ('//book/@ISBN[. = "0-7356-1588-2"]') = 1
|
The VALUE index is created as follows:
|
CREATE XML INDEX idx_xCol_Value on docs (xCol)
USING XML INDEX idx_xCol FOR VALUE
|
Content Indexing
You can create a full-text index on XML columns; this indexes the content of the XML values while ignoring the XML markup. Attribute values are not full-text indexed (since they are considered part of the markup) and element tags are used as token boundaries. You can create both XML and full-text indexes on an XML column, and combine full-text search with XML index usage. Full-text search using CONTAINS() and XQuery fn:contains() have different semantics. The latter as implemented is a case-sensitive, substring search, while the former is a token match using stemming.
Use the full-text index as the first filter to narrow down the choices and then apply XQuery to filter further. The overall semantics is that of case-sensitive substring searches. Alternatively, a part of the XML data can be promoted to a computed XML column XC on which a full-text index is created. This defines XC as the full-text search context.
Example: Create Full-Text Index on XML Column
The steps for creating full-text index on an XML column are the same as those for other SQL type columns. A unique key column on the base table is required. The DDL statements are as follows, in which PK__docs__7F60ED59 is the single-column primary key index of the table.
|
CREATE FULLTEXT CATALOG ft AS DEFAULT
CREATE FULLTEXT INDEX ON dbo.docs (xCol) KEY INDEX PK__docs__7F60ED59
|
Example: Combining Full-Text Search with XML Querying
The following query checks that an XML value contains the word Secure in the title of a book.
|
SELECT *
FROM docs
WHERE CONTAINS(xCol,'Secure')
AND xCol.exist('/book/title/text()[fn:contains(.,"Secure")]') =1
|
The CONTAINS() method uses the full-text index to subset the XML values that contain the word Secure anywhere in the document. The exist() method ensures that the word Secure occurs in the title of a book.
Example: Search Context Defined by XML Data Type Methods
An alternative to the exist() method is to use the LIKE operator on the text value of the context node as shown in the following code. This yields a case-insensitive, substring search.
|
SELECT pk, xCol
FROM docs CROSS APPLY xCol.nodes('/book/title/text()') title(tRef)
WHERE CONTAINS (xCol, 'Secure')
AND title.tRef.value ('.', 'NVARCHAR(MAX)') LIKE '%Secure%'
|
Query Execution Using XML Indexes
XML indexes speed up query execution. Queries are always compiled against the primary XML index on an XML column, if one exists. A single query plan is produced for the entire query (both relational and XML parts), which is optimized by using the database engine's cost-based optimizer. Secondary XML indexes are selected based on the query optimizer's cost estimates.
Catalog Views for XML Indexes
Catalog views exist to provide metadata information regarding XML indexes. The catalog view sys.indexes contains entries for XML indexes with the index "type" 3. The name column contains the name of the XML index.
XML indexes are also recorded in the catalog view sys.xml_indexes, which contains all the columns of sys.indexes and a few special ones meaningful for XML indexes. The value NULL in the column secondary_type indicates a primary XML index; the values 'P', 'R', and 'V' stand for PATH, PROPERTY, and VALUE secondary XML indexes, respectively. The column secondary_type_desc contains NULL for a primary XML index, and the strings "PATH," "PROPERTY," and "VALUE" for the three types of secondary XML indexes.
Space usage of XML indexes can be found by using the table-valued function sys.dm_db_index_physical_stats(). This function provides information such as the number of disk pages occupied, average row size in bytes, number of records, and other information for all index types, including XML indexes. This information is available for each database partition; XML indexes use the same partitioning scheme and partitioning function of the base table.
Example: Space Usage of XML Indexes
|
SELECT sum (page_count)
FROM sys.dm_db_index_physical_stats (db_id(), object_id('docs'),
DEFAULT, DEFAULT, 'DETAILED') SDPS
JOIN sys.xml_indexes SXI ON (SXI.index_id = SDPS.index_id)
WHERE SXI.name = 'idx_xCol_Path'
|
This statement yields the number of disk pages occupied by the XML index idx_xCol_Path in table T across all partitions. Without the sum() function, the result would return the disk page usage per partition.
XML Schema Processing
XML schemas are optional in the system. As previously mentioned, XML data not bound to XML schemas is considered to be untyped—XML node values are stored as Unicode strings, and XML instances are checked for well-formedness. An untyped XML column can be indexed.
XML typing is done by associating an XML data type with XML schemas that are registered with an XML schema collection. A new DDL statement allows the creation of an XML schema collection with which one or more XML schemas may be registered. An XML column, parameter, or variable bound to an XML schema collection is typed according to all the XML schemas in the collection. Within an XML schema collection, the type system identifies each XML schema using its target namespace.
Each top-level XML element in an XML instance must specify a possible empty target namespace it conforms to. Data is validated during insertion and modification according to the target namespace of each top-level element. The binary XML representation encodes typed values based on the associated XML schema information and is fully described, so that reparsing it is more efficient as compared to reparsing untyped XML. Values are properly typed in XML indexes as well (for example, /book/price is stored as decimal if it is defined in the XML schema as xs:decimal).
During query compilation, XML schemas are used for type checking and static errors are issued for type mismatches. The query compiler also uses XML schemas for query optimizations.
The database engine's metadata subsystem contains XML type information such as XML schema collections and their contained XML schemas, and the mapping between the primitive XSD and relational type systems. Almost all of the W3C XML Schema 1.0 specification is supported. (For more information on the W3C XML Schema 1.0 specification, see http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/ and http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/) Comments and annotations in XML schema documents are not preserved, and key/keyref is not supported.
XML Schema Collections
An XML schema collection is a metadata entity, scoped by a relational schema. It contains one or more XML schemas that may be related (for example, using <xs:import>) or unrelated. Individual XML schemas within an XML schema collection are identified by using their target namespace. XML schema collections are securable entities, much like tables.
An XML schema collection is created by using CREATE XML SCHEMA COLLECTION syntax and providing one or more XML schemas. You can then type an XML column by using the XML schema collection. This design yields a flexible data model in which XML that is typed according to different XML schemas can be stored in the same column. This is especially convenient when the number of XML schemas is large. Furthermore, this design supports XML schema evolution to some extent.
Additionally, the option DOCUMENT or CONTENT on a typed XML column specifies whether XML trees or fragments, respectively, can be stored in the XML column. The default behavior is CONTENT. For DOCUMENT, each XML instance must specify the target namespace of its top-level element, according to which it is validated and typed. For CONTENT, on the other hand, each top-level element can specify any one of the target namespaces in the XML schema collection. The XML instance is validated and typed according to all the target namespaces occurring in the instance.
Example: Creating an XML Schema Collection
Suppose you want to use an XML schema with target namespace http://myBooks to type your XML instances. Create an XML schema collection named myCollection and supply the XML schema as the content of myCollection as shown in the following code segment.
|
CREATE XML SCHEMA COLLECTION myCollection AS
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://myBooks"
elementFormDefault="qualified"
targetNamespace="http://myBooks">
<xsd:element name="bookstore" type="bookstoreType" />
<xsd:complexType name="bookstoreType">
<xsd:sequence maxOccurs="unbounded">
<xsd:element name="book" type="bookType" />
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="bookType">
<xsd:sequence>
<xsd:element name="title" type="xsd:string" />
<xsd:element name="author" type="authorName" />
<xsd:element name="price" type="xsd:decimal" />
</xsd:sequence>
<xsd:attribute name="genre" type="xsd:string" />
<xsd:attribute name="publicationdate" type="xsd:string" />
<xsd:attribute name="ISBN" type="xsd:string" />
</xsd:complexType>
<xsd:complexType name="authorName">
<xsd:sequence>
<xsd:element name="first-name" type="xsd:string" />
<xsd:element name="last-name" type="xsd:string" />
</xsd:sequence>
</xsd:complexType>
</xsd:schema>'
|
A new metadata entity is created for myCollection in which the XML schema is registered. A new row can be added to the table named XmlCatalog (for the table definition, see Example: typed XML column in table earlier in this document) as follows:
|
INSERT XmlCatalog VALUES(1, '<?xml version="1.0"?>
<bookstore xmlns="http://myBooks">
<book genre="autobiography" publicationdate="1981"
ISBN="1-861003-11-0">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel" publicationdate="1967"
ISBN="0-201-63361-2">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
<book genre="philosophy" publicationdate="1991"
ISBN="1-861001-57-6">
<title>The Gorgias</title>
<author>
<first-name>Sidas</first-name>
<last-name>Plato</last-name>
</author>
<price>9.99</price>
</book>
</bookstore>
')
|
Modifying XML Schema Collections
The ALTER XML SCHEMA COLLECTION statement supports extending an XML schema in an XML schema collection with new top-level schema components, and registering new XML schemas with the XML schema collection. This is illustrated in the following example.
Example: Altering an XML Schema Collection
The following statement shows how a new XML schema with target namespace http://myDVD can be added to the XML schema collection myCollection.
|
ALTER XML SCHEMA COLLECTION myCollection ADD
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://myDVD"
elementFormDefault="qualified"
targetNamespace="http://myDVD">
<xsd:element name="dvdstore" type="dvdstoreType" />
<xsd:complexType name="dvdstoreType">
<xsd:sequence maxOccurs="unbounded">
<xsd:element name="dvd" type="dvdType" />
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="dvdType">
<xsd:sequence>
<xsd:element name="title" type="xsd:string" />
<xsd:element name="price" type="xsd:decimal" />
</xsd:sequence>
<xsd:attribute name="genre" type="xsd:string" />
<xsd:attribute name="releasedate" type="xsd:string" />
</xsd:complexType>
</xsd:schema>'
|
Catalog Views for XML Schema Collections
SQL catalog views of XML schema collections allow users to reconstruct the contents of individual XML schema namespaces. XML schema collections are enumerated in the catalog view sys.xml_schema_collections. The XML schema collection "sys" is defined by system and contains predefined namespaces that can be used in all user-defined XML schema collections without having to load them explicitly. This list contains the namespaces for xml, xs, xsi, fn, and xdt.
Two other catalog views worth mentioning are sys.xml_schema_namespaces, which enumerates all namespaces within each XML schema collection; and sys.xml_schema_components, which enumerates all XML schema components within each XML schema.
The built-in function XML_SCHEMA_NAMESPACE() accepts the names of a relational schema, an XML schema collection, and optionally the target namespace of an XML schema. It returns an XML data type instance containing the XML schema. If the target namespace argument is left out, the built-in function returns an XML instance that contains all the XML schemas in the XML schema collection, except for the predefined XML schemas.
Example: Enumerate XML Namespaces in XML Schema Collection
Use the following query for the XML schema collection myCollection.
|
SELECT XSN.name
FROM sys.xml_schema_collections XSC
JOIN sys.xml_schema_namespaces XSN ON
(XSC.xml_collection_id = XSN.xml_collection_id)
WHERE XSC.name = 'myCollection'
|
Example: Output a Specified XML Schema from an XML Schema Collection
The following statement outputs the XML schema with target namespace http://myBooks from the XML schema collection myCollection within the (relational) schema dbo.
|
SELECT XML_SCHEMA_NAMESPACE (N'dbo', N'myCollection',
N'http://myBooks')
|
Access Control on XML Schema Collections
XML schema collections can be secured like any SQL object by using the security model in SQL Server 2005. You can grant a principal the privilege to create XML schema collections within a database. Each XML schema collection supports the permissions ALTER, CONTROL, TAKE OWNERSHIP, REFERENCES, EXECUTE, and VIEW DEFINITION.
- The ALTER permission is required to execute an ALTER XML SCHEMA COLLECTION statement.
- The TAKE OWNERSHIP permission is required to transfer ownership of the XML schema collection from one principal to another by executing an ALTER AUTHORIZATION statement.
- The REFERENCES permission authorizes the principal to use the XML schema collection wherever schema binding is required, such as to type or constraint XML columns and parameters.
- The EXECUTE permission is required to validate values inserted or updated by the principal against the XML schema collection. It is also required to query values from a typed XML column, variable, or parameter using the XML data type.
- The VIEW DEFINITION permission allows the principal to access rows in catalog views corresponding to the XML schema collection, all XML schemas contained in it, and all schema components contained in those XML schemas.
- The CONTROL permission gives the principal the permission to perform any operation on the XML schema collection, including dropping the XML schema collection by using the DROP XML SCHEMA COLLECTION statement. It implies other permissions on the XML schema collection.
Permission on an XML schema collection is required in addition to other permissions on a table or an XML column. For example, to create a table T with an XML column X typed according to an XML schema collection C, the principal requires permission to create tables and REFERENCES permission on the XML schema collection C. A principal with permission to insert data into column X can do so provided the principal has EXECUTE permission on the XML schema collection C. Similarly, a principal requires SELECT permission on column X and EXECUTE permission on C to query the data in column X using XML data type methods. However, SELECT permission on X is adequate to retrieve entire XML values from column X, such as in SELECT X FROM T or SELECT * FROM T.
Permissions can be revoked from a principal, and a principal can be denied permissions as allowed by the security model of SQL Server 2005.
Visibility of Catalog Views
A principal having ALTER, TAKE OWNERSHIP, REFERENCES, VIEW DEFINITION, or CONTROL permission on an XML schema collection can access catalog view rows for the XML schema collection, its contained XML schemas, and their XML schema components. With any of these permissions, the principal can also access the contents of the XML schema collection by using the built-in function XML_SCHEMA_NAMESPACE() and in FOR XML… XMLSCHEMA.
If the principle is denied VIEW DEFINITION permission, the principal cannot access the XML schema collection in catalog views, or using XML_SCHEMA_NAMESPACE() or FOR XML… XMLSCHEMA.
Enhancements to FOR XML
The TYPE directive generates an XML data type instance that can be assigned to an XML column, variable, or parameter, or queried using XML data type methods. This allows the nesting of SELECT… FOR XML TYPE statements.
The PATH mode allows users to specify the path in the XML tree where a column's value should appear and—together with the aforementioned nesting—is more convenient to write than FOR XML EXPLICIT. However, it may not perform as well for deep hierarchies.
The directive XSINIL used in conjunction with ELEMENTS maps NULL to an element with the attribute xsi:nil="true". The new ROOT directive allows a root node to be specified in all modes of FOR XML. The new XMLSCHEMA directive generates an XSD inline schema.
Performance Guidelines
The XML data model is richer and more complex than the relational one. Not only does the XML data model allow you to model complex data, but it also must preserve hierarchical relationships and document order within the data. Document order is maintained by sorting based on XML