XInclude, Anyone?

Article
06/30/2006

Chris Lovett
Microsoft Corporation

May 29, 2000

The Problem
ASP Solution?
An XSL Solution
The Ultimate Solution?

How do you create an XML document that includes chunks of XML from other places? This is something programmers know all about, because programming languages have provided this feature for decades. Any time you have a team of people working on the same product, you tend to want to break it down into manageable chunks. But what about XML? I ran into this problem while maintaining an XML-based intranet Web site for our team.

The Problem

There were too many ways to slice the information people wanted to publish on our internal Web site. We had two axes. The first axis was by the type of information, including specifications, development information, test, documentation, and release information.

The second axis was by team. We have four sub-teams, each working on a specific area. We also have other people within Microsoft who want to see everything that is going on; a centralized user-education team that works on the documentation for the entire product; and one release team that manages the actual build and release of all the bits.

The sub-teams wanted to see stuff that was immediately relevant to what they were doing, but not have this information crowded by anything the other teams were doing. We also needed master lists for the people who wanted to see everything.

ASP Solution?

The brute force way to solve this would be to assemble the master lists by writing some Active Server Pages (ASP) code, as follows:

<%@LANGUAGE=JSCRIPT%>
<%
    var list = new Array("/team1/specs.xml",
                         "/team2/specs.xml",
                         "/team3/specs.xml",
                         "/team4/specs.xml");
    var master = new ActiveXObject("Microsoft.XMLDOM");
    master.load(Server.MapPath("/central/allspecs.xml");
    var root = master.documentElement;
    var doc = new ActiveXObject("Microsoft.XMLDOM");
    for (i = 0; i < list.length; i++) {
         doc.load(Server.MapPath(list[i]));
         root.appendChild(doc.documentElement);
    }
    var xsl = new ActiveXObject("Microsoft.XMLDOM");
    xsl.load(Server.MapPath("/central/allspecs.xsl");
    master.transformNodeToObject(xsl, Response);
%>

This would allow each sub-team to maintain its own local lists and to view them by going to its specific area of the Web site. However, it would still provide the master list for the other people to enjoy.

There were a couple of problems with this approach. The first was that blindly copying everything from a team's spec pages was not good enough. I needed a smarter solution. Second, this didn't participate in client-side XSL, so my Web server was spending precious CPU time building these pages.

An XSL Solution

To improve on the above, I prototyped a way of doing this using XSL, with the help of our resident XSL guru, Jonathan Marsh. If you have Internet Explorer 5 installed, you can see what the result looks like:

Main Pages	Specifications
Team1.xml	Specs.xml
Team2.xml	Specs.xml
Master.xml	MasterSpecs.xml

Teams 1 and 2 have their own local pages that list specs, dev, test, and release information. (Note: Only the spec pages are provided in this demo.) The master page points to each team, with a little introduction, and to a MasterSpecs page.

The MasterSpecs page is where things get interesting. This page is an aggregation of the specs from both Team1 and Team2. You will probably notice that the page populates asynchronously as the information is compiled from the other pages. This is authored as follows:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="/voices/website.xsl"?>
<!DOCTYPE website SYSTEM "common.dtd">
<website xmlns:x="http://www.w3.org/1999/XML/xinclude">
    <title>Master List of all Specs</title>
    <p>
    <font color="red"><i>Our team has been busy this week!</i></font>
    </p>
   <h2>Team 1</h2>
    <x:include href="team1/specs.xml#xpointer(/website/link)"/>
   <h2>Team 2</h2>
    <x:include href="team2/specs.xml#xpointer(/website/link)"/>
    &disclaimer;
</website>

Note the x:include elements. These elements contain an href attribute that points to the specs for each team, and that attribute has an XPointer URL fragment that defines exactly which elements in those specs are to be included in the master list. (Actually, in my prototype, this is not a real XPointer; it is just an XPath expression).

The website.xsl style sheet finds these x:include elements; loads the pages referenced by them; replaces the x:include elements with the nodes matching the XPointer expression; then re-runs the style sheet over the resulting aggregate document.

This means that the MasterSpecs author has complete control over the look and feel of the MasterSpecs page, exactly where the content is included. At the same time, the MasterSpecs author doesn't have to worry about synchronizing with the continually changing list of specs that the individual teams maintain.

How does this work?

I was afraid you would ask that. First, the XSL style sheet includes a helper JScript® file called xinclude.js. It, in turn, contains a script block that fires when the page load is complete.

<SCRIPT src="/voices/xinclude.js"></SCRIPT>
<SCRIPT for="window" event="onload">
<xsl:comment><![CDATA[
    ProcessXIncludes(document.XMLDocument);
]]></xsl:comment>
</SCRIPT>

The XML Viewer built into Internet Explorer 5 provides the XMLDocument property. The xinclude.js file contains four functions.

Function	Purpose
ProcessXIncludes	This function takes the original XML document, clones it, then calls StartProcessing. The clone forms the basis for the new combined document.
StartProcessing	This function finds all the x:include elements and builds an array of XML documents (_docs), which asynchronously download. It also parses the href splitting of the XPointer fragments, and saves them off in another array (_pointers and _hrefs). Then it saves the actual x:include nodes (_nodes), so that CombineXML can replace them with the real included nodes.
HandleComplete	This function is the onreadystatechange callback, which is called when an XML document state changes. When the ready state reaches 4 (complete), this function calls CombineXML to process the included document. When all the included documents are done, HandleComplete also re-runs the original transform and updates the body of the HTML page with the result.
CombineXML	This function runs the XPointer selection over the downloaded document, and replaces the x:include element with what it finds. If the downloaded document finds an error, CombineXML also inserts an error description.

Altogether, this is 133 lines of code—including the script blocks in the XSL style sheet, which in my case was well worth the effort. So far, I have used this for maintaining master lists of specs, release information, and standards that we track, and it is working quite well.

The Ultimate Solution?

Ultimately, it would be even better if XInclude processing were supported lower in the XML stack—deep inside the XML parser or XML DOM. Then my style sheets could be completely oblivious to this step, and the style sheets would work the same on both the client and server.

The World Wide Web Consortium (W3C) is, in fact, working on a standard way of doing exactly this. It is called XInclude, and it's related to XLink, another spec on which the W3C is working.

You could use XLinks with the attribute value show="embed" to achieve a similar effect. The resource would then be embedded graphically into the display of the document. But this has a number of differences from inclusion. The embedded resource retains its own characteristics—a distinct document tree, with object model and style characteristics inherited from the original document instead of from the host document. One approximation would be to use XSLT to transform such an XLink into an IFRAME that showed the desired portion of the destination resource.

XInclude, on the other hand, specifies the manipulations necessary to merge two XML resources at the tree level. The result is a single tree, not two linked trees, and a single style sheet can style this tree. This process can be performed at a low level (parsing and creating a document tree), rather than at a higher level (manipulating display elements).

Some things that could improve this prototype:

Complete the XPointer support. The ability to specify a range of markup to be included would be especially useful.
Port the XSL style sheet to the W3C XSLT format.
The prototype doesn't handle relative links in the included documents. It would be nice to fix this. The proper fix involves another W3C spec, XML Base (XBase).

Chris Lovett is a program manager for Microsoft's XML team.