Streamlining Your Web Site Using XML

 

Chris Lovett
Microsoft Corporation

January 17, 2000

Contents

XML Schema Design
Content Development
Design
Web Publishing
Globalization
Business Development
Conclusion
References

You can use XML to break down the tangled mess of HTML that accumulates on your Web site, transforming it into manageable chunks that different members of your team can work on in parallel to help achieve a more compelling site. Large Web sites have different teams of people working on all aspects of the site. I have seen such teams made up of the following groups: content development, which generates XML content, design, which establishes the look and feel of the site using XSL, Web publishing, which controls the actual publishing process, globalization, which endeavors to reach as many readers worldwide as possible, and business development, which focuses on business-to-business partnering opportunities. The following diagram illustrates the workflow between these different groups:

Let's pick a concrete example and explore how these different teams might work together to build and deploy something useful. The example I will choose is something that has been written about in detail on MSDN -- the msdn.microsoft.com menus otherwise known as the Table of Contents (TOC). See references to these MSDN articles below.

XML Schema Design

To kick off the whole process, get all your people together to figure out the right XML schema for the content they are going to develop and deploy. This schema becomes the interface between all the different teams, and will probably evolve over time as more and more features are added to your Web site. For our TOC example, the initial schema can be very simple, such as the following:

<Schema xmlns="urn:schemas-microsoft-com:xml-data">
    <AttributeType name="href"/>
    <AttributeType name="title"/>
    <ElementType name="item" model="closed">
        <attribute type="href"/>
        <attribute type="title" required="yes"/>
        <element type="item" maxOccurs="*" minOccurs="0"/>
    </ElementType>
    <ElementType name="toc" model="closed" content="eltOnly">
        <element type="item" maxOccurs="*" minOccurs="0"/>
    </ElementType>
</Schema>

For those of you who are not familiar with reading schemas, this schema simply says that <toc> elements are made up of zero or more <item> elements, and each item has an optional href attribute and a required title attribute. Also, <item> elements can contain zero or more child items. See XML Schema Developer's Guide for the details on the XML schema support provided by the Microsoft® XML Parser (Msxml.dll) in Microsoft Internet Explorer 5.

The following is an example TOC that complies with the schema listed above:

<toc xmlns="x-schema:toc-schema.xml">
  <item title="General Information">
    <item href="https://msdn.microsoft.com/xml/general/intro.asp" title="Introduction"/>
    <item href="https://msdn.microsoft.com/xml/general/whyxml.asp" title="Why XML?"/>
    <item href="https://msdn.microsoft.com/xml/general/benefits.asp" title="Benefiting from XML"/>
  </item>
  <item title="XML Reference">
    <item href="https://msdn.microsoft.com/xml/reference/xmldom/start.asp" title="XML DOM Reference"/>
    <item href="https://msdn.microsoft.com/xml/reference/schema/start.asp" title="XML Schema Reference"/>
    <item href="https://msdn.microsoft.com/xml/reference/schema/datatypes.asp" title="XML Data Types Reference"/>
    <item href="https://msdn.microsoft.com/xml/reference/xmldom/error-messages.asp" title="XML Error Messages"/>
  </item>
</toc>

Though simple, this schema is enough for us to explore how different people can use it. Schema design is the hardest part of the process. When done right, it lays an excellent foundation for everything that comes after that. BizTalk.org is addressing this issue of difficulty by providing access to pre-built schemas. Perhaps the schema you need has already been defined by someone else; if so, you may find it in the BizTalk schema repository.

Content Development

Once the schema is in place, your content developers can now run full steam ahead creating and reviewing content. Content development actually includes two steps: authoring the content initially, and getting that content into XML format. One way to streamline these two steps is to provide your authors with a tool that allows content to be authored directly in XML. Once the content has been generated, it can be delivered to the Web publishing team. For a large Web site, the task of developing TOC content could even be distributed to many people across your entire company. It would work well in this case, because the above XML is so clean and simple that it is relatively easy for anyone to edit and pass around as simple .xml files. Team members can even view it and validate it themselves using simple Internet Explorer 5 style sheets and Jscript® validators written using the XML Document Object Model (DOM).

Design

The perfect way to present this data -- the right look and feel for any user interactions -- can be worked out completely independently from the content. Your graphic designers can work with developers to come up with XSL style sheets that convert the XML data into HTML and can tweak that look and feel a million times a day with no impact on the productivity of your content developers.

Designers hand off these XSL style sheets to the Web site publishers, who will in turn use the XSL and XML to generate the final HTML. XSL is the interface between the designers and the Web site publishers. The following are examples of what your designers might create.

Basic Nested Unordered List View

This view is plain HTML 3.0 -- ideal for down-level HTML clients. See toc-simple.xsl

Interactive List View

This view looks the same as the basic view, except that this view interactively expands and collapses items that the user clicks. (In this snapshot, the user has clicked the "XSL Reference" item). See toc-int.xsl

Polished MSDN look and feel

This view has a bit more polish on it to make the list really look like a table of contents. It uses advanced features of DHTML and CSS, and displays icon images instead of plain list bullets. It is therefore targeted at the rich HTML 4.0 clients, such as Internet Explorer 4.0 or 5. (In this snapshot, the user has clicked the "XML Reference" item). See toc-final.xsl

If you are running Internet Explorer 5, you can interact with the final result directly. See XML-based TOC. (Be sure to select Source from the View menu to see the XML.)

Web Publishing

The Web site publishing team is responsible for maximizing:

  • Site availability and performance
  • The number of different HTML clients that can view your Web site
  • The richness of experience for clients using high-powered browsers

The first thing the publishing team should do is a "validation" pass on the XML content to ensure it complies with the schema. If it doesn't, the publishing team sends it back to the content developer.

This frees up your Web site publishing team to get out of the "content massaging" business and into the business of producing a leading-edge Web site. Enhancements can include enabling browser sniffing, improving site performance, and making the site more dynamic.

Browser Sniffing

To achieve maximum reach across all browser types, the Web publishing team can enable browser sniffing and logic tailored to the client. The publishing team can use Active Server Pages (ASP) script to sniff the client browser type. Then, based on the browser, the logic determines whether to enable the XSL processing on the server and send the simple HTML to the client, or, if the client browser is Internet Explorer 5, to send the XML. Earlier, in the Design section, we looked at how the Web Publishing team uses the XML and XSL to generate HTML. But Internet Explorer versions 5 and later can parse XML on the client. By sending the XML directly to those browser versions, the team can provide a sleeker user experience.

To make the XML viewable in Internet Explorer 5, you can simply add a processing instruction to top of the XML TOC file, as follows:

<?xml-stylesheet href="toc.xsl" type="text/xsl"?>

The following ASP script running on your Web server will sniff the client browser type. If it is Internet Explorer 5, the server will send the TOC XML file directly to the client with the above XSL instruction. Otherwise, it will transform the XML into HTML on the fly on your server and send the HTML to the client.

<%@LANGUAGE=JSCRIPT%>
<%
    var useragent = ""+Request.ServerVariables("HTTP_USER_AGENT");
    if (useragent.indexOf("MSIE 5") > 0)
    {
        Response.Redirect("toc.xml");
    }
    else
    {
        var xmldoc = new ActiveXObject("Microsoft.XMLDOM");
        xmldoc.async = false;
        xmldoc.load(Server.MapPath("toc.xml"));
        var xsldoc = new ActiveXObject("Microsoft.XMLDOM");
        xsldoc.async = false;
        xsldoc.load(Server.MapPath("toc-simple.xsl"));
        Response.Write(xmldoc.transformNode(xsldoc));
    }
%>

Other Performance Improvements

The Web publishing team can implement improvements to performance, such as:

  • Storing the XSL style sheet in a shared scope so all clients hitting this ASP page share the same XSL style sheet DOM document. This can improve the throughput by as much as 30 to 40 percent.
  • Switching to a C++ XSL ISAPI extension, which includes smarter XSL caching algorithms. This can improve the performance even more.
  • Pre-processing the XSL to HTML in a batch mode so the XSL processing doesn't have to happen on the fly. This can improve throughput by an order of magnitude, because the Internet Information Services (IIS) server is simply returning static HTML pages. This doesn't work if you have more dynamic content (such as stock quotes, time of day, or other time-critical data).

Dynamic Material

In addition to improving performance, the Web publishing team might be interested in making the site more dynamic. They might want to store all the XML TOC data in a database so that as soon as the content developers change one item in the database, the "live" TOC on the Web immediately reflects that change. They could use the ActiveX® Data Objects (ADO) XML persistence features to turn this database data back into XML for processing via XSL. They could also use the upcoming SQL Server/XML integration features.

Last, the Web publishing team may have other tools that help them manage the entire Web site, and so they may want to tie the XML TOC content into these tools by writing some script code using the XML DOM.

Notice that your Web publishing team is doing all this completely in parallel to the content development and the design teams.

Globalization

Globalization involves translating key pieces of your Web site into different languages to maximize the reach of your site across language barriers. The globalization team may want to translate the TOC contents into different languages. Translating the simple XML is much simpler than having to translate complex HTML pages, because the team need worry only about the TOC entries. Because all the XML/XSL processing is Unicode based, the XSL style sheets that produce the HTML will work just fine on any language, although some design considerations may need to accommodate differences in word length or reading direction.

In other cases, the globalization team may also have to translate the XSL style sheets produced by the design team if those XSL style sheets contain other visual DHTML elements, such as buttons with text and so forth. In this case, some Web sites store all the localizable strings in separate XML files. For globalization, these separate files are loaded and those strings are substituted during XSL processing. The result is an HTML page that is both targeted at the specific client browser type and fully localized for the client language on the fly. This can cut down on the amount of static HTML a lot. The Microsoft.com site, www.microsoft.com, reported that tens of thousands of static HTML pages could be reduced to less than 100 XML files using this technique.

Business Development

Lastly, suppose someone in your company identifies a business opportunity for sharing the TOC information with third-party vendors who provide some value-added service (such as publishing relevant pieces of your Web site in an e-mail promotion). Your e-commerce team can build XSL style sheets that transform the TOC format into the standard business-to-business formats required to tap into these opportunities.

In this example, the XML schema for our TOC may need to be improved. Perhaps the schema developers will need to add an "id" attribute so that when the e-mail promotion results in an e-mail query for an item in your menu, the item "id" will be available in the attribute. Even if the item title has been localized or changed since the promotion was sent out, the "id" attribute will be able to tell you exactly what information the customer wanted. XML is fantastic at handling incremental refinement like this, because all the XSL and other code built on the old schema will still work even after making this change to the schema. Again, http://www.biztalk.org/ is an excellent resource for information on these sorts of issues.

Conclusion

So now we have discussed an end-to-end solution using XML, XSL, DHTML, CSS, and ASP. We have caught a glimpse of how your entire Web team can gain improved productivity with XML so that different people can work in parallel on different aspects of your Web site. Even if you have only one or two people working on your Web site you can still get productivity gains. When you extrapolate this TOC example to all the other kinds of data that you publish on your site, then consider other ways of publishing your data (like e-mail bulletins, printed newsletter) and utilizing other types of Web clients (like Microsoft® WebTV®, Palm-size PC, and so forth) then you can begin to understand why www.microsoft.com is rapidly adopting XML for just about everything they do.

References

In this article, I've used an XML TOC for my example. For in-depth information on how to build and implement an XML TOC, see the following articles:

"DXML": Taking a TOC from XML to DHTML (April 1999)

"DXML" Redux: Building Dynamic HTML Menus from XML (May 1999)

"DXML" in Action: Implementing the DHTML Menus and TOCs on Your Site (June 1999)

Chris Lovett is a program manager for Microsoft's XML team.