Use XPath Explorer to Unlock Data in Word 2007 XML Files

Summary:   XPath Explorer is a new tool that you can use to view the XML element hierarchy in any XML file, including Word 2003 and Word 2007 XML files. (6 printed pages)

Bill Coan, Microsoft Word MVP

February 2007

Applies to: 2007 Microsoft Office System, Microsoft Office Word 2007

Contents

  • Microsoft Office System and XPath Explorer

  • Navigating Hierarchical Data Structures

  • Navigating an XML Data Hierarchy

  • Using XPath with Word Documents

  • Unlocking the Power of XPath with XPath Explorer

  • Additional Resources

  • About the Author

Microsoft Office System and XPath Explorer

With the release of Microsoft Office 2003 in December 2003, Microsoft placed a big bet on the future of XML. Three years later, XML is more popular than ever, and now Microsoft has released 2007 Microsoft Office System—the most XML-friendly version of Office ever.

As Office XML files proliferate, the volume of data inside those files grows larger by the day. Fortunately, data tagged with XML markup does not need to remain locked inside the files where the data is stored. Instead, with the help of the XML Path Language (more commonly known as XPath), you can extract the data in XML files with the speed and precision of a robotic arm.

If this strikes you as something that only a crazed data-holic or an IT specialist would find exciting, wait until you see how easy it is to navigate an XML hierarchy using XPath Explorer.

Keep in mind that for as long as you have been using Microsoft Windows you have been navigating hierarchical data structures. For example, Figure 1 shows a familiar hierarchical data structure: a set of nested folders on a hard drive.

Figure 1. Nested folders on a hard drive.

Example of nested folders on a hard drive

If you started at the root folder of the hard drive (C:\) and had to find your way to the FIRSTNAME folder near the bottom of Figure 1, how would you get there?

I suppose you could check every folder on the hard drive until you found one called FIRSTNAME, but what if there were multiple folders called FIRSTNAME? How could you be certain that you found the folder shown at the bottom of Figure 1?

The answer is both simple and obvious. To get to the desired folder, you would start at the root folder (C:\), and inside that folder you would open the BOOKS folder, and inside that folder you would open the BOOK folder, and inside that folder you would open the AUTHOR folder, and inside that folder you would open the FIRSTNAME folder.

We sometimes refer to this as "walking the hierarchy" or "walking the path" to the desired folder. Indeed, in Windows (as in other file systems), the full name of a folder is called its path name. The pathname of the folder at the bottom of Figure 1 is C:\BOOKS\BOOK\AUTHOR\FIRSTNAME.

An XML data hierarchy closely resembles a set of nested folders on a hard drive. If you think it is useful to walk a path to a particular folder on a hard drive, just imagine being able to walk a path inside an XML file until you find the exact element or collection of elements that your boss wants on her desk this instant! That is exactly what the XPath language enables you to do.

Consider the following XML markup inside a Microsoft Word document. As you can see in Figure 2, the markup describes a hierarchical data structure very similar to the nested folders in Figure 1.

Figure 2. Nested XML elements inside a Microsoft Word document.

Nested XML elements in Word document

In this case, there is a BOOKS element (instead of a BOOKS folder) with a BOOK element nested inside it. Inside the BOOK element is a TITLE element. Inside the TITLE element is an AUTHOR element. Then inside the AUTHOR element is a FIRSTNAME element and a LASTNAME element.

NoteNote

Although you cannot tell just by looking at the document, the elements in the document belong to a particular namespace that distinguishes them from similarly named elements in other namespaces. The namespace for this particular XML is "http://www.wordsite.com/books."

In order to walk the path to the data you are interested in, you need to specify the namespace that the data belongs to. To specify the namespace, you use a statement such as xmlns:x="http://www.wordsite.com/books".

Then you need to specify the path that you want to walk. To specify the path that starts at the BOOKS element and then winds its way down to the BOOK element, you would use the XPath expression /x:BOOKS/x:BOOK.

The point here is not to fully explain the XPath language, because there are many articles that already do that (see the links section at the end of this article). Rather, the point here is to help you recognize that walking the path to a particular element of data inside an XML file has a lot in common with walking the path to a particular folder on a hard drive.

Using XPath with Word Documents

Starting with Word 2003, you can use XPath expressions in INCLUDETEXT fields to pull data from an XML document into a Word document. The XML document can be an ordinary XML text file or a Microsoft Office document (such as an Excel spreadsheet or a Word document) that you have saved in XML format. For more information about the use of XPath expressions in INCLUDETEXT fields, look up INCLUDETEXT in the Word 2003 or Word 2007 Help system, or read Field codes: IncludeText field.

Starting with Word 2007, you can use XPath expressions to link content controls to XML data in the document's datastore. Because external programs can access the XML datastore, content controls in the Word document that are linked to the datastore can automatically display XML data from external programs. For more information about the use of XPath expressions with content controls, look up content controls in the Word 2007 Help system, or read Application Development using the Open XML File Formats.

Unlocking the Power of XPath with XPath Explorer

Although the fundamentals of the XPath language are simple, the language itself is very powerful. The best way to learn how to harness that power is to test XPath expressions to see which data a particular XPath expression returns. Using XPath Explorer, a new freeware tool I developed, you can test any XPath expression that you want. Figure 3 shows the main screen of the XPath Explorer tool.

Figure 3. XPath Explorer

XPath Explorer tool

XPath Explorer is compatible with Word 2003 and Word 2007, and it works with any document that contains XML markup, including arbitrary XML files opened in Word. If you want to try out XPath Explorer but you don't have an XML document handy, XPath Explorer can generate one for you. After XPath Explorer has generated a sample XML document, you can experiment with 30 built-in XPath expressions for that sample. In addition, you can enter any arbitrary XPath expression that you want to test.

XPath Explorer is a flexible tool. The tool lets you view the results of an XPath expression complete with XML markup (including WordProcessingML markup if desired) or as plain text with no markup.

Download a free copy of XPath Explorer.

Additional Resources

For more information about Word 2007, XML, and the XPath language, visit the following Web pages.

About the Author

Bill Coan is a Microsoft Word MVP and a developer of custom solutions for Microsoft Word. For more information, visit his Web page at Wordsite Office Automation.