Unlocking Office XML Data Through the XML Document Object Model

This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.

 

Paul Cornell and Frank C. Rice
Microsoft Corporation

January 2002

Applies to:
     Microsoft® Office XP
     Microsoft XML Core Services 4.0

Summary: This article demonstrates how to work with Office XML-based data using Microsoft XML Core Services (MSXML) 4.0, formerly known as the Microsoft XML Parser. (7 printed pages)

Contents

Introduction Installing Microsoft XML Core Services 4.0 Referencing the MSXML 4.0 DLL Working with the XML DOM

Introduction

Microsoft® Access 2002 and Microsoft Excel 2002 have the ability to import and export data in Extensible Markup Language (XML) format, an industry-standard, vendor-neutral, application-neutral data exchange format.

Although XML data is plain text and can be viewed with any text editor, you may want to programmatically access XML data to do things like:

  • Query individual columns, rows, or fields of XML data (for example, validate general ledger account classifications made by sales representatives on individual expense items).
  • Change XML data from one format to another (for example, change a proprietary quote request created in an Access forms-based application to a RosettaNet PIP 3A1 Request Quote shared with supply chain partners).
  • Apply or change data display information (for example, summarize key information on an Excel spreadsheet for executives or other business decision-makers without tinkering around hiding or locking the original data cells).

This article describes how to use Microsoft XML parser technology to unlock your Office XML-based data to accommodate these types of solutions.

Installing Microsoft XML Core Services 4.0

Microsoft provides XML parser technology to programmatically access XML data. This parser is known as the Microsoft XML Parser (for versions 3.0 and earlier) or as Microsoft XML Core Services (for version 4.0). Both the Microsoft XML Parser and Microsoft XML Core Services 4.0 are abbreviated as MSXML, followed by the version number; for example, MSXML 3.0 or MSXML 4.0.

MSXML 4.0, the latest release at the time this article was written, can work without any collisions with previous (or future) versions of XML. So the code you write with MSXML 4.0 today can be used for a long time to come, provided your users have MSXML 4.0 installed on their computers.

MSXML 4.0 comes in three versions:

  • msxml4.msi, which installs MSXML 4.0 onto your computer.
  • msxml4msms.exe, which consists of Windows Installer Merge Modules and an SDK for you to distribute MSXML 4.0 along with your Office solutions.
  • msxml4cab.exe, which can be used to install MSXML 4.0 over the Internet.

For purposes of this article, I assume you have installed msxml4.msi onto your computer for development and testing. Download and install MSXML 4.0. By default:

  • The MSXML 4.0 DLL, Microsoft XML, v4.0 (msxml4.dll), installs into (and can be referenced from) your WINNT\system32 folder.
  • Additional MSXML 4.0 support and Help files install into your C:\Program Files\MSXML 4.0 folder.
  • A shortcut to the Microsoft XML 4.0 Parser SDK Help installs to your Desktop.

Referencing the MSXML 4.0 DLL

To work with MSXML 4.0 programmatically you must first set a reference to Microsoft XML, v4.0 (msxml4.dll):

  1. In the Office Visual Basic® Editor, on the Tools menu, point to References. The References dialog box appears.
  2. Check the box next to Microsoft XML, v4.0, and click OK.

You now have access to the XML Document Object Model (XML DOM), which provides a standard programming model for working with XML data files. The XML DOM is fully described in the World Wide Web Consortium (W3C) DOM specification, and is an industry standard.

Working with the XML DOM

The XML DOM provides a generic container, the node, that can be used to represent individual pieces of XML data in an XML data file such as elements, attributes, text, comments, processing instructions, entities, and so on. The following sections provide information about how to work with these individual pieces of XML data in XML data files.

Accessing XML Data Files

The DOMDocument40 object contains all of the information in the XML data file, and acts as the root node for all of the other nodes in the XML data file. The child nodes of the root node appear in the same sequence in which they appeared in the XML data file.

To access the XML data file, you must create a new instance of the DOMDocument40 object and initialize it to the path of the XML data file, with code similar to the following:

Public Sub GetXMLDOMDocument()

    ' Purpose: Accesses an XML data file from a
    ' hard-coded file path.
    ' You must set a reference to Microsoft XML, v4.0
    ' before calling this code.
    
    Dim objDoc As MSXML2.DOMDocument40
    
    Set objDoc = New MSXML2.DOMDocument40

    ' Load XML from a hard-coded file path.
    objDoc.Load xmlSource:="C:\My XML\Employees.xml"
    
    ' Display the results in the Immediate Window.
    Debug.Print objDoc.XML
    
End Sub

Using the new FileDialog object in Office XP, you could dynamically determine the XML data file path at run time with code similar to the following:

Public Sub GetXMLDOMDocumentUsingDynamicFilePath()

    ' Purpose: Accesses an XML data file from a
    ' dynamic file path.
    ' You must set a reference to Microsoft XML, v4.0
    ' and include the code for the GetXMLFilePath()
    ' function before calling this code.
    
    Dim objDoc As MSXML2.DOMDocument40
    
    Set objDoc = New MSXML2.DOMDocument40
    
    ' Load XML from a user-defined file path.
    objDoc.Load xmlSource:=GetXMLFilePath()
    
    ' Display the results in the Immediate Window.
    Debug.Print objDoc.XML
    
End Sub

Public Function GetXMLFilePath() As String

    ' Purpose: Gets the path to a selected XML file.
    ' Works only in the context of Office XP applications.
    ' Set a reference to the Microsoft Office 10.0
    ' Object Library before calling this code.
    
    Dim objDlg As Office.FileDialog
    Const OPEN_BUTTON = -1
    
    Set objDlg = Application.FileDialog _
        (FileDialogType:=msoFileDialogOpen)
    
    ' Show the File > Open dialog box.
    ' Only allow one XML file to be selected.
    With objDlg
        .AllowMultiSelect = False
        .ButtonName = "Open"
        .Filters.Clear
        .Filters.Add Description:="XML Files", _
            Extensions:="*.xml"
        .Title = "Open XML File"
        
        ' If the Open button is selected,
        ' return the path to the XML file.
        If .Show = OPEN_BUTTON Then
            GetXMLFilePath = .SelectedItems(1)
        End If
               
    End With
    
End Function

From this point, you can begin to access individual parts of the XML data file. When you are finished, you can save your results back to the original XML data file or a new XML data file by calling the DOMDocument40 object's save method and supplying a file path to save the XML data file.

**Note   **The FileDialog object also exposes an msoFileDialogSaveAs enumeration for working with the File > Save As dialog box to dynamically determine the file location to save the XML data file to at run time.

Working with XML Data Nodes

XML data files can contain groups of XML data nodes, accessible through the IXMLDOMNodeList collection. In each IXMLDOMNodeList collection, individual XML data nodes are accessible through the IXMLDOMNode object, using code similar to the following:

Public Sub ListAllNodes()

    ' Purpose: Lists all nodes in an XML data file.
    ' You must set a reference to Microsoft XML, v4.0
    ' and include the code for the GetXMLFilePath()
    ' function and GetChildNodes() subroutine
    ' before calling this code.
    
    Dim objDoc As MSXML2.DOMDocument40
    Dim objNode As MSXML2.IXMLDOMNode
    
    Set objDoc = New MSXML2.DOMDocument40
    
    ' Load XML from a user-defined file path.
    objDoc.Load xmlSource:=GetXMLFilePath()
    
    ' Recursively list each node's name and data value.
    Call GetChildNodes(nodeList:=objDoc.childNodes)
    
End Sub

Public Sub GetChildNodes(ByVal nodeList As MSXML2.IXMLDOMNodeList)

    ' Purpose: Recursively searches through the entire node's
    ' child nodes and prints all of their names and data values.
    
    Dim objNode As IXMLDOMNode
    
    For Each objNode In nodeList
        
        ' If there are child nodes, call this subroutine again.
        If objNode.hasChildNodes = True Then
            Call GetChildNodes(nodeList:=objNode.childNodes)
        ' If there are no child nodes, print the node name and
        ' node data value.
        Else
            Debug.Print objNode.parentNode.nodeName & _
                ": " & objNode.nodeTypedValue
        End If
    
    Next objNode
            
End Sub

Because each node type has different access methods and content limitations, it is sometimes easier to work with nodes belonging to a particular type, rather than with the generic IXMLDOMNode object. In order to do this, you need to work with XML data elements and attributes using the IXMLDOMElement object. Just as elements can contain child elements, text, comments, processing instructions, CDATA sections, and entity references within XML data files, IXMLDOMElement objects can contain IXMLDOMElement, IXMLDOMText, IXMLDOMComment, IXMLDOMProcessingInstruction, IXMLDOMCDATASection, and IXMLDOMEntityReference objects. Attributes are accessed by using the IXMLDOMElement object's getAttribute and setAttribute methods, or manipulated as an IXMLDOMNamedNodeMap object through the IXMLDOMElement object's attributes property. See the MSXML 4.0 SDK Help for more information about working with these methods and objects.