Using Visual Basic .NET from VBA to Serialize Word Documents as XML

This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.

 

Michael Corning
Microsoft Corporation

October 2002

Applies to:
   Microsoft® Word 2002
   Microsoft Visual Studio® .NET

Summary: Learn how to quickly serialize large Word documents as XML by leveraging .NET code from a Microsoft Office Visual Basic for Applications (VBA) program. (35 printed pages)

Download Setup.msi.

Contents

Introduction
The WordXml.Net Sample
Deploying WordXml.Net
Related Literature
Conclusion
Appendix

Introduction

This article describes Microsoft® .NET technologies and techniques that enabled a Microsoft Office Visual Basic® for Applications (VBA) program to serialize a 100-page Microsoft Word 2002 document as XML in 33 seconds (the original VBA code took over ten minutes to serialize the same document).

I have three goals in writing this article. First, to show how to use the System.Xml classes in the .NET Framework to bypass the more expensive XML DOMDocument object and serialize XML directly to the file system; second, how to create and debug a Microsoft Visual Basic .NET class library enabled to interoperate with Component Object Model (COM) technology so that VBA can access the managed XML component; and finally, to create a Microsoft Windows® Installer package to install the Visual Basic .NET application, including debug symbols, on the user's machine.

Evidently, there is some concern in the Office development community about if and how Microsoft will incorporate the .NET Framework into Office. Frankly, I'm not worried. So far, I've been more than satisfied with relying on COM interop to give me the best of both worlds. My legacy VBA code remains in place, but I've replaced one VBA function that used the native Microsoft XML Core Services DLL (msxml4.dll) with a call to my Visual Basic .NET component that uses managed XML. Whatever performance hit I took with COM interop was eclipsed by the performance gains I accrued by serializing Word content without first caching the XML in a DOMDocument object.

My story begins near the end of a development cycle. I had completed a VBA program that converts software test specifications (also known as test specs) written in Word format to XML. Everything worked fine until one of our test engineers gave me his real-world test spec. My VBA took over ten minutes to grind out a DOMDocument object containing the spec's tests. I knew right then that my days of procrastination were over; it was time to come to terms with the .NET System.Xml namespace's managed classes.

When I was done, I had a very effective alternative to cached XML in VBA, and I had a blast putting the .NET solution together; but in the process I had noticed that there were precious few articles and samples in the literature that treated the needs of Office and XML developers like me. So I decided to share my story with the hope that I could help save thousands of hours of development time for other Microsoft customers. I should note that I did find some very useful articles after I was basically done doing the development on my own, and I do list those articles at the end of my story.

I have simplified the test specification authoring system I wrote for our testers and have included the Visual Basic .NET solution and Word template as a download for this article. The simplified program is called WordXml.Net, and I'll document the Visual Basic .NET source code in sections below. In case it's helpful in reading the WordXml.Net source code to understand what the original authoring system did, I've included a functional and program specification for the system, called Socrates, in the appendix of this document. The downloadable WordXml.dot template can only serialize Word content; I removed all the Socrates source code for authoring a test specification.

The WordXml.Net Sample

When you create a new document from the WordXml.dot file you will see a sample Socrates test specification. The serialization process has three steps producing at least one XML file with each step. In Socrates, the final XML file is used by software test automation to actually run the specified test. See the appendix for further details. As a Word developer, the XML file you will be most interested in is the first one, the XML file that has the same name as the test spec (only it uses the .xml suffix). This first XML file represents the actual serialization of the Word document.

The WordXml.Net sample includes the rest of the Socrates pipeline because I wanted to highlight the importance of using the most appropriate XML technology for each different programming problem. That is, using the streaming XML techniques in the System.Xml namespace is great when a cache of XML is unnecessary; but other processing tasks are better implemented using XSLT, and for that you must use an XML cache. Finally, XSLT has its own limitations, so I'll show you how to use the System.Xml classes that optimize XML caches.

One more note before we begin. In the Socrates system the first generated XML file is called an IML file; IML stands for intermediate markup language. The second generated XML file is called an XIML file because it's an expanded IML file. The third and final XML file is called a varmap because it is input to the test automation framework. Again, these terms of art, as well as a diagram that shows the relationships of all the serialization components, are more fully documented in the appendix.

The Visual Basic .NET Components

This section covers the Visual Basic .NET source code. There are two Visual Basic .NET programs that WordXml.Net users can call on. The class library is the first and contains two classes and three functions between them; Word 2002 calls two of the functions and a Visual Basic .NET EXE calls the third function.

The XmlProvider class in the WordXml.Net.dll assembly contains the serialize function that can generate IML from a Word 2002 test specification authored with the WordXml.dot template. This functionality is exposed only through Word 2002 because calling Word from a .NET EXE imposes a huge cost due to cross-process marshalling between Word and the .NET EXE (or XML Web service).

Word 2002 also calls the compileXimlFromWord function in theXimlCompiler class to transform the spec's IML into XIML and generate one or more varmaps from the XIML data. The .NET process updates progress in the Word Status Bar.

The XimlCompiler class also provides the compileXimlFromExefunction so that the second Visual Basic .NET program, WordXmlHost.exe, can display compilation progress in the console. Both compileXimlFrom... functions call the private compile function that does all the real work.

The class library

The class library (the component), WordXml.Net.dll, provides objects for both VBA script in Word and the Visual Basic .NET EXE from the file system. The component has two roles: it use managed XML classes to serialize Word 2002 binary content into text-based IML, and it transforms the IML into XIML and then into as many varmap XML files as there are varmap nodes in the XIML data.

The key justification for using Visual Basic .NET is that we gain access to the efficient XML classes in the Microsoft .NET Framework, a framework that can process multi-megabyte documents in seconds instead of minutes. As noted at the beginning of this article, one real-world test spec (about 100 pages long) took over ten minutes to serialize using VBA and a mere 33 seconds to convert by using the Visual Basic .NET component.

Another obstacle overcome by the Visual Basic .NET component is that XSLT (used to transform IML into XIML) can write to a single file only. However, depending on the spec, a single spec can generate any number of different (but related) varmap files. The WordXml.Net component can stream XIML to as many separate varmap files as necessary. The managed XML classes are so efficient that even XIML files of almost three megabytes can be saved to disk in just over one second.

Source code

First, let's look at the skeleton of the component. The first step in developing the .NET component is to click the Add reference menu command on the Visual Basic .NET class library project's References shortcut menu. Clicking the Add Reference dialog box's COM tab enables you to navigate to the folder holding WINWORD.EXE (for Word 2002). Selecting the EXE adds the primary interop assembly (PIA) for Word 2002 and Microsoft Office XP to the project's references (see Figure 1).

Click here for larger image

Figure 1. The Add Reference dialog box (click picture for larger image)

The second step is to make it easier to type class references in the Visual Basic .NET source code by adding the following Imports statements to the component's WordXml.Net.vb source code file.

Imports Microsoft.Office
Imports System.Text.RegularExpressions
Imports System.Xml
Imports System.Xml.Xsl
Imports System.Xml.XPath
Imports Word

Listing 1. Imports statements in the class library

Next, I lay out the classes, functions, and function signatures that will implement our WordXml.Net compilers. I'll discuss the empty constructor later when I go into the details of enabling COM interop. From that high-level overview I'll drill into the pseudocode for each function. The actual source code can be pretty complicated, so in the interest of space, I'll document the crucial stuff and leave it to the reader to explore the remaining code and comments in the downloaded source code.

Public Class XmlProvider

    Public Sub New()
        ' public constructor required by COM interop
    End Sub

    Public Function Serialize (ByVal rngTestAreas As Range) _
        As Boolean
    End Function

End Class

Public Class XimlCompiler

    Public Sub New()
      ' public constructor required by COM interop
    End Sub

    Private Function Compile(ByVal imlPath As String, _
        ByVal imlFileName As String, _
        ByVal reader As XmlNodeReader, _
        ByRef result As String)
    End Function

    Public Function CompileXimlFromWord (ByVal app As Application, _
        ByVal xsltPath As String, _
        ByVal imlPath As String, _
        ByVal imlFileName As String) _
        As String
    End Function

    Public Function CompileXimlFromExe(ByVal xsltPath As String, _
        ByVal imlPath As String, _
        ByVal imlFileName As String) _
        As String
    End Function

    Private Function IncludeXml(ByVal dataNodes As XmlNodeList,
        ByRef writer As XmlTextWriter)
    End Function

End Class

Listing 2. Component overview of classes

The XmlProvider class

The Serialize function does the most work of all the compilers because it must traverse a complex Word document object (all the functions in the XimlCompiler class merely process well-formed XML). Since this Word document traversal is stateless, we can exploit the substantially better performance offered by managed XML. To minimize the cross-process marshaling, and since Word does not yet directly support the common language runtime, VBA is the client language of choice. Once we get into text-processing of XML in the WordXmlHost.exe, we switch to Visual Basic .NET as both the client and server language.

Public Function Serialize (ByVal rngTestAreas As Range) _
        As Boolean
    Try
        ' input Range object from Word XP
        ' serialize Introduction, Projects, and Contexts sections
        ' traverse all Test Areas sections beginning with first Set
        ' Serialize Set and its setText styled paragraphs
        ' Serialize Level and its heading
        ' traverse all paragraphs in the Level's section
        ' Serializing Vars
        ' serialize the varText table
        ' if present, serialize the Declared test table
        ' if present, serialize the Defined test table
        ' if present, serialize sibling Level (same Level number
        ' but different category)
        ' repeat Var serialization

    Catch
        ' exception so we have a chance to stop component at
        ' runtime and debug.

    Finally
        ' close writer to persist contents to disk (close even if
        ' there's an exception)

End Function

Listing 3. XmlProvider Serialize pseudocode

The main reason Serialize is so much faster than the serializeTestAreasWithDOM function in the WordXml.dot WordXmlDotNet module is that the Serialize method only maps the Word information set (infoset) to the Socrates infoset (see the appendix for detailed description of the Socrates infoset). The Serialize method does not create a rich XML object for each node; it simply uses the System.Xml.XmlTextWriter namespace to write the XML syntax strings to the file system. The disadvantage of the XmlTextWriter is that it streams the XML and does not permit random access to the XML data. Later I'll show you how I used both XSLT and managed XML optimized XML caches to implement solutions that require random access to XML.

Linking Word documents to Visual Basic .NET

In terms of source code, the first thing that's interesting about Serialize is the way we connect the Range object in Word with the function in Visual Basic .NET. The first step is to pass the Range object (containing the test areas from the edited test spec) as the argument to the Serialize call. Once inside the function, we instantiate a Word.Document object (specDoc) with the Range object's Parent property. The specDoc object then gives us access to the Word document's name (which we will use with the .xml suffix as the argument for the XmlTextWriter constructor) as well as many other values needed by the Serialize function.

Dim specDoc As Document
Dim writer As XmlTextWriter
imlFilePathName = specDoc.Path & "\" & _
                  specDocConvertedName.Replace(specDoc.Name, ".xml")
writer = New XmlTextWriter(imlFilePathName, Nothing)

Listing 4. Linking Word document to Visual Basic .NET program

Serializing hierarchical tables

While I'm talking about how I used managed XML to get almost 60 times better performance from my XML application, I thought I'd show you a few techniques I used to actually serialize the Word content.

So the next point of interest is how I used Word tables to simulate a nested hierarchy and how I serialized that table as xml. If you look at the sample WordXml.Net test spec (in the source code download) you will see that set 1 uses two classes to implement test setup and cleanup procedures. The second row in the spec's cContext table is indented. The following source code for serializing a Context table looks for these indents to nest XML elements:

writer.WriteStartElement("contextSection", NSURI_IML)
Do While True
    rng = rng.Next(WdUnits.wdParagraph)
    If rng.Tables.Count = 0 Then
        If Len(rngTextString) > 0 Then
                writer.WriteElementString("p", rngTextString)
        End If
    Else
        thisTable = rng.Tables.Item(FLD_CONTEXT_CLS)
        writer.WriteStartElement("grp")
        For intCt = 3 To thisTable.Rows.Count
            ' get current Group class name and Sets list
            cellName = _
                thisTable.Rows.Item(intCt).Cells.Item(FLD_CONTEXT_CLS)
            cellSets = _
                thisTable.Rows.Item(intCt).Cells.Item(FLD_CONTEXT_SETS)
            If thisTable.Rows.Item(intCt).IsLast Then
                nextLeftIndent = 0
            Else
                nextLeftIndent = _
                    thisTable.Rows.Item(intCt + _
                    1).Cells.Item(FLD_CONTEXT_CLS). _
                    Range.Paragraphs.LeftIndent
            End If
            ' this is a child row if intLIndent > 0
            leftIndent = cellName.Range.Paragraphs.LeftIndent

            writer.WriteStartElement("grp", NSURI_CONTEXT)
            writer.WriteAttributeString( _
                "cls", _ 
                cleanCell.Match(cellName.Range.Text).Value)
            ' use space as the delimiter between Set values
            setList = Split(cleanCell.Match(cellSets.Range.Text).Value)
            ' add a varref child for each Set cited
            For Each setRef In setList
                If "" <> setRef Then
                    writer.WriteStartElement("varref", NSURI_CONTEXT)
                    writer.WriteAttributeString("set", Trim(setRef))
                    writer.WriteEndElement()
                End If
            Next setRef
            If nextLeftIndent = leftIndent Then
                ' close current grp
                writer.WriteEndElement()
            ElseIf nextLeftIndent < leftIndent Then
                ' close current grp
                writer.WriteEndElement()
                ' close parent grp
                writer.WriteEndElement()
            End If
            Next intCt
                ' close containing grp tag
                writer.WriteEndElement()
        Exit Do
    End If ' rng.Tables.Count = 0
Loop
' close contextSection tag
writer.WriteEndElement()

Listing 5. Serializing contexts table in NewTestSpec.doc

I'll note a couple of things about this code. First, I found the Range object's Next method very convenient and very fast as a way to increment my way across a document one paragraph at a time. The code above serializes textual paragraphs with the range does not include any tables, and it serializes table objects when found inside a paragraph. Each row of the Contexts table in the spec is a <grp> tag, and if there are more than one row and the next row has a leftIndent property greater than the current row, then a child <grp> tag will be created before the current <grp> tag is closed. If the next row's leftIndent property is less than the current row's leftIndent property, then this means both the current <grp> and the parent <grp> tag must be closed before the current row's <grp> tag gets created.

After serializing the NewTestSpec.doc file (from the download), here's what the Context table data looks like in one of the spec's varmap XML files:

<grp>
    <grp cls="CSetup">
        <grp cls="CSpecial">
            <rec key="arg1">First Arg</rec>
            <rec key="arg2">Second Arg</rec>
            <varref set="1" />
        </grp>
    </grp>
    <grp cls="CExtraSpecial">
        <rec key="arg1">Another Arg</rec>
        <varref set="2" />
    </grp>
</grp>

Listing 6. Contexts table serialized as grp node

The rec tag child nodes of the grp tag were added by merging external XML data files with the test spec's XML file in the Compile() method I'll document shortly. It was this need for merging specific xml nodes into a parent XML file (and the need to split up the XIML file into constituent XML files) that motivated the change from streaming XML (using the XmlTextWriter) to cached XML.

Maintaining hierarchy in XmlTextWriter

Since the Top property of the XmlTextWriter class is private and there is no other public property that tells me how far into a hierarchy the XML stream has gone, I found using the System.Collections.Stack class invaluable while developing Visual Basic .NET code to serialize a hierarchical XML output from a flat Word document. Interestingly, once the code works properly, I'm not sure the stack will be necessary, but it will always be helpful if you want to use asserts in you program to ensure that any given incoming Word document conforms to the XML schema you want to serialize.

Put differently, unless your serialization mechanism follows (and is limited to) the level number of the last heading, a Word document is essentially flat. To make matters worse, when you use the XmlTextWriter to write XML to some base store (such as the file system) you're writing a stream of text, and the XmlTextWriter doesn't give you any public property to help you keep track of when you need to close an open tag. Consequently, you can very easily get to a place where you try to close a tag when there is no open tag to close. Ironically, the XmlTextWriter has a private top property, but that only helps you while you're debugging code. The good news is that if you forget to close any open tags, XmlTextWriter will do that for you, though the outcome may not be what you expected. Actually, I've found this default behavior very useful in development. When my Catch clauses automatically close the writer object, I can see how far my serializer got and where it failed by looking at the resulting (partial) XML file.

The other thing I learned by using this stack-based approach was not to try to optimize the algorithm. The Serialize method was my earliest laboratory for experimenting with these stack-based techniques. The last method I worked on, the Compile method of the XimlCompiler class (see below), is more primitive but far easier to follow (with respect to the hierarchy problem). In this section I'll show you a relevant snippet of code from Serialize. You'll see how I try to always keep track of how deep I am in the hierarchy and what's coming next in the serialization as I decide whether to pop the stack or not. Later, you'll see that I used an alternative strategy in the Compile method that always pops the stack.

Dim testTree As New System.Collections.Stack()
If testTree.Count > 1 Then ' this is not the first Set
    Try
        If testTree.Count = 3 Then
            ' close Var
            testTree.Pop()
            writer.WriteEndElement()
        End If
        Debug.Assert(testTree.Count = 2)
       If testTree.Count = 2 Then
           ' close Level
           testTree.Pop()
           writer.WriteEndElement()
       End If
       Debug.Assert(testTree.Count = 1)
       If testTree.Count = 1 Then
           ' close Set
           ' but don't pop() Set off stack
           writer.WriteEndElement()
        End If
    Catch
        MsgBox("Attempting to close unopened element.", _
            MsgBoxStyle.Critical, "SerializeTestAreas -- New Set")
    End Try
Else
    ' push this Set onto stack
    testTree.Push("set")
End If

writer.WriteStartElement("set", NSURI_IML)

Listing 7. Safely using WriteEndElement

My strategy here is to push the stack just before I start a new element. The code snippet above is running when the current paragraph uses a Set document style (use the NewTestSpec.doc in the download to follow along here). But before I can create a new <set> tag I have to be sure I've closed off all children. The If clause assumes that the maximum depth I can be is three levels (otherwise the first assert kicks in). So if I had just previously processed the previous Set's last Var paragraph, I'd pop the third level off the stack. If the code has just previously processed a Level paragraph I would only be two levels deep and would pop that second level off the stack safely writing the end tag, </level>. The testTree.Count = 1 test ensures that I don't raise the error that occurs when I try to close a non-existent tag; practically speaking the test isn't necessary.

After the Serialize function runs, an IML file resides in the same folder as the Word document that the IML serializes (see Figure 16).

The XimlCompiler class

As noted above, two functions in the XimlCompiler class, compileXimlFromWord and compileXimlFromExe, call the private Compile function. In each calling function, the code opens an IML file (or degrades gracefully if the IML file is missing) and runs the IML through an XSLT transform, writing the resulting XIML to disk. That XIML is reloaded into an XmlDocument object that instantiates an XmlNodeReader object and is passed to the Compile function in the reader argument. The Compile function uses the reader to traverse the XIML file, inserting xml data from an external data file and writing individual varmap files to disk. The pseudocode for this process is described in Listing 8 below.

Private Function compile(ByVal imlPath As String, _
        ByVal imlFileName As String, _
        ByVal reader As XmlNodeReader, _
        ByRef result As String, _
        ByVal templatePath As String) as Integer

    Try
        ' After calling program has transformed IML to XIML:
        ' Read each node of XIML
        ' if nodeName="varmap" then create new XmlTextWriter with
        ' filename based on owner and framework
        ' if nodeName="var" 
        ' add var element and all but its "nr" attribute to output
        ' if nodeName="rec" create start tag for <rec>
        ' if nodeName="grp" write grp element
        ' if nodeName="varref" write varref tag with attributes
        ' if nodeType is comment and if nodeName="rec",
        ' write comment node
        ' if nodeType is text, write text node
        ' if nodeType is varmap endElement, write end element and
        ' close writer to persist varmap node to file
        ' (reopened by subsequent varmap nodes)
    Catch
        ' throw exception back to calling routine with message that
        ' partial xml may be available for examination
    Finally
        ' close reader and writer (even if an exception occurs)
End Function

Listing 8. XimlCompiler class's Compile function pseudocode

The complete source code for Compile follows, but I will comment only on the most interesting parts of the code. First, I'll show you how Compile implements a simpler strategy for safely serializing flat content like a Word document. Also, I'll show you how to use XPathNavigator to include external XML into your main XML document (in my case I need to include runtime data for the Context classes in my test executable).

One way to optimize a serialization algorithm is to not pop the stack when the next node in the incoming infoset is the same as the current node. In other words, as long as you're processing siblings, don't pop the stack. But when the family tree gets to three levels deep (see Listing 11), this strategy becomes counter productive. Since the <grp> tag can have an arbitrarily deep nesting of <grp> and <varref> tags, I decided to simply pop the stack each time I encountered a close tag in the incoming infoset.

Private Function Compile(ByVal imlPath As String, _
        ByVal imlFileName As String, ByVal reader As XmlNodeReader, _
        ByRef result As String, ByVal templatePath As String) _
        As Integer

    Dim dataInc() As String
    Dim dataNodeIterator As XPathNodeIterator
    Dim doc As XmlDocument = New XmlDocument()
    Dim includeDataClass As Boolean
    Dim nt As New NameTable()
    Dim nav As XPathNavigator
    Dim nsuri As String = _
        "http://wordXml.net/schemas/mcf/2002/01/varmap"
    Dim varCount As Integer
    Dim varmapCount As Integer = -1
    Dim varmapFileName As String
    Dim varmapTree As New Stack()
    Dim writer As XmlTextWriter

    nt = reader.NameTable

    reader.Read()
    Try
        While reader.Read()
        Select Case reader.NodeType
            Case XmlNodeType.Element
                Select Case reader.Name
                    Case nt.Get("varmap")
                    If reader.GetAttribute("framework") <> _
                            "Manual" Then
                        varmapTree.Push(reader.Name)
                        varmapCount = varmapCount + 1
                        varmapFileName = imlPath & imlFileName & _
                        IIf("" <> reader.GetAttribute("owner"), _
                            reader.GetAttribute("owner"), _
                            CStr(varmapCount))
                        If "" = reader.GetAttribute("framework") Then
                            varmapFileName = varmapFileName & _
                            ".varmap.xml"
                        Else
                            varmapFileName = varmapFileName & "." &_
                                reader.GetAttribute("framework") & _
                                ".varmap.xml"
                        End If
                        result = result & vbTab & varmapFileName & vbCr
                        writer = New XmlTextWriter _
                            (varmapFileName, Nothing)
                        writer.Formatting = Formatting.Indented
                        writer.Indentation = 2
                        writer.WriteStartElement(reader.Name, nsuri)
                        ' skip over wordxml.net attributes
                    Do While reader.MoveToNextAttribute()
                        If InStr("revision.framework", reader.Name) = _
                            0 Then
                        writer.WriteAttributeString(reader.Name, _
                            reader.Value)
                        End If
                    Loop
                Else
                    ' skip over the rest of this manual test...
                    Do Until reader.Name = "varmap" And _
                            reader.NodeType = XmlNodeType.EndElement
                        reader.Read()
                    Loop
                End If

            Case nt.Get("var")
                ' ensures no nested vars
                If varmapTree.Count = 1 Then _
                    varmapTree.Push(reader.Name)
                varCount = varCount + 1
                writer.WriteStartElement(reader.Name)
                Do While reader.MoveToNextAttribute()
                    If reader.Name <> nt.Get("nr") Then
                        writer.WriteAttributeString(reader.Name, _
                            reader.Value)
                    End If
                Loop
                reader.MoveToElement()
                If reader.IsEmptyElement Then
                    writer.WriteEndElement()
                    varmapTree.Pop()
                End If

            Case nt.Get("rec")
                varmapTree.Push(reader.Name)
                writer.WriteStartElement(reader.Name)
                writer.WriteAttributes(reader, False)

            Case nt.Get("grp")
                varmapTree.Push(reader.Name)
                writer.WriteStartElement(reader.Name)
                Do While reader.MoveToNextAttribute()
                    If reader.Name = nt.Get("dataCls") Then
                        dataInc = Split(reader.Value, "#")
                        includeDataClass = True
                        Exit Do
                    Else
                        includeDataClass = False
                        writer.WriteAttributeString(reader.Name, _
                            reader.Value)
                    End If
                Loop
                reader.MoveToElement()
                If Not dataInc Is Nothing Then
                    If dataInc(0) <> "" Then
                        ' load dataCls file
                        doc = New XmlDocument()
                        doc.Load(imlPath & dataInc(0))
                        nav = doc.CreateNavigator()
                        If dataInc.Length = 2 Then
                            dataNodeIterator = nav.Select(dataInc(1))
                        Else
                            dataNodeIterator = _
                            nav.Select("//*[@xlink='" & _
                                reader.GetAttribute("cls") & "']")
                        End If
                        If Not dataNodeIterator Is Nothing Then
                            IncludeXml(dataNodeIterator, writer)
                            dataInc = Nothing
                            dataNodeIterator= Nothing
                         End If
                     End If
                 End If

            Case nt.Get("varref")
                varmapTree.Push(reader.Name)
                writer.WriteStartElement(reader.Name)
                writer.WriteAttributes(reader, False)
                If reader.IsEmptyElement Then
                    writer.WriteEndElement()
                    varmapTree.Pop()
                End If
        End Select
    
        Case XmlNodeType.Comment
            If nt.Get(varmapTree.Peek()) = nt.Get("rec") Then
                writer.WriteComment(reader.Value)
            End If

        Case XmlNodeType.Text
            writer.WriteString(reader.Value)

        Case XmlNodeType.EndElement
            If nt.Get("ximl") <> reader.Name And _
                varmapTree.Count > 0 Then
                writer.WriteEndElement()
                varmapTree.Pop()
                If nt.Get("varmap") = reader.Name Then
                    writer.Close()
                End If
            End If
        End Select
    End While

    Catch e As Exception
        MsgBox(e.Message)
        Throw New Exception("Could not complete compilation of ximl. "
          & _
            "Please consult the file " & varmapFileName & _
            " for the point at " & _
            "which the failure occurred.")

    Finally
        writer.Close()
        reader.Close()
    End Try

    Return varCount
End Function

Listing 9. XimlCompiler class's Compile function source code

The Case statement that handles processing IML <grp> tags looks for a dataCls attribute indicating that the tester needs part of his test executable to fetch some separate XML data at runtime. If the dataCls attribute's value includes the # character then Compile uses the following XPath expression (and the special xlink attribute in the external data file) to isolate the correct node in the external XML file. If, instead, Compile() only dereferences a file name, then the name of the cls attribute in the IML file from the Context table becomes the value to find for the xlink attribute in the external data file. Compile can handle both techniques for binding runtime data to executing classes, and passes the selected XML data to IncludeXml (see Listing 10) where the latter function merges the external data into the resulting varmap file for execution.

Private Function IncludeXml(ByVal dataNodeIterator As
  XPathNodeIterator, ByRef writer As XmlTextWriter)
    Dim nav As XPathNavigator
    While (dataNodeIterator.MoveNext())
        nav = dataNodes.Current.Clone()
        writer.WriteStartElement(nav.Name)

        nav.MoveToFirstAttribute()
        If nav.Name <> "xlink" Then
            writer.WriteAttributeString(nav.Name, nav.Value)
        End If
        While (nav.MoveToNextAttribute)
            If nav.Name <> "xlink" Then
                writer.WriteAttributeString(nav.Name, nav2.Value)
            End If
        End While
        nav.MoveToParent()

        writer.WriteString(nav.Value)
        writer.WriteEndElement()

    End While
End Function

Listing 10. XimlCompiler class's IncludeXml function source code

The code from Listing 9 and Listing 10 merges the XML from Listing 11 and Listing 12 to produce the XML in Listing 6.

<grp>
    <grp cls="CSetup" >
      <grp cls="CSpecial"  
          dataCls="grpClass.xml#data/grp[@xlink=&quot;CX&quot;]/rec"
      >
          <varref set="1" />
      </grp>
    </grp>
    <grp cls="CExtraSpecial" dataCls="grpClass.xml">
        <varref set="2" />
    </grp>
</grp>

Listing 11. IML file's serialized Contexts table

<data>
    <grp xlink="CX">
        <rec key="arg1" xlink="CSpecial">First Arg</rec>
        <rec key="arg2" xlink="CSpecial">Second Arg</rec>
    </grp>
    <rec key="arg1" xlink="CExtraSpecial">Another Arg</rec>
</data>

Listing 12. External XML data file processed by IncludeXml

The Visual Basic .NET EXE Client

In Socrates, the EXE is necessary for testers who generate IML from some source other than Word 2002. The most common sources are Microsoft SQL Server™ or legacy specs written in a different XML schema. The Visual Basic .NET EXE only compiles XIML and varmaps from IML. It does not interact with Word 2002 in any way (note the absence of any Word references in the WordXmlHost node in Figure 2). To get the circled reference to the class library, I clicked the Browse button on the References shortcut menu and navigated to the folder containing my new .NET component's DLL.

Click here for larger image

Figure 2. References for the EXE and the class library (click picture for larger image)

The EXE can be called from the command line, or the IML file can be dropped on a shortcut to the exe. The EXE will instantiate the class library (see Listing 14 below) and will pass three arguments and display a text message returned from the component:

Dim a() As String
Dim x As String
...
result = XimlCompiler.compileXimlFromExe(xsltPath, imlPath, _
    imlFileName)

a = Split(result, vbCr)
For Each x In a
    Console.WriteLine(x)
Next x

If promptUser Then
    Console.WriteLine()
    Console.WriteLine("Enter any key to finish")
    Console.ReadLine()
End If

Listing 13. Compiling XIML from the EXE

The rest of the EXE source code determines how many arguments were passed in and generates from that information the three arguments required by the component.

Compiling the Visual Basic .NET Code

This section will highlight the dialog boxes in Visual Studio .NET that contain information used by COM and by the Visual Basic .NET EXE.

The two features of Figure 3 that are important are the Assembly name and Root namespace boxes.

Click here for larger image

Figure 3. Specifying the assembly and namespace (click picture for larger image)

The Visual Basic .NET EXE imports the component with a statement that cites (most of) the component's root namespace in the Imports statement and uses the last level of the namespace plus the class name to instantiate the object:

Imports WordXml.Net
Dim XimlCompiler As New Authoring.XimlCompiler()

Listing 14. Instantiating the component in the EXE

The VBA client uses the full namespace string (the string is actually taken from the AssemblyInfo.vb file that Visual Studio .NET generates for the component, as shown in Listing 10) to add a reference to the component, and uses the assembly name plus class name to instantiate the XimlCompiler object (see Listing 15).

Aa140276.odc_usingdotnetfromvba04(en-us,office.10).gif

Figure 4. Adding a reference to the WordXml.Net compilers

Dim XimlCompiler As New WordXml_Net.XimlCompiler

Listing 15. Instantiating the XimlCompiler in VBA

Debugging the Visual Basic .NET Class Library

To debug a .NET component, you need to go through a .NET EXE. I use the same .NET EXE described above (used primarily to generate XIML and varmap files from IML files) to handle this chore. I have three debugging scenarios. The first is debugging the Serialize function called by Word. The second is debugging the XimlCompiler class (either from Word or from the .NET EXE itself). The third is debugging the source code in the .NET EXE alone.

Figure 5 shows how to set up the .NET EXE Debugging property sheet to run WINWORD.EXE and open a Word document. Set the Start Action area in the Configuration Properties Debugging page to Start external program and enter the file path to the host (in our case, this is Word 2002). In the Start Options area enter the path name to the Word document that you want to open. When you press F5, the .NET EXE will start Word (instead of itself) and open your document. When your VBA code in Word calls on the .NET component, you can set breakpoints in Visual Studio .NET to stop processing during a call from Word.

Click here for larger image

Figure 5. Setting up debugging from Word (click picture to see larger image)

In the second debugging scenario, I need to run the .NET EXE but debug the .NET component's functions. To switch scenarios, I select the Start Project area and change the command line to open an IML file. I can then set breakpoints in the XimlCompiler class and debug that code.

Click here for larger image

Figure 6. Debugging an XML file (click picture to see larger image)

To debug in the third scenario I leave the Debugging properties set in Figure 6, but I set break points in the .NET EXE source code instead.

Enabling COM Interop

Clearly, it is crucial that the .NET component and Word 2002 interoperate, and for reasons of performance it is crucial that the predominant direction of this interoperability is .NET hosted by COM (not the other way round). That is, all component classes and all but one component function have at least a little interaction with Word (to update the Status Bar), but when the Word document is traversed, the .NET component is running under the address space of Word. Providing document navigation in a .NET EXE imposes serious performance hits due to cross-process marshaling.

Enabling COM interop for a .NET component requires setting one switch in the component's property sheet and requires some additional statements in the component's source code, as we'll see next. It also requires a setting in the Deployment Project's Detected Dependencies section of the project's Property page (as I'll show you below).

The switch you need to set is in the Configuration Properties Build property sheet (see Figure 7). If the Register for COM Interop check box is selected, Visual Studio .NET will put a COM callable wrapper (CCW) around the component enabling COM to interact with it as if the component was written with COM constructs. The checkbox also causes Visual Studio .NET to make any necessary Microsoft Windows® registry entries (by calling RegAsm.exe), and it exports the component's type library (by calling TlbExp.exe). If the project is rebuilt, this check box will delete any previous registry entries and type library files before recreating them with updated source code.

Click here for larger image

Figure 7. Enabling COM interop (click picture for larger image)

The second step enabling COM interop is to add the correct attribute to our component's AssemblyInfo.vb file. Here's the line we added to the file that is automatically generated by Visual Studio .NET when we first created the project:

<Assembly: ClassInterfaceAttribute(ClassInterfaceType.AutoDual)>

We could have added this attribute (without the Assembly: prefix) to both of the component's classes, but adding it in one place is easier. If, for some reason, we add a class that should remain invisible to COM, we'll need to move the ClassInterfaceAttribute out of the AssemblyInfo.vb file and into the component's class file, marking only those classes we want to expose.

The final step needed to enable COM to instantiate an object based on our .NET component is to add a blank constructor for each class (see Listing 2).

The VBA Client

This section documents how the VBA client uses the WordXml.Net.dll. Word 2002 has a reference to the DLL (a COM callable wrapper (CCW) generated by Visual Studio .NET after it compiles the Visual Basic .NET source code). The next two sections describe how VBA code interoperates with the CCW. Note that the reference to the CCW is done the same way any traditional COM component is referenced in VBA (from the References menu command of the Visual Basic Editor Tools menu as in Figure 4).

Compile test areas to IML

The serializeTestAreas function and the compileIml procedure run from the WordXmlDotNet module in the WordXml.dot template. The compileSpec procedure calls the serializeTestAreas function when the user selects the Compile Test Areas to IML option from the WordXml.dot Tools menu.

Function serializeTestAreas()
    Dim XmlProvider As New WordXml_Net.XmlProvider
    Dim result As Boolean
    Dim datestart As Date

    datestart = Now()
    Application.ScreenUpdating = False
    On Error GoTo handler
    result = XmlProvider.serializeTestAreas(rngTestAreas)
    On Error GoTo 0
    Application.ScreenUpdating = True

    Application.StatusBar = "Serialized " & _
        rngTestAreas.Paragraphs.Count & " nodes in " & _
        DateDiff("s", datestart, Now()) & " seconds."
    serializeTestAreas = True

Exit Function

handler:
    MsgBox ("Error serializing Test Areas:" & vbCr & _
        Err.Description)
    serializeTestAreas = False

End Function

Listing 16. The VBA serializeTestAreas function

Compile IML to varmaps

After compiling a spec to IML, Socrates compiles the IML into XIML and varmap files. The test executable uses the varmap files directly at run time. In fact, if the varmap is unavailable or invalid, the test executable won't even compile.

Sub compileIml()
    Dim XimlCompiler As New WordXml_Net.XimlCompiler
    Dim xsltPath As String
    Dim imlPath As String
    Dim imlFileName As String

    xsltPath = ActiveDocument.AttachedTemplate.Path & "\"
    imlPath = ActiveDocument.Path & "\"
    imlFileName = Replace(ActiveDocument.name, ".doc", ".xml")

    MsgBox XimlCompiler.compileXiml(Application, xsltPath, imlPath,
      imlFileName)

End Sub

Listing 17. The compileIml procedure

Again, note the use of the .NET assembly's name in the ProgID for the XimlCompiler. Also, a reference to the currently running Application object is sent to the .NET component so that the compileXiml function can update the Word Status Bar with progress and elapsed time information.

Deploying WordXml.Net

In this final section you'll see how to use Visual Studio .NET to build an MSI file for deployment.

There are a few details to remember and a special case to avoid, but other than that, creating MSI files in Visual Studio .NET are a snap. I'll elaborate on that special case in the appendix to this article. The process of coming to terms with the issue raised by an MSI file containing a DLL used by both a Word template and a .NET EXE was the most vexing part of my adventure in .NET land, and I really want to keep you from having to work as hard as I did to get the deployment project to work flawlessly.

So to begin, with my solution node selected I chose the New Project option on the Add menu. From the listed types, I selected Setup and Deployment Projects completing the dialog box with a project name and a click of the OK button.

The next few steps are important to get right. There are three kinds of things I needed to add to my deployment project: the .NET EXE (and symbols), the Word template, and all of the related XML files.

To add the .NET EXE and symbols, I right-clicked the setup project node in my Solution Explorer window, pointed to Add, and clicked the Project Output menu command. I then selected the WordXmlHost option from the Projects list and the Primary output and Debug Symbols items from the listed groups of files (see Figure 8).

Aa140276.odc_usingdotnetfromvba08(en-us,office.10).gif

Figure 8. Add the .NET EXE and symbols

To add the symbols for the .NET component, I repeated the steps for the .NET EXE except that I left out the Primary Output group.

At this point I need to warn you about something I did during my first attempts to build a deployment project for my Socrates application, I included the .NET component's Primary Output group. As a result, I had two entries under the Detected Dependencies node (see the circled entry around SqrtsDotNetAuthoring.dll in Figure 9).

In those early days when the setup project listed two instances of the SqrtsDotNetAuthoring.dll in Detected Dependencies, we often found that the VBA code couldn't instantiate the .NET DLL. Once we removed the explicit reference to the .NET DLL in the deployment project (leaving only the entry that got there because SqrtsDotNetAuthoring.dll is already a dependency of the sqrts.exe), this error went away (but it took many hours of troubleshooting before we discovered this double reference as the problem with the object activation in VBA). Don't make my mistake: don't add your own .NET component to the setup project; let your own .NET EXE do that for you. You'll thank me later.

Returning to steps I took to create the WordXml.Net setup project, I added the Word template and sample test specification document and the XSL and XML files to the setup project by pointing to Add and clicking File on the shortcut menu for the setup project node in the Solution Explorer window. I then pressed CTRL and clicked all of the files (except the .NET files) that I needed in the MSI file. (When building the Socrates setup project I included other files such as an xml configuration file and several Xml Schema Definition files; but those files are for editing test specs and were not needed for the simplified WordXml.Net application).

Finally, I was ready to take the last configuration steps:

  • Exclude some detected dependencies.
  • Configure the .NET component to be registered in the Windows registry during setup.
  • Add the WordXml.Net authoring option to the Windows Programs menu.

Note each of the Detected Dependencies that has a little strikeout symbol in the lower left corner of the icon (for example, MSWord.OLB in Figure 9). The dotnetfxredist_x86_enu.msm is excluded by default, and to exclude the other three dependencies, I pressed CTRL and clicked each icon, and with all three dependencies selected, I right-clicked one of the selected items and clicked Exclude on the shortcut menu.

Click here for larger image

Figure 9. The setup files (click picture for larger image)

I'm almost ready to ship. All I need to do now is set the Register property on the WordXml.Net.dll node in Detected Dependencies. I selected the node and clicked the Properties tab (in my Visual Studio .NET configuration, this tab is one tab to the right of the Solution Explorer tab) to see Figure 10. With the Register property set to vsdraCOM, I had only one more task: provide my users with a menu option off of their Programs menu.

Click here for larger image

Figure 10. Enabling COM registration during installation (click picture for larger image)

The last task is creating a shortcut to the Word template and making that shortcut available from the Windows Programs menu. This task requires three steps in the following order:

  1. Create a menu off the Windows Programs menu.
  2. Create a shortcut for the Word template.
  3. Move the shortcut to the new menu.

You need to take all three steps from the setup project's File System view. To open the File Systems view, right-click the setup project node in the Solution Explorer window, point to View, and click File System on the shortcut menu. To create a Programs menu option, select the User's Programs Menu option, right-click, point to Add and click Folder. Give the folder a name, as I did in Figure 11.

Aa140276.odc_usingdotnetfromvba11(en-us,office.10).gif

Figure 11. Create Programs menu option

Next, you need to create a shortcut to your Word template. I did that by selecting the wordXml.dot option off the file list exposed when selecting the Applications Folder option on the left of the screen and then the wordXml.dot file on the right side (see Figure 12). Right-click wordXml.dot and click Create shortcut to put a shortcut at the end of the listed files.

Click here for larger image

Figure 12. Create a shortcut to the Word template (click picture for larger image)

To get the new shortcut to the Programs menu I had but to drag the shortcut I just created to the WordXml.Net Authoring node of the User's Programs Menu option (see Figure 13). Note the hyphen in the shortcut name. I tried using a colon, but the compiler complained, so I reverted to my second choice, the hyphen.

Aa140276.odc_usingdotnetfromvba13(en-us,office.10).gif

Figure 13. Add the shortcut to the Programs menu

Now when the user selects the WordXml.Net Authoring on their Programs menu, they will see the menu command to launch the Word template. When they click the menu command, Word will start and create a new test spec.

After I added the Setup project to the WordXml solution, Visual Studio .NET skipped the Setup project because, "Project configuration skipped because it is not selected in this solution configuration." I hadn't seen this before, and couldn't find anything in the VS .NET Help system to suggest what I had done wrong. So I selected the Solution node in the Solution Explorer and selected the node's property sheet (see Figure 14). In the Configuration Properties panel I noticed that the checkbox under the Build column was unchecked for the Setup project. Once I checked that box, I could completely rebuild the Solution.

Click here for larger image

Figure 14. Solution property sheet

Also, I remembered one other detail I haven't mentioned here: in the property sheet for the Setup project I set the "RemovePreviousVersions" to true. Should I ever ship a later version, the setup program will first uninstall WordXml—assuming I increment my later versions' Version property (which will trigger a different UpgradeCode that I will confirm when prompted by the Setup project after changing the Version property).

This article wouldn't be possible without the help of some very generous Microsoft authors and software development engineers. My thanks to Siew-Moi Khor, Misha Shneerson, Ralf Westphal, Paul Cornell, David Guyer, and Kenny Jones. I'm sure my poor scholarship has missed others, but the following articles are worth the time to read.

Paul Cornell has an excellent survey article, Introducing .NET to Office Developers, that lays out all the various .NET based technologies that you can use with Microsoft Office. Paul also writes on a topic related to mine. The difference is that Paul's article, Creating Office Managed COM Add-Ins with Visual Studio .NET, shows you how to write add-ins for Word by using .NET and this article is more primitive since it shows you how to use VBA to access the power of the .NET Framework.

Siew-Moi Khor and Misha Shneerson teamed up to write a trilogy on using managed code in unmanaged hosts such as VBA. These articles, like Paul's, are for the more advanced work required to use COM add-ins in Word. Here are the articles:

I'll be referring back to all these articles in the future when I rearchitect Socrates' VBA code as a COM add-in that uses smart tags.

Ralf Westphal has written a great article that describes a more advanced approach to using the XmlTextReader class than I used in Socrates. In, Implementing XmlReader Classes for Non-XML Data Structures and Formats, Ralf documents a generic XmlTextReader object derived from the System.Xml abstract base class, XmlReader. Though Ralf does not have examples of an XmlWordReader class, I plan to use as many of Ralf's ideas as I can when I rewrite the SerializeTestAreas function in my XmlProvider class. One point will differentiate my approach from Ralf's: Ralf uses XSD to inform the design of his custom XmlReader object but does not use the XSD at run time. My experiments do include the XSD at runtime to make a validating XmlWordReader object. That, however, is another story.

For details pertaining to Setup and Deployment Projects, Kenny Jones is the man. His collection of papers is the best source for all the facts pertaining to MSI files and Visual Studio .NET. I just scratched the MSI surface in my article. See Kenny for the rest of the story.

And finally, my thanks go to David Guyer. His patience and dedication to tracking down the unknown was inspirational. David got me out of a lot of jams that I got myself into because I was experimenting as I was learning. With David's help, I hope this article demystifies the MSI process, making it simple and straightforward to implement for other developers who want to turbo-charge their legacy VBA code with .NET components.

Conclusion

This document has described the constellation of programs that start with Word 2002 and end with XML files that can be consumed by software test automation. The programs are organized to optimize performance and ease of use. A Visual Basic .NET component uses managed XML to exploit the streaming XML techniques of the System.Xml.XmlTextReader and System.Xml.XmlTextWriter classes to do the heavy processing of large Word documents. A few switches are set in Visual Studio .NET before compiling the component, and a few lines of code are added to the source files with the result being a .NET component that can be used inside of Word 2002.

I thought it might be helpful if I closed this article with a table summarizing the decision rules that I used to best match available XML processing technology with my programming needs.

Need XML cache? State-based traversal? I used. . .  In. . . 
No No XmlTextWriter

XmlNodeReader

XmlProvider.Serialize

XimlCompiler.Compile

Yes Yes XPathNavigator and XSLT XimlCompiler.CompileXiml

Appendix

AssemblyInfo.Vb

The highlighted assembly attributes contain:

  • The text used by VBA to add a reference to this assembly (see Figure 4).
  • The attribute that enables COM to see the component's interface.

The highlighted Imports statement is necessary for the unqualified reference to the ClassInterfaceAttribute attribute.

Imports System.Reflection
Imports System.Runtime.InteropServices
' General Information about an assembly is controlled through the
' following set of attributes. Change these attribute values to
' modify the information associated with an assembly.

' Review the values of the assembly attributes

<Assembly: AssemblyTitle("WordXml.Net")> 
' following attribute is friendly name when adding assembly to 
' COM references
<Assembly: AssemblyDescription("WordXml.Net.Authoring")>
<Assembly: AssemblyCompany("Microsoft Corporation")>
<Assembly: AssemblyProduct("WordXml.Net Authoring Template")>
<Assembly: AssemblyCopyright("2002")>
<Assembly: AssemblyTrademark("Microsoft Corporation")>
<Assembly: CLSCompliant(True)>
<Assembly: ClassInterfaceAttribute(ClassInterfaceType.AutoDual)>

' The following GUID is for the ID of the typelib if this project
'is exposed to COM
<Assembly: Guid("88A80136-9318-4798-B0A4-5FE3121A0D96")>

' Version information for an assembly consists of the following 'four
  values:
'
'      Major Version
'      Minor Version
'      Build Number
'      Revision
'
'You can specify all the values or you can default the Build and
  'Revision Numbers by using the '*' as shown below:

<Assembly: AssemblyVersion("1.0.*")>

Listing 18. AssemblyInfo.vb

Precautions

Care should be taken when deciding on assembly names because they may conflict with VBA module or class names.

Depending on the sequence of events, you may see the following error when compiling the VS.Net solution:

The file 'SqrtsDotNetAuthoring.dll' cannot be copied to the run
  directory.
The process cannot access the file because it is being used by another
  process.

This is because at least one instance of WINWORD.EXE has a reference to the component. To ensure all WINWORD.EXE processes are closed, open Task Manager and sort by Image Name. Be sure that you've closed Word, and then select each remaining WINWORD.EXE image and click the End Process button.

Early in the development cycle (before I realized I shouldn't add an explicit Project Output reference to my .NET component), the following dialog box periodically appeared after the sqrts.dll was rebuilt.

Aa140276.odc_usingdotnetfromvba15(en-us,office.10).gif

Figure 15. The Automation error dialog box

This anomaly is related to the one I documented above when VBA complains that it cannot create an object based on the .NET classes. I have also seen these errors when I move my Visual Studio .NET projects around (for example, from my development computer at work to my laptop computer) then immediately recompile and use a spec attached to another instance of the SqrtsDotNetAuthoring.dll.

The solution that generally works (and always works in the scenario that I just outlined about using an old Word document with a new instance of the .NET DLL) is to:

  1. Open the VBA editor for the Word document having trouble.
  2. Select (in my case) the Sqrts template.
  3. Uncheck (in my case) the Smx.Test.Infra.Sqrts.Net.Authoring reference (see Figure 4).
  4. Click OK.
  5. Repeat the process only reselecting the .NET component.

This process rebinds the correct DLL to the spec and is necessary when the sqrts.dot and any documents based on the sqrts.dot template move around.

So be a little careful when you move development code, and you should have very little trouble keeping your own .NET enabled VBA template working. I should note here that none of these caveats apply to your users, only to you as a developer. Your users get the added assurance that the Windows Installer will install all files and register every properly so that your users' first impressions will be as good as they get.

Sqrts.Net Functional Specification

This section describes what the system, code named "Socrates" (also known as "Sqrts"), does. The section following, "Program Specification," describes how Socrates does it.

Intended audience

Socrates is designed for software testers and their managers. Socrates helps testers write test specifications in a highly structured, yet very flexible way. The key to success for Socrates is that it leverages the processing power of XML. That is, once the Word document content is serialized to XML, the test specification becomes, essentially, executable. More precisely, the test specification becomes the foundation for, and provides runtime data to, data-driven test executables. Since the XML data includes class names of implemented test code, the test spec is not a dead document but a living document that is always synchronized with test code. In other words, if the test spec and test executable aren't in synch, the test executable won't run.

Since the test spec is not only XML and tightly bound to test executables, when a test fails, an XML-based failure manager can display to the test run investigator the relevant part of the actual test spec describing what the failed test was supposed to do. This lowers the time required to debug failures and enables testers who were not the authors of the test to execute the tests, investigate failures, and enter bugs against product code.

Managers benefit because reports of designed, implemented, and executed tests are free. That is, testers can focus on designing really cool tests instead of spending cycles updating a Microsoft Excel spreadsheet detailing how many positive, negative, security, or globalization tests have been written; or what percentage of all tests are basic verification, functional, integration, stress, or acceptance tests. Reports are merely an XSL transformation of the same XML that drives the tests themselves.

Microsoft Word, in effect, then becomes an XML editor.

Design goals

  • Enable testers to write well-designed and highly structured test specifications in the shortest possible time.
  • Ensure all test specifications are written and displayed in the same format, shortening spec review time.
  • Enable testers to write data-driven tests to increase test coverage without increasing the time to write and maintain test executables. In other words, test executables are more efficient because run-time data is not statically compiled into the executable.
  • Enable testers to use test automation frameworks, which reduces the time needed to write tests and increases the quality and consistency of all tests written by the team's testers.
  • Automate the manual tests by displaying the manual steps in a Web Form and storing the results of the manual tests in XML files or database tables for easy reporting and analysis.

Features

  • A user interface that consistently uses Word objects to capture test data in a way that's easy to serialize to XML.
  • Two XML compilers: one that serializes Word content to XML and a second XML compiler that processes the first XML into XML formats suitable for test code automation and Web Form rendering (for automated and manual tests, respectively).
  • HTML version of a spec (virtually identical in appearance to original Word document) that transforms the spec's XML into HTML.
  • Browser-based, Windows Form-based and XML Web Service-based form factors for rendering manual tests and recording results of executing those manual tests.
  • Integration with test failure management software so that investigators can see test specification data for each failed (automated or manual) test variation.

Workflow

Testers begin a test engagement by describing their tests in terms of Sets, Levels, and Vars. The test specification also provides a table to specify which Sets belong to which test executable and a table to specify which Sets use which classes to execute common setup and cleanup operations. Testers specify run-time data either by listing the arguments that a test variation inputs along with the possible values of each argument, or the tester specifies each argument's value for each variation. In the former case, Socrates generates a number of test cases equal to the cross product of the number of arguments times their possible values. In the latter case, the tester specifies each variation's input data explicitly.

Once the test specification is designed and signed off by test management, the tester uses menu options from the Socrates user interface to generate an HTML copy of the Word document and an XML file representing all of the data entered in the spec. The XML vocabulary is called the intermediate markup language (IML) and serves a purpose for the test executable that the Microsoft Intermediate Language (MSIL) does for the Microsoft Common Language Runtime (CLR).

Once the tester is done editing the IML, it's time to generate full test case data. The result can be a test with far more actual tests than specified tests. Each actual test includes the runtime data specified in the design phase. The expanded IML file is called XIML. Socrates automatically processes the XIML one more time into individual XML files called varmaps.

Once the tester has written the test implementation code, they edit the spec with the names of classes that implement each test variation. In addition, the tester can edit the spec that ultimately adds or modifies attributes on the XML nodes that activate and deactivate tests (based on the status of bugs previously found in the software under test). Finally, testers can assign different owners to different test executables.

At this point it's time to execute the tests by running the varmap files (generated by the Socrates XML compilers) through the test automation framework. Each variation specifies which class to run and what data to use. In this way only specified classes run, and this is what keeps the test specification and executable in synch.

Socrates renders manual tests in Web Forms with radio buttons for pass or fail on each variation. Steps detailed in the test spec appear on the Web Form so that the tester can run the manual test. Each click of the Web Form radio button updates the manual test's varmap file. When implemented as an XML Web service, these varmaps can serve multiple users so that more than one tester can run the manual tests. Every time a tester updates the varmap, the Web Form is re-rendered from the varmap so that any tester can see any changes made by any tester.

Program Specification

This section describes the specific managed XML and XSL classes used to implement the executable test specification system. These XML classes are called from VBA in Word, from the command line with a .NET EXE and from a Web Form.

Architecture

The main goal of the Sqrts.NET system is to serialize binary content from Microsoft Word 2002 documents into an XML format suitable for use by test code execution engines such as the Managed Code Framework (see step 3 in Figure 16).

Click here for larger image

Figure 16. Sqrts.NET compilers overview (click picture for larger image)

As you can see, the key player in this architecture is SqrtsDotNetAuthoring.dll. That component is called by both Word 2002 (specifically, by the sqrts.dot template) and (optionally) by a .NET EXE (sqrts.net.varmap.compiler.exe). The SqrtsDotNetAuthoring.dll first converts the Word binary content to IML (step 1 in Figure 16). IML is an XML schema. The IML may contain XML nodes that represent multiple test cases for a given test variation, so the .NET component must then process the IML expanding all tacit test cases into explicit vars (step 2 in Figure 16).

This expanded IML (XIML) may not yet be suitable for consumption by test code frameworks; for example, the XIML file may contain multiple varmap nodes (one or more for automated tests and one or more for manual tests). Therefore, one final step is necessary. In this final step, the .NET component uses managed XML to save each automated test varmap node to a separate file, leaving all manual test varmap nodes in the XIML file (step 3 in Figure 16).

To summarize, Word content passes through an XML pipeline being changed along the way by a series of XSLT and managed XML files. The process will produce three XML files, each with a different purpose. The first XML file is a direct serialization of the Word content; the second XML file expands some of the first XML into much more test data than was entered in Word; and the third file is in a format suitable for execution by software test automation. Our goal was to use a loosely-coupled technology to tightly bind our test specs to our test executables. Our motto is "change the spec, change the code."