How to: Use Annotations to Minimize Serialization and Deserialization by Using the Open XML API

This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This page may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

The Office Open XML Package specification defines a set of XML files that contain the content and define the relationships for all of the document parts stored in a single package. These packages combine the parts that comprise the document files for Microsoft® Office Excel® 2007, Microsoft Office PowerPoint® 2007, and Microsoft Office Word 2007. The Open XML Application Programming Interface (API) allows you to create packages and manipulate the files that comprise the packages. This topic walks through the code and steps to use annotations to minimize serialization and deserialization while associating data with a document part (file) of an Office Open XML package in Office Word 2007, although the steps are the same for each of the three 2007 Microsoft Office system programs that support the Office Open XML Format.

NoteNote

The code samples in this topic are in Microsoft Visual Basic .NET and Microsoft Visual C#. You can use them in an add-in created in Microsoft Visual Studio 2008. For more information about how to create an add-in in Visual Studio 2008see Getting Started with the Open XML Format SDK 1.0.

Using Annotations to Minimize Serialization and Deserialization

When you develop Open XML Format solutions, you often want to associate arbitrary data with a particular document part. After opening an Open XML document, you may want to read a document part into a System.Xml.Linq.XDocument object, query the XDocument object using LINQ to XML, perhaps modify the XDocument, and then serialize the XDocument back into the package.

Reading the XML from the document part, parsing it, modifying it, and then serializing it back into the package every time that you want to access the XML, results in poor performance. Reading the XML from the document part only once, using it as appropriate, and then serializing it back into the part is more efficient. After reading the XML from the document part, if you add the XDocument instance as an annotation on the document part, you can easily retrieve the annotation instead of rereading the XML each time you access it. Annotations allow you to associate any object with an OpenXmlPartContainer object (the base class of OpenXmlPartpart) in a type-safe way.

NoteNote

The same approach applies if you are using a System.Xml.XmlDocument object.

Using Microsoft Visual C# 3.0 and Microsoft Visual Basic .NET 9.0, you can write an extension method that easily retrieves an XDocument from the document part. The extension method first checks for the existence of an annotation of type XDocument. If it exists, it is returned. If it does not exist, then the method populates the XDocument from the document part, and adds it as an annotation to the document part. The following code shows an extension method.

NoteNote

The following code samples are edited to facilitate online viewing. Change the indentation and remove extra lines to work with this code.

Module LocalExtensions
    ' How to create an extension method.
    <Extension()> _
    Function GetXDocument(ByVal part As OpenXmlPart) As XDocument
        Dim xdoc As XDocument = part.Annotation(Of XDocument)()
        If (xdoc Is Nothing = False) Then
            Return xdoc
        End If

        Using streamReader As StreamReader =
                               new StreamReader(part.GetStream())
            xdoc = XDocument.Load(XmlReader.Create(streamReader))
            part.AddAnnotation(xdoc)
        End Using

        Return xdoc
    End Function
End Module
public static class LocalExtensions {
    // How to create an extension method.
    public static XDocument GetXDocument(this OpenXmlPart part) {
        XDocument xdoc = part.Annotation<XDocument>();
        if (xdoc != null)
            return xdoc;
        using (StreamReader streamReader = 
                               new StreamReader(part.GetStream()))
            xdoc = XDocument.Load(XmlReader.Create(streamReader));
        part.AddAnnotation(xdoc);
        return xdoc;
    }
}

The following example shows the simplest use of this extension method.

Module Module1

    ' Get the XDocument using the GetXDocument function.
    ' This function executes quickly, as the XDocument is stored as an annotation.
    Sub ModifyDocument(ByVal mainDocumentPart As OpenXmlPart)
        Dim xdoc As XDocument = mainDocumentPart.GetXDocument()
        Console.WriteLine("Count of nodes:{0}",
                           xdoc.DescendantNodes().Count())
    End Sub

    Sub Main()
        Using wordDoc As WordprocessingDocument =
                    WordprocessingDocument.Open("C:\Test.docx", True)
            Dim xdoc As XDocument = 
                    wordDoc.MainDocumentPart.GetXDocument()

            ' Query the document, and modify it as necessary.
            Console.WriteLine("Count of nodes:{0}", 
                             xdoc.DescendantNodes().Count())

            ' Call another function, passing the MainDocumentPart part.
            ModifyDocument(wordDoc.MainDocumentPart)

            ' Serialize the XDocument part back to the package.
            Using xw As XmlWriter = 
                XmlWriter.Create(wordDoc.MainDocumentPart.GetStream
                (FileMode.Create, FileAccess.Write))
                xdoc.Save(xw)
            End Using
        End Using
    End Sub

End Module
class Program {

    // Get the XDocument part using the GetXDocument function.
    // This function executes quickly, as the XDocument is stored as an annotation.
    static void ModifyDocument(OpenXmlPart mainDocumentPart) {
        XDocument xdoc = mainDocumentPart.GetXDocument();
        Console.WriteLine("Count of nodes:{0}", 
                           xdoc.DescendantNodes().Count());
    }

    // Simple use of extension method.
    static void Main(string[] args) {
        using (WordprocessingDocument wordDoc = 
                WordprocessingDocument.Open(@"C:\Test.docx", true)) {
            XDocument xdoc = wordDoc.MainDocumentPart.GetXDocument();

            // Query the document, and modify it as necessary.
            Console.WriteLine("Count of nodes:{0}",
                               xdoc.DescendantNodes().Count());

            // Call another function, passing the MainDocumentPart part.
            ModifyDocument(wordDoc.MainDocumentPart);

            // Serialize the XDocument object back to the package.
            using (XmlWriter xw =
                XmlWriter.Create(wordDoc.MainDocumentPart.GetStream
                (FileMode.Create, FileAccess.Write))) {
                xdoc.Save(xw);
            }
        }
    }
}

The following procedure walks through a more sophisticated approach.

To use an extension method

  1. Add an event handler to the XDocument object that watches for any changes to the tree.

  2. If the event handler is called, then remove the event handler, and add a semaphore annotation to the XDocument that indicates that the XDocument was changed.

  3. When finished with the Open XML document, before serializing back into the document part, check for the existence of the semaphore annotation. Only serialize back into the package if the semaphore annotation exists.

The following code example demonstrates this technique.

NoteNote

The following code samples are edited to facilitate online viewing. Change the indentation and remove extra lines to work with this code.

Imports Microsoft.Office.DocumentFormat.OpenXml.Packaging
Imports System.Text
Imports System.IO
Imports System.Runtime.CompilerServices
Imports System.Xml

Module LocalExtensions

    Private Class ChangedSemaphore

    End Class

    Private ElementChanged As EventHandler(Of XObjectChangeEventArgs)

    ' Add an event handler to the XDocument object that watches for any changes to the tree.
    Private Sub ElementChangedHandler(ByVal sender As Object,
                              ByVal e As XObjectChangeEventArgs)
        Dim xSender As XObject = CType(sender, XObject)
        Dim xDocument As XDocument = xSender.Document

        ' Sometimes while moving a node, this event handler may receive 
        ' an event for a node that has been removed from its parent 
        ' (and therefore its document), in which case it is not
        ' necessary to remove the event handler and add an annotation.

        ' If the event handler is called, remove the event handler and
        ' add a semaphore anotation to the XDocument to indicate that 
        ' the XDocument changed.
        If (xDocument Is Nothing = False) Then
            RemoveHandler xDocument.Changing, ElementChanged
            xDocument.AddAnnotation(New ChangedSemaphore())
        End If
    End Sub

    <Extension()> _
    Function GetXDocument(ByVal part As OpenXmlPart) As XDocument
        If (ElementChanged Is Nothing) Then
            ElementChanged = 
            New EventHandler(Of XObjectChangeEventArgs)
                      (AddressOf ElementChangedHandler)
        End If

        Dim xdoc As XDocument = part.Annotation(Of XDocument)()
        If (xdoc Is Nothing = False) Then
            Return xdoc
        End If

        Using streamReader As StreamReader = 
                              New StreamReader(part.GetStream())
            xdoc = XDocument.Load(XmlReader.Create(streamReader))
            part.AddAnnotation(xdoc)
            AddHandler xdoc.Changed, ElementChanged
        End Using

        Return xdoc

    End Function

    <Extension()> _
        Sub PutXDocument(ByVal part As OpenXmlPart)
        Dim xdoc As XDocument = part.GetXDocument()
        If (xdoc Is Nothing = False) Then
            ' Before serializing back into the document part, check for
            ' existence of the semaphore annotation. Only serialize
            ' back into the package if the semaphore annotation exists.
            If (part.GetXDocument().Annotation(Of ChangedSemaphore)()
                  Is Nothing = False) Then
                Console.WriteLine("The XDocument was changed.  Serialize back into the part.")

                ' Serialize the XDocument back to the package.
                Using xw As XmlWriter =
                XmlWriter.Create(part.GetStream(FileMode.Create, FileAccess.Write))
                    xdoc.Save(xw)
                End Using
            Else
                Console.WriteLine("No need to serialize back to part.
                                   XDocument was not changed.")
            End If
        End If
    End Sub
End Module

Module Module1
    Sub Main()
        Using wordDoc As WordprocessingDocument = 
                      WordprocessingDocument.Open("C:\Test.docx", True)
            Dim xdoc As XDocument = 
                      wordDoc.MainDocumentPart.GetXDocument()

            ' Query the document, and modify it as necessary.
            Console.WriteLine("Count of nodes:{0}", 
                               xdoc.DescendantNodes().Count())

            ' Call another function, passing the MainDocumentPart part.
            ModifyDocument(wordDoc.MainDocumentPart)

            wordDoc.MainDocumentPart.PutXDocument()
        End Using
    End Sub

    ' This function changes the first paragraph to uppercase.
    Sub ModifyDocument(ByVal mainDocumentPart As OpenXmlPart)
        Dim w As XNamespace =
 "http://schemas.openxmlformats.org/wordprocessingml/2006/main"

        ' Get the XDocument using the GetXDocument function.
        ' This function executes quickly, as the XDocument object is stored
        ' as an annotation.
        Dim paraNode As XElement = 
              mainDocumentPart.GetXDocument().Root()
             .Element(w + "body").Descendants(w + "p").FirstOrDefault()
        Dim paraText As String = paraNode.Elements(w + "r")
            .Elements(w + "t").Aggregate(New StringBuilder(),
            Function(s, i) s.Append(CStr(i)), Function(s) s.ToString())

        ' Remove all text runs.
        paraNode.Descendants(w + "r").Remove()
        ' Change the first paragraph to uppercase.
        paraNode.Add(<r
         xmlns="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
                         <t><%= paraText.ToUpper() %></t>
                     </r>)
    End Sub

End Module
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using Microsoft.Office.DocumentFormat.OpenXml.Packaging;
using System.Xml;
using System.Xml.Linq;

namespace OpenXmlFormatSDKAnnotationSamples {
    public static class LocalExtensions {

        private class ChangedSemaphore { }

        private static EventHandler<XObjectChangeEventArgs>
                                                 ElementChanged = null;

        // Add an event handler to the XDocument that watches for any 
        // changes to the tree.
        private static void ElementChangedHandler(object sender,
                                            XObjectChangeEventArgs e) {
            XObject xSender = (XObject)sender;
            XDocument xDocument = xSender.Document;

            // Sometimes while moving a node, this event handler may 
            // receive an event for a node that has been removed from 
            // its parent (and therefore its document), in which case
            // it is not necessary to remove the event handler and add
            // an annotation.

            // If the event handler is called, remove the event handler
            // and add a semaphore anotation to the XDocument object to 
            // indicate that the XDocument object changed.
            if (xDocument != null) {
                xDocument.Changing -= ElementChanged;
                xDocument.AddAnnotation(new ChangedSemaphore());
            }
        }

        public static XDocument GetXDocument(this OpenXmlPart part) {
            if (ElementChanged == null)
                ElementChanged = 
       new EventHandler<XObjectChangeEventArgs>(ElementChangedHandler);

            XDocument xdoc = part.Annotation<XDocument>();
            if (xdoc != null)
                return xdoc;
            using (StreamReader streamReader = new StreamReader(part.GetStream()))
                xdoc = XDocument.Load(XmlReader.Create(streamReader));
            part.AddAnnotation(xdoc);
            xdoc.Changed += ElementChanged;
            return xdoc;
        }

        public static void PutXDocument(this OpenXmlPart part) {
            XDocument xdoc = part.GetXDocument();
            if (xdoc != null) {
                // Before serializing back into the document part, check for existence of the semaphore
                // annotation. Only serialize back into the package if the semaphore annotation
                // exists.
                if (part.GetXDocument().Annotation<ChangedSemaphore>() != null) {
                    Console.WriteLine("The XDocument was changed.  Serialize back into the part.");

                    // Serialize the XDocument object back to the package.
                    using (XmlWriter xw =
                        XmlWriter.Create(part.GetStream
                       (FileMode.Create, FileAccess.Write))) {
                        xdoc.Save(xw);
                    }
                }
                else {
                    Console.WriteLine("No need to serialize back to
                                part.  XDocument was not changed.");
                }
            }
        }
    }

    class Program {
        // This function changes the first paragraph to upper case.
        static void ModifyDocument(OpenXmlPart mainDocumentPart) {
            XNamespace w = 
            "http://schemas.openxmlformats.org/wordprocessingml/2006/main";

            // Get the XDocument object using the GetXDocument function.
            // This function executes quickly, as the XDocument object is 
            // stored as an annotation.
            XElement paraNode = mainDocumentPart.GetXDocument()
                                .Root
                                .Element(w + "body")
                                .Descendants(w + "p")
                                .FirstOrDefault();

            string paraText = paraNode
                              .Elements(w + "r")
                              .Elements(w + "t")
                              .Aggregate(new StringBuilder(), (s, i) => 
                               s.Append((string)i), s => s.ToString());

            // Remove all text runs.
            paraNode.Descendants(w + "r").Remove();

            paraNode.Add(
                new XElement(w + "r",
                    new XElement(w + "t", paraText.ToUpper())
                )
            );
        }

        static void Main(string[] args) {
            using (WordprocessingDocument wordDoc = 
                  WordprocessingDocument.Open(@"C:\Test.docx", true)) {
                XDocument xdoc = 
                               wordDoc.MainDocumentPart.GetXDocument();

                // Query the document, and modify it as necessary.
                Console.WriteLine("Count of nodes:{0}",
                                       xdoc.DescendantNodes().Count());

                //Call another function, passing the MainDocumentPart.
                ModifyDocument(wordDoc.MainDocumentPart);

                wordDoc.MainDocumentPart.PutXDocument();
            }
        }
    }
}

To use an extension method and annotations

  1. In this console application, first you open a document as a WordprocessingDocument object.

  2. Then, you retrieve an XDocument from the MainDocumentPart part.

  3. Next, you query the document and modify it as necessary. In this case, the ModifyDocument method changes the first paragraph of the document to upper case.

  4. Finally, you serialize the document part back into the package if the semaphore annotation exists.