Manipulating Word 2007 Files with the Open XML Format API (Part 2 of 3)

Summary: This is the second in a series of three articles that describes the Open XML object model code that you can use to access and manipulate Microsoft Office Word 2007 files. (8 printed pages)

Frank Rice, Microsoft Corporation

August 2007 (Revised August 2008)

Applies to: Microsoft Office Word 2007

Contents

  • Overview

  • Removing Hidden Text from Documents

  • Retrieving Document Application Properties

  • Converting Macro-Enabled Documents to DOCX Files

  • Retrieving Core Document Properties

  • Retrieving Custom Document Properties

  • Setting Core Document Properties

  • Conclusion

View Part 1: Manipulating Word 2007 Files with the Open XML Format API (Part 1 of 3).

Overview

The 2007 Microsoft Office system introduces new file formats that are based on XML called Open XML Formats. Microsoft Office Word 2007, Microsoft Office Excel 2007, and Microsoft Office PowerPoint 2007 all use these formats as the default file format. Open XML formats are useful because they are an open standard and are based on well-known technologies: ZIP and XML. Microsoft provides a library for accessing these files as part of the .NET Framework 3.0 technologies in the DocumentFormat.OpenXml namespace in the Welcome to the Open XML Format SDK 1.0. The Open XML Format members are contained in theDocumentFormat.OpenXml API and provide strongly-typed part classes to manipulate Open XML documents. The SDK simplifies the task of manipulating Open XML packages. The Open XML Format API encapsulates many common tasks that developers perform on Open XML Format packages, so you can perform complex operations with just a few lines of code.

Removing Hidden Text from Documents

In the following code, you remove nodes from the main document part that contain hidden text in a Word 2007 document.

Public Sub WDDeleteHiddenText(ByVal docName As String)
   ' Given a document name, delete all the hidden text.
   Const wordmlNamespace As String = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
   Dim wdDoc As WordprocessingDocument = WordprocessingDocument.Open(docName, true)
   Using (wdDoc)
      ' Manage namespaces to perform XPath queries.
      Dim nt As NameTable = New NameTable
      Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(nt)
      nsManager.AddNamespace("w", wordmlNamespace)
      ' Get the document part from the package.
      ' Load the XML in the part into an XmlDocument instance.
      Dim xdoc As XmlDocument = New XmlDocument(nt)
      xdoc.Load(wdDoc.MainDocumentPart.GetStream)
      Dim hiddenNodes As XmlNodeList = xdoc.SelectNodes("//w:vanish", nsManager)
      For Each hiddenNode As System.Xml.XmlNode In hiddenNodes
         Dim topNode As XmlNode = hiddenNode.ParentNode.ParentNode
         Dim topParentNode As XmlNode = topNode.ParentNode
         topParentNode.RemoveChild(topNode)
         If Not topParentNode.HasChildNodes Then
            topParentNode.ParentNode.RemoveChild(topParentNode)
         End If
      Next
      ' Save the document XML back to its part.
      xdoc.Save(wdDoc.MainDocumentPart.GetStream(FileMode.Create, FileAccess.Write))
   End Using
End Sub
public static void WDDeleteHiddenText(string docName)
{
   // Given a document name, delete all the hidden text.
   const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
   using (WordprocessingDocument wdDoc = WordprocessingDocument.Open(docName, true))
   {
      // Manage namespaces to perform XPath queries.
      NameTable nt = new NameTable();
      XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
      nsManager.AddNamespace("w", wordmlNamespace);
      // Get the document part from the package.
      // Load the XML in the part into an XmlDocument instance.
      XmlDocument xdoc = new XmlDocument(nt);
      xdoc.Load(wdDoc.MainDocumentPart.GetStream());
      XmlNodeList hiddenNodes = xdoc.SelectNodes("//w:vanish", nsManager);
      foreach (System.Xml.XmlNode hiddenNode in hiddenNodes)
      {
         XmlNode topNode = hiddenNode.ParentNode.ParentNode;
         XmlNode topParentNode = topNode.ParentNode;
         topParentNode.RemoveChild(topNode);
         if (!(topParentNode.HasChildNodes))
         {
            topParentNode.ParentNode.RemoveChild(topParentNode);
         }
      }

      // Save the document XML back to its part.
      xdoc.Save(wdDoc.MainDocumentPart.GetStream(FileMode.Create, FileAccess.Write));
   }
}

First, the code example calls the WDDeleteHiddenText method, passing in a reference to the Word 2007 document. Then it creates a WordprocessingDocument object from the input document, representing the Office Open XML Format package. Next, it creates a namespace manager to set up the XPath queries. Then you create a memory-resident XML document as a temporary holder and load the document with the markup and data from the main document part. The remaining code searches for the hidden text nodes using the XPath expression //w:vanish and compiles a list of nodes. The code then loops through the node list and deletes the parent and child nodes.

For Each hiddenNode As System.Xml.XmlNode In hiddenNodes
   Dim topNode As XmlNode = hiddenNode.ParentNode.ParentNode
   Dim topParentNode As XmlNode = topNode.ParentNode
   topParentNode.RemoveChild(topNode)
   If Not topParentNode.HasChildNodes Then
      topParentNode.ParentNode.RemoveChild(topParentNode)
   End If
Next
foreach (System.Xml.XmlNode hiddenNode in hiddenNodes)
{
   XmlNode topNode = hiddenNode.ParentNode.ParentNode;
   XmlNode topParentNode = topNode.ParentNode;
   topParentNode.RemoveChild(topNode);
   if (!(topParentNode.HasChildNodes))
   {
      topParentNode.ParentNode.RemoveChild(topParentNode);
   }
}

Finally, save the updated WordprocessingML markup back to the main document part.

Retrieving Document Application Properties

In the following code, you retrieve the value of an application property contained in the extended properties part (app.xml) in a document.

Public Shared Function WDRetrieveAppProperty(ByVal docName As String, ByVal propertyName As String) As String
   ' Given a document name and an app property, retrieve the value of the property.
   ' Note that because this code uses the SelectSingleNode method,
   ' the search is case sensitive. That is, looking for "Words" is not 
   ' the same as looking for "words".
   Const appPropertiesSchema As String = "http://schemas.openxmlformats.org/officeDocument/2006/extended-properties"
   Dim propertyValue As String = string.Empty
   Dim wdDoc As WordprocessingDocument = WordprocessingDocument.Open(docName, false)

   ' Manage namespaces to perform Xml XPath queries.
   Dim nt As NameTable = New NameTable
   Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(nt)
   nsManager.AddNamespace("d", appPropertiesSchema)

   ' Get the properties from the package.
   Dim xdoc As XmlDocument = New XmlDocument(nt)

   ' Load the XML in the part into an XmlDocument instance.
   xdoc.Load(wdDoc.ExtendedFilePropertiesPart.GetStream)
   Dim searchString As String = String.Format("//d:Properties/d:{0}", propertyName)
   Dim xNode As XmlNode = xdoc.SelectSingleNode(searchString, nsManager)
   If Not (xNode = Nothing) Then
      propertyValue = xNode.InnerText
   End If

   Return propertyValue
End Function
public static string WDRetrieveAppProperty(string docName, string propertyName)
{
   // Given a document name and an app property, retrieve the value of the property.
   // Note that because this code uses the SelectSingleNode method,
   // the search is case sensitive. That is, looking for "Words" is not 
   // the same as looking for "words".

   const string appPropertiesSchema = "http://schemas.openxmlformats.org/officeDocument/2006/extended-properties";

   string propertyValue = string.Empty;

   using (WordprocessingDocument wdDoc = WordprocessingDocument.Open(docName, false))
   {
      // Manage namespaces to perform XML XPath queries.
      NameTable nt = new NameTable();
      XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
      nsManager.AddNamespace("d", appPropertiesSchema);

      // Get the properties from the package.
      XmlDocument xdoc = new XmlDocument(nt);

      // Load the XML in the part into an XmlDocument instance.
      xdoc.Load(wdDoc.ExtendedFilePropertiesPart.GetStream());

      string searchString = string.Format("//d:Properties/d:{0}", propertyName);
      XmlNode xNode = xdoc.SelectSingleNode(searchString, nsManager);
      if (!(xNode == null))
      {
         propertyValue = xNode.InnerText;
      }
   }
   return propertyValue;
}

First, the code example calls WDRetrieveAppProperty method, passing in a reference to the Word 2007 document and the name of the application property that you want to retrieve. Then you create a WordprocessingDocument object from the input document, representing the Office Open XML Format package. Next, you create a namespace manager to set up the XPath queries. Then you create a memory-resident XML document as a temporary holder and load the document with the markup and data from the extended properties part (app.xml). Next, you set up the search string as an XPath query to search for the d:Properties node.

Dim searchString As String = String.Format("//d:Properties/d:{0}", propertyName)
Dim xNode As XmlNode = xdoc.SelectSingleNode(searchString, nsManager)
If Not (xNode = Nothing) Then
   propertyValue = xNode.InnerText
End If
string searchString = string.Format("//d:Properties/d:{0}", propertyName);
XmlNode xNode = xdoc.SelectSingleNode(searchString, nsManager);
if (!(xNode == null))
{
   propertyValue = xNode.InnerText;
}

Then you select the node that contains the specified application property and assign its text to a return variable.

Converting Macro-Enabled Documents to DOCX Files

The following code removes all Microsoft Visual Basic for Applications (VBA) parts from a document. It also converts the document from a .docx file, which has macros enabled, to a .docm file, which has macros disabled.

Public Sub WDConvertDOCMToDOCX(ByVal docName As String)
   ' Given a DOCM file (with macro storage), remove the VBA 
   ' project, reset the document type, and save the document with a new name.
   Dim vbaRelationshipType As String = "http://schemas.microsoft.com/office/2006/relationships/vbaProject"
   Dim wdDoc As WordprocessingDocument = WordprocessingDocument.Open(docName, True)
   Using (wdDoc)
      wdDoc.ChangeDocumentType(WordprocessingDocumentType.Document)
      Dim partsToDel As List(Of ExtendedPart) = New List(Of ExtendedPart)()
      For Each part As ExtendedPart In wdDoc.MainDocumentPart.GetPartsOfType(Of ExtendedPart)()
         If (part.RelationshipType = vbaRelationshipType) Then
            partsToDel.Add(part)
         End If
      Next part
      wdDoc.MainDocumentPart.DeleteParts(partsToDel)
   End Using

   ' Generate the new file name.
   Dim newFileName As String = (Path.GetDirectoryName(docName) + ("\" _
                    + (Path.GetFileNameWithoutExtension(docName) + ".docx")))
   ' If the new file exists, delete it. You may
   ' want to make this code less destructive.
   If File.Exists(newFileName) Then
      File.Delete(newFileName)
   End If
   File.Move(docName, newFileName)
End Sub
public static void WDConvertDOCMToDOCX(string docName)
{
   // Given a DOCM file (with macro storage), remove the VBA 
   // project, reset the document type, and save the document with a new name.

   const string vbaRelationshipType = "http://schemas.microsoft.com/office/2006/relationships/vbaProject";

   using (WordprocessingDocument wdDoc = WordprocessingDocument.Open(docName, true))
   {
      wdDoc.ChangeDocumentType(WordprocessingDocumentType.Document);
      List<ExtendedPart> partsToDel = new List<ExtendedPart>();
      foreach (ExtendedPart part in wdDoc.MainDocumentPart.GetPartsOfType<ExtendedPart>())
      {
         if (part.RelationshipType == vbaRelationshipType) partsToDel.Add(part);
      }
      wdDoc.MainDocumentPart.DeleteParts(partsToDel);
   }

   // Generate the new file name.
   string newFileName = Path.GetDirectoryName(docName) + @"\" + Path.GetFileNameWithoutExtension(docName) + ".docx";
   // If the new file exists, delete it. You may
   // want to make this code less destructive.
   if (File.Exists(newFileName))
   {
      File.Delete(newFileName);
   }
   File.Move(docName, newFileName);
}

First, you call the WDConvertDOCMToDOCX method, passing in a reference to the Word 2007 document. Then, you create a WordprocessingDocument object from the input document, representing the Open XML File Format package. Next, call the ChangeDocumentType method of the WordprocessingDocument object to change the type of the document from a macro-enabled file to the default document format, which does not have macros enabled.

wdDoc.ChangeDocumentType(WordprocessingDocumentType.Document)
wdDoc.ChangeDocumentType(WordprocessingDocumentType.Document);

The procedure then loops through the extended parts in the package and tests for parts that have a VBA reference. The procedure then adds all parts that include a VBA reference to a list of parts to delete.

For Each part As ExtendedPart In wdDoc.MainDocumentPart.GetPartsOfType(Of ExtendedPart)()
   If (part.RelationshipType = vbaRelationshipType) Then
      partsToDel.Add(part)
   End If
Next part
wdDoc.MainDocumentPart.DeleteParts(partsToDel)
foreach (ExtendedPart part in wdDoc.MainDocumentPart.GetPartsOfType<ExtendedPart>())
{
   if (part.RelationshipType == vbaRelationshipType) partsToDel.Add(part);
}
wdDoc.MainDocumentPart.DeleteParts(partsToDel);

The parts are deleted by calling the DeleteParts method of the main document part. Finally, the name of the original document is changed to reflect its updated type.

Retrieving Core Document Properties

In the following code, you retrieve the value of a core property contained in the core properties part (core.xml) in a document.

Public Function WDRetrieveCoreProperty(ByVal docName As String, ByVal propertyName As String) As String
   ' Given a document name and a core property, retrieve the value of the property.
   ' Note that because this code uses the SelectSingleNode method, 
   ' the search is case sensitive. That is, looking for "Author" is not 
   ' the same as looking for "author".

   Const corePropertiesSchema As String = "http://schemas.openxmlformats.org/package/2006/metadata/core-properties"
   Const dcPropertiesSchema As String = "http://purl.org/dc/elements/1.1/"
   Const dcTermsPropertiesSchema As String = "http://purl.org/dc/terms/"
   Dim propertyValue As String = String.Empty
   Dim wdPackage As WordprocessingDocument = WordprocessingDocument.Open(docName, True)

   ' Get the core properties part (core.xml).
   Dim corePropertiesPart As CoreFilePropertiesPart = wdPackage.CoreFilePropertiesPart

   ' Manage namespaces to perform XML XPath queries.
   Dim nt As NameTable = New NameTable
   Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(nt)
   nsManager.AddNamespace("cp", corePropertiesSchema)
   nsManager.AddNamespace("dc", dcPropertiesSchema)
   nsManager.AddNamespace("dcterms", dcTermsPropertiesSchema)

   ' Get the properties from the package.
   Dim xdoc As XmlDocument = New XmlDocument(nt)
   ' Load the XML in the part into an XmlDocument instance.
   xdoc.Load(corePropertiesPart.GetStream)

   Dim searchString As String = String.Format("//cp:coreProperties/{0}", propertyName)
   Dim xNode As XmlNode = xdoc.SelectSingleNode(searchString, nsManager)
   If Not (xNode Is Nothing) Then
      propertyValue = xNode.InnerText
   End If
   Return propertyValue
End Function
public static string WDRetrieveCoreProperty(string docName, string propertyName)
{
   // Given a document name and a core property, retrieve the value of the property.
   // Note that because this code uses the SelectSingleNode method, 
   // the search is case sensitive. That is, looking for "Author" is not 
   // the same as looking for "author".

   const string corePropertiesSchema = "http://schemas.openxmlformats.org/package/2006/metadata/core-properties";
   const string dcPropertiesSchema = "http://purl.org/dc/elements/1.1/";
   const string dcTermsPropertiesSchema = "http://purl.org/dc/terms/";

   string propertyValue = string.Empty;

   using (WordprocessingDocument wdPackage = WordprocessingDocument.Open(docName, true))
   {
      // Get the core properties part (core.xml).
      CoreFilePropertiesPart corePropertiesPart = wdPackage.CoreFilePropertiesPart;

      // Manage namespaces to perform XML XPath queries.
      NameTable nt = new NameTable();
      XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
      nsManager.AddNamespace("cp", corePropertiesSchema);
      nsManager.AddNamespace("dc", dcPropertiesSchema);
      nsManager.AddNamespace("dcterms", dcTermsPropertiesSchema);

      // Get the properties from the package.
      XmlDocument xdoc = new XmlDocument(nt);

      // Load the XML in the part into an XmlDocument instance.
      xdoc.Load(corePropertiesPart.GetStream());

      string searchString = string.Format("//cp:coreProperties/{0}", propertyName);

      XmlNode xNode = xdoc.SelectSingleNode(searchString, nsManager);
      if (!(xNode == null))
      {
         propertyValue = xNode.InnerText;
      }
   }

   return propertyValue;
}

First, the code example calls the WDRetrieveCoreProperty method, passing in a reference to the Word 2007 document and the name of the application property that you want to retrieve. Then, you create a WordprocessingDocument object from the input document, representing the Office Open XML Format package. Next, retrieve the CoreFilePropertiesPart part. Then, you create a namespace manager to set up the XPath query. Create a memory-resident XML document as a temporary holder and load the document with the markup and data from the core file properties part (core.xml). Next, you set up the search string as an XPath query. Then you set up the search string as an XPath query to search for the cp:coreProperties node.

Dim searchString As String = String.Format("//cp:coreProperties/{0}", propertyName)
Dim xNode As XmlNode = xdoc.SelectSingleNode(searchString, nsManager)
If Not (xNode Is Nothing) Then
   propertyValue = xNode.InnerText
End If
string searchString = string.Format("//cp:coreProperties/{0}", propertyName);

XmlNode xNode = xdoc.SelectSingleNode(searchString, nsManager);
if (!(xNode == null))
{
   propertyValue = xNode.InnerText;
}

Then the code selects the node that contains the specified property and assigns its text to a return variable.

Retrieving Custom Document Properties

In the following code, you retrieve the value of a custom property contained in the custom properties part (custom.xml) in a document.

Public Function WDRetrieveCustomProperty(ByVal docName As String, ByVal propertyName As String) As String
   ' Given a document name and a core property, retrieve the value of the property.
   ' Note that because this code uses the SelectSingleNode method
   ' the search is case sensitive. That is, looking for "Author" is not 
   ' the same as looking for "author".

   Const customPropertiesSchema As String = "http://schemas.openxmlformats.org/officeDocument/2006/custom-properties"
   Const customVTypesSchema As String = "http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes"
   Dim propertyValue As String = String.Empty
   Dim wdPackage As WordprocessingDocument = WordprocessingDocument.Open(docName, True)

   ' Get the custom properties part (custom.xml).
   Dim customPropertiesPart As CustomFilePropertiesPart = wdPackage.CustomFilePropertiesPart
   ' There may not be a custom properties part.
   If (Not (customPropertiesPart) Is Nothing) Then
      ' Manage namespaces to perform Xml XPath queries.
      Dim nt As NameTable = New NameTable
      Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(nt)
      nsManager.AddNamespace("d", customPropertiesSchema)
      nsManager.AddNamespace("vt", customVTypesSchema)

      ' Get the properties from the package.
      Dim xdoc As XmlDocument = New XmlDocument(nt)

      ' Load the XML in the part into an XmlDocument instance.
      xdoc.Load(customPropertiesPart.GetStream)
      Dim searchString As String = String.Format("d:Properties/d:property[@name='{0}']", propertyName)
      Dim xNode As XmlNode = xdoc.SelectSingleNode(searchString, nsManager)
      If (Not (xNode) Is Nothing) Then
         propertyValue = xNode.InnerText
      End If
    End If
   Return propertyValue
End Function
public static string WDRetrieveCustomProperty(string docName, string propertyName)
{
   const string customPropertiesSchema = "http://schemas.openxmlformats.org/officeDocument/2006/custom-properties";
   const string customVTypesSchema = "http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes";

   string propertyValue = string.Empty;

   using (WordprocessingDocument wdPackage = WordprocessingDocument.Open(docName, true))
   {
      // Get the custom properties part (custom.xml).
      CustomFilePropertiesPart customPropertiesPart = wdPackage.CustomFilePropertiesPart;

      // There may not be a custom properties part.
      if (customPropertiesPart != null)
      {
         // Manage namespaces to perform XML XPath queries.
         NameTable nt = new NameTable();
         XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
         nsManager.AddNamespace("d", customPropertiesSchema);
         nsManager.AddNamespace("vt", customVTypesSchema);

         // Get the properties from the package.
         XmlDocument xdoc = new XmlDocument(nt);

         // Load the XML in the part into an XmlDocument instance.
         xdoc.Load(customPropertiesPart.GetStream());

         string searchString = string.Format("d:Properties/d:property[@name='{0}']", propertyName);
         XmlNode xNode = xdoc.SelectSingleNode(searchString, nsManager);
         if ((xNode != null))
         {
            propertyValue = xNode.InnerText;
         }
      }
   }
   return propertyValue;
}

First, the code example calls the WDRetrieveCustomProperty method, passing in a reference to the Word 2007 document and the name of the application property that you want to retrieve. Then you create a WordprocessingDocument object from the input document, representing the Office Open XML Format package. Next, retrieve the CustomFilePropertiesPart part and assign it to a variable. Because the package may not contain a custom properties part, you should test the variable to see whether it is empty or not.

If a custom properties part exists, you create a namespace manager to set up the XPath query. Create a memory-resident XML document as a temporary holder and load the document with the markup and data from the custom file properties part (custom.xml). Next, you set up the search string as an XPath query. You then set up the search string as an XPath query to search for the d:Properties/d:property node that contains the specified property.

xdoc.Load(customPropertiesPart.GetStream)
Dim searchString As String = String.Format("d:Properties/d:property[@name='{0}']", propertyName)
Dim xNode As XmlNode = xdoc.SelectSingleNode(searchString, nsManager)
If (Not (xNode) Is Nothing) Then
   propertyValue = xNode.InnerText
End If
string searchString = string.Format("d:Properties/d:property[@name='{0}']", propertyName);
XmlNode xNode = xdoc.SelectSingleNode(searchString, nsManager);
if ((xNode != null))
{
   propertyValue = xNode.InnerText;
}

Then the code example selects the node that contains the specified property and assigns its text to a return variable.

Setting Core Document Properties

In the following code, you set the value of a core property in a document. If you successfully set the property to a new value, the original value is returned to the calling procedure; otherwise a null string is returned.

Public Function WDSetCoreProperty(ByVal docName As String, ByVal propertyName As String, ByVal propertyValue As String) As String
   ' Given a document name, a property name, and a value, update the document.
   ' Note that you cannot set the value of a property that does not
   ' exist within the core.xml part. If you successfully updated the property
   ' value, return its old value.
   ' Attempting to modify a non-existent property raises an exception.

   ' Property names are case-sensitive.

   Const corePropertiesSchema As String = "http://schemas.openxmlformats.org/package/2006/metadata/core-properties"
   Const dcPropertiesSchema As String = "http://purl.org/dc/elements/1.1/"
   Dim retVal As String = Nothing
   Dim wdPackage As WordprocessingDocument = WordprocessingDocument.Open(docName, True)
   Dim corePropertiesPart As CoreFilePropertiesPart = wdPackage.CoreFilePropertiesPart

   ' Manage namespaces to perform Xml XPath queries.
   Dim nt As NameTable = New NameTable
   Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(nt)
   nsManager.AddNamespace("cp", corePropertiesSchema)
   nsManager.AddNamespace("dc", dcPropertiesSchema)

   ' Get the properties from the package.
   Dim xdoc As XmlDocument = New XmlDocument(nt)

   ' Load the XML in the part into an XmlDocument instance:
   xdoc.Load(corePropertiesPart.GetStream)
   Dim searchString As String = String.Format("//cp:coreProperties/{0}", propertyName)
   Dim xNode As XmlNode = xdoc.SelectSingleNode(searchString, nsManager)
   If (xNode Is Nothing) Then
      ' Trying to set the value of a property that 
      ' does not exist? Throw an exception.
      Throw New ArgumentException("Invalid property name.")
   Else
      ' Get the current value.
      retVal = xNode.InnerText
      ' Now update the value.
      xNode.InnerText = propertyValue
      ' Save the properties XML back to its part.
      xdoc.Save(corePropertiesPart.GetStream)
   End If
   Return retVal
End Function
public static string WDSetCoreProperty(string docName, string propertyName, string propertyValue)
{
   // Given a document name, a property name, and a value, update the document.
   // Note that you cannot set the value of a property that does not
   //  exist within the core.xml part. If you successfully updated the property
   // value, return its old value.

   const string corePropertiesSchema = "http://schemas.openxmlformats.org/package/2006/metadata/core-properties";
   const string dcPropertiesSchema = "http://purl.org/dc/elements/1.1/";

   string retVal = null;

   using (WordprocessingDocument wdPackage = WordprocessingDocument.Open(docName, true))
   {
      CoreFilePropertiesPart corePropertiesPart = wdPackage.CoreFilePropertiesPart; ;

      // Manage namespaces to perform Xml XPath queries.
      NameTable nt = new NameTable();
      XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
      nsManager.AddNamespace("cp", corePropertiesSchema);
      nsManager.AddNamespace("dc", dcPropertiesSchema);

      // Get the properties from the package.
      XmlDocument xdoc = new XmlDocument(nt);

      // Load the XML in the part into an XmlDocument instance:
      xdoc.Load(corePropertiesPart.GetStream());

      string searchString = string.Format("//cp:coreProperties/{0}", propertyName);
      XmlNode xNode = xdoc.SelectSingleNode(searchString, nsManager);
      if (xNode == null)
      {
         // Trying to set the value of a property that 
         // does not exist? Throw an exception.
         throw new ArgumentException("Invalid property name.");
      }
      else
      {
         // Get the current value.
         retVal = xNode.InnerText;

         // Now update the value.
         xNode.InnerText = propertyValue;

         // Save the properties XML back to its part.
         xdoc.Save(corePropertiesPart.GetStream());
       }
   }
   return retVal;
}

First, the code example calls the WDSetCoreProperty method, passing in a reference to the Word 2007 document, the name of the core property, and the new value that you want to set the property to. Then you create a WordprocessingDocument object from the input document, representing the Office Open XML Format package. Next, you retrieve the CoreFilePropertiesPart part. Next, the code example creates a namespace manager to set up the XPath query. Create a memory-resident XML document as a temporary holder and load the document with the markup and data from the core file properties part (core.xml). Next, you set up the search string as an XPath query. Then, you set up the search string as an XPath query to search for the cp:coreProperties node. Because the part may not contain the property, you should test the variable to see whether it is empty or not.

Dim searchString As String = String.Format("//cp:coreProperties/{0}", propertyName)
Dim xNode As XmlNode = xdoc.SelectSingleNode(searchString, nsManager)
If (xNode Is Nothing) Then
   ' Trying to set the value of a property that 
   ' does not exist? Throw an exception.
   Throw New ArgumentException("Invalid property name.")
Else
   ' Get the current value.
   retVal = xNode.InnerText
   ' Now update the value.
   xNode.InnerText = propertyValue
   ' Save the properties XML back to its part.
   xdoc.Save(corePropertiesPart.GetStream)
End If
Return retVal
string searchString = string.Format("//cp:coreProperties/{0}", propertyName);
XmlNode xNode = xdoc.SelectSingleNode(searchString, nsManager);
if (xNode == null)
{
   // Trying to set the value of a property that 
   // does not exist? Throw an exception.
   throw new ArgumentException("Invalid property name.");
}
else
{
   // Get the current value.
   retVal = xNode.InnerText;

   // Now update the value.
   xNode.InnerText = propertyValue;

   // Save the properties XML back to its part.
   xdoc.Save(corePropertiesPart.GetStream());
}
return retVal;

If the property does not exist, the code throws an exception. Otherwise, it retrieves the current value of the property and sets the node that contains the property to the new value. Finally, it saves the updated markup back to the package and then returns the original value to the calling procedure.

Conclusion

As this article demonstrates, working with Word 2007 files is much easier with the Welcome to the Open XML Format SDK 1.0. In part three of this series of articles, I describe other common tasks that you can perform with the Open XML Formats SDK

Additional Resources

For more information, see the following resources:

Coming soon: