Office Open XML Formats: Removing Comments from a Word 2007 Document

Summary:   Use Office Open XML Formats to remove comments programmatically from a Microsoft Office Word 2007 document.

Office Visual How To

Applies to:   2007 Microsoft Office System, Microsoft Office Word 2007, Microsoft Visual Studio 2005

Ken Getz, MCW Technologies, LLC

September 2007

Overview

If you insert comments into a Microsoft Office Word 2007 document or set of documents, you might want to remove those comments before publishing or distributing your documents. You could open each document individually to remove the comments, or you can use the Office Open XML Formats to remove the comments programmatically, without opening the documents in Word 2007. This technique requires a significant amount of programming code, but the code is efficient and provides the best performance. Working with the Office Open XML File Formats requires knowledge of how Word 2007 stores the content, the System.IO.Packaging API, and XML programming.

See It Removing Comments from a Word 2007 Document banner

Watch the Video

Length: 08:46 | Size: 7.0 MB | Type: WMV file

Code It | Read It | Explore It

Code It

To get started, download a set of forty code snippets for Microsoft Visual Studio 2005, each of which demonstrates various techniques for working with the 2007 Office System Sample: Open XML File Format Code Snippets for Visual Studio 2005. After you install the code snippets, and have a sample Word document to use for testing, you are ready to go. For more information, see the Read It section later in this topic.

To create a Microsoft Windows Application project in Microsoft Visual Studio 2005, open the code editor, right-click, select Insert Snippet, and select the Word: Remove Comments snippet from the list of available Office 2007 snippets. If you use Microsoft Visual Basic, inserting the snippet inserts a reference to WindowsBase.dll and adds the following Imports statements.

Imports System.IO.Packaging
Imports System.Xml
Imports System.IO

If you use Microsoft Visual C#, you must add the reference to the WindowsBase.dll assembly and corresponding using statements, so that you can compile the code. (Code snippets in C# cannot set references and insert using statements.) If the Windowsbase.dll reference does not appear on the .NET tab of the Add Reference dialog box, click the Browse tab, locate the C:\Program Files\Reference assemblies\Microsoft\Framework\v3.0 folder, and then click WindowsBase.dll.

The WDDeleteComments snippet loads the contents of the Word document and removes the comments from the document. To test it, create a sample document that contains comments, and save your sample document somewhere easy to find (for example, C:\Comments.docx). In a Windows application, insert the WDDeleteComments snippet and then use the following code example to call it, modifying the names to meet your needs. After you finish, open the Word document to verify that you removed all the comments.

WDDeleteComments("C:\comments.docx")
WDDeleteComments(@"C:\comments.docx");

The WDDeleteComments snippet code starts with the following code.

Public Sub WDDeleteComments(ByVal docName As String)
  Const documentRelationshipType As String = _
   "http://schemas.openxmlformats.org/officeDocument/2006/" & _
   "relationships/officeDocument"
  Const wordmlNamespace As String = _
   "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
  Const commentRelationshipType As String = _
   "http://schemas.openxmlformats.org/officeDocument/ & _
   " 2006/relationships/comments"

  Using wdPackage As Package = Package.Open( _
   docName, FileMode.Open, FileAccess.ReadWrite)
    Dim documentPart As PackagePart = Nothing
    Dim documentUri As Uri = Nothing

  ' Code removed here…

  End Using
End Sub
public void WDDeleteComments(string docName)
{
  const string documentRelationshipType = 
    "http://schemas.openxmlformats.org/officeDocument/2006/" + 
    "relationships/officeDocument";
  const string wordmlNamespace = 
    "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
  const string commentRelationshipType = 
    "http://schemas.openxmlformats.org/officeDocument/2006/" + 
    "relationships/comments";

  using (Package wdPackage = Package.Open(
    docName, FileMode.Open, FileAccess.ReadWrite))
  {
    PackagePart documentPart = null;
    Uri documentUri = null;

  // Code removed here…
  }
}

This code creates constants that it uses to refer to the various schemas and namespaces required by the procedure, and retrieves a reference to the package itself by calling the Package.Open method. The code also creates variables that it uses to refer to the document part and the document Uniform Resource Identifier (URI).

Copy the following code and replace the Code removed here… comment in the previous code.

' Get the main document part (document.xml).
For Each relationship As PackageRelationship In _
 wdPackage.GetRelationshipsByType(documentRelationshipType)
  documentUri = PackUriHelper.ResolvePartUri( _
   New Uri("/", UriKind.Relative), relationship.TargetUri)
  documentPart = wdPackage.GetPart(documentUri)
  ' There is only one document.
  Exit For
Next
' Code removed here…
//  Get the main document part (document.xml).
foreach (System.IO.Packaging.PackageRelationship relationship in 
  wdPackage.GetRelationshipsByType(documentRelationshipType))
{
  documentUri = PackUriHelper.ResolvePartUri(
    new Uri("/", UriKind.Relative), relationship.TargetUri);
  documentPart = wdPackage.GetPart(documentUri);
  //  There is only one document.
  break;
}
// Code removed here…

Given a reference to the package, the code then finds the document part, by calling the Package.GetRelationshipsByType method, and passing in the constant that contains the document relationship name (see Figure 1). The code loops through all the returned relationships and retrieves the document URI, relative to the root of the package. You must loop through the PackageRelationship objects to retrieve the one you want. This loop executes only one time.

Copy the following code and replace the Code removed here… comment in the previous code.

' Delete the comments relationship. There can only be one of these.
For Each relationship As PackageRelationship In _
 documentPart.GetRelationshipsByType(commentRelationshipType)
  Dim commentUri As Uri = PackUriHelper.ResolvePartUri( _
   documentUri, relationship.TargetUri)
  Dim commentsPart As PackagePart = wdPackage.GetPart(commentUri)
  documentPart.DeleteRelationship(relationship.Id)
  wdPackage.DeletePart(commentUri)
  ' There is only one comments part.
  Exit For
Next
' Code removed here…
//  Delete the comments relationship. There can only be one of these.
foreach (System.IO.Packaging.PackageRelationship relationship in 
  documentPart.GetRelationshipsByType(commentRelationshipType))
{
  Uri commentUri = PackUriHelper.ResolvePartUri(
    documentUri, relationship.TargetUri);
  PackagePart commentsPart = wdPackage.GetPart(commentUri);
  documentPart.DeleteRelationship(relationship.Id);
  wdPackage.DeletePart(commentUri);
  //  There is only one comments part.
  break;
}
// Code removed here…

This code performs an important task: it finds the relationship for the comments part (see Figure 2), deletes the relationship, and then deletes the comments part. Note that as in the previous search for a particular relationship type, the code must loop through all the matching relationships, even though there is only one relationship to a comments part in a well-formed Word 2007 document.

Copy the following code and replace the Code removed here… comment in the previous code.

' Manage namespaces to perform Xml XPath queries.
Dim nt As New NameTable()
Dim nsManager As New XmlNamespaceManager(nt)
nsManager.AddNamespace("w", wordmlNamespace)

' Get the document part from the package.
' Load the XML in the part into an XmlDocument instance:
Dim xdoc As XmlDocument = New XmlDocument(nt)
xdoc.Load(documentPart.GetStream())
' Code removed here…
//  Manage namespaces to perform Xml XPath queries.
NameTable nt = new NameTable();
XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
nsManager.AddNamespace("w", wordmlNamespace);

//  Get the document part from the package.
//  Load the XML in the part into an XmlDocument instance:
XmlDocument xdoc = new XmlDocument(nt);
xdoc.Load(documentPart.GetStream());
// Code removed here…

When it finds the document part, the code creates an XmlNamespaceManager instance loaded with the namespace used to perform searches of the code, and creates anXmlDocument instance to contain the contents of the document. The code then loads the XML content into the XmlDocument instance.

Copy the following code and replace the Code removed here… comment in the previous code.

' Retrieve a list of nodes representing the 
' comment start elements, and delete them all.
Dim nodes As XmlNodeList = _
 xdoc.SelectNodes("//w:commentRangeStart", nsManager)
For Each node As XmlNode In nodes
  node.ParentNode.RemoveChild(node)
Next

' Retrieve a list of nodes representing the 
' comment end elements, and delete them all.
nodes = xdoc.SelectNodes("//w:commentRangeEnd", nsManager)
For Each node As XmlNode In nodes
  node.ParentNode.RemoveChild(node)
Next
' Code removed here…
//  Retrieve a list of nodes representing the comment 
// start elements, and delete them all.
XmlNodeList nodes = xdoc.SelectNodes(
  "//w:commentRangeStart", nsManager);
foreach (System.Xml.XmlNode node in nodes)
{
  node.ParentNode.RemoveChild(node);
}

//  Retrieve a list of nodes representing the comment 
// end elements, and delete them all.
nodes = xdoc.SelectNodes("//w:commentRangeEnd", nsManager);
foreach (System.Xml.XmlNode node in nodes)
{
  node.ParentNode.RemoveChild(node);
}
// Code removed here…

The Word 2007 document contains a start element and an end element for each comment in the document. This code retrieves a collection of nodes corresponding to each type of element, and deletes all of the commentRangeStart and commentRangeEnd nodes.

Copy the following code and replace the Code removed here… comment in the previous code.

' Retrieve a list of nodes representing the 
' comment reference elements, and delete them all.
nodes = xdoc.SelectNodes( _
"//w:r[.//w:rStyle[@w:val='CommentReference']]", nsManager)
For Each node As XmlNode In nodes
  node.ParentNode.RemoveChild(node)
Next
' Code removed here…
//  Retrieve a list of nodes representing the comment 
// reference elements, and delete them all.
nodes = xdoc.SelectNodes(
  "//w:r[.//w:rStyle[@w:val='CommentReference']]", nsManager);
foreach (System.Xml.XmlNode node in nodes)
{
  node.ParentNode.RemoveChild(node);
}
// Code removed here…

This code block handles the CommentReference attributes. These attributes contain the references to the comments in the comments part, and the code must remove these as well. Just as described earlier, the code retrieves a collection of nodes that match the XPath expression that defines the correct reference nodes, and then deletes each of the nodes.

Copy the following code and replace the Code removed here… comment in the previous code. This code saves the XML content back to the document part.

xdoc.Save(documentPart.GetStream( _
 FileMode.Create, FileAccess.Write))
xdoc.Save(documentPart.GetStream(
  FileMode.Create, FileAccess.Write));

Read It

It is important to understand the file structure of a simple Word 2007 document, so that you can work with the comments. To do that, create a Word 2007 document, and add some comments to the document. Save the document in a convenient location, and close Word. (This how-to topic assumes that you named your document C:\Comments.docx.)

To investigate the contents of the document

  1. In Windows Explorer, rename the document Demo.docx.zip.

  2. Open the ZIP file using Windows Explorer or a ZIP-management application.

  3. View the _rels\.rels file, shown in Figure 1. This document contains information about the relationships between the parts in the document. Note the value for the document.xml part, as highlighted in the figure—this information allows you to find the specific part you need.

    Figure 1. References to top-level document parts in the .rels file

    References to top-level document parts

  4. View the \word\_rels\document.xml.rels file. You will find the relationship between the document, and the associated comments (see Figure 2). This relationship makes it possible to find the comments part, so that the code can delete it.

    Figure 2. References to document-related parts in the document.xml.rels file

    References to document-related parts

  5. View the document specified in the .rels file, \word\document.xml. Locate the commentRangeStart, commentRangeEnd, and commentReference elements (see Figure 3). The code in this article shows how to remove these items.

    Figure 3. Comment-related elements

    Comment-related elements

  6. Close the tool you are using to investigate the presentation, and rename the file with a .docx extension.

Explore It