How to: Retrieve Property Values from a Word Processing Document

Applies to: Excel 2010 | Office 2010 | PowerPoint 2010 | Word 2010

In this article
Open the Existing Document for Read-only Access
Basic WordProcessingML Document Structure
Extended File Properties Part Element
How the Sample Code Works
Sample Code

This topic describes how to use the classes in the Open XML SDK 2.0 for Microsoft Office to programmatically retrieve a property value from a document part in a word processing document.

The following assembly directives are required to compile the code in this topic.

using System.Windows.Forms;
using System.XML;
using DocumentFormat.OpenXml.Packaging;
Imports System.Windows.Forms
Imports System.XML
Imports DocumentFormat.OpenXml.Packaging

Open the Existing Document for Read-only Access

To open an existing document, instantiate the WordprocessingDocument class as shown in the following using statement. In the same statement, open the word processing file at the specified document by using the Open(String, Boolean) method. To open the file for editing the Boolean parameter is set to true. In this example you just need to read the file; therefore, you can open the file for read-only access by setting the Boolean parameter to false.

using (WordprocessingDocument wordDoc = 
       WordprocessingDocument.Open(document, false)) 
{ 
    // Insert other code here. 
}
Dim wordDoc As WordprocessingDocument = _
        WordprocessingDocument.Open(document, False)
    ‘ Insert other code here.

The using statement provides a recommended alternative to the typical .Open, .Save, .Close sequence. It ensures that the Dispose method (internal method used by the Open XML SDK to clean up resources) is automatically called when the closing brace is reached. The block that follows the using statement establishes a scope for the object that is created or named in the using statement, in this case wordDoc.

Basic WordProcessingML Document Structure

The basic document structure of a WordProcessingML document consists of the document and body elements, followed by one or more block level elements such as p, which represents a paragraph. A paragraph contains one or more r elements. The r stands for run, which is a region of text with a common set of properties, such as formatting. A run contains one or more t elements. The t element contains a range of text. For example, the WordprocessingML markup for a document that contains only the text "Example text." is shown in the following code example.

<w:document xmlns:w="https://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p>
      <w:r>
        <w:t>Example text.</w:t>
      </w:r>
    </w:p>
  </w:body>
</w:document>

Using the Open XML SDK 2.0, you can create document structure and content using strongly-typed classes that correspond to WordprocessingML elements. You will find these classes in the DocumentFormat.OpenXml.Wordprocessing namespace. The following table lists the class names of the classes that correspond to the document, body, p, r, and t elements.

WordprocessingML Element

Open XML SDK 2.0 Class

Description

document

Document

The root element for the main document part.

body

Body

The container for the block level structures such as paragraphs, tables, annotations, and others specified in the ISO/IEC 29500 specification.

p

Paragraph

A paragraph.

r

Run

A run.

t

Text

A range of text.

Extended File Properties Part Element

The following text from ISO/IEC 29500 specification introduces this element.

An instance of this part contains properties specific to an Office Open XML document. [Example: A PresentationML document specifies the number of slides in this presentation when last saved by a producer.end example]A package shall contain at most one Extended File Properties part, and that part shall be the target of a relationship in the package-relationship item for the document.

[Example:

<Relationships xmlns="…">
   <Relationship Id="rId4"
      Type="http://…/extended-properties" Target="docProps/app.xml"/>
</Relationships>

end example]

The root element for a part of this content type shall be Properties.

[Example: Here's some content markup from a WordprocessingML document:

<Properties …>
   <Template>Normal.dotm</Template>
   <TotalTime>0</TotalTime>
   <Pages>1</Pages>
   <Words>3</Words>
   <Characters>22</Characters>
   <Application>Sample Producer</Application>
   <DocSecurity>0</DocSecurity>
   <Lines>1</Lines>
   <Paragraphs>1</Paragraphs>
   …
   <AppVersion>12.0000</AppVersion>
</Properties>

© ISO/IEC29500: 2008.

How the Sample Code Works

After you have opened the word processing file for read-only access, you can instantiate the ExtendedFilePropertiesPart class, and then you can examine document part instance. By using the GetStream, you can get the part content data stream and count the number of characters.

{
    ExtendedFilePropertiesPart appPart = wordDoc.ExtendedFilePropertiesPart;

    xmlProperties.Load(appPart.GetStream());
}
XmlNodeList chars = xmlProperties.GetElementsByTagName("Characters");
Dim appPart As ExtendedFilePropertiesPart = _
    wordDoc.ExtendedFilePropertiesPart
xmlProperties.Load(appPart.GetStream)
Dim chars As XmlNodeList = _
    xmlProperties.GetElementsByTagName("Characters")

Sample Code

The following code example shows how to retrieve the number of characters in a word processing document. To call the GetPropertyFromDocument method, you can use the code in the following example, which retrieves the number of characters in a file named "Word17.docx and displays the result in a message box.

string document = @"C:\Users\Public\Documents\Word17.docx";
GetPropertyFromDocument(document);
Dim document As String = "C:\Users\Public\Documents\Word17.docx"
GetPropertyFromDocument(document)

Following is the complete sample code in both C# and Visual Basic.

// To retrieve the properties of a document part.
public static void GetPropertyFromDocument(string document)
{
    XmlDocument xmlProperties = new XmlDocument();

    using (WordprocessingDocument wordDoc = 
        WordprocessingDocument.Open(document, false))
    {
        ExtendedFilePropertiesPart appPart = wordDoc.ExtendedFilePropertiesPart;

        xmlProperties.Load(appPart.GetStream());
    }
    XmlNodeList chars = xmlProperties.GetElementsByTagName("Characters");

    MessageBox.Show("Number of characters in the file = " +
        chars.Item(0).InnerText, "Character Count"); 
}
' To retrieve the properties of a document part.
Public Sub GetPropertyFromDocument(ByVal document As String)
    Dim xmlProperties As XmlDocument = New XmlDocument
    Dim wordDoc As WordprocessingDocument = _
        WordprocessingDocument.Open(document, False)
    Dim appPart As ExtendedFilePropertiesPart = _
        wordDoc.ExtendedFilePropertiesPart
    xmlProperties.Load(appPart.GetStream)
    Dim chars As XmlNodeList = _
        xmlProperties.GetElementsByTagName("Characters")
    MessageBox.Show("Number of characters in the file = " + _
            chars.Item(0).InnerText, "Character Count")
End Sub

See Also

Reference

Class Library Reference