Building Word 2007 Documents Using Office Open XML Formats

Office Visual How To

Erika Ehrli, Brian Jones, Microsoft Corporation

Applies to: 2007 Microsoft Office System, Microsoft Office Word 2007, Office Open XML Formats

Overview

The Office Open XML Formats are based on XML and ZIP archive technologies. The new file format in Microsoft Office Word 2007 divides the file into document parts, each of which defines a part of the overall contents of the file. You can easily create, change, add, or delete data in a Word 2007 file programmatically or manually.

Code It

To illustrate how document parts, content type items, and relationship items work together, this section walks through the process of building a Word XML format document in Word 2007.

To create a Word 2007 document that contains content type and relationship items, you need to create a root folder that contains a specific folder and file structure, as shown in Figure 1.

Figure 1. Folder and file structure for a Word 2007 document

Folder and file structure for a Word 2007 document

After you create all folders and files, the next section walks you through adding the required XML code to each document part.

Creating the Document Properties

First, you need to create two XML files for the document properties:

  1. Create a folder and name it root.

  2. Create a folder inside the folder root and name it docProps.

  3. Open Notepad or any other XML editor.

  4. Copy the following code into a new file and save it as app.xml inside the docProps folder.

  5. Open Notepad or any other XML editor.

  6. Copy the following code into a new file and save it as core.xml inside the docProps folder.

Creating the Document

Next, you need to create an XML file for the document part. This is the only required part in the new Word XML format.

  1. Create a folder and name it root.

  2. Create a folder inside the root folder and name it word.

  3. Open Notepad or any other XML editor.

  4. Copy the following code into a new file and save it as document.xml inside the word folder.

Creating a Relationship

Next, you need to create a relationship to this part. This relationship is documented in the root _relsfolder, which means that the relationship is off the root (or start part) of the package. To create the relationship:

  1. Create a folder inside the folder root and name it _rels.

  2. Open Notepad or any other XML editor.

  3. Copy the following code into a new file and save it as .relsinside the _rels folder.

  4. Notice that this XML creates a relationship of type officeDocument with ID rId1 to the document.xml file in the folder named word.

Defining the Content Type

Next, you need to define the content type of this file.

  1. Note that the structure of a content type definition file looks like the following code.

  2. Open Notepad or any other XML editor.

  3. Copy the above code into a new file and save it as [Content_Types].xml inside the root folder.

    NoteNote
    This reserved file name is used by the Open Packaging Conventions to define the content types of all files in the package.

Creating the Package

Finally, you can put these files into a ZIP package to create a valid Word 2007 document:

  1. Using any ZIP utility, save all the content of the simpledocument folder into a ZIP archive, including the following subfolders: the docProps folder, the word folder, and the _rels folder. Also include [Content_Types].xml.

    Important noteImportant
    Do not simply add the complete simpledocument folder to a ZIP file or you get an internal error while opening the file in Word 2007. You need to specifically add all the subfolders of the simpledocument folder to the ZIP archive.
  2. Save the archive as simpledocument.docx.

Now, you can open this file in Word 2007 and see the contents of the package.

Read It

The file format in Word 2007 consists of a compressed ZIP file, called a package. This package holds all of the content that is contained within the document. You can extract and open the files in the package to reveal component parts that give you access to the structures that compose the file. Figure 2 shows the file structure of a sample Word 2007 document.

Figure 2. Hierarchical file structure of a typical Word 2007 document

Hierarchical file structure of a typical Word 2007

To understand the structure of a Word 2007 document, you must understand the three major components of the new file format:

  • Part items. Each part item corresponds to one file in the un-zipped package. For example, if you right-click a Microsoft Office Excel workbook and choose to extract it, you see a workbook.xml file, several sheetn.xml files, and other files. Each of those files is a document part in the package.

  • Content Type items. Content type items describe what file types are stored in a document part. For example, image/jpeg denotes a JPEG image. This information enables Microsoft Office, and third-party tools, to determine the contents of any part in the package and to process its contents accurately.

  • Relationship items. Relationship items specify how the collection of document parts come together to form a document. This method specifies the connection between a source part and a target resource. Relationships are stored within XML parts in the document package, for example, /_rels/.rels.

You can replace and add entire document parts to change the content, properties, or formatting of Word 2007 documents. For more information about the Word 2007 file format, read the article Walkthrough: Word 2007 XML Format.

You can also build Word 2007 documents programmatically by using the Microsoft WinFX System.IO.Packaging class. For example, by using the Microsoft WinFX System.IO.Packaging class, you can create a document part with the PackagePart.CreatePart method. For more information about PackageParts, see the PackagePart Class reference documentation in the Microsoft Windows SDK.

See It Video splash screen

Watch the Video

Video Length: 00:08:24

File Size: 9.55 MB

File Type: WMV file

Explore It