About the Open XML Format SDK 1.0

This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This page may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Office Open XML (OpenXML) is an open standard for word-processing documents, presentations, and spreadsheets that can be freely implemented by multiple applications on different platforms. OpenXML is designed to faithfully represent existing word-processing documents, presentations, and spreadsheets that are encoded in binary formats defined by Microsoft® Office applications. The reason for OpenXML is simple: billions of documents now exist but, unfortunately, the information in those documents is tightly coupled with the programs that created them. The purpose of the OpenXML standard is to de-couple documents created by Microsoft Office applications so that they can be manipulated by other applications independent of proprietary formats and without the loss of data.

Structure of an OpenXML Package

An OpenXML file is stored in a ZIP archive for packaging and compression. You can view the structure of any OpenXML file using a ZIP viewer. An OpenXML document is built of multiple document parts. The relationships between the parts are themselves stored in document parts. The ZIP format supports random access to each part. For example, an application can move a slide from one Microsoft Office PowerPoint® 2007 presentation to another presentation without parsing the slide content. Likewise, an application can strip all of the comments out of a word processing document without parsing any of its contents.

The document parts in an OpenXML package are created as XML markup. Because XML is structured plain text, you can view the contents of a document part using text readers or you can parse the contents using processes such as XPath.

Structurally, an OpenXML document is an Open Packaging Conventions (OPC) package. As stated previously, a package is composed of a collection of document parts. Each part has a part name that consists of a sequence of segments or a pathname such as "/word/theme/theme1.xml." The package contains a [Content_Types].xml part that allows you to determine the content type of all document parts in the package. A set of explicit relationships for a source package or part is contained in a relationships part that ends with the .rels extension.

Microsoft Office Word 2007 documents are defined using WordprocessingML markup. A document is composed of a collection of stories where each story is one of the following:

  • Main document (the only required story)

  • Glossary document

  • Header and footer

  • Comments

  • Text box

  • Footnote and endnote

Microsoft Office PowerPoint 2007 presentations are described by PresentationML markup. Presentation packages can contain the following document parts:

  • Slide master

  • Notes master

  • Handout master

  • Slide layout

  • Notes

A Microsoft Office Excel® 2007 workbook is described by using SpreadsheetML markup. Workbook packages can contain:

  • Workbook part (required part)

  • One or more worksheets

  • Charts

  • Tables

  • Custom XML

The Open XML Format SDK 1.0

The Open XML Format SDK 1.0 simplifies the manipulation of OpenXML packages. The Open XML Application Programming Interface (API) encapsulates many of the common tasks that you typically perform on OpenXML packages, so you can perform complex operations with just a few lines of code. Some common tasks:

  • Search. With a few lines of code, you can search a collection of Excel 2007 worksheets for some arbitrary data.

  • Document assembly. You can create documents by combining the document parts of existing documents programmatically. For example, you can pull slides from various PowerPoint 2007 presentations to create a single presentation.

  • Validation. With a few lines of code, you can validate the document parts in a package or validate an entire package against a schema.

  • Data update. With the Open XML object model, you can easily modify the data in multiple packages.

  • Privacy. With a few lines of code, you can remove comments and other personal information from a document before it is distributed.

You can use the Open XML API in any language supported by the Microsoft .NET Framework®. The help topics presented in this SDK provide code samples in Microsoft Visual C#® and Microsoft Visual Basic® .NET.

Using the code samples in the help topics in this SDK as a starting point, you can take advantage of the OpenXML standards in the 2007 Microsoft Office system. The Open XML API relieves much of the tedium of working with Open Packaging Conventions documents and is well worth your time to explore.