Office Open XML Formats: Retrieving Lists of PowerPoint 2007 Slides

Summary: Learn how to retrieve lists of PowerPoint slides programmatically using code snippets for use with Visual Studio 2005.

Office Visual How To

Applies to: 2007 Microsoft Office System, Microsoft Office PowerPoint 2007, Microsoft Visual Studio 2005

Ken Getz, MCW Technologies, LLC

March 2007

Overview

Imagine that you need to retrieve a list of slide titles from one or more Microsoft Office PowerPoint 2007 presentations, perhaps to populate a database table, or to provide a report. The ability to perform this operation without requiring you to load PowerPoint 2007 and then load the presentations, one after another, can be an incredible time saver. The Office Open XML File Formats make this task possible. Working with the Office Open XML File Formats requires knowledge of the way PowerPoint stores the content, the System.IO.Packaging API, and XML programming.

See It Bb332059.93ee2011-c70a-42d9-8284-272feb43a5eb(en-us,office.12).jpg

Watch the Video

Length: 09:55 | Size: 7.6 MB | Type: WMV file

Code It | Read It | Explore It

Code It

To help you get started, download a set of forty code snippets for Microsoft Visual Studio 2005, each of which demonstrate various techniques working with the 2007 Office System Sample: Open XML File Format Code Snippets for Visual Studio 2005. After you install the code snippets, create a sample Microsoft Office PowerPoint presentation to test with. (For more information, see Read It). Create a Windows Application project in Visual Studio 2005, open the code editor, right-click and select Insert Snippet, and select the PowerPoint: Get List of Slide Titles snippet from the list of available 2007 Microsoft Office snippets. If you are using Microsoft Visual Basic, inserting the snippet adds a reference to WindowsBase.dll with the following Imports statements:

Imports System.IO
Imports System.IO.Packaging
Imports System.Xml

If you use Microsoft Visual C#, you need to add the reference to the WindowsBase.dll assembly and the corresponding using statements, so that you can compile the code. (Code snippets in C# cannot set references and insert using statements for you.) If the Windowsbase.dll reference does not appear on the .NET tab of the Add Reference dialog box, click the Browse tab, locate the C:\Program Files\Reference assemblies\Microsoft\Framework\v3.0 folder, and then click WindowsBase.dll.

The PPTGetSlideTitles snippet delves programmatically into the various document parts and relationships between the parts to retrieve a list of slide titles. To test it out, store your sample presentation somewhere easy to find (for example, C:\Demo.pptx). In a Windows application, insert the PPTGetSlideTitles snippet, and then call it using the sample below. You see a list of slide titles in the Output window.

Dim titles As List(Of String) = PPTGetSlideTitles("C:\demo.pptx")
For Each title As String In titles
  Debug.Print(title)
Next
List<string> titles = PPTGetSlideTitles("C:\\demo.pptx");
foreach (string title in titles)
{
  System.Diagnostics.Debug.Print(title);
}

The snippet code starts with the following block:

  Public Function PPTGetSlideTitles( _
   ByVal fileName As String) As List(Of String)
    ' Return a generic list containing all 
    ' the slide titles.
    Const documentRelationshipType As String = _
     "http://schemas.openxmlformats.org/officeDocument/2006/" & _
     "relationships/officeDocument"
    Const presentationmlNamespace As String = _
     "http://schemas.openxmlformats.org/" & _
     "presentationml/2006/main"

    ' Fill this collection with a list of all 
    ' the titles of all the slides in the 
    ' requested slide deck.
    Dim titles As New List(Of String)

    ' Next block goes here.

    Return titles
  End Function
public List<string> PPTGetSlideTitles(
  string fileName)
{
  //  Return a generic list containing 
  // all the slide titles.
  const string documentRelationshipType =
    "http://schemas.openxmlformats.org/officeDocument/2006/" +
    "relationships/officeDocument";
  const string presentationmlNamespace =
    "http://schemas.openxmlformats.org/" + 
    "presentationml/2006/main";

  //  Fill this collection with a list of 
  // all the titles of all the slides in 
  // the requested slide deck.
  List<string> titles = new System.Collections. 
    Generic.List<string>();

  // Next block goes here.

  return titles;
}

The code returns a generic List containing a string value for each slide in the document you specify. As with any other work with the Open XML File Formats, you want to use relationships between document parts to find the various parts you need. The code includes a constant, documentRelationshipType, that contains the fixed relationship type you need to find the document part within the PowerPoint package. The presentationmlNamespace constant contains the namespace you need when searching. The code declares a generic List to contain the results. At the end of the procedure, it returns that generic list.

Nearly every procedure that interacts with the Office Open XML File Formats needs to open a package, either for read-only, or for both reading and writing. In this exercise, you are only reading content from the file, so you can open the package in read-only mode. The next block of code does this for you:

Dim documentPart As PackagePart = Nothing
Dim documentUri As Uri = Nothing

Using pptPackage As Package = _
 Package.Open(fileName, FileMode.Open, FileAccess.Read)

  ' Next block goes here.

End Using
PackagePart documentPart = null;
Uri documentUri = null;

using (Package pptPackage = 
  Package.Open(fileName, FileMode.Open, FileAccess.Read))
{
  // Next block goes here.
}

The code creates the pptPackage variable, using the System.IO.Packaging.Package type, and fills it by calling the Package.Open method, passing in the name of the file to open, the mode to use, and the access method. When you are finished with the package, close it. The snippet completes its work in a using block, which closes the package when it is finished.

Every 2007 Office document contains a single document part, which acts as the start part. This document part contains the document itself. In just about every situation, the goal is to find that part first. The next code block finds the document's start part—the XML part representing the document content. It calls the Package.GetRelationshipsByType method, passing in the constant that contains the document relationship name (see Figure 2). The code then loops through all the returned relationships, and retrieves the document URI, relative to the root of the package. You must loop through the PackageRelationship objects to retrieve the one you want. In every case, this loop only executes once:

For Each relationship As PackageRelationship _
 In pptPackage.GetRelationshipsByType( _
   documentRelationshipType)
  documentUri = PackUriHelper.ResolvePartUri( _
   New Uri("/", UriKind.Relative), relationship.TargetUri)
  documentPart = pptPackage.GetPart(documentUri)

  Exit For
Next

' Next block goes here.
foreach (PackageRelationship relationship
  in pptPackage.GetRelationshipsByType(
  documentRelationshipType))
{
  documentUri = PackUriHelper.ResolvePartUri(
    new Uri("/", UriKind.Relative), relationship.TargetUri);
  documentPart = pptPackage.GetPart(documentUri);

  break;
}
// Next block goes here.

To search for the list of relationship IDs, the code starts by setting up an XmlNamespaceManager instance. The namespace manager includes a namespace abbreviated “p”, referring to a namespace named using the presentationmlNamespace constant, discussed above. Next, the code creates an XmlDocument instance, and loads the XML content from the document part into the new XML document. Finally, this code example calls the XmlDocument.SelectNodes method, passing in a query string to find the nodes shown in Figure 3. Note that the variable names and comments in the code in this snippet refer to sheets in many places, instead of slides. Clearly, copy and paste errors occurred in its creation.

' Manage namespaces to perform Xml XPath queries.
Dim nt As New NameTable()
Dim nsManager As New XmlNamespaceManager(nt)
nsManager.AddNamespace("p", presentationmlNamespace)

'  Iterate through the slides and extract 
' the title string from each.
Dim xDoc As New XmlDocument(nt)
xDoc.Load(documentPart.GetStream())

Dim sheetNodes As XmlNodeList = _
 xDoc.SelectNodes("//p:sldIdLst/p:sldId", nsManager)
If sheetNodes IsNot Nothing Then

  ' Next block goes here.

End If
// Manage namespaces to perform Xml 
// XPath queries.
NameTable nt = new NameTable();
XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
nsManager.AddNamespace("p", presentationmlNamespace);

// Iterate through the slides and 
// extract the title string from each.
XmlDocument xDoc = new XmlDocument(nt);
xDoc.Load(documentPart.GetStream());

XmlNodeList sheetNodes =
  xDoc.SelectNodes("//p:sldIdLst/p:sldId", nsManager);
if (sheetNodes != null)
{
  // Next block goes here.
}

The code next loops through each item in the node list, retrieving the r:id attribute for each item—this information provides the relationship ID the code needs to load the individual sheets:

Dim relAttr As XmlAttribute = Nothing
Dim sheetRelationship As PackageRelationship = Nothing
Dim sheetPart As PackagePart = Nothing
Dim sheetUri As Uri = Nothing
Dim sheetDoc As XmlDocument = Nothing
Dim titleNode As XmlNode = Nothing

' Look at each sheet node, retrieving 
' the relationship id.
For Each xNode As XmlNode In sheetNodes
  relAttr = xNode.Attributes("r:id")
  If relAttr IsNot Nothing Then

    ' Next block goes here.

  End If
Next
XmlAttribute relAttr = null;
PackageRelationship sheetRelationship = null;
PackagePart sheetPart = null;
Uri sheetUri = null;
XmlDocument sheetDoc = null;
XmlNode titleNode = null;

//  Look at each sheet node, retrieving 
// the relationship id.
foreach (System.Xml.XmlNode xNode in sheetNodes)
{
  relAttr = xNode.Attributes["r:id"];
  if (relAttr != null)
  {
    // Next block goes here.
  }
}

For each slide relationship, the code uses the PackagePart.GetRelationship method to retrieve the relationship corresponding to the specific ID (listed in Figure 4). For each relationship, the code resolves the URI it finds in the relationships part, and retrieves a reference to the individual slide part:

' Retrieve the PackageRelationship object 
' for the sheet:
sheetRelationship = documentPart.GetRelationship(relAttr.Value)
If sheetRelationship IsNot Nothing Then
  sheetUri = PackUriHelper.ResolvePartUri( _
   documentUri, sheetRelationship.TargetUri)
  sheetPart = pptPackage.GetPart(sheetUri)
  If sheetPart IsNot Nothing Then

    ' Next block goes here.

  End If
End If
//  Retrieve the PackageRelationship 
// object for the sheet.
sheetRelationship = documentPart.GetRelationship(relAttr.Value);
if (sheetRelationship != null)
{
  sheetUri = PackUriHelper.ResolvePartUri(
    documentUri, sheetRelationship.TargetUri);
  sheetPart = pptPackage.GetPart(sheetUri);
  if (sheetPart != null)
  {
    // Next block goes here.
  }
}

Finally, the code includes a reference to the start part. It loads a new XmlDocument instance with the XML content of the slide. With the slide's XML content, the code searches for XML content that represents the title of the slide. If the search finds a matching node, the code adds the InnerText property of the node before the title. You may wonder why it adds it several lines before the title. This becomes an issue if you use several different fonts or styles in the title—the text is broken up among multiple elements. By retrieving the inner text of a parent node, you are guaranteed to retrieve all the text. Finally, the code retrieves the title of a single slide. The code repeats this routine for each slide in the presentation:

sheetDoc = New XmlDocument(nt)
sheetDoc.Load(sheetPart.GetStream())

titleNode = sheetDoc.SelectSingleNode( _
 "//p:sp//p:ph[@type='title' or @type='ctrTitle']", nsManager)
If titleNode IsNot Nothing Then
  titles.Add(titleNode.ParentNode.ParentNode. _
   ParentNode.InnerText)
End If
sheetDoc = new XmlDocument(nt);
sheetDoc.Load(sheetPart.GetStream());
titleNode = sheetDoc.SelectSingleNode(
  "//p:sp//p:ph[@type='title' or @type='ctrTitle']", nsManager);

if (titleNode != null)
{
  titles.Add(titleNode.ParentNode.
    ParentNode.ParentNode.InnerText);
}

Read It

It is important to understand the file structure of a simple PowerPoint document so that you can find the data you need—in this case, you want the title for each slide in the presentation. To do that, create a PowerPoint document with several slides in it, giving each slide a title. I named my document, Demo.pptx, and it contains four slides, as shown in Figure 1.

Figure 1. The sample document contains four slides with unique titles

Main Title in Outline view

To investigate the contents of the document, follow these steps:

  1. In Windows Explorer, rename the document, changing the extension to .zip. For example, Demo.pptx.zip.

  2. Open the ZIP file, using either Window Explorer, or some ZIP application.

  3. View the _rels\.rels file, shown in Figure 2. This document contains information about the relationships between the parts in the document. Note the value for the presentation.xml part, as highlighted in the figure—this information allows you to find specific parts.

    Figure 2. Use relationships between document parts to find specific parts

    Bb332059.a71cb702-267e-4de4-b7f2-0a28aff94e3d(en-us,office.12).jpg

  4. Open ppt\presentation.xml, shown in Figure 3. The highlighted element, p:sldIdLst, contains one reference for each slide in the deck. The snippet youl investigate retrieves each of these slide references to retrieve the slide title.

    Figure 3. Use the r:id attribute to find each slide.

    Bb332059.8cd59b9f-7abf-419c-8f3e-9dcf478657b6(en-us,office.12).gif

  5. Open ppt\_rels\presentation.xml.rels, as shown in Figure 4. This document contains information about the relationships between the document part and all the subsidiary parts. The code snippet uses this information to find each of the slides so that it can retrieve the title from the slide. Note, for example, that the slide whose relationship ID is rId2 refers to slides/slide1.xml.

    Figure 4. Each slide relationship appears in the presentation.xml.rels file

    Bb332059.17b8e954-3e2c-4576-bf8c-9400dfe3612c(en-us,office.12).jpg

  6. Open ppt\slides\slide1.xml, as shown in Figure 5—this part contains the slide title. The code snippet uses XML-searching techniques to find this particular element within the XML content. The code repeats the actions for each slide in the presentation.

    Figure 5. In ppt\slides\slide1.xml, the slide title appears within the XML content for the slide.

    Bb332059.2aa8855f-1940-4385-a58f-7232e0af1bb5(en-us,office.12).gif

  7. Close the tool you are using to review the presentation, and rename the file with a .PPTX extension.

Explore It