Extending the Office (2007) Fixed-Format Export Feature

Summary: Create a COM add-in for an application in the 2007 Microsoft Office release, such as Microsoft Office Publisher 2007, that extends the Office fixed-format export feature to support new formats. The technique described requires knowledge of C++ and COM. (28 printed pages)

Carl Parker, Microsoft Corporation

May 2006

Applies to: 2007 Microsoft Office Suites, Microsoft Office Access 2007, Microsoft Office Excel 2007, Microsoft Office InfoPath 2007, Microsoft Office OneNote 2007, Microsoft Office PowerPoint 2007, Microsoft Office Publisher 2007, Microsoft Office Visio 2007, Microsoft Office Word 2007

Contents

  • Introduction to the Office (2007) Fixed-Format Export Feature

  • Initializing Add-Ins

  • IMsoDocExporter

  • Call Flow

  • GetOutputOption and SetOutputOption

  • EnableCancel

  • HrBeginStructNode

  • HrEndStructNode

  • HrCreateDoc

  • HrSetDefaultLcid

  • HrAddPageFromEmf

  • DocExComment_EPSColor

  • Extended Color Support

  • SetDocExporterSite

  • HrSetPageHeightForPagination

  • HrGetPageBreaks

  • HrAddOutlineNode

  • HrAddDocumentMetadataString

  • HrAddDocumentMetadataDate

  • HrFinalize

  • Conclusion

  • Additional Resources

Introduction to the Office (2007) Fixed-Format Export Feature

This article explains how to hook in to the fixed-format export feature available in the 2007 Microsoft Office release so that third-party software developers can add support for new file formats.

The 2007 release supports export to fixed-file formats such as the Microsoft XML Paper Specification (XPS) and the Adobe Portable Document Format (PDF). Fixed-file formats expose the content of a document in a paginated form that is both application-independent and platform-independent. Fixed-format export is available in the 2007 Microsoft Office system. To export from a Microsoft Office application that has the Ribbon user interface (UI), click the Microsoft Office Button, and then click Publish as PDF or XPS. For all other applications, on the File menu, click Publish as PDF or XPS.

Software developers can add support for additional fixed formats, by writing an Office add-in that implements the IMsoDocExporter COM interface. This article describes IMsoDocExporter and its interaction with a hosting Microsoft Office application, such as Microsoft Office Publisher 2007.

Important noteImportant

The fixed-format export feature is available in all the applications listed in the preceding Applies to section. However, the discussion below uses Publisher 2007 as an example application, except in those cases where an explanation is more relevant to a different application.

Initializing Add-Ins

For the user to access add-in functionality, the add-in should add a new menu item or a new toolbar button to Publisher 2007. When the user selects this menu item or button, the add-in should use the Microsoft Office Object Model to obtain a pointer to the active document. It should then call the active document's ExportAsFixedFormat method with an IUnknown interface pointer that supports the IMsoDocExporter interface through a call to the QueryInterface method. The object model parameter for the interface pointer is a VARIANT with VT_UNKNOWN type.

NoteNote

For Microsoft Office OneNote 2007, the add-in calls the Publish method with a string parameter that is the class ID of the add-in's implementation of the IMsoDocExporter interface. OneNote 2007 then calls CoCreateInstance with the class ID to get an IUnknown interface pointer from the add-in's class factory.

After Publisher 2007 has a pointer to the IMsoDocExporter interface, it calls back the add-in through the methods exposed by IMsoDocExporter. Through these callbacks, Publisher 2007 provides the add-in with document content and other information about the document.

An excellent source of information about building COM add-ins for Microsoft Office applications is the codeproject.com article Building an Office2K COM Add-in with VC++/ATL.

IMsoDocExporter

The IMsoDocExporter interface is declared in the fixedformatext.h header file. This interface exposes the following methods.

Table 1. Methods exposed by the IMsoDocExporter interface

Method

Description

HrCreateDoc

Called at the start of the fixed-format export process.

HrAddPageFromEmf

Called to pass the add-in an enhanced metafile (EMF) that represents a rendered view of the content to export.

HrAddDocumentMetadataString

Called to specify string-format metadata for the document.

HrAddDocumentMetadataDate

Called to specify date-format metadata for the document.

HrSetDefaultLcid

Called to specify the default locale ID (LCID) for the content to export.

HrAddOutlineNode

Called to specify user-navigable document outline information.

HrGetPageBreaks

Called to obtain pagination information from the add-in.

HrSetPageHeightForPagination

Called to specify the page height to enable the add-in to paginate the document.

HrFinalize

Called at the end of the fixed-format export process. Allows the add-in to perform any final processing.

HrBeginStructNode

Called to pass the add-in the starting structure for a document-structure node that spans multiple pages.

HrEndStructNode

Called to pass the add-in the ending structure for a document-structure node that spans multiple pages.

EnableCancel

Called to pass the add-in a pointer to an IDocExCancel interface.

GetOutputOption

Called to retrieve fixed-format output options.

SetOutputOption

Called by Office to set fixed-format output options.

SetDocExporterSite

Called to provide the add-in with a pointer to an IMsoDocExporterSite interface for extended color support.

In addition, IMsoDocExporter also exposes the following methods that are inherited from the IUnknown interface.

Table 2. Methods inherited from the IUnknown interface

Method

Description

AddRef

Increments the reference count.

QueryInterface

Returns pointers to supported interfaces. The add-in's implementation of QueryInterface should support returning an IMsoDocExporter interface pointer from IID_IMsoPdfWriter.

Release

Decrements the reference count.

For information about implementing the IUnknown interface methods, see IUnknown (COM).

Call Flow

The following diagram shows the sequence in which Publisher 2007 calls the methods exposed in IMsoDocExporter. Not all of the methods are used by each Microsoft Office application and not all of the methods are used for every document that is exported.

Figure 1. Calling methods from the IMsoDocExporter interface

Calling methods from the IMsoDocExporter interface

The following sections further describe the methods exposed by the IMsoDocExporter interface. The methods are described in approximately the order in which they would be called by Publisher 2007.

GetOutputOption and SetOutputOption

Publisher 2007 calls the GetOutputOption and SetOutputOption methods to retrieve and set output options for the fixed-format export process.

void GetOutputOption( 
    MSODOCEXOPTION docexoption, 
    DWORD* pdwVal 
)
void SetOutputOption( 
    MSODOCEXOPTION docexoption, 
    DWORD dwVal 
)

For both GetOutputOption and SetOutputOption, the method is called once for each option to be retrieved or set. The docexoption parameter specifies the output option and the (p)dwVal parameter specifies the value for the option. The value of docexoption must be one of the values from the MSODOCEXOPTION enumeration type defined in the fixedformatext.h file.

These GetOutputOption and SetOutputOption methods are used primarily by the implementation of fixed-format export in the 2007 Office release. An add-in implementation typically has its own options dialog box and its own method of storing output options.

Microsoft Office Calls GetOutputOption Only with msodocexOptionTargetDPIColor for Fixed-Format Add-Ins

For the implementation of fixed-format export in the 2007 release, Publisher 2007 calls the GetOutputOption method to retrieve output options for display to the user in the Publish as PDF or XPS dialog box. For add-ins developed by third-party software developers, Publisher 2007 calls GetOutputOption with only the msodocexOptionTargetDPIColor value. This is the only value that an add-in needs to support. If the add-in's implementation of GetOutputOption is called with this value, it should return the target dots-per-inch (DPI) for 3-D effect rasterization.

Microsoft Office Calls SetOutputOption for Fixed-Format Add-Ins

For both the implementation of fixed-format export in the 2007 release and for add-in implementations, Publisher 2007 calls SetOutputOption at the beginning of the fixed-format export process. In the implementation in the 2007 release, the parameter values passed in specify fixed-format output options. However, if the add-in implements its own set of options, the add-in can disregard the options passed to it by Publisher 2007.

EnableCancel

Publisher 2007 calls the EnableCancel method to pass the add-in a pointer to an IMsoDocExCancel interface. The add-in can use this interface to query whether a user chooses to cancel a long document-export operation.

void EnableCancel(
    IMsoDocExCancel* pdec
)

HrBeginStructNode

Publisher 2007 calls the HrBeginStructNode method to specify the start of a document-structure node for content that encompasses multiple complete pages in the document. Document-structure nodes for elements of the document that reside entirely within a page (for example, paragraphs) are embedded by Publisher 2007 in the enhanced metafile (EMF) itself using the DocExComment_BeginStructNode and DocExComment_EndStructNode structures. For more information about document-structure nodes, see the sections HrAddPageFromEmf and DocExComment_BeginStructNode in this article.

HRESULT HrBeginStructNode (
    int idNodeParent, 
    int iSortOrder, 
    const MSODOCEXSTRUCTNODE * pnode, 
    BOOL fNoEndNode
)

The idNodeParent parameter specifies the ID of the node that is the parent of the node being passed to the add-in. If this parameter is 0, the node is located under the root of the document-structure tree. Multiple sibling nodes may be located under the root. If this parameter is -1, the node is located under the currently open node, that is, under the last node specified by HrBeginStructNode that has not been closed by a call to HrEndStructNode.

The iSortOrder parameter specifies the sort order of the structure node among its siblings. No two nodes can have the same sort order. However, the set of integers that constitute the sort order need not be contiguous. A value of -1 indicates that the sibling sort order is the same order in which the nodes appear in the EMF comments.

The pnode parameter points to an MSODOCEXSTRUCTNODE structure, which has the following declaration:

typedef struct _MsoDocexStructNode
{
    int idNode;
    MSODOCEXSTRUCTTYPE nodetype;
    WCHAR * pwchAltText; 
    union
        {
        int iHeadingLevel; 
        int iPage;     
        WCHAR * pwchActualText;
        long cpLim;       // This element only appears in Office 2016 and later.
        };
 } MSODOCEXSTRUCTNODE;

The idNode member specifies the ID of the node being passed in the call to HrBeginStructNode. This member may not have a value of 0. A value of -1 indicates that child nodes do not use the idNodeParent parameter to specify this node as their parent. Instead, this node can be a parent only by enclosing child nodes in the EMF. Multiple nodes can have an ID of -1. If the ID is not -1, the value is unique across the document.

The embedded union at the end of the MSODOCEXSTRUCTNODE is interpreted as a heading level (iHeadingLevel) for heading structure nodes or a page number (iPage) for page structure nodes. For table structure nodes, the union is interpreted as an ordering of the table ends relative to other tables by using cpLim (available in Office 2016 and later), which can be used to determine the nesting order of tables within tables. In the context of the DocExComment_BeginStructNode, the add-in can ignore the pwchActualText member of this union.

The pwchAltText member specifies alternate text for the structure node.

The fNoEndNode parameter to HrBeginStructNode specifies whether Publisher 2007 calls the HrEndStructNode method to mark the end of the structure node. If fNoEndNode is false, then Publisher 2007 calls HrEndStructNode to close off the content bounded by the node. If this parameter has a true value, then the node does not bound any content.

The fNoEndNode parameter affects the interpretation of the parent ID value of subsequent nodes. If fNoEndNode is false, nodes inserted between this call to HrBeginStructNode and the subsequent call to HrEndStructNode, and that have a parent ID of -1, are children of this node. However, if fNoEndNode is true, then nodes inserted after this call to HrBeginStructNode, and that have a parent ID of -1, are not children of this node but are children of the next-most-recently specified node that has fNoEndNode equal to false.

Document structure nodes can be nested to arbitrary depth.

The nodes specified by HrBeginStructNode and those specified by DocExComment_BeginStructNode share the same ID space and exist in the same document structure tree. HrBeginStructNode and DocExComment_BeginStructNode are two alternative ways of adding nodes to this tree. For example, if the most recently opened node was opened by HrBeginStructNode and the next node encountered is from a DocExComment_BeginStructNode EMFcommentrecord with idNodeParent equal to -1, it means that the node from HrBeginStructNode is the parent of the node from the DocExComment_BeginStructNode record.

HrEndStructNode

Publisher 2007 calls the HrEndStructNode method to specify the end of a document-structure node for content that encompasses multiple pages in the document. The structure node ended by the HrEndStructNode was begun previously by a call to the HrBeginStructNode method. For more information, see HrBeginStructNode in this article.

HRESULT HrEndStructNode()

HrCreateDoc

Publisher 2007 calls the HrCreateDoc method to specify the creation of a new, empty fixed-format document.

HRESULT HrCreateDoc(
    const WCHAR * wzDocExFile
);

Publisher 2007 calls the HrCreateDoc method at the beginning of the fixed-format export process to specify the creation of an empty fixed-format document. The wzDocExFile parameter specifies a name for the output file to which to write the fixed-format document.

For an add-in implementation, Publisher 2007 calls HrCreateDoc with the file name that the add-in provided in the call to the ExportToFixedFormat method in the Microsoft Office object model. However, because add-ins typically provide configuration UI to allow the user to specify an output file name, the add-in could disregard this file name during the export process.

For Microsoft Office applications that require the add-in to paginate the document, HrCreateDoc is called twice, once at the start of the pagination-calling sequence, and again after the add-in has paginated the document. For more information, see the descriptions for the HrSetPageHeightForPagination method and the HrGetPageBreaks method.

HrSetDefaultLcid

Publisher 2007 calls the HrSetDefaultLcid method to specify the default locale ID (LCID) for the content to be exported.

HRESULT HrSetDefaultLcid(
    DWORD lcid
);

For a list of valid LCIDs, see List of Locale ID (LCID) Values as Assigned by Microsoft.

HrAddPageFromEmf

Publisher 2007 calls the HrAddPageFromEmf method to pass the add-in a handle to an in-memory EMF that represents the content in the document to export.

HRESULT HrAddPageFromEmf(
    HENHMETAFILE hemf
);

The EMF passed by Microsoft Office to the add-in is the primary source of the content that the add-in exports as a fixed-format file. Microsoft Office calls HrAddPageFromEmf once for each page of content in the application's source document.

For more information about enhanced metafiles, see the Microsoft Knowledge Base article How to Create & Play Enhanced Metafiles in Win32 and Metafiles in the Microsoft .NET Framework Developer's Guide.

EMF Comments Convey Semantic Information

An EMF is a sequence of drawing commands (GDI and GDI+ commands) that specify how to render the visual elements of the document. The EMF does not contain any information beyond these commands (for example, "draw an image here," or "draw a line over there"). In particular, conventional EMF do not support semantic aspects of the document, such as hyperlinks, locale information, and accessibility information. To preserve semantic information in the exported document, Publisher 2007 injects special records in the EMF. These records contain the semantic information.

The records that represent the semantic information are implemented as special-formatted EMF comments. The EMF format allows for comment record types that are ignored by the rendering engine for Graphics Device Interface (GDI), but can contain arbitrary information.

As an example, consider a document that contains alternate text. (Alternate text is used by document readers to describe images for users with sight impairments.) Publisher 2007 injects EMF comments before and after rendering the image, and these EMF comments specify the alternate text for the image. The add-in interprets the comments and writes the information to the fixed-format export file.

The following table shows the semantic records types supported by the Microsoft Office fixed-format export feature. These types are enumerated by the MSODOCEXSTRUCTTYPE enumeration. Each type corresponds to a structure type that describes the format for the record.

Table 3. Semantic record types supported by fixed-format export

Comment Value

Structure Type

msodocexcommentExternalHyperlink

DocExComment_ExternalHyperlink

msodocexcommentInternalHyperlink

DocExComment_InternalHyperlink

msodocexcommentColorInfo

DocExComment_ColorInfo

msodocexcommentColorMapEnable

DocExComment_ColorEnable

msodocexcommentBeginTextRun

DocExComment_BeginTextRun

msodocexcommentEndTextRun

DocExComment_EndTextRun

msodocexcommentBeginStructNode

DocExComment_BeginStructNode

msodocexcommentEndStructNode    

DocExComment_EndStructNode

msodocexcommentExternalHyperlinkRctfv

DocExComment_ExternalHyperlink

msodocexcommentInternalHyperlinkRctfv

DocExComment_InternalHyperlink

msodocexcommentUnicodeForNextTextOut

DocExComment_UnicodeForNextTextOut

msodocexcommentEPSColor

DocExComment_EPSColor

msodocexcommentEPSCMYKJPEG

DocExComment_EPSColorCMYKJPEG

msodocexcommentEPSSpotImage

DocExComment_EPSColorSpotImage

DocExComment_ExternalHyperlink(Rctfv)

The DocExComment_ExternalHyperlink(Rctfv) structure describes a hyperlink that links to outside of the document, for example to a Web site on the Internet.

struct DocExComment_ExternalHyperlink
{
    DWORD ident;
    DWORD iComment;
    union
        {
        RECT  rcdvRegion;
        struct 
            {
            float xLeft;
            float yTop;
            float dxWidth;
            float dyHeight;
            } rctfvRegion;
        };
    WCHAR wzLink[MAX_PATH];
};

The members of DocExComment_ExternalHyperlink(Rctfv) structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentExternalHyperlink or msodocexcommentExternalHyperlinkRctfv.

  • rcdvRegion and rctfvRegion   A union that specifies the region of the page that is the source location of the hyperlink. The region can be represented as a RECT type (rcdvRegion) that uses device pixels as the unit of measure, or as a structure that contains floating-point coordinates (rctfvRegion), in which case the unit of measure is points.

    If the iComment member is equal to msodocexcommentExternalHyperlink, the add-in should use rcdvRegion. In this case, the add-in needs to apply the current EMF transformation matrix to rcdvRegion to convert it to the page space.

    If the iComment member is equal to msodocexcommentExternalHyperlinkRctfv, the add-in should use rctfvRegion. In this case, rctfvRegion is already in the page space, so no transformation is needed.

  • wzLink[MAX_PATH]   Specifies the destination URL for this hyperlink.

DocExComment_InternalHyperlink(Rctfv)

The DocExComment_InternalHyperlink(Rctfv) structure describes a hyperlink that links to a location within the document. Note that, although Publisher 2007 passes a separate EMF for each page of the document, the destination of the hyperlink specified by DocExComment_InternalHyperlink(Rctfv) could be on a different page than the source location.

struct DocExComment_InternalHyperlink
{
    DWORD ident;
    DWORD iComment;
    union
        {
        RECT  rcdvRegion;
        struct 
            {
            float xLeft;
            float yTop;
            float dxWidth;
            float dyHeight;
            } rctfvRegion;
        };
    DWORD iTargetPage;
    float xtfvTarget;
    float ytfvTarget;
    float dytfTargetPage;
};

The members of DocExComment_InternalHyperlink(Rctfv) structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentInternalHyperlink or msodocexcommentInternalHyperlinkRctfv.

  • rcdvRegion and rctfvRegion   As with the DocExComment_ExternalHyperlink structure, this member is a union that specifies the region of the page that is the source location of the hyperlink. The region can be represented as a RECT type (rcdvRegion) that uses device pixels as the unit of measure, or as a structure that contains floating-point coordinates (rctfvRegion), in which case the unit of measure is points.

    If the iComment member is equal to msodocexcommentInternalHyperlink, the add-in should use rcdvRegion. In this case, the add-in needs to apply the current EMF transformation matrix to rcdvRegion to convert it to the page space.

    If the iComment member is equal to msodocexcommentInternalHyperlinkRctfv, the add-in should use rctfvRegion. In this case, rctfvRegion is already in the page space, so no transformation is needed.

  • iTargetPage   Specifies the page number of the destination page within the document.

  • xtfvTarget   Specifies the x-coordinate of the target location on the destination page. The unit of measure for this value is points.

  • ytfvTarget   Specifies the y-coordinate of the target location on the destination page. The unit of measure for this value is points.

  • dytfTargetPage   The height of the destination page in points. The offset specified by the ytfvTarget member is relative to the upper-left corner of the page. However, some fixed-format types use a coordinate system that is relative to the bottom-left corner of the page. For these types of documents, the page height is required to convert the offset.

DocExComment_ColorInfo

The DocExComment_ColorInfo structure specifies color-state information for the EMF. For more information about this structure, see the section Extended Color Support.

struct DocExComment_ColorInfo
{
    DWORD ident;
    DWORD iComment;
    COLORREF clr;
    BOOL fForeColor;
};

The members of the DocExComment_ColorInfo structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentColorInfo.

  • clr   Specifies a color ID that represents a current color state in the EMF.

  • fForeColor   Specifies whether the color ID in the clr member represents a foreground color or a background color. If this member has a value of true, the color ID represents a foreground color. If this member has a value of false, the color ID represents a background color.

DocExComment_ColorEnable

The DocExComment_ColorEnable structure specifies whether color mapping is enabled for subsequent content in the EMF. For more information about this structure, see the section Extended Color Support.

struct DocExComment_ColorEnable
{
    DWORD ident;
    DWORD iComment;
    BOOL  fEnable;
};

The members of the DocExComment_ColorEnable structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentColorMapEnable.

  • fEnable   Specifies whether color mapping is enabled for subsequent content. A value of true indicates that color mapping is enabled. A value of false indicates that color mapping is disabled.

DocExComment_BeginStructNode

The DocExComment_BeginStructNode structure marks the start of a document structure node. Structure nodes serve one of two possible purposes:

  • Structure nodes can identify the type of content they contain and specify the hierarchical relationship between that content and other content in the document.

  • Structure nodes can specify alternate text for elements in the document.

If the fContentNode member has a true value, the DocExComment_BeginStructNode is followed later in the document by a DocExComment_EndStructNode. The DocExComment_EndStructNode marks the end of the content that is wrapped by the information in the DocExComment_BeginStructNode.

The collection of structure nodes within the document forms a tree; each node has a parent node and may also have sibling nodes. The idNodeParent and iSortOrder members describe the structure of this tree. Note that a child node may or may not appear between the DocExComment_BeginStructNode and DocExComment_EndStructNode structures of the parent node in the EMF.

struct DocExComment_BeginStructNode
{
    DWORD ident;
    DWORD iComment;
    int idNodeParent;
    int iSortOrder;
    MSODOCEXSTRUCTNODE  desn;
    BOOL fContentNode;
    int cwchAltText;
};

The members of the DocExComment_BeginStructNode structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentBeginStructNode.

  • idNodeParent   Specifies the ID of the parent node. A value of 0 specifies the root node. A value of -1 specifies the currently open structure node, that is, the enclosing structure node.

  • iSortOrder   Specifies the sort order of the structure node among its sibling nodes. The sort order enables the add-in to order the content correctly in the exported document.

    No two nodes can have the same sort order. However, the set of integers that constitute the sort order do not need to be contiguous.

    A value of -1 indicates that the sibling order is the same order in which the nodes appear in the EMF comments. Note that the order in which the content appears in the EMF is not necessarily the order in which the content is consumed by a user of the document.

  • desn   Specifies a MSODOCEXSTRUCTTYPE structure, which has the following declaration:

    typedef struct _MsoDocexStructNode
    {
        int idNode;
        MSODOCEXSTRUCTTYPE nodetype;
        WCHAR * pwchAltText; 
        union
            {
            int iHeadingLevel; 
            int iPage;     
            WCHAR * pwchActualText;
            long cpLim;        // This element only appears in Office 2016 and later.
            };
     } MSODOCEXSTRUCTNODE;
    

The idNode member specifies the ID of the node. This member may not have a value of 0. A value of -1 indicates that child nodes do not use the idNodeParent member to specify this node as their parent. Instead, this node can be a parent only by enclosing child nodes in the EMF. Multiple nodes can have a ID of -1. If the ID is not -1, the value is unique across the document.

The nodetype specifies the type of structure node. This member is equal to one of the values from the MSODOCEXSTRUCTTYPE enumeration type. The following table lists examples of document structure node types.

Table 4. Document structure node types

Type Value

Description

msodocexStructTypeArticle

A group of nodes forming a single flow of text that should be read or searched as a contiguous block of content. Some documents have a single article and others have multiple articles.

msodocexStructTypePara

A block of text within an article. Its parent node must be an article.

msodocexStructTypePage

A page in the document.

msodocexStructTypeFigure

A graphical element (for example, an image or collection of shapes) that has a textual representation. The textual representation is the alternate text used for reading or searching the document.

msodocexStructTypeHeading

A heading in the text.

msodocexStructTypeTable

A block of text forming a table.

msodocexStructTypeTR

A block of text forming a single row of a table.

msodocexStructTypeTD

A block of text forming a single cell in a table row.

The add-in should ignore the pwchAltText member.

The embedded union at the end of the MSODOCEXSTRUCTNODE is interpreted as a heading level (iHeadingLevel) for heading structure nodes or a page number (iPage) for page structure nodes. For table structure nodes, the union is interpreted as an ordering of the table ends relative to other tables, which can be used to determine nesting order of tables within tables. In the context of the DocExComment_BeginStructNode, the add-in can ignore the pwchActualText member of this union.

**fContentNode   **Specifies whether a DocExComment_EndStructNode structure marks the end of this structure node. If fContentNode is true, a DocExComment_EndStructNode structure closes off the content bounded by the node. If this fContentNode has a false value, then the node does not bound any content.

The fContentNode member affects the interpretation of the parent ID value of subsequent nodes. If fContentNodeis true, nodes that are inserted between this DocExComment_BeginStructNode and a subsequent DocExComment_EndStructNode, and that have a parent ID of -1, are children of this node. However, if fContentNode is true, nodes inserted after this DocExComment_BeginStructNode, and that have a parent ID of -1, are not children of this node. They are children of the next-most-recently specified node that has fContentNode equal to false.

You can nest document structure nodes to arbitrary depth.

**cwchAltText   **Specifies the number of Unicode characters in the block of alternate text that follows the structure. This Unicode string specifies alternate text for the node (for example, alternate text for an image).

DocExComment_EndStructNode

The DocExComment_EndStructNode structure marks the end of the content that is decorated by the information in the DocExComment_BeginStructNode.

struct DocExComment_EndStructNode
{
    DWORD ident;
    DWORD iComment;
};    

The members of the DocExComment_EndStructNode structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentEndStructNode.

DocExComment_BeginTextRun

The DocExComment_BeginTextRun structure identifies the language of a sequence of text in the document and provides the Unicode code points for the text.

Although some text-rendering EMF records use Unicode as the text representation, others use the glyphs that are drawn on the screen, rather than the original source text. A glyph is the index of a given shape in the font, which can be different from font to font.

There can be cases where several Unicode code points are combined into a single glyph or where a single Unicode code point is broken into multiple glyphs. Because the mapping from code points to glyphs is context-dependent, a user cannot text search or copy/paste in a document that contains only glyphs. Therefore, Publisher 2007 sometimes provides the Unicode text as well as the glyphs.

struct DocExComment_BeginTextRun
{
    DWORD ident;
    DWORD iComment;
    DWORD lcid;
    int cGlyphIndex;
    int cwchActualText;
};

The members of the DocExComment_BeginTextRun structure are as follows:

  • Ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentBeginTextRun.

  • lcid   Specifies the LCID for the text sequence.

  • cGlyphIndex   Specifies the size of an array that follows this structure. This array implements a glyph index table that maps Unicode code points in the actual text to the corresponding glyphs in the EMF. Each element of the array corresponds to a code point in the text. The value of that element specifies the first glyph used to render that code point in the EMF. Two or more adjacent code points may have the same value in the array, which means that they both resolve to the same glyph. The value can also be 0, which means that this code point does not map to any glyph.

  • cwchActualText   Specifies the size of the sequence of Unicode code points that follow the glyph index table. This is the text that a consumer of the document can use for searching, copying/pasting, and accessibility. The value of this member can be 0, which means that no Unicode text is provided.

DocExComment_EndTextRun

The DocExComment_EndTextRun structure marks the end of a text sequence, the beginning of which was marked by a DocExComment_BeginTextRun structure.

struct DocExComment_EndTextRun
{
    DWORD ident;
    DWORD iComment;
};    

The members of the DocExComment_EndTextRun structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentEndTextRun.

DocExComment_UnicodeForNextTextOut

The DocExComment_UnicodeForNextTextOut structure functions similarly to the DocExComment_BeginTextRun and DocExComment_EndTextRun structures. However, DocExComment_UnicodeForNextTextOut specifies Unicode code points for only the following EMF TextOut record, rather than for a block of EMF content bounded by begin and end structures.

struct DocExComment_UnicodeForNextTextOut
{
    DWORD ident;
    DWORD iComment;
    int cGlyphIndex;
    int cwchActualText;
};

The members of the DocExComment_UnicodeForNextTextOut structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentUnicodeForNextTextOut.

  • cGlyphIndex   Specifies the size of an array that follows this structure. This array implements a glyph index table that maps Unicode code points in the actual text to the corresponding glyphs in the EMF. Each element of the array corresponds to a code point in the text. The value of that element specifies the first glyph used to render that code point in the EMF. Two or more adjacent code points may have the same value in the array, which means that they both resolve to the same glyph.

  • cwchActualText   Specifies the size of the sequence of Unicode code points that follow the glyph index table. This is the text that a consumer of the document can use for searching, copying/pasting, and accessibility.

DocExComment_EPSColor

The DocExComment_EPSColor structure specifies color information for an encapsulated PostScript (EPS) file embedded in the EMF. For more information about this structure, see the section Extended Color Support.

typedef struct
{
    DWORD ident;
    DWORD iComment;
    BYTE colorInfo[];
} DocExComment_EPSColor;

The members of the DocExComment_EPSColor structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentEPSColor.

  • colorInfo[]   Specifies the color information for the EPS file. The add-in should pass this information to Publisher 2007 using the IMsoDocExporterSite::SetEPSInfo method.

DocExComment_EPSColorCMYKJPEG

The DocExComment_EPSColorCMYKJPEG structure specifies the start, in the EMF, of a binary object that is a CMYKJPEG file stream. For more information about this structure, see the section Extended Color Support.

typedef struct
{
    DWORD ident;
    DWORD iComment;
} DocExComment_EPSColorCMYKJPEG;

The members of the DocExComment_EPSColorCMYKJPEG structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentEPSCMYKJPEG;

DocExComment_EPSColorSpotImage

The DocExComment_EPSColorSpotImage structure provides spot color information for the subsequent RGB image. For more information about this structure, see the section Extended Color Support.

typedef struct
{
    DWORD ident;
    DWORD iComment;
    COLORREF cmykAlt;
    COLORREF rgbAlt;
    float flTintMin;
    float flTintMax;
    char szSpotName[1];
} DocExComment_EPSColorSpotImage;

The members of the DocExComment_EPSColorSpotImage structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentEPSSpotImage.

  • cmykAlt   Specifies a CMYK color ID.

  • rgbAlt   Specifies an RGB color ID.

  • flTintMin   Specifies the minimum tint.

  • flTintMax   Specifies the maximum tint.

  • szSpotName[1]   Specifies a variable length, zero-terminated string that contains the spot name.

Extended Color Support

To support extended color spaces in Publisher 2007, additional EMF semantic records and interfaces are needed because EMF only supports RGB (red-green-black) colors. Extended color spaces include CMYK (cyan-magenta-yellow-black) and spot color space, which are commonly used in commercial printing.

Publisher 2007 uses color mapping to represent extended colors in the document EMF. Publisher 2007 builds a color table for all colors used in the document and replaces actual colors with color IDs in the EMF. The type for the color ID is COLORREF, which is the same type that is used for RGB color. For information about the COLORREF structure, see COLORREF.

To resolve color IDs in the EMF back to the extend color space, the add-in calls back to Publisher 2007 through the HrResolveColor method of the IMsoDocExporterSite interface. The add-in passes Publisher 2007 an interface pointer to an IDOCEXCOLOR interface as one of the parameters to HrResolveColor. Publisher 2007 takes the color IDs, also specified in the call to HrResolveColor, converts them to extended color (RGB, CMYK, or spot color), and passes them back to the add-in through the methods in the IDOCEXCOLOR interface.

Vector Color and Recolored Images

Vector colors are any COLORREF values that the add-in receives from Publisher 2007. For example, text color, line stroke color, and color for metafile recolor. When color mapping is enabled, Publisher 2007 uses a color ID for COLORREF rather than a real RGB color value. If Publisher 2007 provides the add-in an IMsoDocExporterSite interface pointer by calling the SetDocExporterSite method of the IMsoDocExporter interface, the add-in should always call the IMsoDocExporterSite::HrResolveColor method to convert the COLORREF to an extended color, which the add-in receives through the methods in the IDOCEXCOLOR interface.

To support vector color mapping, the add-in needs to do the following:

  • Implement class support for an IDOCEXCOLOR interface. The methods in this interface enable Publisher to pass extended color back to the add-in.

  • Cache the following color state values from the semantic records in the EMF.

  • Set foreground color for recoloring. This is set through the DocExComment_ColorInfo structure.

  • Set background color for recoloring. This is set through the DocExComment_ColorInfo structure.

  • Determine when color mapping is enabled. This is set through the DocExComment_ColorEnable structure.

  • For a vector color, create an IDOCEXCOLOR interface with the color ID, so that IDOCEXCOLOR::GetUnresolvedRGB returns the color ID. The add-in should call the IMsoDocExporterSite::HrResolveColor method with the IDOCEXCOLOR interface and cached color states. Publisher 2007 calls the IDOCEXCOLOR interface methods with the final color, which can be RGB, CMYK, spot, or registration tint.

  • When either foreground color or background color for recoloring is specified from an EMF semantic record, the add-in should recolor images in the add-in (for example, metafiles or raster pictures).

Non-Recolored Images

EMF supports CMYK images using GDI+. Therefore, images in the EMF may be either RGB or CMYK. If the image is a CMYK image, the add-in needs to convert the image to the target color space.

Publisher 2007 maintains a target color space for the document. The add-in can use this target color space by calling the IMsoDocExporterSite::HrConvertImageColorSpace method with the image's color space.

Color from EPS Files

Encapsulated Postscript (EPS) is a metafile type that supports extended color spaces. User who embed EPS images in a Publisher 2007 document expect the color information to be used in the fixed-format output. Inside Publisher 2007, the EPS is converted to an EMF with EPS-related semantic records. This EMF is then embedded in the page EMF file that the application passes to the add-in.

To support color in EPS files, the add-in needs to do the following:

  • Call the IMsoDocExporterSite::SetEPSInfo method for DocExComment_EPSColor records encountered in the EMF.

  • Extract the CMYK image from the DocExComment_EPSColorCMYKJPEG record in the EMF. This record contains a binary object that is the actual CMYK JPEG file stream. Use it to replace the RGB image specified in the subsequent call to the StretchDIBits function.

  • The DocExComment_EPSColorSpotImage record provides spot color information for the subsequent RGB image, which is always an index image. The add-in needs to convert the spot image to the target color space.

  • The add-in can optionally call the IMsoDocExporterSite:: HrGetSpotRecolorInfo method to obtain the document's target color from Publisher 2007. Then the add-in can recolor the subsequent RGB image by mapping colors from the palette of the RGB image to flTintMin and flTintMax tints specified in the DoxExComment_EPSColorSpotImage record. The luminosity for each color of the palette is used for the mapping.

Note that the DocExComment_EPSStart record is only informational. The add-in can ignore this record.

SetDocExporterSite

Publisher 2007 calls SetDocExporterSite to provide the add-in with a pointer to an IMsoDocExporterSite interface. The IMsoDocExporterSite interface exposes methods that enable extended color support.

void SetDocExporterSite (
    IMsoDocExporterSite * pDocExporterSite
) ;

The pDocExporterSite parameter specifies the interface pointer to the IMsoDocExporterSite interface.

HrSetPageHeightForPagination

InfoPath 2007 calls the HrSetPageHeightForPagination method to specify the page height in points.

HRESULT HrSetPageHeightForPagination (
    float dytfPageHeight
)

Some Microsoft Office applications, for example, InfoPath 2007, maintain the user's document in an unpaginated format. In these cases, the add-in paginates the document using the page height specified by the application in the call to HrSetPageHeightForPagination. The dytfPageHeight parameter specifies the page height in points.

After specifying the page height information, the application passes the add-in the entire document as a single in-memory EMF file in a call to HrAddPageFromEmf. The add-in then uses the page-height and EMF file to paginate the document.

The add-in returns the pagination information back to the application in subsequent calls to the HrGetPageBreaks method.

HrGetPageBreaks

InfoPath 2007 calls the HrGetPageBreaks method to obtain the number and location of page breaks for documents that are paginated by the add-in.

        
        (
    * rgdytfPageBreaks, 
    * pcchPageBreaks, 
    * pfCanTrustLastBreakIsEndOfDocument
)

After the add-in paginates a document using the page height specified by the HrSetPageHeightForPagination method, it returns the pagination information in subsequent calls that InfoPath 2007 makes to the HrGetPageBreaks method.

The rgdytfPageBreaks parameter is a pointer to an array of float values that specify the locations of the page breaks in points. The first element in the array (index 0) is the location of the first page break, the second element is the location of the second page break, and so on. Therefore, the values of these elements are successively increasing.

The pcchPageBreaks parameter is a pointer to an integer value that specifies the number of page breaks in the document.

The pfCanTrustLastBreakIsEndOfDocument parameter specifies whether the location of the last page break is the end of the document or the beginning of the last page of the document. A true value indicates that the last page break is the end of the document.

InfoaPath 2007 calls HrGetPageBreaks twice to obtain the pagination information. On the first call, the application calls HrGetPageBreaks to obtain the number of page breaks.

HrGetPageBreaks(NULL, &nPageBreaks, NULL)

InfoPath 2007 then calls HrGetPageBreaks a second time to obtain the actual locations. On the second call, the application passes a buffer of sufficient size to hold the array of page-break locations.

HrGetPageBreaks(rgPageBreaks, &nPageBreaks, fCanStopAtLastPageBreak)

After receiving the page break information from the add-in, the application re-initiates the fixed-format export process, beginning with a call to the HrCreateDoc method, followed by a call to HrAddPageFromEmf for each of the pages given by the page-break information.

HrAddOutlineNode

Publisher 2007 calls the HrAddOutlineNode method to pass the add-in a structure that describes a node within a user-navigable outline for the exported document.

HRESULT HrAddOutlineNode(
    int idNodeParent 
    const MSODOCEXOUTLINENODE * pNode
);    

The fixed-format export code can use the information passed by the HrAddOutlineNode method to construct a user-navigable outline of the export document. From the user's perspective, each node in the outline is represented by some title text that maps to a particular location within the document.

Each call to HrAddOutlineNode specifies information for a single node in this outline. Each node is identified by a node ID that is unique within the outline. An ID of 0 is reserved for the root node. The outline is hierarchical, that is, it has a tree structure in which each node has a single parent and zero or more child nodes.

The first parameter to HrAddOutlineNode provides the ID of the node that is the parent of the node being passed in.

   int idNodeParent

Publisher 2007 always calls HrAddOutlineNode for a parent node before calling the method for any of the parent node's children. In other words, the export code is assured of already having the node information for the node identified by the idNodeParent parameter. The only exception is the initial call to HrAddOutlineNode that specifies the root node. For this call, the value of idNodeParent is 0.

Additional information that the export code needs for each node is passed by HrAddOutlineNode in an MSODOCEXOUTLINENODE structure pointed to by the pNode parameter.

typedef struct _MsoDocexOutlineNode
{
    int idNode;                                 
    WCHAR rgwchNodeText[cwchMaxNodeText];       
    int iDestPage;
    float dytfvDestPage;
    float dxtfvDestOffset;
    float dytfvDestOffset;
} MSODOCEXOUTLINENODE;

The members of the MSODOCEXOUTLINENODE are described as follows:

  • idNode   The ID for the node. A value of -1 indicates that this node cannot have child nodes in the outline. Otherwise, this member has a value that is unique across the document.

  • rgwchNodeText   A Unicode string that represents the title text for each node. This text is not required to be unique across the outline.

  • iDestPage   The page number of the page that contains the destination location within the document.

  • dytfvDestPage   The height of the destination page in points. The offset specified by the dytfvDestOffset member is relative to the upper-left corner of the page. However, some fixed-format types use a coordinate system that is relative to the bottom-left corner of the page. For these types of documents, the page height is required to convert the offset.

  • dxtfvDestOffset   The horizontal offset of the destination location on the destination page.

  • dytfvDestOffset   The vertical offset of the destination location on the destination page.

HrAddDocumentMetadataString

Publisher 2007 calls the HrAddDocumentMetadataString method to specify document metadata in the form of a Unicode string.

HRESULT HrAddDocumentMetadataString (
    MSODOCEXMETADATA metadataType, 
    const WCHAR * pwchValue
)

The metadatatype parameter specifies the type of metadata represented by the string. The metadatatype parameter must be one of the following values from the MSODOCEXMETADATA enumeration type.

Table 5. Enumerated values of MSODOCEXMETADATA

Value

Description

msodocexMetadataTitle

The title of the document.

msodocexMetadataAuthor

The author of the document

msodocexMetadataSubject

String that describes the subject matter of the document (for example, business or science).

msodocexMetadataKeywords

Keyword relevant to the document content.

msodocexMetadataCreator

The creator of the document, possibly distinct from the author.

msodocexMetadataProducer

The producer of the document, possibly distinct from the author or creator.

msodocexMetadataCategory

String that describes the type of document (for example, memo, article, or book).

msodocexMetadataStatus

Status of the document. This field can reflect where the document is in the publication process (for example, draft or final).

msodocexMetadataComments

Miscellaneous comments relevant to the document.

For a given document, each metadata type can have only one string associated with it. So, for example, if the document has multiple keywords, they are passed to the add-in as one concatenated string.

The pwchValue parameter specifies a Unicode string that contains the metadata itself.

How the add-in incorporates the text-string metadata into the exported document depends on the implementation details of the export code and the type of fixed-format used in the exported document.

HrAddDocumentMetadataDate

Publisher 2007 calls the HrAddDocumentMetadataDate method to specify document metadata in the form of a FILETIME structure.

HRESULT HrAddDocumentMetadataDate (
    MSODOCEXMETADATA metadataType, 
    const FILETIME * pftLocalTime
);

The metadatatype parameter specifies the type of metadata represented by the FILETIME structure. The metadatatype parameter must be one of the following values from the MSODOCEXMETADATA enumeration type.

Table 6. Enumerated values of MSODOCEXMETADATA

Value

Description

msodocexMetadataCreationDate

The creation date for the document.

msodocexMetadataModDate

The last-modified date for the document.

The pftLocalTime parameter specifies a pointer to a FILETIME structure that contains the date and time information for the metadata. The following code snippet demonstrates how to extract this information from the structure.

SYSTEMTIME st = {0};
WCHAR s[100];
FileTimeToSystemTime(pfiletime, &st);
swprintf(s, 99, L" %04d-%02d-%02dT%02d:%02d:%02dZ", st.wYear % 10000, 
st.wMonth % 100, st.wDay % 100, st.wHour % 100, st.wMinute % 100, 
st.wSecond % 100);

How the add-in incorporates the date and time metadata into the exported document depends on the implementation details of the export code and the type of fixed-format used in the exported document.

HrFinalize

Publisher 2007 calls the HrFinalize method at the end of the document-export process.

HRESULT HrFinalize();

The code that implements fixed-format export should use HrFinalize to perform tasks such as flushing data buffers, writing remaining data to disk, and freeing memory and other resources.

Conclusion

You can extend the fixed-format export feature of the 2007 release of Microsoft Office by implementing the IMsoDocExporter interface. The methods of this interface provide a channel for Microsoft Office applications to communicate to the add-in the visual content and semantic information in the document to export. The visual content of the document is provided to the add-in as one or more in-memory enhanced metafiles. The semantic information is provided as specially formatted comment records within this EMF. Additional methods in the interface enable Office 2007 applications to communicate metadata and structural information about the document.

Additional Resources

For more information, see the following resources: