Understanding Word's XML Markup [Word 2003 XML Reference] --  Microsoft Office Word 2003 XML Software Development Kit

Understanding Word's XML Markup [Word 2003 XML Reference]

Important  The information set out in this topic is presented exclusively for the benefit and use of individuals and organizations outside the United States and its territories or whose products were distributed by Microsoft before January 2010, when Microsoft removed an implementation of particular functionality related to custom XML from Word. This information may not be read or used by individuals or organizations in the United States or its territories whose products were licensed by Microsoft after January 10, 2010; those products will not behave the same as products licensed before that date or licenses for use outside the United States.

XML markup

When you apply custom XML elements to a Microsoft© Office Word 2003 document, it has XML tags inserted within its content. The tags describe the portion of the document to which they are applied. For example, in the following image the content of a docuemnt is marked up with custom XML telling Office Word 2003 that the first paragraph is a <logon/>, the second paragraph is a <shutdown/>, etc.

To apply markup select the appropriate portion of the document and then click on the desired element in the XML Structure task pane.

Before Office Word 2003 applies markup to a document it makes sure the result is well-formed XML. It performs two tasks when XML markup is applied to selected content:

  • Trim the selection to ensure the resulting XML is placed in a location that results in well-formed XML. This is called snapping.
  • Classify the resulting XML element as either block-level or inline. This ensures that future edits to the marked-up content do not invalidate the XML.

Snapping

When content is selected in Office Word 2003, the selection can include parts of one or more paragraphs, tables, fields, etc. For example, the following image demonstrates how you might select the end of one paragraph and the start of the next paragraph.

If Office Word 2003 puts XML markup around this selection then invalid XML is created. The following XML illustrates what the invalid XML might look like.

<w:p> <w:r> <w:t> The quick brown fox jumps over the lazy
            dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over
            the lazy dog. The quick brown fox jumps over the lazy dog. </w:t> </w:r>
            <!-- ******************************************* <CUSTOM_XML/> represents
            the custom XML that Word inserts when the user marks selected text with an XML tag.
            ******************************************* --> <CUSTOM_XML> <w:r>
            <w:t> The quick brown fox jumps over the lazy dog. The quick brown fox jumps
            over the lazy dog. </w:t> </w:r> <!-- Notice that the <CUSTOM_XML/>
            tag is not closed properly before the first paragraph (e.g. <w:p/>) is closed
            --> </w:p> <w:p> <w:r> <w:t>The quick brown fox jumps
            over the lazy dog.</w:t> </w:r> </CUSTOM_XML> <!-- Notice that
            the <CUSTOM_XML/> tag is improperly closed here within the next <w:p/>
            tag (e.g. within the second paragraph). This is invalid XML. --> <w:r>
            <w:t> The quick brown fox jumps over the lazy dog. The quick brown fox jumps
            over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown
            fox jumps over the lazy dog. </w:t> </w:r> </w:p> 

In order to prevent the invalid XML in the preceding example, Office Word 2003 changes the user's selection so that XML tags are always placed in a valid location. Think of Word "snapping" the selection by trimming it until the custom XML is contained within a single paragraph (<w:p/>) element. The following image illustrates how a selection with an XML tag applied looks after Word has snapped it.

Snapping ensures that the XML in the document is well-formed after custom XML is inserted. Other examples when Office Word 2003 snaps the selection include:

  • Selecting all of one paragraph and part of another paragraph. The selection snaps to contain the first paragraph only.
  • Selecting part of one paragraph and all of another paragraph. The selection snaps to contain part of the first paragraph only.
  • Selecting text and part of a table. The selection snaps to contain the largest valid block of text only.
  • Selecting a region that overlaps with the start or end tag of an existing XML element. The selection snaps to ensure the two tags do not overlap.

Block-level vs. inline elements

After Office Word 2003 ensures that content selected for an XML tag is valid it determines whether the applied tag will be block-level or inline.

A block-level tag is an XML tag that spans one or more complete paragraphs, table cells, or table rows. For example, if the contents of two complete paragraphs is selected and an XML tag is applied to them, then the element is block-level. The following illustration shows what a block-level tag looks like within a document.

If you place the cursor immediately before a block-level start tag or immediately after a block-level end tag and begin typing, the entered text is contained in its own paragraph. This happens because Word is enforcing the definition of a block-level tag by verifying that it contains only whole paragraphs. Note that if Show/Hide ¶ is selected then the character immediately after a block-level end tag is ¶ and the character immediately before a block-level start tag is ¶ (or no character if the paragraph is the first in the document).

An inline XML tag is contained completely within the contents of a single paragraph or table cell. For example, if only a portion of a paragraph is selected and you assign an XML tag to it, then the tag is an inline tag. Note that when Show/Hide ¶ is selected a ¶ marker never appears in an inline tag because inline tags cannot span the begin or end of a paragraph.

Inline tags appear differently than block-level tags so you can tell them apart in a document. A block-level tag contains its name in both its start and end tags as shown in the preceding image. An inline tag has its name in the start tag, but not the end tag, as illustrated in the following image.

Since inline tags by definition cannot span multiple paragraphs, paragraphs (and tables) can not be inserted inside of them. If you insert the cursor insed an inline tag and press Enter, then Office Word 2003 moves all text to the right of the cursor that is inside the inline tag outside of the tag. This is how Office Word 2003 maintains well-formed XML in its documents. The following image illustrates how it looks after a user inserts the cursor after the word "fox" in the preceding illustration and then hits Enter. Notice that a new paragraph is created immediately after the word "fox" and the words "jumps over the laxy dog" move into this new paragraph.

For more information, see Inserting XML Markup.

©2004 Microsoft Corporation. All rights reserved. Permission to copy, display and distribute this document is available at: http://msdn.microsoft.com/library/en-us/odcXMLRef/html/odcXMLRefLegalNotice.asp