Creating a Seed Document [Word 2003 XML Reference] --  Microsoft Office Word 2003 XML Software Development Kit

Creating a Seed Document [Word 2003 XML Reference]

The Transform Inference tool allows you to create XSL Transformations (XSLT) easily that can help you quickly transform similar XML files into documents with complex and rich formatting. The process begins with a raw XML file that is representative of the XML files you wish to transform. You then apply formatting to the XML file in Microsoft© Office Word 2003 such that it becomes a template of how you want to format all other similar XML files. When completed, this document is called the seed document and is used by the Transform Inference tool to create the transform.

  1. Creating the seed document

    To create a seed document, open a raw XML file in Office Word 2003, apply formatting, and use Save As to save it. In the Save As dialog, choose XML Document (*.xml) as the type, and clear the Save data only box or necessary information to create the transform is removed.

    You can create a seed document from a pre-existing XML file, or create the XML file directly in Office Word 2003. The idea is to format each element in the file the way you want that element formatted in other similar files to which you will apply the transform. The files that receive the transform are called input files. Therefore, as a general rule, the seed document should contain at least one of every element that appears in the input files.

    Once you format all the elements and save the seed document, it is used as an argument when running the Transform Inference tool to create a transformation.

    Note  For more information, see Applying an XSLT Transform and XSLT Inference Tool.

  2. Formatting the seed document

    Note  The following simple code example of raw XML file is used in subsequent topics for the purpose of demonstration:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <log xmlns="logs">
        <logon>Logon</logon>
        <shutdown>Shutdown</shutdown>
        <restart>Restart</restart>
        <logoff>Logoff</logoff>
    </log>
    

    Simple formatting

    Most XML schemas contain elements that you can repeat up to a limited or unlimited number of times. For example, an element may denote tasks in a "To do" list or purchases in an invoice that should be formatted the same way every time they appear in a document. When you format repeating elements in the seed document, the WML2XSLT tool recognizes them and includes them in the resulting transformation. In the seed document, the formatting of an element is considered any formatting that is applied both to its contents as well as some surrounding elements. Therefore, an element takes the exact formatting of the text contained within it. Examples of using formatting that surrounds an element, such as a table, is discussed in subsequent topics. For a simple example, consider the following raw XML file, which is representative of generated log XML files:

    View in Office Word 2003:

    Now you can apply distinctly different formatting to each of the child elements of the <log> such as different fonts, colors, and highlighting:

    All formatting applied to the text within an element's begin and end tags in the seed document is applied to the contents of all instances of that same element in raw XML input files when you apply the resulting transform to them. For example, this seed document results in a transform that formats the text of all <logon> elements with the Times New Roman font in red; all text of the <logoff> elements with the Rockwell font that is dark blue, underlined, with yellow highlighting, and so on.

    Formatting repeated elements

    You are not limited to just one type of formatting for each element. If you have three instances of the same element in your seed document, the input document gets the exact same formatting as applied in the seed document, and any further elements get the formatting of the last instance specified. For example:

    The resulting transform applies the Lucida Console font with black text and green highlighting to the first <restart> element it encounters and the Rockwell font with dark blue text with turquoise highlighting to all the remaining <restart> elements. You can uniquely format separate instances of as many of the same element as you like.

    When you use this seed document to create a transform, the resulting file includes the following two lines:

    /ns0:log/ns0:restart[1]

    /ns0:log/ns0:restart[position() >= 2]

    The first template is associated with only the first <restart> element, as indicated by the number one in square brackets. The second template is associated with the second and any other <restart> elements, as signified by the position() >= 2 in its square brackets.

    Formatting based on attributes

    It is not possible to format an element differently with just the seed document based solely on the value of one of its attributes. However, you can do so with the seed document, repeat element formatting, and a slight edit of the .XSL file created by the XSLT Inference tool. For example, a seed document has a<restart> element with a Boolean attribute called automatic and you want to format it one way when it is true and another when it's false. In this case you format the <restart> element twice, once for each possible value of the attribute. One has the formatting you want when it is false, the other for when its true. At this point the seed document looks like it is formatted for repeating elements, just like in the previous topic. The purpose of doing this is to get two versions of the same element in the transform. You next use the XSLT Inference tool to create the transform and then open it in an editor. The tool puts two lines in the XSL file for the <restart> element, which you can modify to differ by attribute rather than position. The following seed document example results in a transform with the following two lines:

    /ns0:log/ns0:restart[1]

    /ns0:log/ns0:restart[position() >= 2]

    You can now remove the position data within the square brackets and replace it with attribute data:

    /ns0:log/ns0:restart[not@automatic]

    /ns0:log/ns0:restart[@automatic]

    Now <restart> elements with automatic set to true is formatted like the first <restart> element in the seed document and those with automatic set to false are formatted like the second <restart> element in the seed document.

    Formatting with mixed content

    Mixed content is considered content outside XML elements. When the WML2XSLT tool encounters mixed content it treats it as a fixed part of formatting and brings it into all the seed documents. This is a useful way to bring in banners, footers, or any other piece of static formatting. A transformation created from the seed document below applies the "My Log File" banner once at the top of the parent <log> element because it's not contained within any of the start and end tags of the leaf elements with the <log>. The transform positions it above the child element (<logon>) because that's where it is positioned in the seed document. The Clip Art inserted at the beginning of the <shutdown> element appears at the beginning of every <shutdown> element since it is a part of contents of the <shutdown> element. Similarly, the divider appears at the end of every <logoff> element.

    Working with extraneous content

    Before using a seed document to create a transform, the tool cleans up various items in the document that do not make sense for formatting XML elements. For example, the tool removes all comments, revisions, smart tags, and spelling and grammar issues. Therefore, do not add these types of items to seed documents. Since annotations, smart tags, and revisions may contain text, and that text cannot be captured by the tool, the tool displays a warning if it encounters any of them.

    Formatting with tables

    You can use tables in seed documents to either group elements together or to wrap single elements by themselves. The difference, which is subtle but very important, is in whether entire elements are contained in a table, or whether only the contents of a single element is contained in a table. Consider the following seed document:

    Here we have three tables. One is a single cell, single row table that outlines all of the child nodes to the <log> element. Notice that it does not include the banner. A second one row, one cell table contains only the contents of the <logon> element - the begin and end tags are outside the table. A third table has two one-cell rows, one with the second <restart> element and one with the <logoff> element. Each has not only their contents (the text), but also the begin and end tags in the table. This indicates to the transform to append the second and any subsequent <restart> elements and all <logoff> elements to the same table each in their own respective row. Note that the first <logoff> element never appears in the table.

    Review the following raw XML file as it appears in Word:

    If you take the transform created with the seed document described with tables and apply it to this raw XML, the formatting looks as follows:

    Notice how the tables follow the same pattern in the seed document. Also notice how the elements that are not encapsulated in a table row, <shutdown> and the first <restart>, appear to stand alone, though they are included with all the other leaf nodes, inside the single-cell table in the <log> element.

    Seeing the end result

    Once you have your seed document, you can create and then apply the transform. At this point, you see the result of the formattingfrom the seed document. For more information, see the XSLT Inference Tool and how to Applying an XSLT Transform.

    Note  To see the result of table formatting, see Formatting with Tables.

    The seed document below contains simple text formatting, formatting of multiple elements (the <restart> element), and mixed content in the form of a banner, Clip Art, and a divider.

    Here is a sample raw XML file to which the transform will be applied. Note that it contains the same elements that are in the seed document. Raw XML files that have the transform applied to them are referred to as input files.

    Using the seed file, the XSLT Inference tool generates a transform (.xsl file) that captures all of its formatting. Here we have the input file shown after the transform created with the sample seed document is applied to it.

    Notice how the text of each element assumes the exact formatting of its respective element in the seed document and how static items such as banners and Clip Art come through. Also note the difference between the first <restart> element and the rest of the <restart> elements and recall how you can format the same element more than once in the seed document

    Data binding

    Data binding is a mode in which the Transform Inference tool is able to smartly take into account elements that contain only child elements. In this case, there is no need to create a template in the transform for elements that do not contain any content to format, so the tool is able to create XPaths to the relevant leaf elements that do contain content when it runs in data binding mode. The data is, in essence, bound within XPaths comprised of some elements that do not require formatting.

    As an example, take this sample raw XML file:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <purchase xmlns="Purchases">
        <customer>
            <name>
                <first></first>
                <last></last>
            </Name>
        </customer>
        <item>
            <total></total>
        </item>
    </purchase>
    

    Suppose you want a table that has one row for each purchase, and each row contains three columns, one for the first name, one for the last name, and one for the total. This is a good case for using data binding because there is no formatting associated with the <customer>, <name>, or <item> elements. Seed documents created for data binding are put together differently than other types of seed documents. Rather than begin with a raw XML file, it is best to load the schema on which your input files are based and then create the document from scratch. For more information on how to load a schema, see Schema Library. In this example, you begin by constructing a table with one row that acts as a model customer row plus any header rows, and then add the necessary elements. When adding the elements, it is important to ensure that each is added within its parent element so that proper XPaths are constructed in the seed document. For example, the <first> element's path must look like <purchase><customer><name><first>...</first></name></customer></purchse>. When in data binding mode, if an element contains nothing but other nodes, the tool interprets it as part of an XPath and therefore no template is created for it in the transform. The table might look something like this:

    Notice how the <first>, <last>, and <total> elements are wrapped in a hierarchy that resolves to a valid XPath up to the parent node of <purchase>.

    To use this seed document in data binding mode, save it as an XML file (make sure to clear the Save data onlybox) and pass it to the tool in the cmd window with the -db option. If the file is titled purchases.xml, you type the following:

    wml2xslt purchases.xml -db

    The resulting transform is called purchases.xsl. When you apply it to an input file that is based on the schema used to create the seed document, a table with the same format as the one in the seed file appear with a row for each customer that contains a column for the first name, the last name, and the total.

©2004 Microsoft Corporation. All rights reserved. Permission to copy, display and distribute this document is available at: http://msdn.microsoft.com/library/en-us/odcXMLRef/html/odcXMLRefLegalNotice.asp