Converting Journal Notes to XML, SVG, and OneNote

Article
03/02/2007

casey chesnut
brains-N-brawn.com LLC

March 2005

Applies to:
   Microsoft Tablet PC Platform SDK
   Microsoft Windows Journal
   Microsoft Windows Journal Reader Supplemental Component
   Microsoft Office 2003: XML Reference Schemas
   Microsoft Office OneNote 2003 SP1

Summary: Shows how to use the Journal Reader Supplemental Component to convert Journal notes to XML and then convert them to Scalable Vector Graphics (SVG) for viewing on the Web or a Pocket PC. Also provides the code to import a Journal Note into OneNote. (13 printed pages)

Click here to download the code sample for this article.

Introduction
Using the Journal Reader Supplemental Component
Working with Journal XML
Using Journal Types in your Application
Introduction to SVG
Converting Journal XML to SVG
Converting Ink to SVG
Importing Journal Notes to OneNote
Conclusion
Biography

Introduction

One of the first applications to support ink was Windows Journal. For a while, it was the shining example of how a Tablet PC running an ink-enabled application could provide for a great user experience. Personally, it entirely replaced my usage of pen and paper for taking notes. I used it extensively in meetings, at presentations, and for brainstorming. In turn, some businesses have been using Journal as a way to replace paper forms, creating numerous Journal files. Journal usage is widespread enough that Microsoft released Windows Journal Viewer for Windows 2000 and Windows XP.

The problem is that the Journal file format is a proprietary binary format. It is not possible to open and view Journal notes in your own application or to write an ink file format that Journal can read. For example, OneNote cannot import and display Journal notes, and neither can you view a Journal note on a Pocket PC.

However, Microsoft recently released the Journal Reader Supplemental Component, which remedies this dilemma. Using this component, you can develop an application that converts Journal notes to an XML format. From the Journal XML format you can either open and view Journal notes in your own application or convert them to a new format.

This article shows how to use the Journal Reader Supplemental Component, how to parse the Journal XML and use it in your own application, and how to convert the Journal XML to Scalable Vector Graphics (SVG) so that you can view your notes on a Pocket PC or the Web. Finally, it provides the code to import a Journal note into OneNote.

Using the Journal Reader Supplemental Component

First, you must install the Journal Reader Supplemental Component. The installation registers a DLL that can be called from COM or a managed assembly that wraps it. In this article, I'll use the managed assembly: Microsoft.Ink.JournalReader.dll. The assembly exposes only one public method, ReadFromStream, on the JournalReader class. Contrary to the first release of the documentation, it is a static method. Its input is a stream of the Journal note and its output is a stream of XML. It does not have the complementary method to convert a Journal note XML file back to the Journal binary format. The following code example shows how to call JournalReader to convert a Journal stream to an XML document.

using Microsoft.Ink;
private XmlDocument ReadJntToXml(Stream jntStream)
{
   Stream xmlStream = JournalReader.ReadFromStream(jntStream);
   XmlDocument xmlDoc = new XmlDocument();
   xmlDoc.Load(xmlStream);
   xmlStream.Close();
   return xmlDoc;
}

Working with Journal XML

Now that I have converted the Journal note to an XML format, we can do something meaningful with it. Fortunately, the Journal Reader Supplemental Component installation provides the XSD schema for the Journal XML format. Instead of listing the entire XSD schema here, I have included the following XML which shows a skeleton of a Journal XML file. For simplicity, I removed the XML attributes and other content to show you only the core XML elements with which we will be working.

<JournalDocument>
  <Stationery>XML</Stationery>
  <JournalPage>
    <TitleInfo>
      <Text>TEXT</Text>
      <Date>TEXT</Date>
    </TitleInfo>
    <Content>
      <Paragraph>
        <Line>
          <InkWord>
            <AlternateList>
              <Alternate>WORD</Alternate>
            </AlternateList>
            <InkObject>BASE64</InkObject>
          </InkWord>
        </Line>
      </Paragraph>
      <Drawing>
        <InkObject>BASE64</InkObject>
      </Drawing>
      <Text>RTF</Text>
      <Flag>BASE64</Flag>
      <Image>BASE64</Image>
      <GroupNode>XML</GroupNode>
    </Content>
  </JournalPage>
</JournalDocument>

The XML skeleton contains the following elements:

The root element, JournalDocument, typically contains a single Stationery element and multiple JournalPage elements.
The Stationery element stores background information such as line layout, title box, and margin rules.
The JournalPage element contains a TitleInfo element and a Content element.
The child elements of the Content element store the actual content for each page. These can be any of the following elements: Paragraph, Drawing, Text, Flag, Image, or GroupNode.
The Paragraph element can be further broken down into Line elements and InkWord elements.
The InkWord element contains the child element, InkObject, which stores the actual ink in Base64 format.
The AlternateList element of an InkWord element stores the top ten recognition results for that ink along with the word. The AlternateList element enables you to search for text in a Journal note.
The Drawing element contains ink in Base64 format in the InkObject element as well. This ink is ink not recognized as text, as well as highlighter ink strokes.
The Text element contains text that was entered into a text box in a Journal note. It is stored in RTF format, retaining the font and color information.
The Flag and Image elements both contain Base64 image data.
The GroupNode element is created when you group content in a Journal note. It can contain any of the same elements that the Content element can contain.

Now that we have some understanding of the Journal XML, we can begin using it. Instead of manually parsing the XML, I chose to deserialize the Journal XML into objects. To do this, I used the XML Schema Definition Tool (Xsd.exe) available with the .NET Framework SDK to generate C# classes from the Journal XSD schema. These generated classes hold the type information that the XmlSerializer class uses to deserialize a Journal XML file.

The first step is to make two modifications to the XSD schema. From experience, I know that Xsd.exe has problems with the <xs:group/> element. The first problem is with the ContentGroup definition. I added the attributes minOccurs="0" and maxOccurs="unbounded" to all of those elements. Otherwise, the generated code would only deserialize one Drawing element or Paragraph element, instead of an array of Content objects.

The second problem is in the GroupNodeType definition. It has an <xs:element/> for ScalarTransform followed by an <xs:group/> element referencing the ContentGroup. In this situation, Xsd.exe does not generate a collection to loop over. To work around this, I copied the ContentGroup definition and renamed it GroupNodeContentGroup. To this group, I added the ScalarTransform element. I then removed the <xs:element/> for ScalarTransform and changed the <xs:group/> to reference the new GroupNodeContentGroup instead of ContentGroup. Then I ran Xsd.exe to generate the classes to be used by XmlSerializer.

xsd.exe JntSchema.xsd /classes /l:CS /n:Microsoft.Ink

This generated the JntSchema.cs class that I added to my Visual Studio .NET project. With these generated classes, I am finally ready to use XmlSerializer to deserialize a Journal XML file into an object graph.

protected JournalDocumentType DeserializeJournalDocument(string fileName)
{
   XmlSerializer serializer = new XmlSerializer(typeof(JournalDocumentType));
   FileStream stream = new FileStream(fileName, FileMode.Open);
   XmlReader reader = new XmlTextReader(stream);
   serializer.UnknownNode += new XmlNodeEventHandler(serializer_UnknownNode);
   JournalDocumentType journalDoc = (JournalDocumentType) serializer.Deserialize(reader);
   reader.Close();
   stream.Close();
   return journalDoc;
}

Using Journal Types in Your Application

With the Journal XML deserialized into a typed object, we can now use that data in our own Tablet PC applications. For this sample application, I created a Windows Form with four ListBox controls containing pages, contents, lines, and words, a PictureBox control to display images, a RichTextBox control to display text, a Panel control to display ink, and a fifth ListBox control to display alternates when recognizing ink. Then I bound the objects of JournalDocumentType to the form.

Note The serialization objects generally end with the word "Type." For example, the Journal XML root element is named JournalDocument, while its serialization object is named JournalDocumentType.

Remember that in the schema, the JournalDocument element contains JournalPage elements. So I bound each of the JournalPageType objects to the first ListBox. When the user selects a JournalPageType object from the ListBox, the application binds all of the ContentType objects for that page to the second ListBox. These ContentType objects could be TextType, DrawingType, ParagraphType, ImageType, FlagType, or GroupNodeType objects. If the user selected one of the ParagraphType objects, then its LineType objects are bound to the third ListBox. Similarly, when the user selects a LineType object, its InkWordType objects are bound to the fourth ListBox.

Figure 1. Journal Note conversion application

The previous step merely set up the data so that we can easily work with it. Next, I extended the selection events of each ListBox to render the Journal object data in the application. I did this from the bottom up starting with the InkWordType object ListBox. When an InkWordType object is selected, its ink data is loaded into an Ink object as a byte [].

private void RenderWord(InkWordType iwt)
{
   byte [] ba = iwt.InkObject;
   Ink ink = new Ink();
   ink.Load(ba);
   RenderInk(ink, ba);
   RenderAlternates(iwt.AlternateList);
}

That Ink object is then loaded into an InkOverlay object bound to the Panel area. After refreshing the Panel, you can see the ink from the Journal file in the application. Notice how the ink from the Journal document must be scaled to display at its original size in the Panel. This is due to the ink coordinate system and must be taken into consideration when importing the Journal notes, as well as when exporting to another format.

private void RenderInk(Ink ink, byte [] baInk)
{
   inkOverlay.Enabled = false;
   Rectangle rect = ink.GetBoundingBox();
   double adjust = (53d / 50d ) * 2d; //See MSDN Ink.GetBoundingBox docs.
   Point origin = new Point((int)(rect.X * adjust), (int)(rect.Y * adjust));
   Size size = new Size((int)(rect.Width * adjust), (int)(rect.Height * adjust));
   Rectangle adjRect = new Rectangle(origin, size);
   inkOverlay.Ink.AddStrokesAtRectangle(ink.Strokes, adjRect);
   ink.Dispose();
   inkOverlay.Enabled = true;
   panel1.Refresh();
}

Better yet, you can still perform recognition on that ink.

Next, I had the third ListBox iteratively call to render each InkWordType object for a selected LineType object. The second ListBox is more complicated than the third because of the different types. For the ParagraphType object, the second ListBox just iteratively calls to render each LineType object as ink. The DrawingType object contains an InkObjectType object that can be loaded into an Ink object and rendered the same way as the InkWordType object is handled in the third ListBox. The TextType object is an RTF string (and not ink), so I just display that in the RichTextBox when it is selected.

private void RenderText(TextType tt)
{
   richTextBox1.Rtf = tt.Value;
}

The FlagType and ImageType objects are both images, so you can load their byte [] into a Bitmap and render them however you see fit. This application just displays them in the PictureBox control.

private void RenderImage(ImageType it)
{
   byte [] baImage = it.Value;
   MemoryStream ms = new MemoryStream(baImage);
   Bitmap b = new Bitmap(ms);
   ms.Close();
   pictureBox1.Image = b;
}

The GroupNodeType object is a little special because it can contain all of the above ContentType objects, but you can step through its collection and render its items using the same methods as I just described. Following this approach, we can recover individual data items that were contained in our Journal notes, either as ink, images, or text. The lack of a complementary Journal Writer supplemental component is a not-so-subtle hint to move away from the Journal format, and in the next section, I will show how to do just that.

Introduction to SVG

I need to convert my Journal notes to a more flexible file format. To make the decision about which format would be most useful, I reflected on some of the constraints I have experienced with ink. One thing that bothers me is that there is no Journal Viewer for the Pocket PC. Another issue I have is that when I render ink on the Web as a raster image, I lose the vector graphic capabilities to scale and zoom.

It just so happens that there is a file format that has the potential to solve both of these problems. It is called Scalable Vector Graphics (SVG). SVG is a W3C recommendation as a standard for representing two-dimensional graphics in XML. It solves the first problem, allowing many SVG viewers for different platforms and devices, including the Pocket PC. Secondly, it is a vector format so it retains the scaling and zooming capabilities of ink, and it makes perfect sense as a format for rendering ink on the Web.

Converting Journal XML to SVG

An SVG file is just XML, so at a high level, our conversion program will write out an XML document that contains the information from the JournalDocument object, just in a different format. To make things easier, the program does not convert an entire JournalDocument object to a single SVG file. Instead, in this program, when the user selects a JournalPageType object from a JournalDocumentType object, then that page is converted to an SVG file.

To get started, let's map the Journal content to SVG elements. The table below shows how I chose to represent each content type in SVG.

Journal Type	SVG Element
JournalDocument	Not applicable
JournalPage	svg
Stationery	g
Stationery\Background	rect
Stationery\Title\TitleArea	rect
Stationery\LineLayout\Horizontal	line
Stationery\LineLayout\Vertical	line
Stationery\LineLayout\Margin	line
Content	g
Content\Drawing	path
Content\Paragraph	g
Content\Paragraph\Line	g
Content\Paragraph\Line\Word	path
Content\Text	text
Content\Flag	image
Content\Image	image

This mapping covers the majority of Journal XML documents and almost gives you a one-to-one mapping from Journal content to SVG. A JournalPage element becomes an svg element, which is the root of an SVG document. The g element is for grouping, and mainly makes the SVG easier to read. The elements rect and line are self explanatory. I will explain the other SVG elements used for the conversion a bit later.

Next, I created a class called JntToSvg. It contains the logic to take a JournalDocumentType object and create an SVG XmlDocument object that represents a selected JournalPageType object. Images are copied directly over. So, for example, if the Journal XML looked like the following:

<Image Left="3482" Top="6652" Width="3703" Height="3175">/9j/4A . . .</Image>

Then the SVG representation would be as follows:

<image x="3482" y="6652" width="3703" height="3175" xlink:href="data:;base64,/9j/4A . . .</image>

Journal Text elements are more complicated because the text is stored as RTF.

<Text Left="9682" Top="6099" Width="3589" Height="3167">
{\rtf1\ansi\ansicpg1252\deff0\deflang1033
{\fonttbl{\f0\fnil\fcharset0 Arial;}}
{\colortbl ;\red0\green0\blue0;}
{\*\generator Msftedit 5.41.15.1507;}
\viewkind4\uc1\pard\cf1\fs24 hello world\par}</Text>

I really was not interested in writing the code to parse that, especially after seeing the Rich Text Format Specification. Doing so would definitely be out of scope for our discussion. To work around the issue and expose the text, font, and color information from that string, I loaded the RTF string into a Windows Forms RichTextBox.Rtf property. Now, the RichTextBox.Rtf property returns the plain text. Additionally, the SelectionFont and SelectionColor properties return the font and color information respectively.

This solution does not require too much effort, but it has serious limitations. It only returns the font and color information for the first word. If the font or color changes for subsequent words, then that data is ignored. Also, it does not return hyperlinks, which could be represented in SVG as an a element. The SVG for the previous Journal text in RTF looks like this :

<text x="9682" y="6099" stroke="rgb(0,0,0)" font-family="Arial" font-size="420">hello world</text>

The JntToSvg class handles stationery, images, flags, and text. We still need to handle ink.

Converting Ink to SVG

Because it makes sense to render ink in SVG outside of the context of a Journal page, I broke this out to a separate class called InkToSvg. The JntToSvg class calls InkToSvg when it needs to render a Drawing or InkWord element from a Journal page. It would also make sense to use this class to write an SVG file for rendering ink in Internet Explorer. Until Internet Explorer supports SVG natively, you can use the Adobe SVG Viewer 3.0 to view SVG documents. It's what I used for testing.

Let's start by looking at how Journal XML represents an ink Drawing element:

<Drawing Left="3533" Top="17699" Width="1666" Height="2614">
   <InkObject>ALACAT. . .</InkObject>
</Drawing>

The Drawing element contains position and size information, while the InkObject element contains Base64 ink. To use the InkObject element, first call the Ink.Load() method. Then iterate across each Stroke in the Ink.Strokes collection. The DrawingAttribute property on a Stroke object contains information about color, pen shape and size, and so on. The actual points which make up the x and y coordinate path of the Stroke are in the BezierPoints property. Those x and y coordinates have to be concatenated into a long string to add to the SVG element which can represent ink in this form:

<path stroke-linecap="squared" stroke-linejoin="squared" fill="none" stroke="rgb(0,0,0)" stroke-width="38" d="M 2793 5460 C 2811 5518 2802 5573 . . ." />

The DrawingAttributes property also exposes the RasterOperation property. If it is set to MaskPen, then you know that it represents a transparent highlighter stroke (for example, a yellow highlighter over black text). By adding the XML attribute opacity="0.5", the path element can also represent this feature. Anyway, enough chatter, I’m assuming you want to see some screen shots of the results.

Figure 2 is the original Journal note rendered in the Journal accessory. It demonstrates stationery that looks like ruled paper, handwriting, drawings, grouped ink, highlighted ink words, an embedded image, text, a hyperlink, a flag, and different thicknesses and colors of ink.

Figure 2. The original Journal note

Figure 3 shows the Journal note converted to SVG and rendered in Internet Explorer. You can see that it is almost identical to the original Journal note. Plus, you can now view it from the Web and other platforms. Though the hyperlink text transferred in the conversion, you cannot actually click it and follow it. With a little more work, you could make the link active.

Figure 3. Journal note converted to SVG

Figure 4 is the same SVG file rendered on a Pocket PC using the trial program from PocketSVG.

Figure 4. SVG note on a Pocket PC

Figure 5 shows the power of vector graphics – they allow you to zoom in to read the text on a small device.

Figure 5. SVG note on a Pocket PC with zoom

Figure 6 shows that SVG is also powerful enough to convert Journal notes that have been created with the Journal Note Printer.

Note SVG files converted from Journal notes with Journal Note Printer might not open on a Pocket PC due to the limited resources of the device.

Figure 6. Journal note from Journal Note Printer converted to SVG

Importing Journal Notes to OneNote

SVG took care of the problems I had with rendering ink to raster images to display in Internet Explorer, as well as being able to view my Journal notes on a Pocket PC. But I was also bothered that OneNote does not import Journal notes. Well, it just so happens that OneNote 2003 Service Pack 1 exposes a method for importing pictures, ink, and HTML into OneNote pages from an XML file.

Granted, it's not as simple as it sounds, because the format in which OneNote imports XML is not the same as the Journal XML. But the Import schema for OneNote was made public with the Office 2003: XML Reference Schemas. So, all we have to do is take the Journal XML and transform it into the XML format that OneNote expects.

Note OneNote must be installed for this conversion to work.

Using Xsd.exe once again, I generated classes for the OneNote schema. It took a number of changes to the schema so that Xsd.exe could generate useful classes with it. The modified schema is included with the code that accompanies this article.

Next, I extended the application to traverse the object graph of the JournalDocumentType and populate the appropriate OneNote Import objects. Journal DrawingType and InkWordType objects become the Ink type in OneNote. Journal FlagType and ImageType objects become the Image type in OneNote. And Journal TextType becomes HTML for OneNote. Once the Import objects are populated, we can use XmlSerializer to serialize the Import object to XML. Finally, we can call OneNote to import the data. The code to do this is in the class called JntToOneNote.

The OneNote file that this process creates generates a folder in OneNote called Journal. Each Journal note that you import is added to that folder as a separate file. Unlike SVG, the OneNote file can handle the multiple pages of a single Journal note. To import a Journal note, all you do is click JNT-ONE in the sample application and select a Journal note. If it works, a message box will be displayed when the process is done. OneNote will have the Journal folder with a tab named the same as the imported Journal Note (with multiple pages if appropriate).

Figure 7 shows a Journal note that has been imported into OneNote. If you set OneNote to display rules, then it looks very similar to the original Journal note, although certain behavior does not transfer. Multiple pages are represented as tabs to the side of the document. All of the imported ink is initially treated as a drawing. You must select the ink and explicitly tell OneNote to treat it as text. Also, the flags transfer to OneNote as images. They do not operate in the same way as OneNote's flags. Finally, OneNote does not parse the RTF from text elements to properly display the font style and hyperlinks. OneNote expects HTML, so this would involve converting RTF to HTML.

Figure 7. Journal note imported into OneNote

Figure 8 shows the Journal Note Printer file imported into OneNote. Notice how this OneNote file has multiple page elements.

Figure 8. Journal Note Printer file imported into OneNote

Conclusion

The Journal note file format has proven very useful for Tablet PC users. Now the Journal Reader Supplemental Component provides access to the contents of Journal notes so that we can migrate that data to new formats. This article has shown how to use the Journal Reader Supplemental Component, how to import the data into your own application, how to export the data to SVG for viewing on the Web or a Pocket PC, and finally how to import your Journal notes into OneNote.

Biography

casey chesnut is an independent consultant specializing in Seamless Computing (Mobility, Web Services, Speech, and Location). This includes playing with the Compact Framework, WS-*, Tablet PC, Speech SDK, MapPoint, and Artificial Intelligence. His blog and other articles can be found at www.brains-N-brawn.com.