Ink Analysis with the Tablet PC SDK

 

Jamie Wakeam
Microsoft Corporation

May 2003

Applies to:
    Microsoft® Tablet PC Platform SDK

Summary: Learn the benefits of analyzing ink using the Divider APIs in the Tablet PC Platform SDK version 1.5. This paper also outlines some example scenarios where ink analysis is particularly useful. (7 printed pages)

Contents

Introduction
What is Ink Analysis?
Why is Classification Useful?
How is Divider Different from RecognizerContext?
How Well Does the Classification Work?
Conclusion

Introduction

It is presumed that the reader has a basic familiarity with the Microsoft® Tablet PC Platform SDK APIs. For details about the Tablet PC Platform SDK, see Windows XP Tablet PC Edition.

What is Ink Analysis?

As more applications start to integrate ink as a primary or alternate form of input, there is an increased need for more powerful tools that help manipulate that ink. In common practice, users tend to author their ink notes all over the page. Although convention dictates that we write between the lines, the truth is that unless we are writing a traditional letter most of us don't always stay between the lines. We usually write in various sizes, add content at odd angles, use differing line lengths, sketch in quick doodles, and insert drawings, flow charts, and bulleted and number lists. To the author and most readers this content is perfectly natural and quite legible. We understand that margin comments are distinct lines not to be mixed in with the main body of a note.

To the computer, however, ink is nothing more than a linear collection of strokes on the screen. By using the Divider object and related APIs found in version 1.5 of the Microsoft Tablet PC Platform SDK, application developers can add clear and useful structure to the ink captured in their programs. This structure is available to the developer as two types of classifications: handwriting or drawing.

The handwriting classification is further divided into paragraphs, lines, and segments. In a word-based recognizer, a segment is associated with a word, while in a character-based recognizer, a segment is associated with a character. For example, the Microsoft English (US) Handwriting Recognizer is a word based recognizer, and the Microsoft Chinese (Traditional) Handwriting Recognizer is a character based recognizer.

The Divider object analyzes the ink strokes in a given stroke collection and computes a classification for the strokes. To start, all ink strokes are divided only into the two collections of handwriting and drawing. The drawing collection contains any strokes that the Divider has determined to be drawings. All strokes that are not added to the drawing collection are segmented into collections of paragraphs, lines and segments.

Again, in a word-based recognizer, a segment is associated with a word, while in a character-based recognizer, a segment is associated with a character. Stroke classification is common across a single paragraph, line, and segment collection. That is, once a stroke has been classified as handwriting, it has membership in one paragraph collection, one line collection, and one segment collection.

Figure 1 illustrates a sample hand-written ink note. In this example there are a total of 33 strokes on the page.

Figure 1.

This collection of strokes is broken up into the following classifications.

Collection Strokes
Drawing Collection
9 strokes

Figure 2.

Paragraph Collection
24 strokes

Figure 3.

Line Collection One
12 strokes

Figure 4.

Line Collection Two
12 strokes

Figure 5.

Word Collection One
7 strokes

Figure 6.

Word Collection Two
5 strokes

Figure 7.

Word Collection Three
7 strokes

Figure 8.

Word Collection Four
5 strokes

Figure 9.

The Divider object orders the collections that are returned based primarily on the time of the authored strokes. For example, if in the previous example the user had written "Hello Rover" first and then "Hello World", the divider would return "Hello Rover" as the first line regardless of the fact that "Hello World" is spatially higher than "Hello Rover".

Why is Classification Useful?

There may be many scenarios where ink classification is useful in your application; however, there are two key scenarios that are enabled by classification and are worth pointing out explicitly.

Improved Recognition

One of the first features users try out after they launch an ink-enabled application is to convert their handwriting into text. A recognizer works best when a single, horizontal line of ink is passed in for conversion. Some applications compensate for this by providing a rectangular guide in the input area to indicate where ink should be written. However, free-form ink applications do not have guides of any kind within the inking area. At best, stationary lines may be present, but this contradicts the common intent of allowing users to write freely on the page.

The Divider object aids the free-form scenario by analyzing the ink and classifying the strokes into complete lines. The collections of strokes composing a line can then be separately passed to the recognizer for conversion. There is no need to define horizontal rectangles for input. The application makes line determinations for the user.

The Divider object improves recognition for lines that are written at an angle. Because users are not confined to a particular rectangle for each line, slightly angled, vertical, or even upside down lines can be written and recognized with the same degree of accuracy as horizontal lines. It is common for users to write comments at 45-degree angles in the margins when annotating a document. When converting these comments to text, you can use the Transform property associated with each line or segment. This property allows the developer to rotate the strokes to a 0º angle, which can then be passed to the recognizer for best results.

Furthermore, the application can also hold back the strokes that are classified as drawings, to prevent extra characters from being included in the recognition result. An example of this is when circling the phrase "Hello World" results in the converted text including an unneeded letter "o" at the end. By passing the stroke collection for the line only, the correct converted text is attained, and the extra character is not added.

Finally, although not necessarily a direct recognition scenario, users often mix handwriting and drawing together. Maps are a common example of this mix. If you use the Divider object to classify these strokes, you can store the recognition results with the ink and search the ink in future sessions.

Improved Selection or Hit Testing

In most free-form inking scenarios, immediate conversion of handwriting to text is generally not required. Recognition usually happens during a later stage in the document's life. Various ink editing operations such as selection, deletion, moving, changing attributes, and others add extra value to ink left as ink within the document. Classification of the ink strokes aids end users in selecting groups of strokes to apply such editing operations.

Selection is perhaps the most obvious editing operation where the grouping information can be put to powerful use. Your application can implement a "tap to select" feature that selects all of the strokes in a given word when only one of those strokes is tapped. This is accomplished by searching for the stroke id tag of the tapped stroke in the collection of strokes associated with each word or segment unit. Once the hit stroke is determined to be in one of the word units, all of the strokes in that unit become the collection of strokes in the selection. This feature can be extended to select an entire line or paragraph. In addition, a deletion feature can also be set to delete entire words, instead of the single stroke, by using a similar technique.

How is Divider Different from RecognizerContext?

Users of the Tablet PC Platform SDK version 1.1 may wonder what the differences are between the Divider object and the RecognizerContext object. The Divider object is intended to help normalize the ink as a preprocessing step before either recognizing the text or performing a user editing function that takes advantage of the stroke groupings. If you assign a recognizer context to the Divider object, the Divider object internally calls the recognizer to determine the word or character segments based on the context of the language, but only the top recognition result for each word or character is returned. Applications requiring full recognition results need to pass the lines to the recognizer directly by using the RecognizerContext property. Systems that do not have recognizers installed on them will still be able to utilize the Divider object; however, the segmentation grouping may be less accurate, especially for character-based recognizers. Thus, the Divider object is similar to the RecognizerContext object in that it makes calls to the recognizers, but there are some key distinctions that are worth mentioning. The following table illustrates some of the similarities and explains the distinctions.

Feature RecognizerContext Object InkDivider Object
Single line recognition Returns a full RecognitionResult object containing the full lattice for the line. Developers needing to perform advanced recognition operations such as exposing recognition alternates should make calls directly to the RecognizerContext object. Returns only the top alternate for each segment. These results are also concatenated together and exposed as the top alternate for the line and paragraph. It is exposed through the RecognitionString property on each DivisionUnit object.
Multiple line recognition Multiple line input can be passed into the RecognizerContext object; however it is strongly recommended that only a few lines should be passed in for best results. Multiple line input can be passed in. There is no limit to the number of lines passed in.
Angled line recognition Angled lines are limited to plus or minus 15 degrees. Lines at any angle may be passed in.
Asynchronous recognition Does not natively support recognizing strokes as they are authored. Can continually analyze the ink as it is being written. This potentially provides for speedy results but does vary with the types of content authored.
Recognition alternates Returns the full recognition results object. Only returns the top alternate for each segment. Lines and paragraphs contain a concatenation of the top alternate for each word in their collections.
Recognition of only handwriting Returns recognition results for all ink passed in. Depending on the amount of ink this can become a time-intensive task. Only recognizes the collection of handwriting strokes and not the drawing strokes. Depending on the ink written this may save valuable time by not unnecessarily recognizing drawing strokes.

How Well Does the Classification Work?

In general, ink classification works well; however, there are some situations where the analysis of the ink makes mistakes. General free-form notes where the majority of the ink is written in a normal paragraph style gets classified quite well. More random whiteboard-types of scenarios, where the ink noise increases, do not yield perfect results. You should consider which types of scenarios your applications apply to and adjust the use of the Divider object accordingly.

Conclusion

Perhaps the most apparent benefit of the Divider object is its use as a preprocessing step to normalize ink before that ink is converted to text. Other features not associated with recognition, such as ease of selection and asynchronous recognition, are powerful aids that become less time-consuming with the use of the Divider object. For further details about the Divider object and its uses, see About Ink Analysis with the Divider Object and the Divider Sample in ** the Tablet PC Platform SDK version 1.5.