Friday, August 31, 2007

Paper #3 - Specifying Gestures By Example

Paper:
Specifying Gestures By Example
(Dean Rubine)

Summary:
A gesture-based drawing program (GDP) is an example gesture-based application in which the user inputs a gesture for the program to classify it accordingly. Input begins with the user positioning the cursor on the screen with a mouse press and drawing the gesture, and ends with the user either releasing the mouse button or ceasing movement of the mouse while the button is still pressed. Designing gestures is done by sending those gestures from a click-and-drag interface to an application, both which are built by the GRANDMA object-oriented toolkit. A GDP’s output mechanism is done via its View class hierarchy, which is further divided into two classes. Instances of the GdpTopView class refer to the window in which the GDP runs, and instances of the GraphicObjectView class are a set of either lines, rectangles, ellipses, text object, or a combination of them. Editing gesture attributes consist of entering semantic expressions to its three semantic components: 1) recog (i.e., evaluated during gesture recognition), 2) manip (i.e., evaluated during subsequent mouse points), and 3) done (i.e., evaluated during mouse button release).

The gesture recognition problem first involves having each gesture class specified by an example gesture, where there are C’ gesture classes labeled from 0 to C’ – 1. An input gesture is thus classified to a gesture class it resembles closest to. Statistical gesture recognition consists of computing a vector of features from an input gesture, then classifying it to one of the C’ possible gestures. For this particular paper, thirteen geometrical and algebraic features are used. Ambiguous gestures, or input gestures classified to more than one gesture class, are generally rejected.

Evaluating the performance of the GDP, a 98% rate of classification was achieved for classifiers with 15 or more examples per gesture class from a group of 15 or less classes. Less examples dropped recognition to 96%. Some GDP versions employed a couple of recognition extensions. One is the eager recognition, a kind of recognition in which a GDP recognizes a gesture during mid-creation. The other is multi-finger recognition, in which gestures drawn with multiple fingers are treated as multi-path data. The single-stroke algorithm applies each of those paths individually and combines the results to classify the multi-path gesture.

Discussion:
Trying to map an input which a user has drawn to a member of a set of examples is no easy task, yet the GDP in Rubine’s paper takes a novel approach by requiring that the input be drawn in one stroke. Problems are compounded when attempting to classify a gesture with multiple strokes, so having the user draw gestures with a single stroke does away with those complexities. Yet the significance of a GDP in simplifying those complexities to one stroke is the fact that it does so while maintaining a high rate of accuracy.

One fault that I see in the approach demonstrated in the Rubine paper is the idea of forcing the user to conform to a certain style instead of having the program do all the conforming. This introduces an unnatural feel to the user in writing naturally multiple-stroke gestures into one stroke. Limiting gestures to only one stroke can also introduce clashing with input gestures, resulting in an input gesture not in the example set being classified incorrectly to one that does. For example, when the mathematical equal sign is drawn like the letter Z to make the symbol into a one-stroke gesture, this introduces problems when the user wishes to have the Z character classified as the letter Z. This would be analogous to taking the absolute value of integers or converting a real number to an integer, where precision would be lost during the mapping.

A natural evolution to the approach in the Rubine paper would be to incorporate multi-stroke gestures in the classification process. But it seemed like the Rubine method resorted to classifying single-stroke gestures in order to, as stated earlier, avoid the inherent problems found in multi-stroke gestures. Expanding on the Rubine method to incorporate classification of such gestures seem to be no better than not having the Rubine method at all. Perhaps the ideas of the multi-finger recognition extension covered near the end of Rubine’s paper could be exploited in an approach to classifying the more complex multi-stroke gestures.

1 comment:

Grandmaster Mash said...

One issue with multi-stroke gestures is how do you know that the gesture is finished? You could draw a vertical straight line that could be a '1', or it could be the first part of a '4', '7', '6' or '9'.

It's a significant user interface issue to decide when the user is done drawing. Gesture systems are supposed to be simple and intuitive and allow a user to not press a button to accomplish an action. But if we need to be explicitly told when the gesture ends, should we force the user to press a button? Do we pause?

Or do we force the user to draw the vertical line in a '4' going from bottom-up and a '1' top-down? And how is that any better than forcing the user to draw a '4' in one stroke?

Just some things to think about.