Sunday, November 25, 2007

Paper #22 - Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes

Paper:
Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes
(Jacob O. Wobbrock, Andrew D. Wilson, Yang Li)

Summary:
The authors created an easy, cheap, and highly portable gesture recognizer called the $1 Gesture Recognizer. The algorithm requires one hundred lines of code and handles only basic geometry and trigonometry. The algorithm’s contributions include being easy to implement for novice user interface prototypers, capable as a measuring stick against more advanced algorithms, and used to give insight to gestures that are “best” for people and computer systems. Challenges to gesture recognizers in general include having resilience to sampling variations, supporting optimal and configurable rotation, scale, and position invariance, requiring no advanced math techniques, writing it easily in a few lines of code, being teachable with one example, returning a list of N-best points with sensible scores independent of number of points, and providing competitive recognition rates to more advanced algorithms.

$1 is able to cope with those challenges in its four-step algorithm: 1) re-sample to N points, where 32 <= N <= 256, 2) rotate once based on indicative angle, which is the angle formed between the gesture’s centroid and starting point, 3) scale non-uniformly and translate to the centroid, which is set as the origin, and 4) do recognition by finding the optimal angle for the best score. Analyzing the rotation invariance shows that there’s no guarantee that candidate points and template points of the gesture will optimally align after rotating the indicative angle to 0 degrees, so $1 uses a Golden Section Search (GSS) to find minimum ranges using the Golden Ratio in order to find the optimal angle.

Limitations of $1 include being unable to distinguish gestures whose identities depend on specific orientations, aspect ratios, or locations, abusing horizontal and vertical lines by non-uniform scaling, and being unable to differentiate gestures by speed since it doesn’t use time. In order to handle variations with $1, new templates can be defined to capture variation with a single name. A study was done to compare $1 with a modified Rubine classifier and Dynamic Time Warping (DTW) template matcher. The study showed that $1 and DTW were more accurate than Rubine, and that $1 and Rubine executed faster than DTW.

Discussion:
What I find most interesting about the work in this paper is the creation of a simple algorithm to deal with a difficult problem such as sketch recognition. I agree with the points the authors make in which implementing sketch recognition algorithms at an undergraduate level is a surmountable challenge. The upside of the algorithm's simplicity is also its downside. As the authors have noted, $1 is severely limited by this simplicity, and handling of more complex gestures would require serious additions, such as using features and coding more complex math, up to the point that the algorithm is no longer $1. Despite the shortcomings, I do think that this algorithm would be an excellent "Hello, world!" example to the world of sketch recognition for aspiring sketch recognition students. At least the transition of having students implement $1 to Rubine will give them incentive to then go from Rubine to Sezgin. Hopefully.

2 comments:

Grandmaster Mash said...

I like its simplicity, but it also is misleading by introducing students to sketch recognition and saying "Hey, this is easy!".

The good thing about Rubine is that it is a very simple introduction to a wide world of machine learning techniques with classification, while the concepts in Rubine's can be seen in more complicated techniques such as SVMs.

The concept behind $1 has little growth. There are more complex bitmap recognizers, and there are more ways to manipulate the stroke to generate new mappings, but the overall technique is not robust enough to be used outside of simple symbols.

- D said...

But at the same time, read the title of the paper again: User interface prototypes. This isn't something you're going to use for 95% accuracy. This is something to whip together to impress the boss/client an hour before demo time, or something to fall back on in your class project when all the available handwriting recognizers fail to make on your machine. booo!