Sunday, September 9, 2007

Paper #6 - Sketch Based Interfaces: Early Processing for Sketch Understanding

Paper:
Sketch Based Interfaces: Early Processing for Sketch Understanding
(Tevfik Metin Sezgin, et al)

Summary:
The system described in the Sezgin paper, called the Sezgin system from now on, is one which focuses on the task of computers understanding sketches. Unlike image analysis and graphics recognition, sketch understanding poses the problem of recognizing noisy, incomplete sketches in real-time. The approach in the Sezgin system uses single stroke gestures as input, and then does early sketch processing in three phases: approximation (fitting primitive geometries of lines and curves to a set of pixels), beautification (making the approximation more appealing without changing meaning), and basic recognition.

In the first step (stroke approximation), the Sezgin system detects vertices at endpoints of the linear segments, and then detects and characterizes the curved segments. The approach detects endpoints from extrema in the curvature (maxima of change in direction with respect to arc length) and speed (minima of pen’s speed while turning corners) of the stroke to locate these vertices. Due to noise in the stroke giving false positives of the extrema, average-based filtering is used by filtering out extrema below a certain threshold (mean of the curvature data, 90% mean of the speed data). In other words, extrema where curvature’s already high and stroke speed is already low is selected. Hybrid fits, which are sets of candidate vertices, are generated from both the curvature and speed data, and the best hybrid fit is used in determining those vertices. In the second step (beautification), minor adjustments are made to the previous approximation to give the intended look. This is done by adjusting the line segments’ slopes so that lines with the same slope end up being parallel. To do so, the system first looks for a cluster of slopes from the final hybrid fit. The weighted average of those slopes in the cluster is then used to rotate the lines at their midpoints. Finally, the intersections of those lines become the new endpoints, which may result in vertices moving. In the last step (basic object recognition), the most basic geometric objects are built from the simple properties examined by the beautified sketch.

A user study was conducted to measure the system’s ease of use and accuracy of shape recognition. With the users familiarizing themselves with the system and then drawing ten shapes into it, the system had a 96% recognition accuracy and only one user did not prefer the system. The authors believe that a sketching system should have: the ability to draw arbitrary single-stroke objects, automatic feature point detection, one sketching mode to draw objects, and natural user feel, properties they believe lacking in other current freehand sketching systems.

Discussion:
The algorithm used in the Rubine paper takes into account speed data, which was abandoned in the subsequent Long paper and was also not used at all in the Paulson paper. In the Sezgin paper, speed data is embraced once more as a much more significant metric than in the previous three papers. It’s because of this that I found the contrast with the importance of time data quite interesting between the Sezgin paper and the papers by Long and Paulson. After reading the Long and Paulson papers, I wasn’t convinced that time was a completely useless metric, even though it made sense to not incorporate in the techniques used in their papers. Sezgin’s paper (so far) solidified my point that time data does have its uses.

I still have an issue with the authors point of describing a sketch system that should have the property of feeling natural to the user, due to the fact that the Sezgin system takes only single strokes as input. Obviously, it’s not possible to sketch numerous objects naturally with just single strokes. I’m guessing though that those properties were qualified in the minds of those authors, since the domain in their system consisted of just simple geometric objects.

4 comments:

Brian David Eoff said...

I had this belief that time data could be used for something, so I bucked when Long threw it out. It made sense there though, as Brandon said, people sketch differently so averaging those speeds across training data would have been pointless. Sezgin's idea though makes sense because it is looking for a change of speed, and isn't making comparisons to others.

On the multiple strokes, if they can recognize a single stroke, and have a definition for complex objects (square - four lines, similar length) aren't they moving towards multiple strokes? If they can recognize low level strokes, and combine them intelligently due to proximity....how much farther do they need to go to be a multi-stroke recognizer?

rg said...

I think there are some domains that lend themselves to single strokes, but generally agree that it is less common. I think in that case, you have to appreciate where they're coming from. At least they aren't forcing you to draw it in a certain way. Besides, at the basic recognition level they do see multiple stroke things as one object like the gears. Who really wants to draw multiple stroke circles and squares?

Test said...

You all have really good points. By finding corners, you can find higher level objects formed from multiple strokes. However, one question that is still being looked at is - how do you combine strokes to form a single low level object. This are some algorithms for this, which we will talk about later.

Grandmaster Mash said...

I typically draw large circles with many, shorter dashes. The dashes themselves might not have much curvature, which could definitely hurt any algorithms that are trying to form my group of strokes into a circle.

And I agree with you, Paul. Sketching with a single stroke isn't very natural. But when corner finding is combined with geometric recognizers, it is natural since a square can now be composed of 1, 2, 3 or 4 strokes (assuming we're not looking into stroke merging). What the user draws is broken up into four separate lines, which the geometric recognizer can use to determine that the user drew a square.