Sunday, September 16, 2007

Paper #7 - A Domain-Independent System for Sketch Recognition

Paper:
A Domain-Independent System for Sketch Recognition
(Bo Yu and Shijie Cai)

Summary:
According to the paper, unnaturally strict restrictions on drawing manner and lack of ability to analyze curves into primitive shapes are two main deficiencies in existing methods of sketch recognition systems. Their solution is a domain-independent system to recognize freehand sketches as primitive shapes and objects. The system’s process is divided into two stages: imprecise stroke approximation (given a stroke, approximate it to one of several predefined shapes) and post-process (analyze finished drawing, retrieve shape relations, and represent user’s intent).

In the first stage, values from direction, curvature, and feature area (a stroke’s constituent points in relation to a line, point, and arc) are extracted from a stroke. Vertex approximation takes these extracted visual features on the given stroke to check for possible shape approximation. If it can’t, the stroke is split at the highest curvature peak and approximated again. Line segment approximation is done by trying to fit a horizontal line to both the direction graph and the actual coordinates with a least-squares best fit line. In curve approximation, the system attempts to approximate a line to the direction values in order to approximate. If it can and the lengths add up to 2*pi, a circle is approximated with its center chosen as the middle of the candidate’s bounding box. If the lengths are less than 2*pi, then an algorithm for arc recognition is used instead. The special case involves strokes with self-intersections, which will fail to be recognized by the system using the approximation methods above. Instead, stroke approximation is modified by making two copies of that case, applying different strategies on those copies, and choosing the approximation with the better result.

The second stage takes the primitive shape from the first stage as input into three steps. In simple relation retrieval, the system tries to form various relations based on the shape. In cleanup, three rules are applied on the shape to remove meaningless line segments and arcs that cause abnormal curvature changes. In basic object recognition, the system processes the shape from the cleanup stage into basic geometric objects without relying on domain-specific knowledge.

In addition to the recognition system, the writers employed a powerful user interface to facilitate user input through the use of various command symbols that uses sketch recognition as well for creating, modifying, and deleting primitive shapes. When a user study of ten students was conducted on the system, primitive shape and polylines yielded 98% accuracy, arcs yielded 94% accuracy, hybrid shapes which had lines and arcs smoothly connected yielded 70% accuracy, and other types of hybrid shapes yielded 90%.

Discussion:
Based on the tone of the Yu paper, it seemed like it basically stated why the Sezgin paper is flawed and how their system fixed those flaws. From the integration of vertex detection, the handling of curve detection, and the omission of speed data, the Yu paper pretty much took the concepts that had potential in the Sezgin paper and then tried to solve the same problem by starting all over again. I might be oversimplifying it a bit, but the Yu paper basically said that it can do vertex detection just as well without the use of speed data. That's pretty bold, considering that speed data was the foundation of vertex finding in the Sezgin paper. In the end though, the problems that were discussed in the Sezgin paper did seem to have been resolved (to an extent) in the Yu paper. Bravo.

One of two interesting aspects I saw in this paper is its domain-independent approach of approximating curves. Approximating circles were an especially huge problem in the Sezgin paper, and with the approach Yu used by using the direction graph to approximate a circle is quite elegant. Using the direction graph to try to fit a line of length 2*pi in order to classify the shape as a circle and having the center as the center of the bounding box makes it seem like circle approximation is a trivial problem.

The other aspect I found interesting was how the system’s handling of overtracing was better explained in this paper compared to the Sezgin one. The issue I had with the Sezgin paper was how it would handle the difference between a circle that was overtraced and a spiral. Once again, the Yu paper had a simple yet effective solution: classify a stroke of circles with similar size as a circle of mean size, and classify a stroke with circles of ascending or descending radii as a spiral. Simple solutions to difficult problems never stop being amazing.

3 comments:

rg said...

I think the most difficult problems are yet to be solved. 70% on smooth transitioning hybrid shapes is not nearly high enough. While the paper is honest and presents a very strong set of heuristics for dealing with different shapes, there is a long road ahead for the serious practical use they describe.

Grandmaster Mash said...

I'd think spirals would still be awkward. As mentioned in class, the circles would be oddly shaped. And in beautifying the spiral with semicircles and parts of arcs might make the spiral look really choppy.

- D said...

Sure, circle/arc approximation may be relatively simple in terms of algorithmic complexity, but what if the arc is skewed and not circular?

Oops.

Simple solutions to difficult problems are more than likely approximations. The more simple, the more approximate. (of course this statement is insanely oversimplified and generalized)