Wednesday, November 7, 2007

Paper #21 - SketchREAD: A Multi-Domain Sketch Recognition Engine

Paper:
SketchREAD: A Multi-Domain Sketch Recognition Engine
(Christine Alvarado and Randall Davis)

Summary:
The author proposes the SketchREAD system, a multi-domain sketch recognition engine capable of interpreting free-hand sketches. Four challenges are introduced in sketch understanding: 1) incremental sketch recognition, which is the ability for autonomously seamless computer recognition, 2) handling of messy strokes, 3) segmenting of strokes, and handling of shape variations.

The SketchREAD framework was created to deal with these challenges. In knowledge representation, the domain’s shapes are described in a hierarchical shape description language (probably LADDER), where context is given by higher-level shapes and domain patterns. An overview of the recognition involves parsing the strokes in a visual language. Possible interpretations of the strokes are parsed in a two-stage generate and test recognition process. The algorithm involves a bottom-up approach of generating the most likely interpretation, and a top-down approach of seeking out missing parts in the interpretation. Finding possible interpretations involve evaluating partial hypotheses. A current set of hypotheses is evaluated in a dynamic Bayesian network, which is a framework to reason with uncertainty from multiple sources. Since each interpretation’s probability is affected by stroke data (from children nodes) and context (from parent nodes), the system can cope with noise. Thus, the system can assess the likelihood of incomplete interpretations from prior information, because partial interpretations have probabilities. A major challenge is balancing correctness and a manageable number of interpretations. A hypothesis generation algorithm is used to compensate: the bottom-up step parses strokes into primitive objects as the user sketches, the top-down step tries to find subshapes missing from the partial interpretation, and the pruning step removes unlikely interpretations. In selecting an interpretation, the system uses a greedy algorithm.

The SketchREAD system was tested on sketching two domains: family trees and circuit diagrams. On average, the system correctly identified 77% of the symbols in the family tree examples (54% over baseline interpretation) and 62% of the symbols in the circuit diagram examples (17% reduction over baseline interpretation). According to the author, the system is best utilized in simple domains where the shapes are not too similar and distanced far enough, but suffers in accuracy in most cases especially for complicated sketches and domains.

Discussion:
One of the important steps in the sketch recognition process is the ability to generate possible interpretations of a sketch. I found the approach used in the paper quite interesting by resorting to a modified Bayesian network to evaluate the current possible interpretations. Usually, Bayesian networks aren’t used in a dynamic domain such as sketch recognition, since such a network works on a priori data. Thus, the interesting part in the approach was how the author modified it to take in new stroke input by creating a sort of library to describe the shapes and domain patterns of new input. The faults in the paper remain consistent to the disadvantages introduced in the second Oltmans paper about stroke-based recognition approaches, and those faults were also brought up by the author in the results being low when used in most general cases. Besides the preprocessing step used in the paper to make improvements to the system, I think one improvement would be to employ some more specific context recognition, since it might remedy the weaknesses generally inherent in stroke-based approaches.

One thing I did want to mention was during my first two readings of this paper. In my first reading, the shape descriptions looked a lot like LADDER. On my second reading, I read the citation on the part where the author states using a hierarchical shape description language to describe those shapes. It cites Dr. Hammond’s LADDER paper. I was a bit surprised the paper didn’t explicitly state the language name in the paper.

1 comment:

- D said...

Again, I think this is a case of one specific approach just not being able to stand up on its own. I think DBNs are infantile right now. Really it seems Sezgin and Alvarado are among the first. They will improve as they mature. But using strictly temporal information will not be a magic bullet. Diversify.