Wednesday, October 31, 2007

Paper #18 - Constellation Models for Sketch Recognition

Paper:
Constellation Models for Sketch Recognition
(D. Sharon and M. van de Panne)

Summary:

The problem that the author wishes to tackle involves having a system figure out what the user drew and what each stroke corresponded to using techniques that have strong parallelism to single-image interpretations in computer vision. The general approach involves modeling the sketches’ strokes as a constellation or ‘pictorial structure’ to be recognized, and this model is based on both local and pairwise features. Contributions by this approach include being able to deal with wide variations of sketch objects and allowing the parts in sketch objects to vary in number. The assumptions do involve similar parts drawn with similar strokes, and also mandatory object parts having only one instance. But the problems addressed make it possible for systems to understand sketches with known a stroke structure and unknown stroke ordering.

A constellation (or pictorial structure) model, the one used in this paper, is generally composed of local parts and a geometry model which defines the relative locations and distances of those parts. The way it’s used in this paper involve the sketch objects being modeled as one, where its features are composed of individual parts and pairs of parts. Each part has a label which is classified as being either mandatory or optional, and features are computed for strokes which have mandatory labels in order to improve computation time as sketches are scaled up. The sketch recognition process involves searching the space to assign mandatory labels and to find optional labels for the remaining unlabeled stroke. Labeling of a stroke to a part is done using a cost function to score the quality of the match between a label and stroke, and the probability of a stroke being a label is the product of two different projects: the product for all likelihoods of individual strokes, and one similarly for stroke pairs. A maximum likelihood (ML) search is done to find the best likelihood of all strokes, and this search uses a branch-and-bound tree. During search, the tree branches on the best node and bounds on all the others. To make the search faster, a multi-pass threshold and hard constraints are used.

The recognition algorithm in the paper was tested on 5 different object classes. Sketch recognition was based on the accuracy of labeling the individual parts of known objects in the sketch. The algorithm with multi-pass thresholding completed faster than without, in general. Even though hard constraints could also lead to reduced recognition times, failed recognition was attributed from using hard constraints due to lack of training examples.

Discussion:
I think that the approach used by the author really is a creative solution to recognizing more complex shapes that can be drawn in such a wide variety. I would imagine that applying vanilla stroke-based SR techniques on this set of sketches, even given the strong assumptions by the author, would fail entirely.

Outside of those cases, I'm not sure that it's really that useful compared to the typical stroke-based SR techniques, such as the circuit diagram symbols processed in the Oltmans paper. But then, I would see why this approach would not be very ideal with the shapes evaluated on in the other paper, since I can't imagine what would be an intuitive way to divide the circuit diagram symbols into a sufficient set of mandatory and optional parts in order to generate reasonable recognition from this paper's approach. Nonetheless, this approach is useful for simple objects which can be split into parts easily. For simple objects that can't, I would use traditional SR techniques. For more complex objects that can, might as well resort to computer vision or image processing techniques.

1 comment:

- D said...

I disagree somewhat. I think this method is useful in that it's one of the few that uses spatial relationships so fully. While I do agree with you that I don't see much of a future in this approach, I think this was a neat paper just because it was so different. And wouldn't this method be considered computer vision?