Sunday, November 25, 2007

Paper #23 - Sketch Interpretation Using Multiscale Models of Temporal Patterns

Paper:
Sketch Interpretation Using Multiscale Models of Temporal Patterns
(Tevfik M. Sezgin and Randall Davis)

Summary:
The authors of this paper discuss their sketch-recognition framework which uses data to automatically learn object orderings of common user sketchings for sketch recognition. Key features include elarning object-level patterns from data, handling multistroke object and multiobject strokes, and supporting continuous observable features. According to the framework, given a sequence of time-ordered primitives from a sketch, the goal is to find a segmentation and classification of primitives which maximizes the joint likelihoods of stroke- and object-level patterns. The approach assumes that both patterns can be modeled as products of first-order Markov processes, which would efficiently compute maximum likelihoods and enable efficient recognition. This framework uses a temporal patterns model represented as dynamic Bayesian networks (DBNs). The stroke-based portion of it acts like a regular hidden Markov model (HMM), while the combined stroke- and object-based model can be described as a DBN.

The framework does have two implementation issues. The first one is that since the framework model can be dual-represented as a DBN and HMM, it is essential for the model to use or convert to DBN representatives since HMMs are expensive and complicated. The second is that since numerical instabilities occur during training, a specialized algorithm is used to compensate. An experiment was conducted on the framework by seeing if modeling object-level patterns improve over stroke-level patterns. The testing domain was electronic circuit diagrams, since it can be characterized as a group of other domains. It turned out that significant drawing styles occurred by the users, so personalized models were created for each user to accurately measure the level of recognition. Modeling object-level patterns ended up always improving performance from quantitative results, while the model learned and used different kinds of patterns with a mathematically sound framework from qualitative results.

A limitation of solely relying on temporal patterns is that the system would classify strokes as the wrong shape but with the right temporal character at times. Another issue included interspersing, the situation where users would draw other objects before completing the original object, which would could misrecognitions by the system.

Discussion:
An interesting aspect of this paper is the strong use of stroke-ordering information by the user as an approach to handle sketch recognition. Even though this is a common aspect from previous papers, this paper focuses more on improving recognition rates by creating a system personalized for each user. I don't believe this was a capable covered from previous papers, which focused on more general recognizers. As the paper brought up, interspersing was a key issue with this paper's sketch recognition framework. It's quite a difficult problem, and much more so in other domains, so I would like to see how the authors were able to compensate in improvements to their framework.

2 comments:

Grandmaster Mash said...

Many authors looking at free-form domains don't use time information, or at least not to the degree of an HMM. Interspersing becomes much less of a problem when using computer vision approaches (Oltmans, Constellation Models).

- D said...

But computer vision methods lack the information that is present in online stroke processing, namely that there is a lot of temporal context available. A method that uses both is the key.