Paul Taele's Blog on Sketch Recognition

Sunday, November 25, 2007

Paper #24 - Properties of Real-World Digital Logic Diagrams

Paper:
Properties of Real-World Digital Logic Diagrams
(Christine Alvarado and Michael Lazzareschi)

Summary:
The authors of this paper strive to tackle the problem of creating a sketch recognition system that handles the trade-off between ease of recognition and drawing freedom. The focus is on how students naturally draw digital logic diagrams for an educational design of a recognition system. Three aspects are stroke order, stroke timing, and stroke number. The motivation stems from students drawing differently from experienced designers, and also from little work done on the digital logic domain that presents an important challenge for recognizers. The authors state their contributions in discovering that drawing patterns of students violate standard restrictions up to 34%, having analysis that informs free-sketch sketch recognition systems for digital circuit diagrams, and also having analysis that provides a framework to investigate the belief that similar drawing patterns exist in other domains.

In the data collection and analysis portion of the paper, a complete set of sketches and notes were taken from digital design and computer architecture students using tablet computers. Stroke order was done by counting how many symbols students drew and didn’t draw that were temporarily contiguous. Stroke timing was done by comparing the pause time between consecutive strokes in the same symbol. Stroke number was done by computing the number of strokes drawn for each symbol.

Different results were produced from the three aspects above. For stroke order, two different drawing patterns were seen. Students either drew a majority of a symbol with consecutive strokes or came back to that symbol later to add a single stroke due to correction or overtracing, or the student left the symbol obviously unfinished for later completion to invert the output. For stroke time, it turned out that students paused longer when switching symbols. Pauses varied per student, and error bars of pauses showed that it was impossible to find a reliable time threshold to indicate a new symbol being drawn. For stroke number, the number of strokes varied greatly per student. Students varied in consistency of drawing each individual type of symbol and rarely used a single stroke to span multiple symbols. The authors note that such trends may be domain-specific. In conclusion, the authors believe that pause times can aid stroke grouping, user-specific learning may aid recognition, recognition systems need to identify symbols that are not drawn with consecutive strokes, and symbol recognizers must incorporate a wide range of drawing styles.

Discussion:
Out of all the papers we've read thus far in this course, I found it most interesting that the sketch recognition domain in this paper may be the most specific of the bunch. Previous papers we've read have also analyzed sketch recognition using digital logic as a domain (Oltmanns, Sezgin, etc.), but this paper deals more specifically on drawing styles by students. Some people may say that the contributions from this paper would be more limited since the domain is narrower, but I believe that this paper may serve as a useful reference in developing sketch recognizers targeting university students in order to enhance current curriculum methods.

Paper #23 - Sketch Interpretation Using Multiscale Models of Temporal Patterns

Paper:
Sketch Interpretation Using Multiscale Models of Temporal Patterns
(Tevfik M. Sezgin and Randall Davis)

Summary:
The authors of this paper discuss their sketch-recognition framework which uses data to automatically learn object orderings of common user sketchings for sketch recognition. Key features include elarning object-level patterns from data, handling multistroke object and multiobject strokes, and supporting continuous observable features. According to the framework, given a sequence of time-ordered primitives from a sketch, the goal is to find a segmentation and classification of primitives which maximizes the joint likelihoods of stroke- and object-level patterns. The approach assumes that both patterns can be modeled as products of first-order Markov processes, which would efficiently compute maximum likelihoods and enable efficient recognition. This framework uses a temporal patterns model represented as dynamic Bayesian networks (DBNs). The stroke-based portion of it acts like a regular hidden Markov model (HMM), while the combined stroke- and object-based model can be described as a DBN.

The framework does have two implementation issues. The first one is that since the framework model can be dual-represented as a DBN and HMM, it is essential for the model to use or convert to DBN representatives since HMMs are expensive and complicated. The second is that since numerical instabilities occur during training, a specialized algorithm is used to compensate. An experiment was conducted on the framework by seeing if modeling object-level patterns improve over stroke-level patterns. The testing domain was electronic circuit diagrams, since it can be characterized as a group of other domains. It turned out that significant drawing styles occurred by the users, so personalized models were created for each user to accurately measure the level of recognition. Modeling object-level patterns ended up always improving performance from quantitative results, while the model learned and used different kinds of patterns with a mathematically sound framework from qualitative results.

A limitation of solely relying on temporal patterns is that the system would classify strokes as the wrong shape but with the right temporal character at times. Another issue included interspersing, the situation where users would draw other objects before completing the original object, which would could misrecognitions by the system.

Discussion:
An interesting aspect of this paper is the strong use of stroke-ordering information by the user as an approach to handle sketch recognition. Even though this is a common aspect from previous papers, this paper focuses more on improving recognition rates by creating a system personalized for each user. I don't believe this was a capable covered from previous papers, which focused on more general recognizers. As the paper brought up, interspersing was a key issue with this paper's sketch recognition framework. It's quite a difficult problem, and much more so in other domains, so I would like to see how the authors were able to compensate in improvements to their framework.

Paper #22 - Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes

Paper:
Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes
(Jacob O. Wobbrock, Andrew D. Wilson, Yang Li)

Summary:
The authors created an easy, cheap, and highly portable gesture recognizer called the $1 Gesture Recognizer. The algorithm requires one hundred lines of code and handles only basic geometry and trigonometry. The algorithm’s contributions include being easy to implement for novice user interface prototypers, capable as a measuring stick against more advanced algorithms, and used to give insight to gestures that are “best” for people and computer systems. Challenges to gesture recognizers in general include having resilience to sampling variations, supporting optimal and configurable rotation, scale, and position invariance, requiring no advanced math techniques, writing it easily in a few lines of code, being teachable with one example, returning a list of N-best points with sensible scores independent of number of points, and providing competitive recognition rates to more advanced algorithms.

$1 is able to cope with those challenges in its four-step algorithm: 1) re-sample to N points, where 32 <= N <= 256, 2) rotate once based on indicative angle, which is the angle formed between the gesture’s centroid and starting point, 3) scale non-uniformly and translate to the centroid, which is set as the origin, and 4) do recognition by finding the optimal angle for the best score. Analyzing the rotation invariance shows that there’s no guarantee that candidate points and template points of the gesture will optimally align after rotating the indicative angle to 0 degrees, so $1 uses a Golden Section Search (GSS) to find minimum ranges using the Golden Ratio in order to find the optimal angle.

Limitations of $1 include being unable to distinguish gestures whose identities depend on specific orientations, aspect ratios, or locations, abusing horizontal and vertical lines by non-uniform scaling, and being unable to differentiate gestures by speed since it doesn’t use time. In order to handle variations with $1, new templates can be defined to capture variation with a single name. A study was done to compare $1 with a modified Rubine classifier and Dynamic Time Warping (DTW) template matcher. The study showed that $1 and DTW were more accurate than Rubine, and that $1 and Rubine executed faster than DTW.

Discussion:
What I find most interesting about the work in this paper is the creation of a simple algorithm to deal with a difficult problem such as sketch recognition. I agree with the points the authors make in which implementing sketch recognition algorithms at an undergraduate level is a surmountable challenge. The upside of the algorithm's simplicity is also its downside. As the authors have noted, $1 is severely limited by this simplicity, and handling of more complex gestures would require serious additions, such as using features and coding more complex math, up to the point that the algorithm is no longer $1. Despite the shortcomings, I do think that this algorithm would be an excellent "Hello, world!" example to the world of sketch recognition for aspiring sketch recognition students. At least the transition of having students implement $1 to Rubine will give them incentive to then go from Rubine to Sezgin. Hopefully.

Wednesday, November 7, 2007

Paper #21 - SketchREAD: A Multi-Domain Sketch Recognition Engine

Paper:
SketchREAD: A Multi-Domain Sketch Recognition Engine
(Christine Alvarado and Randall Davis)

Summary:
The author proposes the SketchREAD system, a multi-domain sketch recognition engine capable of interpreting free-hand sketches. Four challenges are introduced in sketch understanding: 1) incremental sketch recognition, which is the ability for autonomously seamless computer recognition, 2) handling of messy strokes, 3) segmenting of strokes, and handling of shape variations.

The SketchREAD framework was created to deal with these challenges. In knowledge representation, the domain’s shapes are described in a hierarchical shape description language (probably LADDER), where context is given by higher-level shapes and domain patterns. An overview of the recognition involves parsing the strokes in a visual language. Possible interpretations of the strokes are parsed in a two-stage generate and test recognition process. The algorithm involves a bottom-up approach of generating the most likely interpretation, and a top-down approach of seeking out missing parts in the interpretation. Finding possible interpretations involve evaluating partial hypotheses. A current set of hypotheses is evaluated in a dynamic Bayesian network, which is a framework to reason with uncertainty from multiple sources. Since each interpretation’s probability is affected by stroke data (from children nodes) and context (from parent nodes), the system can cope with noise. Thus, the system can assess the likelihood of incomplete interpretations from prior information, because partial interpretations have probabilities. A major challenge is balancing correctness and a manageable number of interpretations. A hypothesis generation algorithm is used to compensate: the bottom-up step parses strokes into primitive objects as the user sketches, the top-down step tries to find subshapes missing from the partial interpretation, and the pruning step removes unlikely interpretations. In selecting an interpretation, the system uses a greedy algorithm.

The SketchREAD system was tested on sketching two domains: family trees and circuit diagrams. On average, the system correctly identified 77% of the symbols in the family tree examples (54% over baseline interpretation) and 62% of the symbols in the circuit diagram examples (17% reduction over baseline interpretation). According to the author, the system is best utilized in simple domains where the shapes are not too similar and distanced far enough, but suffers in accuracy in most cases especially for complicated sketches and domains.

Discussion:
One of the important steps in the sketch recognition process is the ability to generate possible interpretations of a sketch. I found the approach used in the paper quite interesting by resorting to a modified Bayesian network to evaluate the current possible interpretations. Usually, Bayesian networks aren’t used in a dynamic domain such as sketch recognition, since such a network works on a priori data. Thus, the interesting part in the approach was how the author modified it to take in new stroke input by creating a sort of library to describe the shapes and domain patterns of new input. The faults in the paper remain consistent to the disadvantages introduced in the second Oltmans paper about stroke-based recognition approaches, and those faults were also brought up by the author in the results being low when used in most general cases. Besides the preprocessing step used in the paper to make improvements to the system, I think one improvement would be to employ some more specific context recognition, since it might remedy the weaknesses generally inherent in stroke-based approaches.

One thing I did want to mention was during my first two readings of this paper. In my first reading, the shape descriptions looked a lot like LADDER. On my second reading, I read the citation on the part where the author states using a hierarchical shape description language to describe those shapes. It cites Dr. Hammond’s LADDER paper. I was a bit surprised the paper didn’t explicitly state the language name in the paper.

Paper #20 - Speech and Sketching: An Empirical Study of Multimodal Interaction

Paper:
Speech and Sketching: An Empirical Study of Multimodal Interaction
(Aaron Adler and Randall Davis)

Summary:
The problem in this paper concerns the difficulty of sketching alone to convey the entire meaning of its design in the early stages. The author’s goal is to create a more natural speech and sketch recognition system which is capable of understanding enough of the user’s input. The limitations of such current systems include: unnatural speech consisting of short commands, the system’s inability to respond or add to sketches, a lack of freehand drawing, and the requirement of knowing a fixed symbol library. A study was conducted on 18 students requiring them to sketch: a familiar floor plan, an AC/DC transformer plan, a full adder design, and their digital current design project for their class. The setup consisted of interactions between an experimenter and a participant sitting across from each other using Tablet PCs.

The focus of study analysis was how speech and sketch worked together when people interacted with each other. Observations were fitted into five categories. In the sketching observation, sketching consisted of two types: stroke statistics and user of color. In the former, there were four types of strokes: creation, modification, selection, and writing. The percentage of ink ended up being more reflective of the participants’ drawing than was stroke count, and even though selection stroke was least common, it was still important since it was key to understanding the participant’s input. In the language observation, speech tended to have frequent word and phase repetition. Participants tended to respond to the experimenter’s question by repeating words used in the question. Speech utterances were also related to sketches. In the multimodal observation, there were three variants in the speech and sketch interaction: referencing list of items, referencing written words, and coordination between input modalities. In the question observation, participants would often make revisions or elaborate beyond what the question asked. It caused the participants to make the sketch more accurate, spurred explanations on other parts, and encouraged more detailed explanations of the sketch. In the comments observation, comments by the participants were not directly related to the sketch, yet were still valuable. Their comments aided the system in understanding their actions. The study analysis provided architectural implications for multimodal digital whiteboards. It demonstrated that implementations of them should integrate color knowledge and recognize the importance in switching colors, and should take advantage of the relationship between sketch and speech.

Discussion:
Like Oltmans' first paper (his master's thesis), the author's paper is also a multimodal paper which involves combining speech and sketch. Naturally, a comparison of the two papers would be warranted. What I did find most interesting about this paper is how it actually strives for natural speech as well, unlike the short command speech used in Oltmans' first paper. What I didn't see on my first reading of the paper is a mechanism to handle the special case when a user misspeaks repeatedly and then eventually corrects himself. If a user errs in drawing, the erasing functionality was enabled for correction, but it didn't seem like there was a speech correction mechanism. A possible solution to that problem may be to allow the system to employ some sort of NLP techniques to parse the later correct speech segments from the earlier incorrect speech segments. The solution I propose is very vague in scope though, and such parsing would appear as difficult of a problem as segmenting strokes in a stroke-based sketching approach.

Thursday, November 1, 2007

Paper #19 - Three main concerns in sketch recognition and an approach to addressing them

Paper:
Three main concerns in sketch recognition and an approach to addressing them
(James V. Mahoney and Markus P.J. Fromherz)

Summary:
The problem which the author wishes to tackle is matching models of curvilinear configurations to hand-drawn sketches. Three sketch recognition concerns are introduced: 1) coping with variability and ambiguity of sketches, 2) providing interactive performance or better, and 3) offering easy or automatic extensibility to new shapes and configurations. Each of these concerns is addressed in a framework.

For the first concern, variability and ambiguity need to be addressed. Since variability also corrupts one-to-one mapping between a model of a figure and its instance, a lack of one-to-one mapping between segmentation primitives and model elements causes segmentation ambiguity. The authors handle this ambiguity by using dedicated pre-processing to address the effects of drawing variability. This is done by first creating subgraphs in the data graph of descriptions, applying alternative descriptions using a general perceptual organization. Then the descriptions are rectified by finding the alternative descriptions which are preferred towards the rest: proximity linking, link closure, virtual junction splitting, spurious segment jumping, and continuity tracing.

For the second concern, a solution is sought out for coping with the difficulty in efficient structural matching. The first step is accomplished by exploiting internal constraints to the sketch problem. The authors develop a constrained optimization approach to subgraph matching, using a data graph from standard image processing operations, for recognition in this case. The second step uses any available external constraints focused on as added criteria to limit the search in the prior step in order to improve processing time.

For the last concern, a technique was proposed to automatically learn models from user specified examples. The proposed automated model would acquire new examples using a highly effective learning scheme. This would be done by having the scheme use as few training examples as possible, preferably positive examples, in an incremental fashion.

Even though this matching approach deals with mostly curvilinear configurations, the authors claim it can extend naturally to other abstract relations among strokes.

Discussion:
I can see why Dr. Hammond had this paper originally assigned as reading prior to the Sharon paper, since the Sharon paper shares the same idea of labeling parts for more complex shapes. Naturally, the Sharon paper cites this paper, but it was cited by stating that matching parts could be done as an isomorphic graphing problem, something which that paper did not do at all. I completely agree with the authors stating that the three main concerns introduced in the paper are important in sketch recognition system, but their approach to cope with those concerns suffer the same faults that I saw in the Sharon of doing part matching of symbols. There are some important classes of symbols that can't be split into parts intuitively, thereby limiting the usefulness of the approach in this paper similarly.

Wednesday, October 31, 2007

Paper #18 - Constellation Models for Sketch Recognition

Paper:
Constellation Models for Sketch Recognition
(D. Sharon and M. van de Panne)

Summary:

| View | Upload your own

The problem that the author wishes to tackle involves having a system figure out what the user drew and what each stroke corresponded to using techniques that have strong parallelism to single-image interpretations in computer vision. The general approach involves modeling the sketches’ strokes as a constellation or ‘pictorial structure’ to be recognized, and this model is based on both local and pairwise features. Contributions by this approach include being able to deal with wide variations of sketch objects and allowing the parts in sketch objects to vary in number. The assumptions do involve similar parts drawn with similar strokes, and also mandatory object parts having only one instance. But the problems addressed make it possible for systems to understand sketches with known a stroke structure and unknown stroke ordering.

A constellation (or pictorial structure) model, the one used in this paper, is generally composed of local parts and a geometry model which defines the relative locations and distances of those parts. The way it’s used in this paper involve the sketch objects being modeled as one, where its features are composed of individual parts and pairs of parts. Each part has a label which is classified as being either mandatory or optional, and features are computed for strokes which have mandatory labels in order to improve computation time as sketches are scaled up. The sketch recognition process involves searching the space to assign mandatory labels and to find optional labels for the remaining unlabeled stroke. Labeling of a stroke to a part is done using a cost function to score the quality of the match between a label and stroke, and the probability of a stroke being a label is the product of two different projects: the product for all likelihoods of individual strokes, and one similarly for stroke pairs. A maximum likelihood (ML) search is done to find the best likelihood of all strokes, and this search uses a branch-and-bound tree. During search, the tree branches on the best node and bounds on all the others. To make the search faster, a multi-pass threshold and hard constraints are used.

The recognition algorithm in the paper was tested on 5 different object classes. Sketch recognition was based on the accuracy of labeling the individual parts of known objects in the sketch. The algorithm with multi-pass thresholding completed faster than without, in general. Even though hard constraints could also lead to reduced recognition times, failed recognition was attributed from using hard constraints due to lack of training examples.

Discussion:
I think that the approach used by the author really is a creative solution to recognizing more complex shapes that can be drawn in such a wide variety. I would imagine that applying vanilla stroke-based SR techniques on this set of sketches, even given the strong assumptions by the author, would fail entirely.

Outside of those cases, I'm not sure that it's really that useful compared to the typical stroke-based SR techniques, such as the circuit diagram symbols processed in the Oltmans paper. But then, I would see why this approach would not be very ideal with the shapes evaluated on in the other paper, since I can't imagine what would be an intuitive way to divide the circuit diagram symbols into a sufficient set of mandatory and optional parts in order to generate reasonable recognition from this paper's approach. Nonetheless, this approach is useful for simple objects which can be split into parts easily. For simple objects that can't, I would use traditional SR techniques. For more complex objects that can, might as well resort to computer vision or image processing techniques.