Paul Taele's Blog on Sketch Recognition: 2007

Sunday, November 25, 2007

Paper #24 - Properties of Real-World Digital Logic Diagrams

Paper:
Properties of Real-World Digital Logic Diagrams
(Christine Alvarado and Michael Lazzareschi)

Summary:
The authors of this paper strive to tackle the problem of creating a sketch recognition system that handles the trade-off between ease of recognition and drawing freedom. The focus is on how students naturally draw digital logic diagrams for an educational design of a recognition system. Three aspects are stroke order, stroke timing, and stroke number. The motivation stems from students drawing differently from experienced designers, and also from little work done on the digital logic domain that presents an important challenge for recognizers. The authors state their contributions in discovering that drawing patterns of students violate standard restrictions up to 34%, having analysis that informs free-sketch sketch recognition systems for digital circuit diagrams, and also having analysis that provides a framework to investigate the belief that similar drawing patterns exist in other domains.

In the data collection and analysis portion of the paper, a complete set of sketches and notes were taken from digital design and computer architecture students using tablet computers. Stroke order was done by counting how many symbols students drew and didn’t draw that were temporarily contiguous. Stroke timing was done by comparing the pause time between consecutive strokes in the same symbol. Stroke number was done by computing the number of strokes drawn for each symbol.

Different results were produced from the three aspects above. For stroke order, two different drawing patterns were seen. Students either drew a majority of a symbol with consecutive strokes or came back to that symbol later to add a single stroke due to correction or overtracing, or the student left the symbol obviously unfinished for later completion to invert the output. For stroke time, it turned out that students paused longer when switching symbols. Pauses varied per student, and error bars of pauses showed that it was impossible to find a reliable time threshold to indicate a new symbol being drawn. For stroke number, the number of strokes varied greatly per student. Students varied in consistency of drawing each individual type of symbol and rarely used a single stroke to span multiple symbols. The authors note that such trends may be domain-specific. In conclusion, the authors believe that pause times can aid stroke grouping, user-specific learning may aid recognition, recognition systems need to identify symbols that are not drawn with consecutive strokes, and symbol recognizers must incorporate a wide range of drawing styles.

Discussion:
Out of all the papers we've read thus far in this course, I found it most interesting that the sketch recognition domain in this paper may be the most specific of the bunch. Previous papers we've read have also analyzed sketch recognition using digital logic as a domain (Oltmanns, Sezgin, etc.), but this paper deals more specifically on drawing styles by students. Some people may say that the contributions from this paper would be more limited since the domain is narrower, but I believe that this paper may serve as a useful reference in developing sketch recognizers targeting university students in order to enhance current curriculum methods.

Paper #23 - Sketch Interpretation Using Multiscale Models of Temporal Patterns

Paper:
Sketch Interpretation Using Multiscale Models of Temporal Patterns
(Tevfik M. Sezgin and Randall Davis)

Summary:
The authors of this paper discuss their sketch-recognition framework which uses data to automatically learn object orderings of common user sketchings for sketch recognition. Key features include elarning object-level patterns from data, handling multistroke object and multiobject strokes, and supporting continuous observable features. According to the framework, given a sequence of time-ordered primitives from a sketch, the goal is to find a segmentation and classification of primitives which maximizes the joint likelihoods of stroke- and object-level patterns. The approach assumes that both patterns can be modeled as products of first-order Markov processes, which would efficiently compute maximum likelihoods and enable efficient recognition. This framework uses a temporal patterns model represented as dynamic Bayesian networks (DBNs). The stroke-based portion of it acts like a regular hidden Markov model (HMM), while the combined stroke- and object-based model can be described as a DBN.

The framework does have two implementation issues. The first one is that since the framework model can be dual-represented as a DBN and HMM, it is essential for the model to use or convert to DBN representatives since HMMs are expensive and complicated. The second is that since numerical instabilities occur during training, a specialized algorithm is used to compensate. An experiment was conducted on the framework by seeing if modeling object-level patterns improve over stroke-level patterns. The testing domain was electronic circuit diagrams, since it can be characterized as a group of other domains. It turned out that significant drawing styles occurred by the users, so personalized models were created for each user to accurately measure the level of recognition. Modeling object-level patterns ended up always improving performance from quantitative results, while the model learned and used different kinds of patterns with a mathematically sound framework from qualitative results.

A limitation of solely relying on temporal patterns is that the system would classify strokes as the wrong shape but with the right temporal character at times. Another issue included interspersing, the situation where users would draw other objects before completing the original object, which would could misrecognitions by the system.

Discussion:
An interesting aspect of this paper is the strong use of stroke-ordering information by the user as an approach to handle sketch recognition. Even though this is a common aspect from previous papers, this paper focuses more on improving recognition rates by creating a system personalized for each user. I don't believe this was a capable covered from previous papers, which focused on more general recognizers. As the paper brought up, interspersing was a key issue with this paper's sketch recognition framework. It's quite a difficult problem, and much more so in other domains, so I would like to see how the authors were able to compensate in improvements to their framework.

Paper #22 - Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes

Paper:
Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes
(Jacob O. Wobbrock, Andrew D. Wilson, Yang Li)

Summary:
The authors created an easy, cheap, and highly portable gesture recognizer called the $1 Gesture Recognizer. The algorithm requires one hundred lines of code and handles only basic geometry and trigonometry. The algorithm’s contributions include being easy to implement for novice user interface prototypers, capable as a measuring stick against more advanced algorithms, and used to give insight to gestures that are “best” for people and computer systems. Challenges to gesture recognizers in general include having resilience to sampling variations, supporting optimal and configurable rotation, scale, and position invariance, requiring no advanced math techniques, writing it easily in a few lines of code, being teachable with one example, returning a list of N-best points with sensible scores independent of number of points, and providing competitive recognition rates to more advanced algorithms.

$1 is able to cope with those challenges in its four-step algorithm: 1) re-sample to N points, where 32 <= N <= 256, 2) rotate once based on indicative angle, which is the angle formed between the gesture’s centroid and starting point, 3) scale non-uniformly and translate to the centroid, which is set as the origin, and 4) do recognition by finding the optimal angle for the best score. Analyzing the rotation invariance shows that there’s no guarantee that candidate points and template points of the gesture will optimally align after rotating the indicative angle to 0 degrees, so $1 uses a Golden Section Search (GSS) to find minimum ranges using the Golden Ratio in order to find the optimal angle.

Limitations of $1 include being unable to distinguish gestures whose identities depend on specific orientations, aspect ratios, or locations, abusing horizontal and vertical lines by non-uniform scaling, and being unable to differentiate gestures by speed since it doesn’t use time. In order to handle variations with $1, new templates can be defined to capture variation with a single name. A study was done to compare $1 with a modified Rubine classifier and Dynamic Time Warping (DTW) template matcher. The study showed that $1 and DTW were more accurate than Rubine, and that $1 and Rubine executed faster than DTW.

Discussion:
What I find most interesting about the work in this paper is the creation of a simple algorithm to deal with a difficult problem such as sketch recognition. I agree with the points the authors make in which implementing sketch recognition algorithms at an undergraduate level is a surmountable challenge. The upside of the algorithm's simplicity is also its downside. As the authors have noted, $1 is severely limited by this simplicity, and handling of more complex gestures would require serious additions, such as using features and coding more complex math, up to the point that the algorithm is no longer $1. Despite the shortcomings, I do think that this algorithm would be an excellent "Hello, world!" example to the world of sketch recognition for aspiring sketch recognition students. At least the transition of having students implement $1 to Rubine will give them incentive to then go from Rubine to Sezgin. Hopefully.

Wednesday, November 7, 2007

Paper #21 - SketchREAD: A Multi-Domain Sketch Recognition Engine

Paper:
SketchREAD: A Multi-Domain Sketch Recognition Engine
(Christine Alvarado and Randall Davis)

Summary:
The author proposes the SketchREAD system, a multi-domain sketch recognition engine capable of interpreting free-hand sketches. Four challenges are introduced in sketch understanding: 1) incremental sketch recognition, which is the ability for autonomously seamless computer recognition, 2) handling of messy strokes, 3) segmenting of strokes, and handling of shape variations.

The SketchREAD framework was created to deal with these challenges. In knowledge representation, the domain’s shapes are described in a hierarchical shape description language (probably LADDER), where context is given by higher-level shapes and domain patterns. An overview of the recognition involves parsing the strokes in a visual language. Possible interpretations of the strokes are parsed in a two-stage generate and test recognition process. The algorithm involves a bottom-up approach of generating the most likely interpretation, and a top-down approach of seeking out missing parts in the interpretation. Finding possible interpretations involve evaluating partial hypotheses. A current set of hypotheses is evaluated in a dynamic Bayesian network, which is a framework to reason with uncertainty from multiple sources. Since each interpretation’s probability is affected by stroke data (from children nodes) and context (from parent nodes), the system can cope with noise. Thus, the system can assess the likelihood of incomplete interpretations from prior information, because partial interpretations have probabilities. A major challenge is balancing correctness and a manageable number of interpretations. A hypothesis generation algorithm is used to compensate: the bottom-up step parses strokes into primitive objects as the user sketches, the top-down step tries to find subshapes missing from the partial interpretation, and the pruning step removes unlikely interpretations. In selecting an interpretation, the system uses a greedy algorithm.

The SketchREAD system was tested on sketching two domains: family trees and circuit diagrams. On average, the system correctly identified 77% of the symbols in the family tree examples (54% over baseline interpretation) and 62% of the symbols in the circuit diagram examples (17% reduction over baseline interpretation). According to the author, the system is best utilized in simple domains where the shapes are not too similar and distanced far enough, but suffers in accuracy in most cases especially for complicated sketches and domains.

Discussion:
One of the important steps in the sketch recognition process is the ability to generate possible interpretations of a sketch. I found the approach used in the paper quite interesting by resorting to a modified Bayesian network to evaluate the current possible interpretations. Usually, Bayesian networks aren’t used in a dynamic domain such as sketch recognition, since such a network works on a priori data. Thus, the interesting part in the approach was how the author modified it to take in new stroke input by creating a sort of library to describe the shapes and domain patterns of new input. The faults in the paper remain consistent to the disadvantages introduced in the second Oltmans paper about stroke-based recognition approaches, and those faults were also brought up by the author in the results being low when used in most general cases. Besides the preprocessing step used in the paper to make improvements to the system, I think one improvement would be to employ some more specific context recognition, since it might remedy the weaknesses generally inherent in stroke-based approaches.

One thing I did want to mention was during my first two readings of this paper. In my first reading, the shape descriptions looked a lot like LADDER. On my second reading, I read the citation on the part where the author states using a hierarchical shape description language to describe those shapes. It cites Dr. Hammond’s LADDER paper. I was a bit surprised the paper didn’t explicitly state the language name in the paper.

Paper #20 - Speech and Sketching: An Empirical Study of Multimodal Interaction

Paper:
Speech and Sketching: An Empirical Study of Multimodal Interaction
(Aaron Adler and Randall Davis)

Summary:
The problem in this paper concerns the difficulty of sketching alone to convey the entire meaning of its design in the early stages. The author’s goal is to create a more natural speech and sketch recognition system which is capable of understanding enough of the user’s input. The limitations of such current systems include: unnatural speech consisting of short commands, the system’s inability to respond or add to sketches, a lack of freehand drawing, and the requirement of knowing a fixed symbol library. A study was conducted on 18 students requiring them to sketch: a familiar floor plan, an AC/DC transformer plan, a full adder design, and their digital current design project for their class. The setup consisted of interactions between an experimenter and a participant sitting across from each other using Tablet PCs.

The focus of study analysis was how speech and sketch worked together when people interacted with each other. Observations were fitted into five categories. In the sketching observation, sketching consisted of two types: stroke statistics and user of color. In the former, there were four types of strokes: creation, modification, selection, and writing. The percentage of ink ended up being more reflective of the participants’ drawing than was stroke count, and even though selection stroke was least common, it was still important since it was key to understanding the participant’s input. In the language observation, speech tended to have frequent word and phase repetition. Participants tended to respond to the experimenter’s question by repeating words used in the question. Speech utterances were also related to sketches. In the multimodal observation, there were three variants in the speech and sketch interaction: referencing list of items, referencing written words, and coordination between input modalities. In the question observation, participants would often make revisions or elaborate beyond what the question asked. It caused the participants to make the sketch more accurate, spurred explanations on other parts, and encouraged more detailed explanations of the sketch. In the comments observation, comments by the participants were not directly related to the sketch, yet were still valuable. Their comments aided the system in understanding their actions. The study analysis provided architectural implications for multimodal digital whiteboards. It demonstrated that implementations of them should integrate color knowledge and recognize the importance in switching colors, and should take advantage of the relationship between sketch and speech.

Discussion:
Like Oltmans' first paper (his master's thesis), the author's paper is also a multimodal paper which involves combining speech and sketch. Naturally, a comparison of the two papers would be warranted. What I did find most interesting about this paper is how it actually strives for natural speech as well, unlike the short command speech used in Oltmans' first paper. What I didn't see on my first reading of the paper is a mechanism to handle the special case when a user misspeaks repeatedly and then eventually corrects himself. If a user errs in drawing, the erasing functionality was enabled for correction, but it didn't seem like there was a speech correction mechanism. A possible solution to that problem may be to allow the system to employ some sort of NLP techniques to parse the later correct speech segments from the earlier incorrect speech segments. The solution I propose is very vague in scope though, and such parsing would appear as difficult of a problem as segmenting strokes in a stroke-based sketching approach.

Thursday, November 1, 2007

Paper #19 - Three main concerns in sketch recognition and an approach to addressing them

Paper:
Three main concerns in sketch recognition and an approach to addressing them
(James V. Mahoney and Markus P.J. Fromherz)

Summary:
The problem which the author wishes to tackle is matching models of curvilinear configurations to hand-drawn sketches. Three sketch recognition concerns are introduced: 1) coping with variability and ambiguity of sketches, 2) providing interactive performance or better, and 3) offering easy or automatic extensibility to new shapes and configurations. Each of these concerns is addressed in a framework.

For the first concern, variability and ambiguity need to be addressed. Since variability also corrupts one-to-one mapping between a model of a figure and its instance, a lack of one-to-one mapping between segmentation primitives and model elements causes segmentation ambiguity. The authors handle this ambiguity by using dedicated pre-processing to address the effects of drawing variability. This is done by first creating subgraphs in the data graph of descriptions, applying alternative descriptions using a general perceptual organization. Then the descriptions are rectified by finding the alternative descriptions which are preferred towards the rest: proximity linking, link closure, virtual junction splitting, spurious segment jumping, and continuity tracing.

For the second concern, a solution is sought out for coping with the difficulty in efficient structural matching. The first step is accomplished by exploiting internal constraints to the sketch problem. The authors develop a constrained optimization approach to subgraph matching, using a data graph from standard image processing operations, for recognition in this case. The second step uses any available external constraints focused on as added criteria to limit the search in the prior step in order to improve processing time.

For the last concern, a technique was proposed to automatically learn models from user specified examples. The proposed automated model would acquire new examples using a highly effective learning scheme. This would be done by having the scheme use as few training examples as possible, preferably positive examples, in an incremental fashion.

Even though this matching approach deals with mostly curvilinear configurations, the authors claim it can extend naturally to other abstract relations among strokes.

Discussion:
I can see why Dr. Hammond had this paper originally assigned as reading prior to the Sharon paper, since the Sharon paper shares the same idea of labeling parts for more complex shapes. Naturally, the Sharon paper cites this paper, but it was cited by stating that matching parts could be done as an isomorphic graphing problem, something which that paper did not do at all. I completely agree with the authors stating that the three main concerns introduced in the paper are important in sketch recognition system, but their approach to cope with those concerns suffer the same faults that I saw in the Sharon of doing part matching of symbols. There are some important classes of symbols that can't be split into parts intuitively, thereby limiting the usefulness of the approach in this paper similarly.

Wednesday, October 31, 2007

Paper #18 - Constellation Models for Sketch Recognition

Paper:
Constellation Models for Sketch Recognition
(D. Sharon and M. van de Panne)

Summary:

| View | Upload your own

The problem that the author wishes to tackle involves having a system figure out what the user drew and what each stroke corresponded to using techniques that have strong parallelism to single-image interpretations in computer vision. The general approach involves modeling the sketches’ strokes as a constellation or ‘pictorial structure’ to be recognized, and this model is based on both local and pairwise features. Contributions by this approach include being able to deal with wide variations of sketch objects and allowing the parts in sketch objects to vary in number. The assumptions do involve similar parts drawn with similar strokes, and also mandatory object parts having only one instance. But the problems addressed make it possible for systems to understand sketches with known a stroke structure and unknown stroke ordering.

A constellation (or pictorial structure) model, the one used in this paper, is generally composed of local parts and a geometry model which defines the relative locations and distances of those parts. The way it’s used in this paper involve the sketch objects being modeled as one, where its features are composed of individual parts and pairs of parts. Each part has a label which is classified as being either mandatory or optional, and features are computed for strokes which have mandatory labels in order to improve computation time as sketches are scaled up. The sketch recognition process involves searching the space to assign mandatory labels and to find optional labels for the remaining unlabeled stroke. Labeling of a stroke to a part is done using a cost function to score the quality of the match between a label and stroke, and the probability of a stroke being a label is the product of two different projects: the product for all likelihoods of individual strokes, and one similarly for stroke pairs. A maximum likelihood (ML) search is done to find the best likelihood of all strokes, and this search uses a branch-and-bound tree. During search, the tree branches on the best node and bounds on all the others. To make the search faster, a multi-pass threshold and hard constraints are used.

The recognition algorithm in the paper was tested on 5 different object classes. Sketch recognition was based on the accuracy of labeling the individual parts of known objects in the sketch. The algorithm with multi-pass thresholding completed faster than without, in general. Even though hard constraints could also lead to reduced recognition times, failed recognition was attributed from using hard constraints due to lack of training examples.

Discussion:
I think that the approach used by the author really is a creative solution to recognizing more complex shapes that can be drawn in such a wide variety. I would imagine that applying vanilla stroke-based SR techniques on this set of sketches, even given the strong assumptions by the author, would fail entirely.

Outside of those cases, I'm not sure that it's really that useful compared to the typical stroke-based SR techniques, such as the circuit diagram symbols processed in the Oltmans paper. But then, I would see why this approach would not be very ideal with the shapes evaluated on in the other paper, since I can't imagine what would be an intuitive way to divide the circuit diagram symbols into a sufficient set of mandatory and optional parts in order to generate reasonable recognition from this paper's approach. Nonetheless, this approach is useful for simple objects which can be split into parts easily. For simple objects that can't, I would use traditional SR techniques. For more complex objects that can, might as well resort to computer vision or image processing techniques.

Saturday, October 27, 2007

Paper #17 - Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches

Paper:
Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches
(Michael Oltmans)

Summary:
The sketch recognition system created by the author strives to handle freehand sketching independent of constant system feedback and rigid drawing constraints. The first challenge in freehand sketch recognition involves shape drawing variations from the signal noise level (i.e., in which users did not intend) and the conceptual level (i.e., which reflects on the diversity of user sketching). The second challenge is unnecessary overtracing by the user, and the third challenge is the task of segmenting groups of strokes to a particular shape. The author relies on vision and machine learning techniques to cope with signal noise and conceptual variations, a visual approach to process overtraced strokes as original strokes, and an appearance-based approach which doesn’t need to handle segmentation of strokes since it deals with a sketch’s visual patterns instead of individual strokes.

Recognition of sketches is handled by the visual parts and shapes. For representing visual parts, a shape context feature called the bullseye. Visual parts are analyzed in a circular region consisting of a center point and concentric rings of wedges, where the wedges serve as a histogram of points within them. Shapes make use of the visual parts from the bullseye feature grouped together based on a codebook, which is a standard collection of parts analogous to a vocabulary. A set of parts is selected from a range of parts in the training set, clustered together into similar parts, and divided into entries of this codebook. Shape identification is then done on the parts by using some distance metric on match vectors, a representation which calculates the shape as a set of bullseye parts in terms of the codebook.

The author’s system handles two tasks: isolated shape classification and shape localization. Using the methods discussed previously, shape classification is done based on visual part representation in order to train a classifier that distinguishes the shapes. In the second task, a three-stage process is used to perform shape localization. Candidate locations are first scanned on the sketch, a classifier is then applied on those locations, and a final set of predictions from the sorted candidates are finally done. The final output of the system is this shape localization. From this entire approach, the author was able to achieve 89.5% shape recognition in isolated tasks, and a 74% shape recognition rate of 24% precision on shapes in complete sketches.

Discussion:
Up to this paper, a majority of the sketch recognition techniques covered this semester concerned stroke-based recognizers. What was very interesting about the approach used by the author of this paper was that it abandoned the primary techniques of stroke-based recognition for a more machine learning / computer vision-based one. The author was able to back up his separate approach with incredible recognition rates on outstandingly sloppy sketches of various symbols shown below.

It’s difficult to find noticeable faults, right off hand, with the author’s separate sketch recognition approach. The bullseye shape feature used as a histogram to calculate points has a functional approach to shape orientation, and it can also scale well by altering the radius size for varying shape size. Furthermore, information that is available in stroke-based recognition is still accessible to the author’s hybrid recognition technique, thus making use the strengths of the traditional stroke-based approaches without much of its weaknesses, if any. The only thing that comes in mind involves the shapes being incorrectly classified by the author’s recognizer. He attributes it to the strong similarity between shapes of other classes causing the mis-recognition.

I can’t blame the recognizer for mis-recognizing those terribly drawn symbols, as those symbols probably would have been difficult for the stroke-based recognizers already introduced in the semester to recognize anyway. One thing that may remedy the already high recognition rates by the author’s recognizer would be to employ context-recognition. It would be highly unrealistic to expect this recognizer to correctly recognize symbols that are difficult for a person to recognize when those symbols are in isolation. The thing is that these symbols are presented in a domain in which symbols can be disambiguated by figuring out that symbol’s context in relation to surrounding symbols that may have already been correctly classified. That’s the only improvement I can see, but that seems to be quite an ambitious addition for an already ambitious approach designed originally to recognize informal sketches. Plus, I think recognition through context is already a difficult problem in itself.

Friday, October 26, 2007

Paper #16 - Naturally Conveyed Explanations of Device Behavior

Paper:
Naturally Conveyed Explanations of Device Behavior
(Michael Oltmans and Randall Davis)

Summary:
The authors of this paper created a multi-modal system which can interpret simple two-dimensional kinematic devices called ASSISTANCE. The program interprets these devices using sketch and verbal input. In order to describe a device to a computer as easily to a person, a description would need to contain that device’s structure and behavior. A tool in ASSISTANCE is capable of enhancing descriptions by understanding a device’s graphical representation and also a verbal description of its behavior. In order to use ASSISTANCE, a designer first sketches the device, then explains the structure to the system, and finally has the system interpret the utterance and gesture. To make the system manageable, the domain is limited to two-dimensional kinematic devices.

The system’s structure is composed of three parts. First, the device’s sketch (i.e., physical body and arrows) and speech (parsed textual phrases) descriptions are handled by the ASSIST sketch and ViaVoice voice recognizer. Second, output in the form of propositional statements is sent to ASSISTANCE for interpretation. Last, event and casual links generated by ASSISTANCE from a truth maintenance and rule system is sent to the last part, which then finds a consistent casual structure that also closely matches the designer’s descriptions of the device.

The method that the authors use to evaluate ASSISTANCE is by comparing it to other alternatives and also by analyzing its usability. The system strives to combine the mathematical precision of formal alternatives like CAD tools and the constraint-independence of informal alternatives such as written and verbal explanations. It has also demonstrated the ability to interpret a designer’s explanations of a device in the early design stage.

Discussion:
Previous papers, besides Abam’s, introduced us to novel approaches in solving various sketch recognition problems. What those papers had in common was the fact that they dealt solely on sketch recognition. With this paper, it’s very interesting that we’re introduced to a multi-modal method which employs speech recognition in addition to the already familiar sketch recognition. This is quite an ambitious combination for the author to use in order to interpret two-dimensional kinematic devices in a domain as difficult as physics.

I agree with one fault brought up by Dr. Hammond during class about the use of two different recognition systems, which was how the system copes with speech input which completely contradicts sketch input. This can be remedied in several ways, one being giving priority of one input over another. The other fault I see lies in the level of usefulness this kind of multi-modal system would bring, given the amount of overhead needed to see benefits over alternative approaches. The argument that the author makes on his system is that it gives the flexibility of descriptions made in normal communication and the precision of computing tools. A sketch-voice recognition approach is intriguing, but it’s a bit difficult for me to gauge whether or not the amount of effort it takes to employ such an approach outweighs simply resorting to an all-sketch recognition method instead.

Thursday, October 25, 2007

Paper #15 - Ambiguous Intentions: a Paper-like Interface for Creative Design

Paper:
Ambiguous Intentions: a Paper-like Interface for Creative Design
(Mark D. Gross and Ellen Yi-Luen Do)

Summary:
The authors of the paper create a pen-based interface called Electronic Cocktail Napkin, an interface which handles freehand input by: acquiring its ambiguous and precision information, internally representing it, and visually and edit-behaviorally echoing it to users. Created as a design application, its goal is to support informal drawings by aiming to not only support creation, modification, and management mechanisms of diagrams, but to also use freehand drawings as core information retrieval information. The approach includes mechanisms for: recognizing configurations (i.e., group of elements, which may or may not be drawn abstractly by the user), handling ambiguity and context (i.e., maintaining a representation of unknown and ambiguous elements until it is resolved by context), representing imprecision with constraints (i.e., using an internal, constraint representation which provides interactive edit behavior), and refining and formalizing incrementally (i.e., making improvements with the ultimate design aim of having the program making definite decisions).

Three main mechanisms comprise the Cocktail Napkin application: low level recognition of glyphs, high level recognition of configurations, and constraint and spatial relations maintenance of drawing elements. Low level recognition consists of identifying the best matches given an input’s context and certainty value, and training its recognizer can be done by giving examples to it. Configuration recognition consists of finding patterns in all element pairs of the drawing, and constructing its recognizer is done by extracting constraints from the element types and its spatial relations from the trained examples. Context is maintained as a list of “current context” and “current context chain” in Napkin, where a chain specifies a sequence of other contributing context. Recognition of the previous two mechanisms is done through this context, and recognizing context begins with initial context of only shapes, alphanumeric characters, and lines. Context is then adjusted based on new drawings.

The Napkin application’s interface was tested on undergraduate students and design instructors at the University of Colorado’s architecture program. New testers of the application were given half an hour to familiarize with the application. It was discovered that users primarily wanted to draw, therefore ignoring the recognition facilities. This resulted in focusing on low level tuning and tweaking of the program instead.

Discussion:
An interesting aspect that I found in the Napkin interface was the authors’ aim in creating a system that handles like a very flexible CAD system. With the amount of work done to implement the aim of the authors to employ a smart paper-like system, I’m not sure if it’s enough for the target audience to make a complete switch-over to conventional methods of sketching it into traditional drawing program though. Napkin seems a little too generic with its current implementation, and would need some additions or improvements, such as broadening its functionality as a design program and making use of a sketching language, to be taken more seriously by its target audience.

Paper #14 - Graphical Input Through Machine Recognition of Sketches

Paper:
Graphical Input Through Machine Recognition of Sketches
(Christopher F. Herot)

Summary:
Three experiments were conducted in the Herot paper. In the first experiment, the HUNCH system, which is a set of programs designed to process sketches, was employed to determine the existence of a sketch syntax and a domain-independent sketch interpreter. Raw data was taken from users sketching with a special pencil on paper taped over a data tablet. In finding curves, corners, and end lines, it was discovered in the paper that corners could be found from speed minima. Curves were taken to be special cases of corners, while end lines were first handled with a simple latching program. Latching, or the joining of endpoint pairs within a certain radius, produced strange results when scaled. Overtracing was not dealt with in the original HUNCH system, but Herot mentioned the need of the program to distinguish between two carefully drawn parallel lines and a wide, overtraced one. Other inference levels included inferring a third dimension and finding rooms in a floor plan sketch.

Mixed results from the first experiment led to another experiment in finding a system which specified a context at the lowest levels in order to avoid difficulties in the machine recognizing context on the sketches.

The last experiment emphasized user involvement in the input process outside the range of a machine’s capabilities while not being cumbersome to the user during sketching. This system involves a database and a set of programs to make use of it. Three types of programs were used: 1) inference programs, which tests reasonableness on output, 2) display programs, which allows any level of the database to be displayed, and 3) manipulation programs, which is combined with a graphical editor for user modification. A set of functions in this new program were used to determine lines and curves in real-time. The two most useful functions were speed, which measures interval length by number of points, and “bentness,” which measures maximum distance from point to chord. Bentness is used to find actual corners, while speed was used to indicate corner intention.

Discussion:
When viewed in isolation, this paper isn’t really that amazing to the previous papers covered thus far. It wasn’t until after I paid attention to the photo of the hippy guy in the paper did I then realize how old this paper was in comparison to the succeeding paper. The innovative idea of using velocity and curvature to resolve the difficult problem of finding lines in a point? It may be a happy coincidence to them, but that’s quite an accomplished feat when considering that it didn’t come up again for another two decades. It’s a shame that it wasn’t picked up for such a long time, and amusing that this idea was only mentioned in passing in this paper when it was dedicated entirely in the Sezgin paper.

I do see a fault with the latching approach the author uses to resolve unconnected lines, something covered more in detail than the more novel point-detection approach. Latching intuitively makes sense, but the way it’s done in this paper doesn’t seem to perform well when it comes to images scaled smaller. The only fix I would see in the latching approach is either using a variable scheme as opposed to the rote scheme used in the paper, or simply do away with it completely and use something else. I’m guessing the latter was done, as I haven’t heard anything about latching or other similar approaches being done in the more recent papers.

Paper #13 - Perceptually Based Learning of Shape Descriptions for Sketch Recognition

Paper:
Perceptually Based Learning of Shape Descriptions for Sketch Recognition
(Olya Veselova and Randall Davis)

Summary:
The authors are interested in better understanding natural multi-domain sketch interaction. Their goal is to capture a sufficient enough shape description for system recognition. Much of their insight stems from Goldmeier’s work on perceived shape similarity, in which people are either biased or ignorant of certain geometric properties. Goldmeier refers to these properties as singularities, and these singularities and additions by the authors create a qualitative vocabulary for their approach. The authors then use Goldmeier’s second set of experiments which gave those singularities different perceptual importance. From the authors’ own analysis, they’ve created a default relevance ranking of supported constraints in their approach. Using heuristics, they have adjusted those relevance scores based on obstruction (i.e., number of geometric primitives between two shape parts), tension lines (i.e., alignment of corners of a shape), and grouping (i.e., combination of individual elements of a shape as a whole).

A user study was conducted to determine if their system produced the same classification as people. 33 people were shown 20 variations of 9 different symbols, where the system’s performance was measured by the proportion of times the system agreed with the majority answer. When the majority agreed over 80% of the time, the system was in that majority 83%. When it guessed randomly, the system agreed with the majority 50% of the time. Of the system’s errors, most of them were false negatives attributed to it paying more attention to individual detail of symbols than people in general. For false positives, they stemmed from lack of global symmetry detection and “must not” constraints.

Discussion:
Unlike the papers which primarily focusing on attack sketch recognition problems by employing techniques to better recognize shapes in isolation, I found it interesting that this paper instead focuses on understanding shapes at the perceptive level. This is important as well, since systems will attempt to recognize shapes that are not in isolation. The author’s goal of finding constraints that coincide with how people differentiate between shapes seems just as important in making more powerful sketch recognizers. In addition, this paper thus far cites the oldest source (1923!).

In this particular paper, I see faults in shapes being limited to only lines and ellipses. These shapes only cover a fraction of symbols that people realistically draw, so a logical next step would be to expand the study which incorporates more complex shapes.

Paper #12 - Ambiguous Intentions: a Paper-like Interface for Creative Design

Paper:
Interactive Learning of Structural Shape Descriptions from Automatically Generated Near-miss Examples
(Tracy Hammond and Randall Davis)

Summary:
LADDER is a sketching language which lets developers structurally describe a shape in a way that can be recognized, displayed, and edited in their domain of choice. In a multi-domain recognition system, these descriptions can be automatically given as input to recognizers, exhibitors, and editors. When an incorrect number of constraints describe a shape, false positives (from too few constraints) and false negatives (from too many constraints) occur. Shape descriptions automatically generated from one example can be difficult and too vague, while descriptions created by hand can be time-consuming and too specific. The authors of this paper therefore propose a visual debugger which uses active learning to generate suspected near-misses (shapes which differ from positives examples by one stroke) for the developer to classify as either positive or negative examples.

The approach used requires a positive hand-drawn example and its description as an initial condition. Descriptions can either be generated automatically or written with syntax-checking. After conceptual errors are corrected, the system finds a description with a variable assignment that contains the fewest failed constraints. The user is then required to remove enough constraints to achieve shape recognition. Thus, the given example is guaranteed not to be over-constrained on the single example. In the case of general over-constrained testing of general examples, a complete list of true constraints on the initial sketch was generated earlier. Known constraints in a correct description are then taken by creating a list of constraints in the true constraints that are not true in an encountered negative example. For under-constrained testing of general testing, a list of possible missing constraints are created. Removed constraints are composed of related constraints which: already exist, are more general, and follow transitively from the description. Afterwards, each of those constraints are tested if they’re missing by adding the negation to its description.

Shapes are generated based on the new descriptions to the user and tested for being either positive or negative examples. In each instance, the description is modified by adding or removing more constraints, depending on the type of example encountered. After executing this process, a shape description which is neither over- or under-constrained is created.

Discussion:
It’s not easy to have a system correctly recognize a sketch, simply due to the fact that a target symbol is bound to have variety. Being able to encapsulate that variety correctly would ease that aspect for the system. Using a sketching like LADDER helps construct shapes so that designers can focus on constraining a shape just right, but it still seems like perfecting those constraints acts more like an art than a science. The procedure used in this paper takes a nice step in the right direction in trying to eliminate a shape’s constraints from being too over- or under-constrained, and that is what I found to be important in this paper. One fault mentioned by the author is the fact that it can be an expensive process, so the use of heuristics was employed to reduce that cost. From my prior experience with using LADDER, I think improvements could be made by removing constraints that can be compensated by more valuable constraints. An alternative solution…more heuristics?

Monday, October 1, 2007

Paper #11 - LADDER, A Sketching Language for User Interface Developers

Paper:
LADDER, A Sketching Language for User Interface Developers
(Tracy Hammond and Randall Davis)

Summary:
Shapes and groups of shapes, along with their actions of drawing, editing, and displaying, can be described with the sketch description language LADDER. This language allows interface designers a tool to handle these actions as well as make use of additional information such as stroke order or direction. The language incorporates predefined shapes, constraints, editing behaviors, display methods, and specification syntax for defining domains.

Describing shapes are limited to shapes with fixed graphical grammar, composed completely of LADDER primitive constraints, contain curves which are few in number or not highly relevant, and have high regularity. Shape definitions contain components, geometric constraints on those components, aliases for element simplification, editing behaviors, and display methods. Those definitions are simplified through a hierarchical structure. Shapes can be made abstract by requiring certain components to be defined. Shapes can also be made into a collection by having shapes in the same domain put into a group. The first advantage in grouping allowing the system top-down recognition, and the second is allowing shapes of the same group to move together. The language itself stems from various predefined shapes, constraints, editing tools, and display methods. In addition, components of variable numbers can be specified as a vector.

The LADDER recognition system is a multi-domain system which comprises of recognizing primitive shapes, domain shapes, and editing gestures. Low-level recognition is done on primitive shapes, and fact finding on a Jess rule-based system (Prolog-like syntax in Java) is performed on domain shapes. The system does checks on strokes to also determine if it is an editing gesture instead of a drawing gesture. Since shapes can be displayed with its original strokes, best-fit primitives, ideal strokes, or through Java Swing objects, the shapes’ components, properties, and constraints are converted into mathematical equations to find a global optima solution which satisfy all those equations. With new positional information obtained for description, the system creates and outputs a beautified shape.

Discussion:
Long discussed in one of his papers the lack of user interfaces employing sketch recognition due the difficulties interface designers face in employing them. Having a sketching language to alleviate these difficulties would facilitate the creation of more sketch-based interfaces, and the significance of LADDER lies in it being such a sketching language for interface designers. What makes LADDER even more significant is the fact that it is the very first sketch language. Shortcomings do exist, four of which are listed as limitations to shape design (those limitations are repeated in my summary above).

One aspect of LADDER that caught my attention was a particular facet of its multi-domain recognition system. Primitive shape recognition handles like it should, but the use of the Jess rule-based system in domain shape recognition was very intriguing. I heard claims that Jess was an expressive language, but I’ve only seen its applications in multi-agent systems. Recognition based on a rule-based system is a novel idea that I have not seen used in other system. I wonder if this has something to do with LADDER being a geometric recognizer as opposed to previous papers which focused on gesture recognition.

Wednesday, September 26, 2007

Paper #10 - Recognizing and Beautifying Low-level Sketch Shapes with Two New Features and Ranking Algorithm

Paper:
Recognizing and Beautifying Low-level Sketch Shapes with Two New Features and Ranking Algorithm
(Brandon Paulson and Tracy Hammond)

Summary:
Problems with current recognition systems include having users learn the system instead of vice versa. Paulson’s recognition system does not pose such constraints on the user and only require users to draw single-stroke primitive shapes. Additional features include returning multiple features, higher recognition rates compared to popular low-level recognizers, and high level of recognition rates. Currently recognized shapes are: lines, polylines, circles, ellipses, arcs, curves, spirals, and helixes.

The recognizer’s overall system architecture first consists of performing pre-recognition calculations on single-stroke inputs. In this stage, various graphs and values are computed, and corners are calculated using a multi-pass corner finding algorithm. Two additional features are added in the recognition process: normalized distance between direction extremes (NDDE) and direction change ratio (DCR). NDDE is calculated by taking the computing the stroke length between the point with the highest and lowest directional values. DCR is calculated as the maximum change in direction divided by the average change in direction. In general, polylines have lower NDDE values and higher DCR values than curved strokes. Furthermore, two tests are applied before recognition: existence of overtracing and existence of a closed figure.

For each of the recognized shapes in the system, shape tests are conducted on them. One special note about the shape tests is the complex test, which is set as the default interpretation for shapes which do not pass the other shape tests. After the shape tests, a hierarchy is created to sort the interpretations. Using inherent properties of the multi-pass corner finding, a ranking algorithm was used to determine the best fit of a stroke as a complex, curve, or polyline fit. The study on the recognizer involved collecting two sets of data of 900 shapes from 10 users. The first set was used for training, and the second set was used for testing. Correct shape interpretation returned 99.89%, with 98.56% of those being the top interpretation. Complex interpretations made up half of the error.

Discussion:
I wish I had read this paper earlier. The low-level techniques used here would have sure helped me on the earlier homework assignments. Anyway, I really like the architecture used in this paper, especially the pre-recognition step. Those two new features seem like really novel ways for arc-polyline differentiation. In addition, the shape tests seem quite developed. Unfortunately, I wouldn't be able to comment much more without actually seeing it in action. For now, I shall use faith.

My main observation concern the shapes that were misrecognized in the study. The recognizer does a fine job on normal strokes, but the strokes that the recognizer didn’t identify correctly were quite sloppy. Figure 7 was especially the case. Just by looking at those example strokes, I wouldn’t be able to figure out if they were truly just polylines myself (seriously). If this is difficult for people to differentiate, I believe that it would be unrealistic to expect the recognizer to accurately make such a distinction on such sloppy input data. If I found the user who made those stroke inputs, I’d vote him or her off the island in the next tribal council vote.

Paper #9 - Streaming Algorithms for Line Simplification

Paper:
Streaming Algorithms for Line Simplification
(Mohammad Ali Abam et al)

Summary:
Line simplification is a technique used to approximate (i.e., simplify) a line of possibly infinite streaming points in order to accommodate limited storage. While its use in the paper primarily focuses on Geographical Information Systems (GIS), it also has practical applications in sketch recognition. Even though line simplification is used to overcome limited storage, current computation is quite costly for known algorithms. The goal of the paper is to present an algorithm which can reduce these costs into something more manageable.

Two distance metrics and a ratio are used to aid in computation and to determine efficiency, respectively. The first distance metric is Hausdorff distance, which both measure the distance of two sets of points. Hausdorff distance takes the largest distance from a set of smallest distances of each point in a set to the other, while Frechet distance can be defined as measuring the distance between two paths when both are traversed forward at different time intervals. A competitive ratio, which is a ratio of a simple even-number interval line simplification algorithm to the optimal algorithm in question, determines how efficient the optimal algorithm is.

The Abam linear simplification method claims to have a good competitive ratio under the two conditions of being monotonic and having an error-predicting oracle. The algorithm itself consists of four steps, given that streamed points are already simplified. The first step is to set the next streamed point as the next line-simplified point. The second step is to compute the error, with the oracle, of the previous point based on the line segment of the current point and a point that’s two points away, and to also add the previous point and its priority (its error if removed from the line simplification) into a priority queue. The third step is to extract the point in the priority queue with the minimum priority. The last step is to update the priority of the previous and succeeding point of the extracted point.

The key component in the algorithm is this use of the error oracle. There are three types of paths which the points stream in that determine how the oracle responds. For convex paths, which are paths that forms a convex polygon when the last point connects to the first point, and xy-monotone paths, which are paths that are intersected by horizontal or vertical lines at most one point, the Hausdorff distance is used. For arbitrary paths, the Frechet distance is used. With the given results, O(1) competitive ratio was achieved for the Hausdorff and Frechet cases with O(k) and O(k*k) additional storage, respectively.

Discussion:
Line simplification on potentially infinite streaming points with traditional techniques is a very costly operation, and the reduction of cost with the Abam method seems impressive. As Dr. Hammond elaborated, the Abam method has potential in use for certain sketches such as those which the user scribbles in a fixed region many more points than can be handled by general sketch recognition methods.

One issue which was brought up by fellow namesake Paul (Corey), which I didn’t realize was a problem at the time until he brought it up during my presentation of the paper, was the removal of a point with lowest priority each time a new point is streamed into the simplified line. I was unable to hear his entire argument about it, but I would intuitively think that the computation used to determine which points to remove would lose relevance if storage savings is no better than an algorithm which simply removes even-numbered points.

Thursday, September 20, 2007

Paper #8 - A Curvature Estimation for Pen Input Segmentation in Sketch-Based Modeling

Paper:
A Curvature Estimation for Pen Input Segmentation in Sketch-Based Modeling
(Dae Hyun Kim and Myoung-Jun Kim)

Summary:
Kim et al’s approach to sketch recognition involves segmenting pen input at high-curvature points in order to enhance shape recognition of pen strokes. Best curvature estimation is done by using only local shape information of the stroke on the fly. Several features are extracted from the stroke to aid in the process.

The first feature is taking the direction of the point by taking the angle from the preceding and succeeding point. Points are already re-sampled in this approach, which means choosing points a given distance from the previous point. The second feature is the use of support points, or a set of k number of preceding and k number of succeeding points (k = 3 usually), from a point aid in that point’s curvature estimation. Due to re-sampling, curvature is set to the point’s direction.

Two additional features are used to aid in locating segmentation points: local convexity and local monotonicity. In local convexity, a point’s curvature is computed as the sum of curvature values of itself and support points which share local convexity (same direction sign). Segmentation confusion can occur with local convexity. In other words, when points in the same neighborhood can end up having the same curvature values in local convexity, only complicated heuristics can remove the unnecessary features. To remedy it, local monotonicity is alternatively used. In local monotonicity, curvature estimation values of a point proceeds as follows. The sum is first taken to be the current points’ curvature value. While looping through the support points, add to the running sum the curvature value of a support point whose absolute curvature value is less than the current points’ absolute curvature value. When that is done, set that support point’s absolute value as the minimum. Repeat the process with the other support points to get the curvature value of the current point.

With the above definitions, the stroke can be scanned just once to compute the curvature at all input points. Segmentation occurs at the local maximum of positive curvature points and local minima of negative curvature points. These points are recognized as segmentation points if the absolute value of its curvature value does not exceed a threshold. Pen speed data is used in case of a low threshold value giving incorrect recognition (due to adding unnecessary segmentation points).

On one of the evaluation tests which the method was tested on (different drawing styles on a standard set of drawings), the method achieved a 95% recognition rate.

Discussion:
Shape recognition by segmentation of re-sampled points at high curvature points is a novel alternative to the Sezgin time-dependent approach. Unlike Yu’s solution to the problems of the Sezgin’s method by employing an approach independent of time, the Kim method does not entirely do away with time. The time component is only used sparingly though, as it is employed as a reference as opposed to a dominant feature. Instead, the focus of the Kim method in segmentation at high curvatures deals with the problems of corner-finding difficulties in Sezgin’s method with what appears to be heavy attention on computing curvature estimations of points in relation to neighboring points. Adding local convexity and local monotonicity as features to aid in shape recognition does seem to hold promise, given the recognition results from the evaluation tests used.

One of the issues that was mentioned in the paper was the inability of their method, along with segmentation algorithms in general, to deal with strokes of smaller drawing areas. With their method employing re-sampling, it is obvious why they restrict strokes to a minimum 30x20 drawing area. Thus, images which are scaled down pose a problem to the Kim method. Judging by the tone from the writers, it seems like the kind of restriction that has to be accepted with their method since its a fact of life. Scaling up the stroke’s size, extracting the line that runs through the middle of the scaled-up stroke, and smoothing that line for shape recognition would be an obvious solution to a naïve person. If only smoothing weren’t such a trivial operation…

Sunday, September 16, 2007

Paper #7 - A Domain-Independent System for Sketch Recognition

Paper:
A Domain-Independent System for Sketch Recognition
(Bo Yu and Shijie Cai)

Summary:
According to the paper, unnaturally strict restrictions on drawing manner and lack of ability to analyze curves into primitive shapes are two main deficiencies in existing methods of sketch recognition systems. Their solution is a domain-independent system to recognize freehand sketches as primitive shapes and objects. The system’s process is divided into two stages: imprecise stroke approximation (given a stroke, approximate it to one of several predefined shapes) and post-process (analyze finished drawing, retrieve shape relations, and represent user’s intent).

In the first stage, values from direction, curvature, and feature area (a stroke’s constituent points in relation to a line, point, and arc) are extracted from a stroke. Vertex approximation takes these extracted visual features on the given stroke to check for possible shape approximation. If it can’t, the stroke is split at the highest curvature peak and approximated again. Line segment approximation is done by trying to fit a horizontal line to both the direction graph and the actual coordinates with a least-squares best fit line. In curve approximation, the system attempts to approximate a line to the direction values in order to approximate. If it can and the lengths add up to 2*pi, a circle is approximated with its center chosen as the middle of the candidate’s bounding box. If the lengths are less than 2*pi, then an algorithm for arc recognition is used instead. The special case involves strokes with self-intersections, which will fail to be recognized by the system using the approximation methods above. Instead, stroke approximation is modified by making two copies of that case, applying different strategies on those copies, and choosing the approximation with the better result.

The second stage takes the primitive shape from the first stage as input into three steps. In simple relation retrieval, the system tries to form various relations based on the shape. In cleanup, three rules are applied on the shape to remove meaningless line segments and arcs that cause abnormal curvature changes. In basic object recognition, the system processes the shape from the cleanup stage into basic geometric objects without relying on domain-specific knowledge.

In addition to the recognition system, the writers employed a powerful user interface to facilitate user input through the use of various command symbols that uses sketch recognition as well for creating, modifying, and deleting primitive shapes. When a user study of ten students was conducted on the system, primitive shape and polylines yielded 98% accuracy, arcs yielded 94% accuracy, hybrid shapes which had lines and arcs smoothly connected yielded 70% accuracy, and other types of hybrid shapes yielded 90%.

Discussion:
Based on the tone of the Yu paper, it seemed like it basically stated why the Sezgin paper is flawed and how their system fixed those flaws. From the integration of vertex detection, the handling of curve detection, and the omission of speed data, the Yu paper pretty much took the concepts that had potential in the Sezgin paper and then tried to solve the same problem by starting all over again. I might be oversimplifying it a bit, but the Yu paper basically said that it can do vertex detection just as well without the use of speed data. That's pretty bold, considering that speed data was the foundation of vertex finding in the Sezgin paper. In the end though, the problems that were discussed in the Sezgin paper did seem to have been resolved (to an extent) in the Yu paper. Bravo.

One of two interesting aspects I saw in this paper is its domain-independent approach of approximating curves. Approximating circles were an especially huge problem in the Sezgin paper, and with the approach Yu used by using the direction graph to approximate a circle is quite elegant. Using the direction graph to try to fit a line of length 2*pi in order to classify the shape as a circle and having the center as the center of the bounding box makes it seem like circle approximation is a trivial problem.

The other aspect I found interesting was how the system’s handling of overtracing was better explained in this paper compared to the Sezgin one. The issue I had with the Sezgin paper was how it would handle the difference between a circle that was overtraced and a spiral. Once again, the Yu paper had a simple yet effective solution: classify a stroke of circles with similar size as a circle of mean size, and classify a stroke with circles of ascending or descending radii as a spiral. Simple solutions to difficult problems never stop being amazing.

Sunday, September 9, 2007

Paper #6 - Sketch Based Interfaces: Early Processing for Sketch Understanding

Paper:
Sketch Based Interfaces: Early Processing for Sketch Understanding
(Tevfik Metin Sezgin, et al)

Summary:
The system described in the Sezgin paper, called the Sezgin system from now on, is one which focuses on the task of computers understanding sketches. Unlike image analysis and graphics recognition, sketch understanding poses the problem of recognizing noisy, incomplete sketches in real-time. The approach in the Sezgin system uses single stroke gestures as input, and then does early sketch processing in three phases: approximation (fitting primitive geometries of lines and curves to a set of pixels), beautification (making the approximation more appealing without changing meaning), and basic recognition.

In the first step (stroke approximation), the Sezgin system detects vertices at endpoints of the linear segments, and then detects and characterizes the curved segments. The approach detects endpoints from extrema in the curvature (maxima of change in direction with respect to arc length) and speed (minima of pen’s speed while turning corners) of the stroke to locate these vertices. Due to noise in the stroke giving false positives of the extrema, average-based filtering is used by filtering out extrema below a certain threshold (mean of the curvature data, 90% mean of the speed data). In other words, extrema where curvature’s already high and stroke speed is already low is selected. Hybrid fits, which are sets of candidate vertices, are generated from both the curvature and speed data, and the best hybrid fit is used in determining those vertices. In the second step (beautification), minor adjustments are made to the previous approximation to give the intended look. This is done by adjusting the line segments’ slopes so that lines with the same slope end up being parallel. To do so, the system first looks for a cluster of slopes from the final hybrid fit. The weighted average of those slopes in the cluster is then used to rotate the lines at their midpoints. Finally, the intersections of those lines become the new endpoints, which may result in vertices moving. In the last step (basic object recognition), the most basic geometric objects are built from the simple properties examined by the beautified sketch.

A user study was conducted to measure the system’s ease of use and accuracy of shape recognition. With the users familiarizing themselves with the system and then drawing ten shapes into it, the system had a 96% recognition accuracy and only one user did not prefer the system. The authors believe that a sketching system should have: the ability to draw arbitrary single-stroke objects, automatic feature point detection, one sketching mode to draw objects, and natural user feel, properties they believe lacking in other current freehand sketching systems.

Discussion:
The algorithm used in the Rubine paper takes into account speed data, which was abandoned in the subsequent Long paper and was also not used at all in the Paulson paper. In the Sezgin paper, speed data is embraced once more as a much more significant metric than in the previous three papers. It’s because of this that I found the contrast with the importance of time data quite interesting between the Sezgin paper and the papers by Long and Paulson. After reading the Long and Paulson papers, I wasn’t convinced that time was a completely useless metric, even though it made sense to not incorporate in the techniques used in their papers. Sezgin’s paper (so far) solidified my point that time data does have its uses.

I still have an issue with the authors point of describing a sketch system that should have the property of feeling natural to the user, due to the fact that the Sezgin system takes only single strokes as input. Obviously, it’s not possible to sketch numerous objects naturally with just single strokes. I’m guessing though that those properties were qualified in the minds of those authors, since the domain in their system consisted of just simple geometric objects.

Friday, September 7, 2007

Paper #5 - MARQS: Retrieving Sketches Using Domain- and Style-Independent Features Learned from a Single Example Using a Dual-Classifier

Paper:
MARQS: Retrieving Sketches Using Domain- and Style-Independent Features Learned from a Single Example Using a Dual-Classifier
(Brandon Paulson and Tracy Hammond)

Summary:
The MARQS (Media Albums Retrieved by a Query Sketch) application uses a search-by sketch algorithm with the goals of 1) recognizing unconstrained sketch symbols, 2) recognizing symbols from only one training example, and 3) learning from successful queries for improved accuracy. Previous work of sketch and gesture recognition stems from Rubine and Long’s work in feature-based recognizers, Sezgin and Yu’s work in geometric-based recognizers, Kato, Leung, and Chen’s work in media retrieval through sketch, and Hammond’s LADDER language.

The recognition algorithm used in MARQS uses two different classifiers, depending on the number of available training examples. Before sketch feature calculations, the sketch’s major axis, or the line formed from the two furthest points of the sketch, is calculated. The major axis is then rotated to lie horizontally to ensure all sketches are orientated similarly. When there is only one example sketch, a single classifier is used to calculate the sketch’s feature to classify. The error of the query sketch and example set comparison is calculated, and four example sketches with the lowest errors are ranked. With more than one example sketches, a linear classifier was used instead. Each query sketch entered into the system is added to the example set for future use by the linear classifier. Four global features were used in the algorithm: 1) bounding box aspect ratio, 2) pixel density, 3) average curvature, and 4) perceived corners count.

The MARQS system, which is an application for organizing photos and albums, was used to test the recognition algorithm. When the user inputs a sketch into the application, the top four photo candidates are displayed, and the user can click on the correct photos and view other candidates not in the top four results. The classification rate of the recognition algorithm was tested on a scale where 1.0 is the top search rank, and higher values correspond to lower search ranks. When 10 experiments were conducted with 1350 different sketches, the system used the single classifier had 27% usage with results of 1.7 average search rankings, and the linear classification was 73% with 1.51. That meant results were ranked in first, top two, top three, and top four were 70%, 87.6%, 95%, and 98%. More examples improved search rank results.

Notable disadvantages of the algorithm was slowdown during query search over time with the single classifier, lack of additional features, and lack of training on extreme sketch cases.

Discussion:
At this point of the class, we’ve only covered the feature-based recognizers covered in the Rubine and Long paper, so the algorithm covered in the Paulson paper of recognizing strokes at decent success rates with multi-strokes is quite impressive after just reading the single-stroke recognition papers. Considering that the recognition rates for the algorithm used in the MARQS application were achieved with still much work to be done is even more impressive. It would be interesting to see how the recognition algorithm works when used in other types of applications.

As Paulson brought up during class , there was an issue with the recognition algorithm of handling sketches which differed only in different orientations at a particular area within the sketch. If this is still an issue with the recognition algorithm, I wonder how this issue would be resolved with the current implementation. Could it be resolved with fine tuning the current implementation, or would another technique need to be appended to the current implementation?

There was only one minor issue I had with the paper itself. With the recognition algorithm, one of its global features was pixel density, or the ratio of filled (black) pixels to total pixels within the sketch’s bounding box. Since the algorithm was programmed in Java, this means that sketching with pressure, a feature not currently possible with Java, isn’t taken into account. There was a Chinese character which was used as an example in the paper though. Even though the character was simply used as an example, I think a more practical Chinese character should be used. I guess this would be analogous with the impossibility of trying to sketch gothic-styled Roman letters. The example character wouldn’t be realistically drawn since it can only be drawn with a brush, something not possible with a computer pen. This character might be relevant with image recognition, but not so relevant with sketch recognition.

Sunday, September 2, 2007

Paper #4 - "These Look Similar!" Issues in Automating Gesture Design Advice

Paper #1:
"These Look Similar!" Issues in Automating Gesture Design Advice
(Chris Long, et al)

Summary:
New interaction technologies in state-of-the-art interfaces can be difficult for interface designers, since they aren’t familiar enough with these technologies to incorporate them in their interfaces. Traditional interface tools provide little help, so the quill design tool was created to give unsolicited advice to designers on creating and improving their interface. Gestures are fast, commonly used, and easier to remember than text commands, but can be difficult for people to learn and computers to recognize. quill tries to resolve this by continually analyzing gestures of both problem types.

Experiments were done to gauge how many people judged expressions similarly, which yielded results of 99.8% accuracy for dissimilar gestures and 22.4% accuracy for similar ones. Thus, quill was developed to warn designers of gestures potentially similar to people. The gesture recognizer is trained by the designer with the Rubine algorithm by drawing ten to fifteen example gestures per gesture class, and then into gesture groups if there are many classes. Designers then test gesture recognition by drawing them. quill uses an active feedback system to offer unsolicited advice about any problems. It first analyzes gestures, then offers advice when the designer pauses. When gestures are similar or misrecognized, warnings appear with advice on fixing them.

Pilot studies of quill demonstrated various issues related to advice. The first concerns the user interface. Users were originally forced to ask for advice, but most did not, so quill gave users unsolicited advice instead. When given too late, users usually ignored them. Given at the earliest, and it may distract or interrupt the user. Given too soon, and the information may get out-of-date. Thus, advice is delayed and given as soon as a gesture is tested and evaluated. Concise meanings with hyperlinks to details in English prose and graphics are given in advice. For implementation issues, analyses are done in the background with the aim to increase user freedom and decrease user confusion. In quill, analysis can be started automatically by the system or manually by the user. User actions are disabled in advice computation for user-initiated analyses, while any action is allowed with affected analysis cancelled in system-initiated ones. The last issue was the metric which quill’s models used to predict human-perceived similarity. Users disagreed with it at times, especially when the models overestimated similarity to gestures with letter-based characteristics. The flaw was attributed to the models being based on non-letter data, so separate models for letter and non-letter shapes would be ideal.

Paper #2:
Visual Similarity of Pen Gestures
(Chris Long, et al)

Summary:
Gestures, or pen-based commands, are desirable features in pen-based user interfaces for their speed, rate of use, and iconic meanings. Problems exist with current gesture use due to difficulty for people to remember and computers and recognize. The researchers in this paper conduct gesture similarity experiments to be used in gesture design tools. Previous related work on gestures have been used in devices, such as Apple’s Newton and 3Com’s Pilot, and in various applications. Previous psychological experiments on geometric shapes simpler than gestures have been done to determine perceived similarity based on people’s stimuli and metric. INDSCAL, a version of multi-dimensional scaling (MDS), is a technique used by the researchers to reduce data set dimensions in order to see patterns plotted out.

Two trials were conducted to understand principles people use to judge gesture similarity. In the first trial of the gesture similarity experiments, a set of widely diverse gestures were shown to participants on the monitor in triads (groups of three gestures). All possibilities were generated randomly, and the participants had to choose the least similar-looking gesture to them. The goal of the experiment was to determine measurable similarity properties of gestures and to produce a model of gesture similarity. From the MDS plots, short, wide gestures registered as similar to narrow, tall ones, and both types differed from square ones. Angle of bounding box contrasted thin and square gestures, but not tall vertical and short horizontal ones. Aspect features were created to represent this.

For the second trial, three new gestures sets were designed to explore: 1) total absolute angle and aspect, 2) length and area, and 3) rotation-related features such as cosine and sine of total angle. In addition, relative importance of the features were being tested. The procedure and analysis used in this trial was similar to the one used in the previous one. For the results in determining similarity, the significance of the absolute angle couldn’t be determined in the first set, neither length nor area were significant in the second set, and gestures with horizontal and vertical lines were perceived more similar than diagonal lines.

Based on both trials, length and area were insignificant features in judging similarity, while the logarithm of aspect were more significant than the aspect itself. Both similarity experiments resulted in different similarity models, but the researchers prefer the model from the first trial since it was a slightly better similarity predictor.

Discussion:
I found the advice timing aspect of the user interface challenges in the first paper quite humorous. For some reason, the problem of when to display unsolicited advice reminds me a lot of the paper clip from older versions of Microsoft Office applications. That paper clip was so annoying. I can understand the difficulties the authors faced in trying to balance user convenience with program accuracy.

For the second paper, there were two areas that the authors brought up in the without going into much detail. I’m not exactly sure why it was omitted in the paper, because they seemed quite relevant. The first area concerned the gestures created by one of the authors for the two trials. According to the paper, the gestures were created based on one of the author’s personal intuition of spanning a wide range of possible gesture types and differences in orientation. I wish there was more elaboration on the specifics behind those gestures, since the trials were based on their uses. The second area concerned the results of the trials. According to the authors, the differences of the participants’ results from the first trial split into two different groups. The authors mentioned analyzing those groups, but didn’t mention what those differences were (to my knowledge). A pity they left out that information as well.

Friday, August 31, 2007

Paper #3 - Specifying Gestures By Example

Paper:
Specifying Gestures By Example
(Dean Rubine)

Summary:
A gesture-based drawing program (GDP) is an example gesture-based application in which the user inputs a gesture for the program to classify it accordingly. Input begins with the user positioning the cursor on the screen with a mouse press and drawing the gesture, and ends with the user either releasing the mouse button or ceasing movement of the mouse while the button is still pressed. Designing gestures is done by sending those gestures from a click-and-drag interface to an application, both which are built by the GRANDMA object-oriented toolkit. A GDP’s output mechanism is done via its View class hierarchy, which is further divided into two classes. Instances of the GdpTopView class refer to the window in which the GDP runs, and instances of the GraphicObjectView class are a set of either lines, rectangles, ellipses, text object, or a combination of them. Editing gesture attributes consist of entering semantic expressions to its three semantic components: 1) recog (i.e., evaluated during gesture recognition), 2) manip (i.e., evaluated during subsequent mouse points), and 3) done (i.e., evaluated during mouse button release).

The gesture recognition problem first involves having each gesture class specified by an example gesture, where there are C’ gesture classes labeled from 0 to C’ – 1. An input gesture is thus classified to a gesture class it resembles closest to. Statistical gesture recognition consists of computing a vector of features from an input gesture, then classifying it to one of the C’ possible gestures. For this particular paper, thirteen geometrical and algebraic features are used. Ambiguous gestures, or input gestures classified to more than one gesture class, are generally rejected.

Evaluating the performance of the GDP, a 98% rate of classification was achieved for classifiers with 15 or more examples per gesture class from a group of 15 or less classes. Less examples dropped recognition to 96%. Some GDP versions employed a couple of recognition extensions. One is the eager recognition, a kind of recognition in which a GDP recognizes a gesture during mid-creation. The other is multi-finger recognition, in which gestures drawn with multiple fingers are treated as multi-path data. The single-stroke algorithm applies each of those paths individually and combines the results to classify the multi-path gesture.

Discussion:
Trying to map an input which a user has drawn to a member of a set of examples is no easy task, yet the GDP in Rubine’s paper takes a novel approach by requiring that the input be drawn in one stroke. Problems are compounded when attempting to classify a gesture with multiple strokes, so having the user draw gestures with a single stroke does away with those complexities. Yet the significance of a GDP in simplifying those complexities to one stroke is the fact that it does so while maintaining a high rate of accuracy.

One fault that I see in the approach demonstrated in the Rubine paper is the idea of forcing the user to conform to a certain style instead of having the program do all the conforming. This introduces an unnatural feel to the user in writing naturally multiple-stroke gestures into one stroke. Limiting gestures to only one stroke can also introduce clashing with input gestures, resulting in an input gesture not in the example set being classified incorrectly to one that does. For example, when the mathematical equal sign is drawn like the letter Z to make the symbol into a one-stroke gesture, this introduces problems when the user wishes to have the Z character classified as the letter Z. This would be analogous to taking the absolute value of integers or converting a real number to an integer, where precision would be lost during the mapping.

A natural evolution to the approach in the Rubine paper would be to incorporate multi-stroke gestures in the classification process. But it seemed like the Rubine method resorted to classifying single-stroke gestures in order to, as stated earlier, avoid the inherent problems found in multi-stroke gestures. Expanding on the Rubine method to incorporate classification of such gestures seem to be no better than not having the Rubine method at all. Perhaps the ideas of the multi-finger recognition extension covered near the end of Rubine’s paper could be exploited in an approach to classifying the more complex multi-stroke gestures.

Thursday, August 30, 2007

Paper #2 - Sketchpad: A Man-machine Graphical Communications System

Paper:
Sketchpad: A Man-machine Graphical Communications System
(Ivan Sutherland)

Summary:
The Sketchpad system is a system which allows a human user to interact with the computer by drawing lines on the screen. With the system, a set of buttons issue specific commands, switches activate functions, a light pen indicates position information and points to existing objects, and knobs rotate and magnify picture parts. Using Sketchpad to construct drawings serves as a model for the design process, giving the advantages of 1) storing and updating drawings, 2) gaining understanding of operations graphically, 3) serving as a topological input device for circuit simulations, and 4) producing highly repetitive drawings.

A hierarchal data structure is employed by Sketchpad, separating information of the general and specific parts of a drawing. This structure allows modifying details of the specifics without needing to change the general. Input is handled by a light pen, which serves a dual role of positioning new parts of a drawing and pointing to existing parts for modifications. Fairly extensive drawings can be achieved by its power of locking and tracking of picture parts through pseudo pen location and demonstrative language programs. Display is stored in a large input table, where magnification is handled by operating knows and geometric generation is computed with difference equations. There is an additional option to display modifiable abstractions, or user-desired properties, through circular symbols and numerical values.

Operations in Sketchpad are handled recursively with three very generalized functions in order to handle any fixed geometry subpicture: 1) expansion of instances (unlimited levels of subpictures for desirability), 2) recursive deletion (removal of all related subpictures from original deletion for consistency), and 3) recursive merging (combining two similar picture parts for complexity). The use of atomic operations for creating basic geometric objects thus makes it possible to create more complex geometric objects. One major feature in Sketchpad is the user’s ability to apply specific mathematical conditions on existing drawn parts. By defining a type of constraint for the computer to satisfy these conditions, a drawing can take the exact shape desired.

Sketchpad has been and can potentially be applied to various applications, ranging from producing patterns to artistic drawings. What Sketchpad hopes to achieve is the ability to make it more worthwhile to draw on a computer than by hand.

Discussion:
This is a paper about a system which allows a user to draw lines with a pen device on the screen and modifying those lines with a series of knobs, dials, and switches. That in itself, to a modern reader, is not too exciting. But for a paper written over four decades, that’s amazing and quite groundbreaking. Subjects covered in the paper such as precursors to graphical user interfaces and object-orientated programming are significant in themselves, but one of the things I find most interesting about this paper is how it is still easily readable.

One fault I did see with the technology presented in the paper is the light pen’s practicality as the dominant input device for computers had the mouse not existed. One advantage of the mouse that people take for granted is its ability to be used seamlessly with a keyboard without the user tiring much from prolonged use. The reason is probably because the mouse lies on the surface, so the user’s hand does not need to lift the mouse up for an extended time. That is not the case with the light pen. If the light pen were used in conjunction with a keyboard in a similar fashion as we use a mouse and a keyboard nowadays, I believe it would be far more tiring. Sketchpad's input system would have been a viable supplement to computers equipped with a mouse had it been economically feasible in its introduction, but it would not have been a successful substitute for it.

Since the paper was written, much progress has been made to touch/pen-based input with the Tablet PCs (covered to an extent in the Hammond paper in the previous blog post). But if the Sketchpad system had the success that the mouse had in that time period, a logical evolution would have been to incorporate a dual monitor system, one where the monitor stands up on the surface to cope with keyboard input, while the other would lie flat on the surface to cope with comfort in the same way as writing on paper does. In addition, I would have incorporated supplementary buttons on the pen, analogous to the ones found on the mouse.

Paper #1 - Introduction to Sketch Recognition

Paper:
Introduction to Sketch Recognition
(Tracy Hammond and Kenneth Mock)

Summary:
Sketch Recognition had its beginnings in 1963 with Dr. Ivan Sutherland’s vector-based Sketchpad system, consisting of a light pen and keyboard input to allow the user to produce complicated 2-D graphics. Despite its graphical superiority of smooth lines over raster-based computers, computers with raster graphics and its mouse input prevailed with its lower cost and superiority in other graphical features. Difficulties in sketching with a mouse, as well as recent technological advances, allowed for the emerging Tablet PC technology to penetrate the market.

Tablet PCs are, in essence, notebook computers with the added feature of touch or pen-based input capabilities. To track an input’s location, a digitizer is employed by the Tablet PC. There exists three types of digitizers: 1) passive (i.e., touch), 2) active (i.e., specialized pen), and 3) hybrid (i.e., combination of both). Two common physical styles also exist for the Tablet PC: 1) slate (i.e., keyboard-less) and 2) convertible (i.e., twist/rotate monitor). The operating system most commonly supported by Tablet PCs is Windows, but aftermarket modifications and installation of additional drivers allow for support with Mac OS X and Linux too.

Instructors who adapt Tablet PCs into their lectures discover flexibility in their teaching methods by allowing dynamic functionality over traditional static teaching materials, but disadvantages in implementing them come in greater cost in money and resources compared to traditional teaching methods. Student use of Tablet PCs in the classroom setting showed advantages of more detailed note-taking, greater collaboration with teachers and classmates, and also additional academic resources at their disposal.

One framework to help instructors produce sketch recognition interfaces in their curriculum is the FLUID framework. This framework implements the LADDER language and GUILD sketch recognition generator to let instructors not have to deal with programming sketch recognition code. The technology instead allows the instructor to focus on specifying shapes to be used in the curriculum with the framework. Two particular case studies of Tablet PCs used in the classroom setting involved the use of video recordings in the first case and tablet usage to post curriculum material in the second case.

Discussion:
With Tablet PCs combining the features that conventional notebook PCs already have with the added functionality of touch and pen-based input, its significance can be seen in opening up new uses and extending current ones without losing capabilities that notebook PCs already have. In terms of its use in pedagogy, Tablet PCs implementation to long-standing traditional methods is also significant. Instructors can advance their teaching methods by extending or easing their current teaching methods with Tablet PCs, while students can use Tablet PCs to supplement and better elaborate what they are already learning.

Despite the advantages of Tablet PCs, there will be cases when its technologies may be used in areas that may not be needed or is simply overkill, as a parody of Microsoft’s Surface demonstrates:

Since the paper’s release, there has been some progress with overcoming the limitations of passive digitizers. For example, the iPhone demonstrated the ability to overcome the passive digitizers’ inability to distinguish single-click from double-click by tapping two fingers on the screen for double-tap instead of one finger. The video below of touch screen typing on the iPhone demonstrates how advances can be made in the merging of convenience in passive digitizers with versatility in active digitizers:

Along with the advances made by the iPhone, touch and pen-based Tablet PCs can definitely benefit from combining the advantages of passive and active digitizers while overcoming the imprecision of passive digitizers and the overhead inherent in current active digitizers.

About Me

Hi, I'm Paul Taele. I got my Bachelors in Computer Sciences and Mathematics at the University of Texas at Austin (yes, I'm a Longhorn). I also studied Chinese for three semesters at National Chengchi University (Taipei, Taiwan).

Year:
Masters in Comp Sci, 1st Year

E-mail:
ptaele [at] gmail [dot] com

Academic Interests:
Don't quote me on this, but I'll say Neural Networks and Multi-Agents for now.

Relevant Experience:
I doubled in CS and math at UT. I primarily programmed in Java while there, since that's all that was taught. I took a variety of AI courses at the time: AI, Neural Networks, Robotics, Cognitive Science, Independent Research (on the CS/EE-side of neural networks), and Independent Study (on the math-side of neural networks). That was fun.

Why I'm taking this class?
Because Dr. Choe told me to. Haha, not really (highly recommended it though). Besides Dr. Choe's recommendation, I found the course topic interesting, especially after I saw an application of sketch recognition (I think?) called the T.E.D.D.Y. system on YouTube several months earlier:

What do I hope to gain?
Without sounding too philosophical, I hope this course gives me a better insight of how AI is done outside of (or besides) my main interest in neural networks.

What do I think I will be doing in 5 years?
Still being a student? I hope that doesn't happen, but it won't surprise me if I still am.

What do I think I will be doing in 10 years?
I'm not sure. Thinking that far scares me.

What are my non-academic interests?
I'm a big fan of East Asian movies, TV shows, and music. This is a consequence of studying Asian languages for several years, I guess.

My funny story:
I didn't plan on being a CS major during undergrad (I was originally a business major at the University of Southern California). When I went to UT afterwards, I decided to do pre-CS though since all my friends were doing it (at UT, students have to apply as a CS major typically in their third year). Out of my friends, I was the only one who got accepted into the CS program. I made new friends and decided to double major in math as well since they all were also CS and math majors. Turns out that I was the only one out of my friends whom didn't drop math as a major. I made some more friends, and we all vowed to go to grad school after we graduated. Well...yeah, I'm the only one out of them that went immediately into grad school. Wait a minute, that's not a funny story at all. That's just plain sad...

Random Stuff #1 - Why is my blog pink?
It's not pink, it's salmon. And I think it looks hilarious. I figured no one else would choose this template, especially due to the lack of female classmates in this course.

Random Stuff #2 - Why am I doing grad school at A&M?
Haha, I sometimes wonder what I'm doing here after doing undergrad at UT. When I was thinking of doing grad school, my professors told me to go somewhere else for CS grad to gain a different perspective. I focused on A&M and UT Dallas because they were both in-state schools with decent CS programs, even though it turned out my profs originally wanted me to go out-of-state. In the end, I chose A&M for two reasons: 1) they gave me more money, and 2) Dr. Choe, one of the professors I would like to have on my advising committee, also went to UT (in fact, my prof for AI and neural nets and my prof for robotics at UT were his advisers when he was working on his PhD). Anyway, my friends gave me a new nickname: Traitor.

Random Stuff #3 - What do I think of College Station/Bryan?
It sucks.