Wednesday, October 31, 2007

Paper #18 - Constellation Models for Sketch Recognition

Paper:
Constellation Models for Sketch Recognition
(D. Sharon and M. van de Panne)

Summary:

The problem that the author wishes to tackle involves having a system figure out what the user drew and what each stroke corresponded to using techniques that have strong parallelism to single-image interpretations in computer vision. The general approach involves modeling the sketches’ strokes as a constellation or ‘pictorial structure’ to be recognized, and this model is based on both local and pairwise features. Contributions by this approach include being able to deal with wide variations of sketch objects and allowing the parts in sketch objects to vary in number. The assumptions do involve similar parts drawn with similar strokes, and also mandatory object parts having only one instance. But the problems addressed make it possible for systems to understand sketches with known a stroke structure and unknown stroke ordering.

A constellation (or pictorial structure) model, the one used in this paper, is generally composed of local parts and a geometry model which defines the relative locations and distances of those parts. The way it’s used in this paper involve the sketch objects being modeled as one, where its features are composed of individual parts and pairs of parts. Each part has a label which is classified as being either mandatory or optional, and features are computed for strokes which have mandatory labels in order to improve computation time as sketches are scaled up. The sketch recognition process involves searching the space to assign mandatory labels and to find optional labels for the remaining unlabeled stroke. Labeling of a stroke to a part is done using a cost function to score the quality of the match between a label and stroke, and the probability of a stroke being a label is the product of two different projects: the product for all likelihoods of individual strokes, and one similarly for stroke pairs. A maximum likelihood (ML) search is done to find the best likelihood of all strokes, and this search uses a branch-and-bound tree. During search, the tree branches on the best node and bounds on all the others. To make the search faster, a multi-pass threshold and hard constraints are used.

The recognition algorithm in the paper was tested on 5 different object classes. Sketch recognition was based on the accuracy of labeling the individual parts of known objects in the sketch. The algorithm with multi-pass thresholding completed faster than without, in general. Even though hard constraints could also lead to reduced recognition times, failed recognition was attributed from using hard constraints due to lack of training examples.

Discussion:
I think that the approach used by the author really is a creative solution to recognizing more complex shapes that can be drawn in such a wide variety. I would imagine that applying vanilla stroke-based SR techniques on this set of sketches, even given the strong assumptions by the author, would fail entirely.

Outside of those cases, I'm not sure that it's really that useful compared to the typical stroke-based SR techniques, such as the circuit diagram symbols processed in the Oltmans paper. But then, I would see why this approach would not be very ideal with the shapes evaluated on in the other paper, since I can't imagine what would be an intuitive way to divide the circuit diagram symbols into a sufficient set of mandatory and optional parts in order to generate reasonable recognition from this paper's approach. Nonetheless, this approach is useful for simple objects which can be split into parts easily. For simple objects that can't, I would use traditional SR techniques. For more complex objects that can, might as well resort to computer vision or image processing techniques.

Saturday, October 27, 2007

Paper #17 - Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches

Paper:
Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches
(Michael Oltmans)

Summary:
The sketch recognition system created by the author strives to handle freehand sketching independent of constant system feedback and rigid drawing constraints. The first challenge in freehand sketch recognition involves shape drawing variations from the signal noise level (i.e., in which users did not intend) and the conceptual level (i.e., which reflects on the diversity of user sketching). The second challenge is unnecessary overtracing by the user, and the third challenge is the task of segmenting groups of strokes to a particular shape. The author relies on vision and machine learning techniques to cope with signal noise and conceptual variations, a visual approach to process overtraced strokes as original strokes, and an appearance-based approach which doesn’t need to handle segmentation of strokes since it deals with a sketch’s visual patterns instead of individual strokes.

Recognition of sketches is handled by the visual parts and shapes. For representing visual parts, a shape context feature called the bullseye. Visual parts are analyzed in a circular region consisting of a center point and concentric rings of wedges, where the wedges serve as a histogram of points within them. Shapes make use of the visual parts from the bullseye feature grouped together based on a codebook, which is a standard collection of parts analogous to a vocabulary. A set of parts is selected from a range of parts in the training set, clustered together into similar parts, and divided into entries of this codebook. Shape identification is then done on the parts by using some distance metric on match vectors, a representation which calculates the shape as a set of bullseye parts in terms of the codebook.

The author’s system handles two tasks: isolated shape classification and shape localization. Using the methods discussed previously, shape classification is done based on visual part representation in order to train a classifier that distinguishes the shapes. In the second task, a three-stage process is used to perform shape localization. Candidate locations are first scanned on the sketch, a classifier is then applied on those locations, and a final set of predictions from the sorted candidates are finally done. The final output of the system is this shape localization. From this entire approach, the author was able to achieve 89.5% shape recognition in isolated tasks, and a 74% shape recognition rate of 24% precision on shapes in complete sketches.

Discussion:
Up to this paper, a majority of the sketch recognition techniques covered this semester concerned stroke-based recognizers. What was very interesting about the approach used by the author of this paper was that it abandoned the primary techniques of stroke-based recognition for a more machine learning / computer vision-based one. The author was able to back up his separate approach with incredible recognition rates on outstandingly sloppy sketches of various symbols shown below.


It’s difficult to find noticeable faults, right off hand, with the author’s separate sketch recognition approach. The bullseye shape feature used as a histogram to calculate points has a functional approach to shape orientation, and it can also scale well by altering the radius size for varying shape size. Furthermore, information that is available in stroke-based recognition is still accessible to the author’s hybrid recognition technique, thus making use the strengths of the traditional stroke-based approaches without much of its weaknesses, if any. The only thing that comes in mind involves the shapes being incorrectly classified by the author’s recognizer. He attributes it to the strong similarity between shapes of other classes causing the mis-recognition.


I can’t blame the recognizer for mis-recognizing those terribly drawn symbols, as those symbols probably would have been difficult for the stroke-based recognizers already introduced in the semester to recognize anyway. One thing that may remedy the already high recognition rates by the author’s recognizer would be to employ context-recognition. It would be highly unrealistic to expect this recognizer to correctly recognize symbols that are difficult for a person to recognize when those symbols are in isolation. The thing is that these symbols are presented in a domain in which symbols can be disambiguated by figuring out that symbol’s context in relation to surrounding symbols that may have already been correctly classified. That’s the only improvement I can see, but that seems to be quite an ambitious addition for an already ambitious approach designed originally to recognize informal sketches. Plus, I think recognition through context is already a difficult problem in itself.

Friday, October 26, 2007

Paper #16 - Naturally Conveyed Explanations of Device Behavior

Paper:
Naturally Conveyed Explanations of Device Behavior
(Michael Oltmans and Randall Davis)

Summary:
The authors of this paper created a multi-modal system which can interpret simple two-dimensional kinematic devices called ASSISTANCE. The program interprets these devices using sketch and verbal input. In order to describe a device to a computer as easily to a person, a description would need to contain that device’s structure and behavior. A tool in ASSISTANCE is capable of enhancing descriptions by understanding a device’s graphical representation and also a verbal description of its behavior. In order to use ASSISTANCE, a designer first sketches the device, then explains the structure to the system, and finally has the system interpret the utterance and gesture. To make the system manageable, the domain is limited to two-dimensional kinematic devices.

The system’s structure is composed of three parts. First, the device’s sketch (i.e., physical body and arrows) and speech (parsed textual phrases) descriptions are handled by the ASSIST sketch and ViaVoice voice recognizer. Second, output in the form of propositional statements is sent to ASSISTANCE for interpretation. Last, event and casual links generated by ASSISTANCE from a truth maintenance and rule system is sent to the last part, which then finds a consistent casual structure that also closely matches the designer’s descriptions of the device.

The method that the authors use to evaluate ASSISTANCE is by comparing it to other alternatives and also by analyzing its usability. The system strives to combine the mathematical precision of formal alternatives like CAD tools and the constraint-independence of informal alternatives such as written and verbal explanations. It has also demonstrated the ability to interpret a designer’s explanations of a device in the early design stage.

Discussion:
Previous papers, besides Abam’s, introduced us to novel approaches in solving various sketch recognition problems. What those papers had in common was the fact that they dealt solely on sketch recognition. With this paper, it’s very interesting that we’re introduced to a multi-modal method which employs speech recognition in addition to the already familiar sketch recognition. This is quite an ambitious combination for the author to use in order to interpret two-dimensional kinematic devices in a domain as difficult as physics.

I agree with one fault brought up by Dr. Hammond during class about the use of two different recognition systems, which was how the system copes with speech input which completely contradicts sketch input. This can be remedied in several ways, one being giving priority of one input over another. The other fault I see lies in the level of usefulness this kind of multi-modal system would bring, given the amount of overhead needed to see benefits over alternative approaches. The argument that the author makes on his system is that it gives the flexibility of descriptions made in normal communication and the precision of computing tools. A sketch-voice recognition approach is intriguing, but it’s a bit difficult for me to gauge whether or not the amount of effort it takes to employ such an approach outweighs simply resorting to an all-sketch recognition method instead.

Thursday, October 25, 2007

Paper #15 - Ambiguous Intentions: a Paper-like Interface for Creative Design

Paper:
Ambiguous Intentions: a Paper-like Interface for Creative Design
(Mark D. Gross and Ellen Yi-Luen Do)

Summary:
The authors of the paper create a pen-based interface called Electronic Cocktail Napkin, an interface which handles freehand input by: acquiring its ambiguous and precision information, internally representing it, and visually and edit-behaviorally echoing it to users. Created as a design application, its goal is to support informal drawings by aiming to not only support creation, modification, and management mechanisms of diagrams, but to also use freehand drawings as core information retrieval information. The approach includes mechanisms for: recognizing configurations (i.e., group of elements, which may or may not be drawn abstractly by the user), handling ambiguity and context (i.e., maintaining a representation of unknown and ambiguous elements until it is resolved by context), representing imprecision with constraints (i.e., using an internal, constraint representation which provides interactive edit behavior), and refining and formalizing incrementally (i.e., making improvements with the ultimate design aim of having the program making definite decisions).

Three main mechanisms comprise the Cocktail Napkin application: low level recognition of glyphs, high level recognition of configurations, and constraint and spatial relations maintenance of drawing elements. Low level recognition consists of identifying the best matches given an input’s context and certainty value, and training its recognizer can be done by giving examples to it. Configuration recognition consists of finding patterns in all element pairs of the drawing, and constructing its recognizer is done by extracting constraints from the element types and its spatial relations from the trained examples. Context is maintained as a list of “current context” and “current context chain” in Napkin, where a chain specifies a sequence of other contributing context. Recognition of the previous two mechanisms is done through this context, and recognizing context begins with initial context of only shapes, alphanumeric characters, and lines. Context is then adjusted based on new drawings.

The Napkin application’s interface was tested on undergraduate students and design instructors at the University of Colorado’s architecture program. New testers of the application were given half an hour to familiarize with the application. It was discovered that users primarily wanted to draw, therefore ignoring the recognition facilities. This resulted in focusing on low level tuning and tweaking of the program instead.

Discussion:
An interesting aspect that I found in the Napkin interface was the authors’ aim in creating a system that handles like a very flexible CAD system. With the amount of work done to implement the aim of the authors to employ a smart paper-like system, I’m not sure if it’s enough for the target audience to make a complete switch-over to conventional methods of sketching it into traditional drawing program though. Napkin seems a little too generic with its current implementation, and would need some additions or improvements, such as broadening its functionality as a design program and making use of a sketching language, to be taken more seriously by its target audience.

Paper #14 - Graphical Input Through Machine Recognition of Sketches

Paper:
Graphical Input Through Machine Recognition of Sketches
(Christopher F. Herot)

Summary:
Three experiments were conducted in the Herot paper. In the first experiment, the HUNCH system, which is a set of programs designed to process sketches, was employed to determine the existence of a sketch syntax and a domain-independent sketch interpreter. Raw data was taken from users sketching with a special pencil on paper taped over a data tablet. In finding curves, corners, and end lines, it was discovered in the paper that corners could be found from speed minima. Curves were taken to be special cases of corners, while end lines were first handled with a simple latching program. Latching, or the joining of endpoint pairs within a certain radius, produced strange results when scaled. Overtracing was not dealt with in the original HUNCH system, but Herot mentioned the need of the program to distinguish between two carefully drawn parallel lines and a wide, overtraced one. Other inference levels included inferring a third dimension and finding rooms in a floor plan sketch.

Mixed results from the first experiment led to another experiment in finding a system which specified a context at the lowest levels in order to avoid difficulties in the machine recognizing context on the sketches.

The last experiment emphasized user involvement in the input process outside the range of a machine’s capabilities while not being cumbersome to the user during sketching. This system involves a database and a set of programs to make use of it. Three types of programs were used: 1) inference programs, which tests reasonableness on output, 2) display programs, which allows any level of the database to be displayed, and 3) manipulation programs, which is combined with a graphical editor for user modification. A set of functions in this new program were used to determine lines and curves in real-time. The two most useful functions were speed, which measures interval length by number of points, and “bentness,” which measures maximum distance from point to chord. Bentness is used to find actual corners, while speed was used to indicate corner intention.

Discussion:
When viewed in isolation, this paper isn’t really that amazing to the previous papers covered thus far. It wasn’t until after I paid attention to the photo of the hippy guy in the paper did I then realize how old this paper was in comparison to the succeeding paper. The innovative idea of using velocity and curvature to resolve the difficult problem of finding lines in a point? It may be a happy coincidence to them, but that’s quite an accomplished feat when considering that it didn’t come up again for another two decades. It’s a shame that it wasn’t picked up for such a long time, and amusing that this idea was only mentioned in passing in this paper when it was dedicated entirely in the Sezgin paper.

I do see a fault with the latching approach the author uses to resolve unconnected lines, something covered more in detail than the more novel point-detection approach. Latching intuitively makes sense, but the way it’s done in this paper doesn’t seem to perform well when it comes to images scaled smaller. The only fix I would see in the latching approach is either using a variable scheme as opposed to the rote scheme used in the paper, or simply do away with it completely and use something else. I’m guessing the latter was done, as I haven’t heard anything about latching or other similar approaches being done in the more recent papers.

Paper #13 - Perceptually Based Learning of Shape Descriptions for Sketch Recognition

Paper:
Perceptually Based Learning of Shape Descriptions for Sketch Recognition
(Olya Veselova and Randall Davis)

Summary:
The authors are interested in better understanding natural multi-domain sketch interaction. Their goal is to capture a sufficient enough shape description for system recognition. Much of their insight stems from Goldmeier’s work on perceived shape similarity, in which people are either biased or ignorant of certain geometric properties. Goldmeier refers to these properties as singularities, and these singularities and additions by the authors create a qualitative vocabulary for their approach. The authors then use Goldmeier’s second set of experiments which gave those singularities different perceptual importance. From the authors’ own analysis, they’ve created a default relevance ranking of supported constraints in their approach. Using heuristics, they have adjusted those relevance scores based on obstruction (i.e., number of geometric primitives between two shape parts), tension lines (i.e., alignment of corners of a shape), and grouping (i.e., combination of individual elements of a shape as a whole).

A user study was conducted to determine if their system produced the same classification as people. 33 people were shown 20 variations of 9 different symbols, where the system’s performance was measured by the proportion of times the system agreed with the majority answer. When the majority agreed over 80% of the time, the system was in that majority 83%. When it guessed randomly, the system agreed with the majority 50% of the time. Of the system’s errors, most of them were false negatives attributed to it paying more attention to individual detail of symbols than people in general. For false positives, they stemmed from lack of global symmetry detection and “must not” constraints.

Discussion:
Unlike the papers which primarily focusing on attack sketch recognition problems by employing techniques to better recognize shapes in isolation, I found it interesting that this paper instead focuses on understanding shapes at the perceptive level. This is important as well, since systems will attempt to recognize shapes that are not in isolation. The author’s goal of finding constraints that coincide with how people differentiate between shapes seems just as important in making more powerful sketch recognizers. In addition, this paper thus far cites the oldest source (1923!).

In this particular paper, I see faults in shapes being limited to only lines and ellipses. These shapes only cover a fraction of symbols that people realistically draw, so a logical next step would be to expand the study which incorporates more complex shapes.

Paper #12 - Ambiguous Intentions: a Paper-like Interface for Creative Design

Paper:
Interactive Learning of Structural Shape Descriptions from Automatically Generated Near-miss Examples
(Tracy Hammond and Randall Davis)

Summary:
LADDER is a sketching language which lets developers structurally describe a shape in a way that can be recognized, displayed, and edited in their domain of choice. In a multi-domain recognition system, these descriptions can be automatically given as input to recognizers, exhibitors, and editors. When an incorrect number of constraints describe a shape, false positives (from too few constraints) and false negatives (from too many constraints) occur. Shape descriptions automatically generated from one example can be difficult and too vague, while descriptions created by hand can be time-consuming and too specific. The authors of this paper therefore propose a visual debugger which uses active learning to generate suspected near-misses (shapes which differ from positives examples by one stroke) for the developer to classify as either positive or negative examples.

The approach used requires a positive hand-drawn example and its description as an initial condition. Descriptions can either be generated automatically or written with syntax-checking. After conceptual errors are corrected, the system finds a description with a variable assignment that contains the fewest failed constraints. The user is then required to remove enough constraints to achieve shape recognition. Thus, the given example is guaranteed not to be over-constrained on the single example. In the case of general over-constrained testing of general examples, a complete list of true constraints on the initial sketch was generated earlier. Known constraints in a correct description are then taken by creating a list of constraints in the true constraints that are not true in an encountered negative example. For under-constrained testing of general testing, a list of possible missing constraints are created. Removed constraints are composed of related constraints which: already exist, are more general, and follow transitively from the description. Afterwards, each of those constraints are tested if they’re missing by adding the negation to its description.

Shapes are generated based on the new descriptions to the user and tested for being either positive or negative examples. In each instance, the description is modified by adding or removing more constraints, depending on the type of example encountered. After executing this process, a shape description which is neither over- or under-constrained is created.

Discussion:
It’s not easy to have a system correctly recognize a sketch, simply due to the fact that a target symbol is bound to have variety. Being able to encapsulate that variety correctly would ease that aspect for the system. Using a sketching like LADDER helps construct shapes so that designers can focus on constraining a shape just right, but it still seems like perfecting those constraints acts more like an art than a science. The procedure used in this paper takes a nice step in the right direction in trying to eliminate a shape’s constraints from being too over- or under-constrained, and that is what I found to be important in this paper. One fault mentioned by the author is the fact that it can be an expensive process, so the use of heuristics was employed to reduce that cost. From my prior experience with using LADDER, I think improvements could be made by removing constraints that can be compensated by more valuable constraints. An alternative solution…more heuristics?

Monday, October 1, 2007

Paper #11 - LADDER, A Sketching Language for User Interface Developers

Paper:
LADDER, A Sketching Language for User Interface Developers
(Tracy Hammond and Randall Davis)

Summary:
Shapes and groups of shapes, along with their actions of drawing, editing, and displaying, can be described with the sketch description language LADDER. This language allows interface designers a tool to handle these actions as well as make use of additional information such as stroke order or direction. The language incorporates predefined shapes, constraints, editing behaviors, display methods, and specification syntax for defining domains.

Describing shapes are limited to shapes with fixed graphical grammar, composed completely of LADDER primitive constraints, contain curves which are few in number or not highly relevant, and have high regularity. Shape definitions contain components, geometric constraints on those components, aliases for element simplification, editing behaviors, and display methods. Those definitions are simplified through a hierarchical structure. Shapes can be made abstract by requiring certain components to be defined. Shapes can also be made into a collection by having shapes in the same domain put into a group. The first advantage in grouping allowing the system top-down recognition, and the second is allowing shapes of the same group to move together. The language itself stems from various predefined shapes, constraints, editing tools, and display methods. In addition, components of variable numbers can be specified as a vector.

The LADDER recognition system is a multi-domain system which comprises of recognizing primitive shapes, domain shapes, and editing gestures. Low-level recognition is done on primitive shapes, and fact finding on a Jess rule-based system (Prolog-like syntax in Java) is performed on domain shapes. The system does checks on strokes to also determine if it is an editing gesture instead of a drawing gesture. Since shapes can be displayed with its original strokes, best-fit primitives, ideal strokes, or through Java Swing objects, the shapes’ components, properties, and constraints are converted into mathematical equations to find a global optima solution which satisfy all those equations. With new positional information obtained for description, the system creates and outputs a beautified shape.

Discussion:
Long discussed in one of his papers the lack of user interfaces employing sketch recognition due the difficulties interface designers face in employing them. Having a sketching language to alleviate these difficulties would facilitate the creation of more sketch-based interfaces, and the significance of LADDER lies in it being such a sketching language for interface designers. What makes LADDER even more significant is the fact that it is the very first sketch language. Shortcomings do exist, four of which are listed as limitations to shape design (those limitations are repeated in my summary above).

One aspect of LADDER that caught my attention was a particular facet of its multi-domain recognition system. Primitive shape recognition handles like it should, but the use of the Jess rule-based system in domain shape recognition was very intriguing. I heard claims that Jess was an expressive language, but I’ve only seen its applications in multi-agent systems. Recognition based on a rule-based system is a novel idea that I have not seen used in other system. I wonder if this has something to do with LADDER being a geometric recognizer as opposed to previous papers which focused on gesture recognition.