Paul Taele's Blog on Sketch Recognition: September 2007

Wednesday, September 26, 2007

Paper #10 - Recognizing and Beautifying Low-level Sketch Shapes with Two New Features and Ranking Algorithm

Paper:
Recognizing and Beautifying Low-level Sketch Shapes with Two New Features and Ranking Algorithm
(Brandon Paulson and Tracy Hammond)

Summary:
Problems with current recognition systems include having users learn the system instead of vice versa. Paulson’s recognition system does not pose such constraints on the user and only require users to draw single-stroke primitive shapes. Additional features include returning multiple features, higher recognition rates compared to popular low-level recognizers, and high level of recognition rates. Currently recognized shapes are: lines, polylines, circles, ellipses, arcs, curves, spirals, and helixes.

The recognizer’s overall system architecture first consists of performing pre-recognition calculations on single-stroke inputs. In this stage, various graphs and values are computed, and corners are calculated using a multi-pass corner finding algorithm. Two additional features are added in the recognition process: normalized distance between direction extremes (NDDE) and direction change ratio (DCR). NDDE is calculated by taking the computing the stroke length between the point with the highest and lowest directional values. DCR is calculated as the maximum change in direction divided by the average change in direction. In general, polylines have lower NDDE values and higher DCR values than curved strokes. Furthermore, two tests are applied before recognition: existence of overtracing and existence of a closed figure.

For each of the recognized shapes in the system, shape tests are conducted on them. One special note about the shape tests is the complex test, which is set as the default interpretation for shapes which do not pass the other shape tests. After the shape tests, a hierarchy is created to sort the interpretations. Using inherent properties of the multi-pass corner finding, a ranking algorithm was used to determine the best fit of a stroke as a complex, curve, or polyline fit. The study on the recognizer involved collecting two sets of data of 900 shapes from 10 users. The first set was used for training, and the second set was used for testing. Correct shape interpretation returned 99.89%, with 98.56% of those being the top interpretation. Complex interpretations made up half of the error.

Discussion:
I wish I had read this paper earlier. The low-level techniques used here would have sure helped me on the earlier homework assignments. Anyway, I really like the architecture used in this paper, especially the pre-recognition step. Those two new features seem like really novel ways for arc-polyline differentiation. In addition, the shape tests seem quite developed. Unfortunately, I wouldn't be able to comment much more without actually seeing it in action. For now, I shall use faith.

My main observation concern the shapes that were misrecognized in the study. The recognizer does a fine job on normal strokes, but the strokes that the recognizer didn’t identify correctly were quite sloppy. Figure 7 was especially the case. Just by looking at those example strokes, I wouldn’t be able to figure out if they were truly just polylines myself (seriously). If this is difficult for people to differentiate, I believe that it would be unrealistic to expect the recognizer to accurately make such a distinction on such sloppy input data. If I found the user who made those stroke inputs, I’d vote him or her off the island in the next tribal council vote.

Paper #9 - Streaming Algorithms for Line Simplification

Paper:
Streaming Algorithms for Line Simplification
(Mohammad Ali Abam et al)

Summary:
Line simplification is a technique used to approximate (i.e., simplify) a line of possibly infinite streaming points in order to accommodate limited storage. While its use in the paper primarily focuses on Geographical Information Systems (GIS), it also has practical applications in sketch recognition. Even though line simplification is used to overcome limited storage, current computation is quite costly for known algorithms. The goal of the paper is to present an algorithm which can reduce these costs into something more manageable.

Two distance metrics and a ratio are used to aid in computation and to determine efficiency, respectively. The first distance metric is Hausdorff distance, which both measure the distance of two sets of points. Hausdorff distance takes the largest distance from a set of smallest distances of each point in a set to the other, while Frechet distance can be defined as measuring the distance between two paths when both are traversed forward at different time intervals. A competitive ratio, which is a ratio of a simple even-number interval line simplification algorithm to the optimal algorithm in question, determines how efficient the optimal algorithm is.

The Abam linear simplification method claims to have a good competitive ratio under the two conditions of being monotonic and having an error-predicting oracle. The algorithm itself consists of four steps, given that streamed points are already simplified. The first step is to set the next streamed point as the next line-simplified point. The second step is to compute the error, with the oracle, of the previous point based on the line segment of the current point and a point that’s two points away, and to also add the previous point and its priority (its error if removed from the line simplification) into a priority queue. The third step is to extract the point in the priority queue with the minimum priority. The last step is to update the priority of the previous and succeeding point of the extracted point.

The key component in the algorithm is this use of the error oracle. There are three types of paths which the points stream in that determine how the oracle responds. For convex paths, which are paths that forms a convex polygon when the last point connects to the first point, and xy-monotone paths, which are paths that are intersected by horizontal or vertical lines at most one point, the Hausdorff distance is used. For arbitrary paths, the Frechet distance is used. With the given results, O(1) competitive ratio was achieved for the Hausdorff and Frechet cases with O(k) and O(k*k) additional storage, respectively.

Discussion:
Line simplification on potentially infinite streaming points with traditional techniques is a very costly operation, and the reduction of cost with the Abam method seems impressive. As Dr. Hammond elaborated, the Abam method has potential in use for certain sketches such as those which the user scribbles in a fixed region many more points than can be handled by general sketch recognition methods.

One issue which was brought up by fellow namesake Paul (Corey), which I didn’t realize was a problem at the time until he brought it up during my presentation of the paper, was the removal of a point with lowest priority each time a new point is streamed into the simplified line. I was unable to hear his entire argument about it, but I would intuitively think that the computation used to determine which points to remove would lose relevance if storage savings is no better than an algorithm which simply removes even-numbered points.

Thursday, September 20, 2007

Paper #8 - A Curvature Estimation for Pen Input Segmentation in Sketch-Based Modeling

Paper:
A Curvature Estimation for Pen Input Segmentation in Sketch-Based Modeling
(Dae Hyun Kim and Myoung-Jun Kim)

Summary:
Kim et al’s approach to sketch recognition involves segmenting pen input at high-curvature points in order to enhance shape recognition of pen strokes. Best curvature estimation is done by using only local shape information of the stroke on the fly. Several features are extracted from the stroke to aid in the process.

The first feature is taking the direction of the point by taking the angle from the preceding and succeeding point. Points are already re-sampled in this approach, which means choosing points a given distance from the previous point. The second feature is the use of support points, or a set of k number of preceding and k number of succeeding points (k = 3 usually), from a point aid in that point’s curvature estimation. Due to re-sampling, curvature is set to the point’s direction.

Two additional features are used to aid in locating segmentation points: local convexity and local monotonicity. In local convexity, a point’s curvature is computed as the sum of curvature values of itself and support points which share local convexity (same direction sign). Segmentation confusion can occur with local convexity. In other words, when points in the same neighborhood can end up having the same curvature values in local convexity, only complicated heuristics can remove the unnecessary features. To remedy it, local monotonicity is alternatively used. In local monotonicity, curvature estimation values of a point proceeds as follows. The sum is first taken to be the current points’ curvature value. While looping through the support points, add to the running sum the curvature value of a support point whose absolute curvature value is less than the current points’ absolute curvature value. When that is done, set that support point’s absolute value as the minimum. Repeat the process with the other support points to get the curvature value of the current point.

With the above definitions, the stroke can be scanned just once to compute the curvature at all input points. Segmentation occurs at the local maximum of positive curvature points and local minima of negative curvature points. These points are recognized as segmentation points if the absolute value of its curvature value does not exceed a threshold. Pen speed data is used in case of a low threshold value giving incorrect recognition (due to adding unnecessary segmentation points).

On one of the evaluation tests which the method was tested on (different drawing styles on a standard set of drawings), the method achieved a 95% recognition rate.

Discussion:
Shape recognition by segmentation of re-sampled points at high curvature points is a novel alternative to the Sezgin time-dependent approach. Unlike Yu’s solution to the problems of the Sezgin’s method by employing an approach independent of time, the Kim method does not entirely do away with time. The time component is only used sparingly though, as it is employed as a reference as opposed to a dominant feature. Instead, the focus of the Kim method in segmentation at high curvatures deals with the problems of corner-finding difficulties in Sezgin’s method with what appears to be heavy attention on computing curvature estimations of points in relation to neighboring points. Adding local convexity and local monotonicity as features to aid in shape recognition does seem to hold promise, given the recognition results from the evaluation tests used.

One of the issues that was mentioned in the paper was the inability of their method, along with segmentation algorithms in general, to deal with strokes of smaller drawing areas. With their method employing re-sampling, it is obvious why they restrict strokes to a minimum 30x20 drawing area. Thus, images which are scaled down pose a problem to the Kim method. Judging by the tone from the writers, it seems like the kind of restriction that has to be accepted with their method since its a fact of life. Scaling up the stroke’s size, extracting the line that runs through the middle of the scaled-up stroke, and smoothing that line for shape recognition would be an obvious solution to a naïve person. If only smoothing weren’t such a trivial operation…

Sunday, September 16, 2007

Paper #7 - A Domain-Independent System for Sketch Recognition

Paper:
A Domain-Independent System for Sketch Recognition
(Bo Yu and Shijie Cai)

Summary:
According to the paper, unnaturally strict restrictions on drawing manner and lack of ability to analyze curves into primitive shapes are two main deficiencies in existing methods of sketch recognition systems. Their solution is a domain-independent system to recognize freehand sketches as primitive shapes and objects. The system’s process is divided into two stages: imprecise stroke approximation (given a stroke, approximate it to one of several predefined shapes) and post-process (analyze finished drawing, retrieve shape relations, and represent user’s intent).

In the first stage, values from direction, curvature, and feature area (a stroke’s constituent points in relation to a line, point, and arc) are extracted from a stroke. Vertex approximation takes these extracted visual features on the given stroke to check for possible shape approximation. If it can’t, the stroke is split at the highest curvature peak and approximated again. Line segment approximation is done by trying to fit a horizontal line to both the direction graph and the actual coordinates with a least-squares best fit line. In curve approximation, the system attempts to approximate a line to the direction values in order to approximate. If it can and the lengths add up to 2*pi, a circle is approximated with its center chosen as the middle of the candidate’s bounding box. If the lengths are less than 2*pi, then an algorithm for arc recognition is used instead. The special case involves strokes with self-intersections, which will fail to be recognized by the system using the approximation methods above. Instead, stroke approximation is modified by making two copies of that case, applying different strategies on those copies, and choosing the approximation with the better result.

The second stage takes the primitive shape from the first stage as input into three steps. In simple relation retrieval, the system tries to form various relations based on the shape. In cleanup, three rules are applied on the shape to remove meaningless line segments and arcs that cause abnormal curvature changes. In basic object recognition, the system processes the shape from the cleanup stage into basic geometric objects without relying on domain-specific knowledge.

In addition to the recognition system, the writers employed a powerful user interface to facilitate user input through the use of various command symbols that uses sketch recognition as well for creating, modifying, and deleting primitive shapes. When a user study of ten students was conducted on the system, primitive shape and polylines yielded 98% accuracy, arcs yielded 94% accuracy, hybrid shapes which had lines and arcs smoothly connected yielded 70% accuracy, and other types of hybrid shapes yielded 90%.

Discussion:
Based on the tone of the Yu paper, it seemed like it basically stated why the Sezgin paper is flawed and how their system fixed those flaws. From the integration of vertex detection, the handling of curve detection, and the omission of speed data, the Yu paper pretty much took the concepts that had potential in the Sezgin paper and then tried to solve the same problem by starting all over again. I might be oversimplifying it a bit, but the Yu paper basically said that it can do vertex detection just as well without the use of speed data. That's pretty bold, considering that speed data was the foundation of vertex finding in the Sezgin paper. In the end though, the problems that were discussed in the Sezgin paper did seem to have been resolved (to an extent) in the Yu paper. Bravo.

One of two interesting aspects I saw in this paper is its domain-independent approach of approximating curves. Approximating circles were an especially huge problem in the Sezgin paper, and with the approach Yu used by using the direction graph to approximate a circle is quite elegant. Using the direction graph to try to fit a line of length 2*pi in order to classify the shape as a circle and having the center as the center of the bounding box makes it seem like circle approximation is a trivial problem.

The other aspect I found interesting was how the system’s handling of overtracing was better explained in this paper compared to the Sezgin one. The issue I had with the Sezgin paper was how it would handle the difference between a circle that was overtraced and a spiral. Once again, the Yu paper had a simple yet effective solution: classify a stroke of circles with similar size as a circle of mean size, and classify a stroke with circles of ascending or descending radii as a spiral. Simple solutions to difficult problems never stop being amazing.

Sunday, September 9, 2007

Paper #6 - Sketch Based Interfaces: Early Processing for Sketch Understanding

Paper:
Sketch Based Interfaces: Early Processing for Sketch Understanding
(Tevfik Metin Sezgin, et al)

Summary:
The system described in the Sezgin paper, called the Sezgin system from now on, is one which focuses on the task of computers understanding sketches. Unlike image analysis and graphics recognition, sketch understanding poses the problem of recognizing noisy, incomplete sketches in real-time. The approach in the Sezgin system uses single stroke gestures as input, and then does early sketch processing in three phases: approximation (fitting primitive geometries of lines and curves to a set of pixels), beautification (making the approximation more appealing without changing meaning), and basic recognition.

In the first step (stroke approximation), the Sezgin system detects vertices at endpoints of the linear segments, and then detects and characterizes the curved segments. The approach detects endpoints from extrema in the curvature (maxima of change in direction with respect to arc length) and speed (minima of pen’s speed while turning corners) of the stroke to locate these vertices. Due to noise in the stroke giving false positives of the extrema, average-based filtering is used by filtering out extrema below a certain threshold (mean of the curvature data, 90% mean of the speed data). In other words, extrema where curvature’s already high and stroke speed is already low is selected. Hybrid fits, which are sets of candidate vertices, are generated from both the curvature and speed data, and the best hybrid fit is used in determining those vertices. In the second step (beautification), minor adjustments are made to the previous approximation to give the intended look. This is done by adjusting the line segments’ slopes so that lines with the same slope end up being parallel. To do so, the system first looks for a cluster of slopes from the final hybrid fit. The weighted average of those slopes in the cluster is then used to rotate the lines at their midpoints. Finally, the intersections of those lines become the new endpoints, which may result in vertices moving. In the last step (basic object recognition), the most basic geometric objects are built from the simple properties examined by the beautified sketch.

A user study was conducted to measure the system’s ease of use and accuracy of shape recognition. With the users familiarizing themselves with the system and then drawing ten shapes into it, the system had a 96% recognition accuracy and only one user did not prefer the system. The authors believe that a sketching system should have: the ability to draw arbitrary single-stroke objects, automatic feature point detection, one sketching mode to draw objects, and natural user feel, properties they believe lacking in other current freehand sketching systems.

Discussion:
The algorithm used in the Rubine paper takes into account speed data, which was abandoned in the subsequent Long paper and was also not used at all in the Paulson paper. In the Sezgin paper, speed data is embraced once more as a much more significant metric than in the previous three papers. It’s because of this that I found the contrast with the importance of time data quite interesting between the Sezgin paper and the papers by Long and Paulson. After reading the Long and Paulson papers, I wasn’t convinced that time was a completely useless metric, even though it made sense to not incorporate in the techniques used in their papers. Sezgin’s paper (so far) solidified my point that time data does have its uses.

I still have an issue with the authors point of describing a sketch system that should have the property of feeling natural to the user, due to the fact that the Sezgin system takes only single strokes as input. Obviously, it’s not possible to sketch numerous objects naturally with just single strokes. I’m guessing though that those properties were qualified in the minds of those authors, since the domain in their system consisted of just simple geometric objects.

Friday, September 7, 2007

Paper #5 - MARQS: Retrieving Sketches Using Domain- and Style-Independent Features Learned from a Single Example Using a Dual-Classifier

Paper:
MARQS: Retrieving Sketches Using Domain- and Style-Independent Features Learned from a Single Example Using a Dual-Classifier
(Brandon Paulson and Tracy Hammond)

Summary:
The MARQS (Media Albums Retrieved by a Query Sketch) application uses a search-by sketch algorithm with the goals of 1) recognizing unconstrained sketch symbols, 2) recognizing symbols from only one training example, and 3) learning from successful queries for improved accuracy. Previous work of sketch and gesture recognition stems from Rubine and Long’s work in feature-based recognizers, Sezgin and Yu’s work in geometric-based recognizers, Kato, Leung, and Chen’s work in media retrieval through sketch, and Hammond’s LADDER language.

The recognition algorithm used in MARQS uses two different classifiers, depending on the number of available training examples. Before sketch feature calculations, the sketch’s major axis, or the line formed from the two furthest points of the sketch, is calculated. The major axis is then rotated to lie horizontally to ensure all sketches are orientated similarly. When there is only one example sketch, a single classifier is used to calculate the sketch’s feature to classify. The error of the query sketch and example set comparison is calculated, and four example sketches with the lowest errors are ranked. With more than one example sketches, a linear classifier was used instead. Each query sketch entered into the system is added to the example set for future use by the linear classifier. Four global features were used in the algorithm: 1) bounding box aspect ratio, 2) pixel density, 3) average curvature, and 4) perceived corners count.

The MARQS system, which is an application for organizing photos and albums, was used to test the recognition algorithm. When the user inputs a sketch into the application, the top four photo candidates are displayed, and the user can click on the correct photos and view other candidates not in the top four results. The classification rate of the recognition algorithm was tested on a scale where 1.0 is the top search rank, and higher values correspond to lower search ranks. When 10 experiments were conducted with 1350 different sketches, the system used the single classifier had 27% usage with results of 1.7 average search rankings, and the linear classification was 73% with 1.51. That meant results were ranked in first, top two, top three, and top four were 70%, 87.6%, 95%, and 98%. More examples improved search rank results.

Notable disadvantages of the algorithm was slowdown during query search over time with the single classifier, lack of additional features, and lack of training on extreme sketch cases.

Discussion:
At this point of the class, we’ve only covered the feature-based recognizers covered in the Rubine and Long paper, so the algorithm covered in the Paulson paper of recognizing strokes at decent success rates with multi-strokes is quite impressive after just reading the single-stroke recognition papers. Considering that the recognition rates for the algorithm used in the MARQS application were achieved with still much work to be done is even more impressive. It would be interesting to see how the recognition algorithm works when used in other types of applications.

As Paulson brought up during class , there was an issue with the recognition algorithm of handling sketches which differed only in different orientations at a particular area within the sketch. If this is still an issue with the recognition algorithm, I wonder how this issue would be resolved with the current implementation. Could it be resolved with fine tuning the current implementation, or would another technique need to be appended to the current implementation?

There was only one minor issue I had with the paper itself. With the recognition algorithm, one of its global features was pixel density, or the ratio of filled (black) pixels to total pixels within the sketch’s bounding box. Since the algorithm was programmed in Java, this means that sketching with pressure, a feature not currently possible with Java, isn’t taken into account. There was a Chinese character which was used as an example in the paper though. Even though the character was simply used as an example, I think a more practical Chinese character should be used. I guess this would be analogous with the impossibility of trying to sketch gothic-styled Roman letters. The example character wouldn’t be realistically drawn since it can only be drawn with a brush, something not possible with a computer pen. This character might be relevant with image recognition, but not so relevant with sketch recognition.

Sunday, September 2, 2007

Paper #4 - "These Look Similar!" Issues in Automating Gesture Design Advice

Paper #1:
"These Look Similar!" Issues in Automating Gesture Design Advice
(Chris Long, et al)

Summary:
New interaction technologies in state-of-the-art interfaces can be difficult for interface designers, since they aren’t familiar enough with these technologies to incorporate them in their interfaces. Traditional interface tools provide little help, so the quill design tool was created to give unsolicited advice to designers on creating and improving their interface. Gestures are fast, commonly used, and easier to remember than text commands, but can be difficult for people to learn and computers to recognize. quill tries to resolve this by continually analyzing gestures of both problem types.

Experiments were done to gauge how many people judged expressions similarly, which yielded results of 99.8% accuracy for dissimilar gestures and 22.4% accuracy for similar ones. Thus, quill was developed to warn designers of gestures potentially similar to people. The gesture recognizer is trained by the designer with the Rubine algorithm by drawing ten to fifteen example gestures per gesture class, and then into gesture groups if there are many classes. Designers then test gesture recognition by drawing them. quill uses an active feedback system to offer unsolicited advice about any problems. It first analyzes gestures, then offers advice when the designer pauses. When gestures are similar or misrecognized, warnings appear with advice on fixing them.

Pilot studies of quill demonstrated various issues related to advice. The first concerns the user interface. Users were originally forced to ask for advice, but most did not, so quill gave users unsolicited advice instead. When given too late, users usually ignored them. Given at the earliest, and it may distract or interrupt the user. Given too soon, and the information may get out-of-date. Thus, advice is delayed and given as soon as a gesture is tested and evaluated. Concise meanings with hyperlinks to details in English prose and graphics are given in advice. For implementation issues, analyses are done in the background with the aim to increase user freedom and decrease user confusion. In quill, analysis can be started automatically by the system or manually by the user. User actions are disabled in advice computation for user-initiated analyses, while any action is allowed with affected analysis cancelled in system-initiated ones. The last issue was the metric which quill’s models used to predict human-perceived similarity. Users disagreed with it at times, especially when the models overestimated similarity to gestures with letter-based characteristics. The flaw was attributed to the models being based on non-letter data, so separate models for letter and non-letter shapes would be ideal.

Paper #2:
Visual Similarity of Pen Gestures
(Chris Long, et al)

Summary:
Gestures, or pen-based commands, are desirable features in pen-based user interfaces for their speed, rate of use, and iconic meanings. Problems exist with current gesture use due to difficulty for people to remember and computers and recognize. The researchers in this paper conduct gesture similarity experiments to be used in gesture design tools. Previous related work on gestures have been used in devices, such as Apple’s Newton and 3Com’s Pilot, and in various applications. Previous psychological experiments on geometric shapes simpler than gestures have been done to determine perceived similarity based on people’s stimuli and metric. INDSCAL, a version of multi-dimensional scaling (MDS), is a technique used by the researchers to reduce data set dimensions in order to see patterns plotted out.

Two trials were conducted to understand principles people use to judge gesture similarity. In the first trial of the gesture similarity experiments, a set of widely diverse gestures were shown to participants on the monitor in triads (groups of three gestures). All possibilities were generated randomly, and the participants had to choose the least similar-looking gesture to them. The goal of the experiment was to determine measurable similarity properties of gestures and to produce a model of gesture similarity. From the MDS plots, short, wide gestures registered as similar to narrow, tall ones, and both types differed from square ones. Angle of bounding box contrasted thin and square gestures, but not tall vertical and short horizontal ones. Aspect features were created to represent this.

For the second trial, three new gestures sets were designed to explore: 1) total absolute angle and aspect, 2) length and area, and 3) rotation-related features such as cosine and sine of total angle. In addition, relative importance of the features were being tested. The procedure and analysis used in this trial was similar to the one used in the previous one. For the results in determining similarity, the significance of the absolute angle couldn’t be determined in the first set, neither length nor area were significant in the second set, and gestures with horizontal and vertical lines were perceived more similar than diagonal lines.

Based on both trials, length and area were insignificant features in judging similarity, while the logarithm of aspect were more significant than the aspect itself. Both similarity experiments resulted in different similarity models, but the researchers prefer the model from the first trial since it was a slightly better similarity predictor.

Discussion:
I found the advice timing aspect of the user interface challenges in the first paper quite humorous. For some reason, the problem of when to display unsolicited advice reminds me a lot of the paper clip from older versions of Microsoft Office applications. That paper clip was so annoying. I can understand the difficulties the authors faced in trying to balance user convenience with program accuracy.

For the second paper, there were two areas that the authors brought up in the without going into much detail. I’m not exactly sure why it was omitted in the paper, because they seemed quite relevant. The first area concerned the gestures created by one of the authors for the two trials. According to the paper, the gestures were created based on one of the author’s personal intuition of spanning a wide range of possible gesture types and differences in orientation. I wish there was more elaboration on the specifics behind those gestures, since the trials were based on their uses. The second area concerned the results of the trials. According to the authors, the differences of the participants’ results from the first trial split into two different groups. The authors mentioned analyzing those groups, but didn’t mention what those differences were (to my knowledge). A pity they left out that information as well.

Paul Taele's Blog on Sketch Recognition