Paul Taele's Blog on Sketch Recognition: August 2007

Friday, August 31, 2007

Paper #3 - Specifying Gestures By Example

Paper:
Specifying Gestures By Example
(Dean Rubine)

Summary:
A gesture-based drawing program (GDP) is an example gesture-based application in which the user inputs a gesture for the program to classify it accordingly. Input begins with the user positioning the cursor on the screen with a mouse press and drawing the gesture, and ends with the user either releasing the mouse button or ceasing movement of the mouse while the button is still pressed. Designing gestures is done by sending those gestures from a click-and-drag interface to an application, both which are built by the GRANDMA object-oriented toolkit. A GDP’s output mechanism is done via its View class hierarchy, which is further divided into two classes. Instances of the GdpTopView class refer to the window in which the GDP runs, and instances of the GraphicObjectView class are a set of either lines, rectangles, ellipses, text object, or a combination of them. Editing gesture attributes consist of entering semantic expressions to its three semantic components: 1) recog (i.e., evaluated during gesture recognition), 2) manip (i.e., evaluated during subsequent mouse points), and 3) done (i.e., evaluated during mouse button release).

The gesture recognition problem first involves having each gesture class specified by an example gesture, where there are C’ gesture classes labeled from 0 to C’ – 1. An input gesture is thus classified to a gesture class it resembles closest to. Statistical gesture recognition consists of computing a vector of features from an input gesture, then classifying it to one of the C’ possible gestures. For this particular paper, thirteen geometrical and algebraic features are used. Ambiguous gestures, or input gestures classified to more than one gesture class, are generally rejected.

Evaluating the performance of the GDP, a 98% rate of classification was achieved for classifiers with 15 or more examples per gesture class from a group of 15 or less classes. Less examples dropped recognition to 96%. Some GDP versions employed a couple of recognition extensions. One is the eager recognition, a kind of recognition in which a GDP recognizes a gesture during mid-creation. The other is multi-finger recognition, in which gestures drawn with multiple fingers are treated as multi-path data. The single-stroke algorithm applies each of those paths individually and combines the results to classify the multi-path gesture.

Discussion:
Trying to map an input which a user has drawn to a member of a set of examples is no easy task, yet the GDP in Rubine’s paper takes a novel approach by requiring that the input be drawn in one stroke. Problems are compounded when attempting to classify a gesture with multiple strokes, so having the user draw gestures with a single stroke does away with those complexities. Yet the significance of a GDP in simplifying those complexities to one stroke is the fact that it does so while maintaining a high rate of accuracy.

One fault that I see in the approach demonstrated in the Rubine paper is the idea of forcing the user to conform to a certain style instead of having the program do all the conforming. This introduces an unnatural feel to the user in writing naturally multiple-stroke gestures into one stroke. Limiting gestures to only one stroke can also introduce clashing with input gestures, resulting in an input gesture not in the example set being classified incorrectly to one that does. For example, when the mathematical equal sign is drawn like the letter Z to make the symbol into a one-stroke gesture, this introduces problems when the user wishes to have the Z character classified as the letter Z. This would be analogous to taking the absolute value of integers or converting a real number to an integer, where precision would be lost during the mapping.

A natural evolution to the approach in the Rubine paper would be to incorporate multi-stroke gestures in the classification process. But it seemed like the Rubine method resorted to classifying single-stroke gestures in order to, as stated earlier, avoid the inherent problems found in multi-stroke gestures. Expanding on the Rubine method to incorporate classification of such gestures seem to be no better than not having the Rubine method at all. Perhaps the ideas of the multi-finger recognition extension covered near the end of Rubine’s paper could be exploited in an approach to classifying the more complex multi-stroke gestures.

Thursday, August 30, 2007

Paper #2 - Sketchpad: A Man-machine Graphical Communications System

Paper:
Sketchpad: A Man-machine Graphical Communications System
(Ivan Sutherland)

Summary:
The Sketchpad system is a system which allows a human user to interact with the computer by drawing lines on the screen. With the system, a set of buttons issue specific commands, switches activate functions, a light pen indicates position information and points to existing objects, and knobs rotate and magnify picture parts. Using Sketchpad to construct drawings serves as a model for the design process, giving the advantages of 1) storing and updating drawings, 2) gaining understanding of operations graphically, 3) serving as a topological input device for circuit simulations, and 4) producing highly repetitive drawings.

A hierarchal data structure is employed by Sketchpad, separating information of the general and specific parts of a drawing. This structure allows modifying details of the specifics without needing to change the general. Input is handled by a light pen, which serves a dual role of positioning new parts of a drawing and pointing to existing parts for modifications. Fairly extensive drawings can be achieved by its power of locking and tracking of picture parts through pseudo pen location and demonstrative language programs. Display is stored in a large input table, where magnification is handled by operating knows and geometric generation is computed with difference equations. There is an additional option to display modifiable abstractions, or user-desired properties, through circular symbols and numerical values.

Operations in Sketchpad are handled recursively with three very generalized functions in order to handle any fixed geometry subpicture: 1) expansion of instances (unlimited levels of subpictures for desirability), 2) recursive deletion (removal of all related subpictures from original deletion for consistency), and 3) recursive merging (combining two similar picture parts for complexity). The use of atomic operations for creating basic geometric objects thus makes it possible to create more complex geometric objects. One major feature in Sketchpad is the user’s ability to apply specific mathematical conditions on existing drawn parts. By defining a type of constraint for the computer to satisfy these conditions, a drawing can take the exact shape desired.

Sketchpad has been and can potentially be applied to various applications, ranging from producing patterns to artistic drawings. What Sketchpad hopes to achieve is the ability to make it more worthwhile to draw on a computer than by hand.

Discussion:
This is a paper about a system which allows a user to draw lines with a pen device on the screen and modifying those lines with a series of knobs, dials, and switches. That in itself, to a modern reader, is not too exciting. But for a paper written over four decades, that’s amazing and quite groundbreaking. Subjects covered in the paper such as precursors to graphical user interfaces and object-orientated programming are significant in themselves, but one of the things I find most interesting about this paper is how it is still easily readable.

One fault I did see with the technology presented in the paper is the light pen’s practicality as the dominant input device for computers had the mouse not existed. One advantage of the mouse that people take for granted is its ability to be used seamlessly with a keyboard without the user tiring much from prolonged use. The reason is probably because the mouse lies on the surface, so the user’s hand does not need to lift the mouse up for an extended time. That is not the case with the light pen. If the light pen were used in conjunction with a keyboard in a similar fashion as we use a mouse and a keyboard nowadays, I believe it would be far more tiring. Sketchpad's input system would have been a viable supplement to computers equipped with a mouse had it been economically feasible in its introduction, but it would not have been a successful substitute for it.

Since the paper was written, much progress has been made to touch/pen-based input with the Tablet PCs (covered to an extent in the Hammond paper in the previous blog post). But if the Sketchpad system had the success that the mouse had in that time period, a logical evolution would have been to incorporate a dual monitor system, one where the monitor stands up on the surface to cope with keyboard input, while the other would lie flat on the surface to cope with comfort in the same way as writing on paper does. In addition, I would have incorporated supplementary buttons on the pen, analogous to the ones found on the mouse.

Paper #1 - Introduction to Sketch Recognition

Paper:
Introduction to Sketch Recognition
(Tracy Hammond and Kenneth Mock)

Summary:
Sketch Recognition had its beginnings in 1963 with Dr. Ivan Sutherland’s vector-based Sketchpad system, consisting of a light pen and keyboard input to allow the user to produce complicated 2-D graphics. Despite its graphical superiority of smooth lines over raster-based computers, computers with raster graphics and its mouse input prevailed with its lower cost and superiority in other graphical features. Difficulties in sketching with a mouse, as well as recent technological advances, allowed for the emerging Tablet PC technology to penetrate the market.

Tablet PCs are, in essence, notebook computers with the added feature of touch or pen-based input capabilities. To track an input’s location, a digitizer is employed by the Tablet PC. There exists three types of digitizers: 1) passive (i.e., touch), 2) active (i.e., specialized pen), and 3) hybrid (i.e., combination of both). Two common physical styles also exist for the Tablet PC: 1) slate (i.e., keyboard-less) and 2) convertible (i.e., twist/rotate monitor). The operating system most commonly supported by Tablet PCs is Windows, but aftermarket modifications and installation of additional drivers allow for support with Mac OS X and Linux too.

Instructors who adapt Tablet PCs into their lectures discover flexibility in their teaching methods by allowing dynamic functionality over traditional static teaching materials, but disadvantages in implementing them come in greater cost in money and resources compared to traditional teaching methods. Student use of Tablet PCs in the classroom setting showed advantages of more detailed note-taking, greater collaboration with teachers and classmates, and also additional academic resources at their disposal.

One framework to help instructors produce sketch recognition interfaces in their curriculum is the FLUID framework. This framework implements the LADDER language and GUILD sketch recognition generator to let instructors not have to deal with programming sketch recognition code. The technology instead allows the instructor to focus on specifying shapes to be used in the curriculum with the framework. Two particular case studies of Tablet PCs used in the classroom setting involved the use of video recordings in the first case and tablet usage to post curriculum material in the second case.

Discussion:
With Tablet PCs combining the features that conventional notebook PCs already have with the added functionality of touch and pen-based input, its significance can be seen in opening up new uses and extending current ones without losing capabilities that notebook PCs already have. In terms of its use in pedagogy, Tablet PCs implementation to long-standing traditional methods is also significant. Instructors can advance their teaching methods by extending or easing their current teaching methods with Tablet PCs, while students can use Tablet PCs to supplement and better elaborate what they are already learning.

Despite the advantages of Tablet PCs, there will be cases when its technologies may be used in areas that may not be needed or is simply overkill, as a parody of Microsoft’s Surface demonstrates:

Since the paper’s release, there has been some progress with overcoming the limitations of passive digitizers. For example, the iPhone demonstrated the ability to overcome the passive digitizers’ inability to distinguish single-click from double-click by tapping two fingers on the screen for double-tap instead of one finger. The video below of touch screen typing on the iPhone demonstrates how advances can be made in the merging of convenience in passive digitizers with versatility in active digitizers:

Along with the advances made by the iPhone, touch and pen-based Tablet PCs can definitely benefit from combining the advantages of passive and active digitizers while overcoming the imprecision of passive digitizers and the overhead inherent in current active digitizers.

About Me

Hi, I'm Paul Taele. I got my Bachelors in Computer Sciences and Mathematics at the University of Texas at Austin (yes, I'm a Longhorn). I also studied Chinese for three semesters at National Chengchi University (Taipei, Taiwan).

Year:
Masters in Comp Sci, 1st Year

E-mail:
ptaele [at] gmail [dot] com

Academic Interests:
Don't quote me on this, but I'll say Neural Networks and Multi-Agents for now.

Relevant Experience:
I doubled in CS and math at UT. I primarily programmed in Java while there, since that's all that was taught. I took a variety of AI courses at the time: AI, Neural Networks, Robotics, Cognitive Science, Independent Research (on the CS/EE-side of neural networks), and Independent Study (on the math-side of neural networks). That was fun.

Why I'm taking this class?
Because Dr. Choe told me to. Haha, not really (highly recommended it though). Besides Dr. Choe's recommendation, I found the course topic interesting, especially after I saw an application of sketch recognition (I think?) called the T.E.D.D.Y. system on YouTube several months earlier:

What do I hope to gain?
Without sounding too philosophical, I hope this course gives me a better insight of how AI is done outside of (or besides) my main interest in neural networks.

What do I think I will be doing in 5 years?
Still being a student? I hope that doesn't happen, but it won't surprise me if I still am.

What do I think I will be doing in 10 years?
I'm not sure. Thinking that far scares me.

What are my non-academic interests?
I'm a big fan of East Asian movies, TV shows, and music. This is a consequence of studying Asian languages for several years, I guess.

My funny story:
I didn't plan on being a CS major during undergrad (I was originally a business major at the University of Southern California). When I went to UT afterwards, I decided to do pre-CS though since all my friends were doing it (at UT, students have to apply as a CS major typically in their third year). Out of my friends, I was the only one who got accepted into the CS program. I made new friends and decided to double major in math as well since they all were also CS and math majors. Turns out that I was the only one out of my friends whom didn't drop math as a major. I made some more friends, and we all vowed to go to grad school after we graduated. Well...yeah, I'm the only one out of them that went immediately into grad school. Wait a minute, that's not a funny story at all. That's just plain sad...

Random Stuff #1 - Why is my blog pink?
It's not pink, it's salmon. And I think it looks hilarious. I figured no one else would choose this template, especially due to the lack of female classmates in this course.

Random Stuff #2 - Why am I doing grad school at A&M?
Haha, I sometimes wonder what I'm doing here after doing undergrad at UT. When I was thinking of doing grad school, my professors told me to go somewhere else for CS grad to gain a different perspective. I focused on A&M and UT Dallas because they were both in-state schools with decent CS programs, even though it turned out my profs originally wanted me to go out-of-state. In the end, I chose A&M for two reasons: 1) they gave me more money, and 2) Dr. Choe, one of the professors I would like to have on my advising committee, also went to UT (in fact, my prof for AI and neural nets and my prof for robotics at UT were his advisers when he was working on his PhD). Anyway, my friends gave me a new nickname: Traitor.

Random Stuff #3 - What do I think of College Station/Bryan?
It sucks.

Paul Taele's Blog on Sketch Recognition