Friday, September 7, 2007

Paper #5 - MARQS: Retrieving Sketches Using Domain- and Style-Independent Features Learned from a Single Example Using a Dual-Classifier

Paper:
MARQS: Retrieving Sketches Using Domain- and Style-Independent Features Learned from a Single Example Using a Dual-Classifier
(Brandon Paulson and Tracy Hammond)

Summary:
The MARQS (Media Albums Retrieved by a Query Sketch) application uses a search-by sketch algorithm with the goals of 1) recognizing unconstrained sketch symbols, 2) recognizing symbols from only one training example, and 3) learning from successful queries for improved accuracy. Previous work of sketch and gesture recognition stems from Rubine and Long’s work in feature-based recognizers, Sezgin and Yu’s work in geometric-based recognizers, Kato, Leung, and Chen’s work in media retrieval through sketch, and Hammond’s LADDER language.

The recognition algorithm used in MARQS uses two different classifiers, depending on the number of available training examples. Before sketch feature calculations, the sketch’s major axis, or the line formed from the two furthest points of the sketch, is calculated. The major axis is then rotated to lie horizontally to ensure all sketches are orientated similarly. When there is only one example sketch, a single classifier is used to calculate the sketch’s feature to classify. The error of the query sketch and example set comparison is calculated, and four example sketches with the lowest errors are ranked. With more than one example sketches, a linear classifier was used instead. Each query sketch entered into the system is added to the example set for future use by the linear classifier. Four global features were used in the algorithm: 1) bounding box aspect ratio, 2) pixel density, 3) average curvature, and 4) perceived corners count.

The MARQS system, which is an application for organizing photos and albums, was used to test the recognition algorithm. When the user inputs a sketch into the application, the top four photo candidates are displayed, and the user can click on the correct photos and view other candidates not in the top four results. The classification rate of the recognition algorithm was tested on a scale where 1.0 is the top search rank, and higher values correspond to lower search ranks. When 10 experiments were conducted with 1350 different sketches, the system used the single classifier had 27% usage with results of 1.7 average search rankings, and the linear classification was 73% with 1.51. That meant results were ranked in first, top two, top three, and top four were 70%, 87.6%, 95%, and 98%. More examples improved search rank results.

Notable disadvantages of the algorithm was slowdown during query search over time with the single classifier, lack of additional features, and lack of training on extreme sketch cases.

Discussion:
At this point of the class, we’ve only covered the feature-based recognizers covered in the Rubine and Long paper, so the algorithm covered in the Paulson paper of recognizing strokes at decent success rates with multi-strokes is quite impressive after just reading the single-stroke recognition papers. Considering that the recognition rates for the algorithm used in the MARQS application were achieved with still much work to be done is even more impressive. It would be interesting to see how the recognition algorithm works when used in other types of applications.

As Paulson brought up during class , there was an issue with the recognition algorithm of handling sketches which differed only in different orientations at a particular area within the sketch. If this is still an issue with the recognition algorithm, I wonder how this issue would be resolved with the current implementation. Could it be resolved with fine tuning the current implementation, or would another technique need to be appended to the current implementation?

There was only one minor issue I had with the paper itself. With the recognition algorithm, one of its global features was pixel density, or the ratio of filled (black) pixels to total pixels within the sketch’s bounding box. Since the algorithm was programmed in Java, this means that sketching with pressure, a feature not currently possible with Java, isn’t taken into account. There was a Chinese character which was used as an example in the paper though. Even though the character was simply used as an example, I think a more practical Chinese character should be used. I guess this would be analogous with the impossibility of trying to sketch gothic-styled Roman letters. The example character wouldn’t be realistically drawn since it can only be drawn with a brush, something not possible with a computer pen. This character might be relevant with image recognition, but not so relevant with sketch recognition.

3 comments:

rg said...

I think that the problem of orientation and style will always haunt you when you push for a style , scale, speed, and domain independent recognizer. It is a huge constraint on what features you can use, but helps the user. Many problems come down to similar tradeoffs, this one just hasn't been fully explored.

- D said...

It seems like since this application is being tailored to one specific user (I'm not going to search your journal for a sketch you made, on my honor as an Aggie), you would want, in this case, to have features that /were/ user dependent. Perhaps not too much so you didn't account for variations from day to day. But I don't need to be general enough to detect the way you draw a slice of pizza when I'm the only one searching my photos for those pictures of pizza. If you are searching my photos, how will you know I tagged them with a sketch of a slice of pizza and not a whole pie?

Grandmaster Mash said...

If Java ever does incorporate pressure, though, the pixel density algorithm could still work since we could then find which pixels should be colored using the pressure & point location information. Still, I think the Chinese character was more of an example for why the system should avoid geometric based models.