Distance Measures - Combining Representations

5.3 Combining Representations

5.3.2 Distance Measures

A key component of this thesis is judging the credibility of a given representa-tion/system on a particular domain or symbol in order to combine systems most effectively. Section 4.2.2 introduced three distance metrics (spatial, temporal, and conceptual) to calculate the similarity of two sketched symbols and gauge their con-fusability. These methods are related to several existing applications of distance measures for the comparison of simple images and handwriting. Calculating the simi-larity of an image or a handwritten character to templates or to members of a training data set is a common method of classifying such data. The key difference between the application of distance measures in this thesis and exiting applications is that in this work, distance measures are not used directly for classification or recogni-tion. Instead, we use distance measures to provide guidance for combining separate recognition methods.

Spatial

We define the spatial distance between symbols in terms of distances between edge direction histograms. Similar approaches have been well studied as a means of image comparison. Jain and Vailaya [30] apply edge direction histograms, similar to those used in this work, to the problem of identifying similar trademark images, while Veltkamp and Latecki [65] compare the properties and performance of a number of spatial similarity measures, including edge direction histograms, for image retrieval.

Temporal

We represent a symbol temporally as a sequence of events, which is then encoded as a string. Temporal distances are calculated as string edit distances. Cortelazzo et al.

also represent images as strings and apply a string edit distance to determine similarity and identify trademark images [10]. Their strings, however, represent spatial elements of images, rather than temporal elements as in this work, and although their distance measure is very similar to ours, their representation is fundamentally different.

Conceptual

We calculate a conceptual distance between symbols by computing the distance be-tween corresponding graph representations. This application of a distance measure is the most closely related to previous work. Similar graph distance measures have been applied to pen-based data, while the distance measures discussed above were applied to static, pixel-based images.

Sanfeliu and Fu apply a graph distance measure to the problem of classifying hand written English characters [52]. Their graph distance measure is similar to that used in this work; however, nodes and edges represent individual strokes and connections between strokes, while in our work a stroke may correspond to more than one node and edges encode additional relationships between nodes. WeeSan Lee et al. [37] and Seong-Whan Lee[36] each perform sketched symbol classification by matching graphs that contain geometric information and define alternatives to the graph distance measure used in this work.

Chapter 6 Future Work

This chapter discusses three areas of possible future work: developing more integrated combination schemes, using recognitions systems other than those employed in this thesis, and applying a combination of representations and systems to other domains and tasks.

Alternate Models for Combination

The combination methods we have developed employ representations separately, i.e.

the constituent recognition systems do not consider all available information and they do not exchange intermediate information, only final labels or segments. Combination schemes that allow more information exchange between representations would be an interesting avenue for future work. Such schemes are particularly relevant for sketch recognition because in many cases, subdecisions affect one another, and improving one subdecision may impact several neighboring decisions and result in a cascading improvement in the final result. For example, changing the interpretation of a stroke from a curve to a line may result in a collection of strokes being reinterpreted as a square, which may then affect the interpretation of a higher level symbol, and if domain knowledge is incorporated, the reinterpretation of a symbol may affect the meaning of a nearby symbol.

One way of allowing this information exchange between representations would be a combination scheme that uses iterative refinement. Each constituent system, based on

a single representation, would evaluate a sketch, share intermediate information with other systems, and then reevaluate based on new information. In this work we have treated existing systems as near black boxes, making relatively few changes to internal structures; however, such an iterative method would require different constituent systems (or significant changes to current systems) since most stand alone systems are not capable of sharing and accepting intermediate information.

A second way of combining information from different representations would be to combine evidence from multiple representations in a single probabilistic model.

Several recognition systems, in particular the conceptual system used in this work [4] and a related temporal system [56], use such models; however, these systems ignore some possible sources of information from other representations, which could be includes as observations.

Alternate Implementations of Representations

We have considered only a single recognition system based on each of the three repre-sentations (spatial, temporal and conceptual). However, it would be useful to examine different recognition implementations in order to better understand the relationship between a representation and a particular implementation of that representation. In particular, there is a question of which recognition results are due to the implemen-tation and which are due to the underlying represenimplemen-tation. A challenge in answering this question is that creating implementations requires significant development time (the three implementations used in this thesis are theses themselves). Furthermore, there are currently few systems which act on the kinds of unconstrained sketches examined in this work.

Other Symbolic Domains and Sketching Tasks

This work examines three symbolic domains (family trees, flow charts and circuit diagrams); however, there are many others to which this work could be applied. For example, military planning diagrams, chemistry molecules, and mechanical or physical systems have been used in other sketch recognition work [17, 42, 3]. Expanding our

work to new sketching domains would provide more reference points to better gauge the strengths and weaknesses of each representation/system. Including more domains with different combinations of characteristics, like the degree of temporal interspersing or pixel overlap between symbols, could help to refine the complexity metrics for domains and symbols in Chapter 4. One challenge to incorporating new domains in this work is data collection. Locating knowledgeable subjects, who are willing to provide sufficient training data, can be difficult for many specialized domains (e.g.

military planning diagrams).

In addition, we consider only recognition of symbolic sketches, but there are many other pen-based tasks for which a combination of multiple representation or systems could be useful, for example the beautification of artistic sketches. In particular, the segmentation approaches presented in Chapter 3, which do not use domain knowledge, could be useful as part of a user interface application that groups strokes to facili-tate editing or indexing, but does not perform recognition (for example the digital notebook created by Wang et al. [67]).

Chapter 7 Conclusion

This thesis is based on the idea that looking at a problem from different perspectives may make it appear easier or harder. It explores ways of improving recognition accuracy by combing different sketch representations, and recognition systems that are based on those representations. There are four main contributions of this work.

• We have developed spatial, temporal and conceptual approximate segmentation methods, which do not rely on domain knowledge.

• We have applied the segmentation methods to improve recognition accuracy.

• We have introduced confusability as a means of judging the credibility of a recognition system and as a means of combining systems.

• We have developed representation specific measures, which approximate con-fusability without the need to build systems.

We have presented two methods for combining recognition systems. The first improves recognition by improving segmentation, while the second seeks to predict how well a particular system will recognize a given domain or symbol. We have shown that combining several recognition systems based on different representations can improve the accuracy of existing recognition methods. This work brings us closer to the level of recognition accuracy necessary to apply sketch recognition in a realistic design setting.

Bibliography

[1] Aaron Adler and Randall Davis. Symmetric multimodal interaction in a dynamic dialogue. In 2009 Intelligent User Interfaces Workshop on Sketch Recognition.

ACM Press, February 2009.

[2] Fevzi Alimoglu and Ethem Alpaydin. Combining multiple representations for pen-based handwritten digit recognition. In Proceedings of the Fourth Interna-tional Conference on Document Analysis and Recognition, 1997.

[3] Christine Alvarado. A natural sketching environment: Bringing the computer into early stages of mechanical design. Master’s thesis, MIT, 2000.

[4] Christine Alvarado. Multi-Domain Sketch Understanding. PhD thesis, Mas-sachusetts Institute of Technology, August 2004.

[5] Christine Alvarado and Randall Davis. Sketchread: A multi-domain sketch recog-nition engine. In Proceedings of UIST 2004, pages 23–32, New York, New York, October 24-27 2004. ACM Press.

[6] Derek Anderson, Craig Bailey, and Marjorie Skubic. Hidden markcov model symbol recognition for sketch-based interfaces. In AAAI Fall Symposium Series, Making Pen-Based Interaction Intelligent and Natural, 2004.

[7] Ajay Apte, Van Vo, and Takayuki Dan Kimura. Recognizing multistroke ge-ometric shapes: an experimental evaluation. In UIST ’93: Proceedings of the 6th annual ACM symposium on User interface software and technology, pages 121–128, New York, NY, USA, 1993. ACM.

[8] Samy Bengio, Christine Marcel, S´ebastien Marcel, and Johnny Mari´ethoz. Confi-dence measures for multimodal identity verification. Information Fusion, 3:267–

276, 2002.

[9] Sonya Cates. Using context to resolve ambiguity in sketch understanding. Mas-ter’s thesis, MIT, 2006.

[10] G. Cortelazzo, G.A. Mian, G. Vezzi, and P. Zamperoni. Trademark shapes de-scription by string-matching techniques. Pattern Recognition, 27(8):1005–1018, August 1994.

[11] Randall Davis. Sketch understanding in design: Overview of work at the MIT AI lab. Sketch Understanding, Papers from the 2002 AAAI Spring Symposium, pages 24–31, March 25-27 2002.

[12] Randall Davis, Howard Shrobe, and Peter Szolovits. What is a knowledge rep-resentation? AI Magazine, 14(1):17–34, 1993.

[13] Vincenzo Deufemia and Michele Risi. A dynamic stroke segmentation technique for sketched symbol recognition. pages 328–335. 2005.

[14] D. C. Engelbart. X-y position indicator for a display system. U.S. Patent 3,541,541, 1970.

[15] Manuel J. Fonseca, Csar Pimentel, and Joaquim A. Jorge. Cali: An online scribble recognizer for calligraphic interfaces. In Sketch Understanding, Papers from the 2002 AAAI Spring Symposium, pages 51–58, 2002.

[16] Kenneth Forbus, R. Ferguson, and J. Usher. Towards a computational model of sketching. In Proceedings of QR2000, 2000.

[17] Kenneth D. Forbus, Jeffrey Usher, and Vernell Chapman. Sketching for military courses of action diagrams. In IUI ’03: Proceedings of the 8th international conference on Intelligent user interfaces, pages 61–68, New York, NY, USA, 2003. ACM.

[18] Leslie Gennari, Levent Burak Kara, Thomas F. Stahovich, and Kenji Shimada.

Combining geometry and domain knowledge to interpret hand-drawn diagrams.

Computers and Graphics, 29(4):547–562, 2005.

[19] E. Goldmeier. Similarity in visually perceived forms. In Psychological Issues, volume 8:1, 1972.

[20] A. L. Gorin, B. A. Parker, R. M. Sachs, and J. G. Wilpon. How may I help you?

Speech Communication, 23:113–127, 1997.

[21] Barbara Grosz and Julia Hirschberg. Some intonational characteristics of dis-course structure. In Proceedings of the International Conference on Spoken Lan-guage Processing, pages 429–432, 1992.

[22] Jungpil Hahn and Jinwoo Kim. Why are some diagrams easier to work with?

effects of diagrammatic representation on the cognitive integration process of sys-tems analysis and design. ACM Transactions on Computer-Human Interaction, 6(3):181–213, 1999.

[23] Tracy Hammond and Randall Davis. Ladder, a sketching language for user interface developers. Computers and Graphics, 28:518–532, 2005.

[24] Tracy Anne Hammond. LADDER: A Perceptually-based Language to Simplify Sketch Recognition User Interface Development. PhD thesis, Massachusetts In-stitute of Technology, Cambridge, MA, January 2007.

[25] Hongwei Hao, Cheng-Lin Liu, and H. Sako. Confidence evaluation for combining diverse classifiers. In Seventh International Conference on Document Analysis and Recognition, pages 760–765, 2003.

[26] Heloise Hse and A. Richard Newton. Sketched symbol recognition using zernike moments. In ICPR ’04: Proceedings of the Pattern Recognition, 17th Interna-tional Conference on (ICPR’04) Volume 1, pages 367–370, Washington, DC, USA, 2004. IEEE Computer Society.

[27] Jianying Hu, Michael K. Brown, and William Turin. HMM based on-line hand-writing recognition. IEEE Transactions on Pattern Analysis and Machine Intel-ligence, 18:1039–1045, October 1996.

[28] J.J. Hull. A database for handwritten text recognition research. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 16(5):550–554, May 1994.

[29] Takeo Igarashi, Satoshi Matsuoka, and Hidehiko Tanaka. Teddy: A sketching interface for 3d freeform design. SIGGRAPH 99, pages 409–416, August 1999.

[30] Anil K. Jain and Aditya Vailaya. Image retrieval using color and shape. Pattern Recognition, 29(8):1233–1244, 1996.

[31] Levent Burak Kara and Thomas F. Stahovich. An image-based, trainable symbol recognizer for hand-drawn sketches. Computers and Graphics, 29(4):501–517, 2005.

[32] Manolya Kavakli and John S. Gero. Sketching as mental imagery processing.

Design Studies, 22(4):101–122, 2001.

[33] Josef Kittler, Mohamad Hatef, Robert P.W. Duin, and Jiri Matas. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226–239, 1998.

[34] James A. Landay and Brad A. Myers. Sketching interfaces: Toward more human interface design. Computer, 34(3):56–64, 2001.

[35] Jill H. Larkin and Herbert A. Simon. Why a diagram is (sometimes) worth 10, 000 word. Cognitive Science, pages 65–99, 1987.

[36] Seong-Whan Lee. Recognizing hand-drawn electrical circuit symbols with at-tributed graph matching. In Structured Document Image Analysis, pages 340–

358. Springer-Verlag, 1992.

[37] Weesan Lee, Levent Burak Kara, and Thomas F. Stahovich. An efficient graph-based symbol recognizer. In EUROGRAPHICS Workshop on Sketch-Based In-terfaces and Modeling, pages 11–18, 2006.

[38] Hector J. Levesque and Ronald J. Brachman. A fundamental tradeoff in knowl-edge representation and reasoning (revised version). In Ronald J. Brachman and Hector J. Levesque, editors, Readings in Knowledge Representation, pages 41–70.

1985.

[39] A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels.

Visual similarity of pen gestures. In SIGCHI conference on Human factors in computing systems, pages 360–367, 2000.

[40] Michael Oltmans. Envisioning Sketch Recognition: A Local Feature Based Ap-proach to Recognizing Informal Sketches. PhD thesis, Massachusetts Institute of Technology, May 2007.

[41] Michael Oltmans, Christine Alvarado, and Randall Davis. Etcha sketches:

Lessons learned from collecting sketch data. In Making Pen-Based Interaction Intelligent and Natural, pages 134–140, Menlo Park, California, October 21-24 2004. AAAI Fall Symposium.

[42] Tom Y. Ouyang and Randall Davis. Recognition of hand drawn chemical dia-grams. In Proceedings of AAAI, pages 846–851, 2007.

[43] Tom Y. Ouyang and Randall Davis. A visual approach to sketched symbol recog-nition. In Proceedings of the 2009 International Joint Conference on Artificial Intelligence (IJCAI), 2009.

[44] Charles Peirce. What is a sign?, volume 2 of The Essential Peirce, chapter 2.

Indiana University Press, 1998.

[45] Rejean Plamondon and Sargur N. Srihari. On-line and off-line handwriting recog-nition: A comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22:63–84, January 2000.

[46] A. T. Purcell and J. S. Gero. Drawing and the design process. Design Studies, 19:389–430, 1998.

[47] A.F.R. Rahman and M.C. Fairhurst. A study of some multi-expert recognition strategies for industrial applications: Issues of speed and reliability. In Interna-tional Conference on Vision Interfaces, pages 569–574, 1999.

[48] Ahmad Fuad Rezaur Rahman and Michael C. Fairhurst. Multiple classifier decision combination strategies for character recognition: A review. IJDAR, 5(4):166–194, 2003.

[49] Fuad Rahman, Yuliya Tarnikova, Aman Kumar, and Hassan Alam. Second guessing a commercial ’black box’ classifier by an ’in house’ classifier: Serial classifier combination in a speech recognition application. In Multiple Classifier Systems, pages 374–383, 2004.

[50] Dean Rubine. Specifying gestures by example. In Computer Graphics, volume 25(4), pages 329–337, 1991.

[51] Dymitr Ruta and Bogdan Gabrys. An overview of classifier fusion methods. In Computing and Information Systems, volume 7, pages 1–10, 2000.

[52] Alberto Sanfeliu and King-Sun Fu. A distance measure between attributed rela-tional graphs for pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, 13(3):353–362, 1983.

[53] Eric Saund, David Fleet, Daniel Larner, and James Mahoney. Perceptually supported image editing of text and graphics. In Proceedings of UIST ’03, 2003.

[54] Andrew W. Senior and Anthony J. Robinson. An off-line cursive handwriting recognition system. IEEE Transactions on Pattern Analysis and Machine Intel-ligence, 20:309–321, March 1998.

[55] Tevfik Metin Sezgin and Randall Davis. Hmm-based efficient sketch recognition.

In Proceedings of the International Conference on Intelligent User Interfaces (IUI’05), pages 281–283, New York, New York, January 9-12 2005. ACM Press.

[56] Tevfik Metin Sezgin and Randall Davis. Sketch interpretation using multi-scale models of temporal patterns. IEEE Computer Graphics and Applications, 27(1):28–37, January 2007.

[57] Tevfik Metin Sezgin, Thomas Stahovich, and Randall Davis. Sketch based in-terfaces: Early processing for sketch understanding. In The Proceedings of 2001 Perceptive User Interfaces Workshop (PUI’01), Orlando, FL, November 2001.

[58] M. Shilman, H. Pasula, and S. Russel R. Newton. Statistical visual language models for ink parsing. In Proceedings of the AAAI Spring Symposium on Sketch Understanding, pages 126–32, 2002.

[59] M. Shilman, P. Viola, and K. Chellapilla. Recognition and grouping of hand-written text in diagrams and equations. In Ninth International Workshop on Frontiers in Handwriting Recognition, pages 569–574, Oct. 2004.

[60] Saul Simhon and Gregory Dudek. Sketch interpretation and refinement using statistical models. pages 23–32. Eurographics Symposium on Rendering, 2004.

[61] Salvatore J. Stolfo, Zvi Galil, Kathleen McKeown, and Russell Mills. Speech recognition in parallel. In Human Language and Technology Conference Work-shop on Speech and Natural Language, pages 353–373, 1989.

[62] Ivan B. Sutherland. Sketchpad, a man-machine graphical communication system.

Proceedings of the Spring Joint Computer Conference, pages 329–346, 1963.

[63] David G. Ullman, Stephen Wood, and David Craig. The importance of drawing in the mechanical design process. Computer and Graphics, 14(2):263–274, 1990.

[64] Remko van der Lugt. How sketching can affect the idea generation process in design group meetings. Design Studies, 26(2):101–122, March 2005.

[65] Remco C. Veltkamp and Longin Jan Latecki. Properties and performances of shape similarity measures. In Content-Based Retrieval, volume 06171, 2006.

[66] Olya Veselova and Randall Davis. Perceptually based learning of shape descrip-tions. Proceedings of the Nineteenth National Conference on Artificial Intelli-gence (AAAI-04), pages 482–487, 2004.

[67] Xin Wang, Michael Shilman, and Sashi Raghupathy. Parsing ink annotations on heterogeneous documents. In Eurographics Workshop on Sketch-Based Interfaces and Modeling, pages 43–50, 2006.

[68] Xiaogang Xu, Wenyin Liu, Xiangyu Jinand, and Zhengxing Sun. Sketch-based user interface for creative tasks. In Asia Pacific Conference on Computer Human Interaction, 2002.

[69] Jianfeng Yin and Zhengxing Sun. An Online Multi-stroke Sketch Recognition Method Integrated with Stroke Segmentation, pages 803–810. Lecture Notes for Computer Science. Springer-Verlag Berlin Heidelberg, 2005.

[70] Shane W. Zamora and Eyrun A. Eyjolfsdottir. Circuitboard: Sketch-based cir-cuit design and analysis. In IUI Workshop on Sketch Recognition, February 2009.

In document Combining Representations for Improved Sketch Recognition (Page 81-96)