5.4 Stage-2 training and classifier
5.5.2 Gesture recognition
Preliminary results for gesture recognition are presented in this section. By different of training, there are two kinds of gesture introduced. One trains with general dataset of skin and
other trains with specific user. The results of both variations for each user in the database are presented in Table 5-2.
Table 5-2: Recognition rates for gesture recognition approaches with user specific training.
User 1 2 3 4
General training 79 64 100 91
Specific training 82 72 100 93
Generic gesture recognition was applied to the command videos from a subset of the database. The gesture classifier was trained with two command videos from each user (therefore using 4 training videos per gesture). Testing was performed on the remaining video. Overall correct classification performance rates were 83.5%. Also, user-specific gesture recognition was applied to all users for comparison. The results of each user in the database are presented in Table 5-2. Signers wore clothing with no sleeves which resulted in incorrect hand localisation and hand shape classification) for all users, as can be seen all users show improvements in classification performance for the user-specific classifier except the 3rd user.
For the next experiments, we employed the hand tracking with different techniques including our approach. Aux particle filter and mean shift iteration. Then the sequences of the positions of both left and right hands from each technique provide to the gestures mechanism and the results are indicated in Table 5-3.
Table 5-3: Comparison of gesture recognition rates from different hand tracking techniques
User 1 2 3 4
Proposal approach 78 78 92 85
Aux particle filter 71 71 71 78
Mean shift 64 64 64 64
The results from a set of experiments indicate the performance of each tracking technique by using the position of hands in the gesture recognition decision. The performance of classification by employing the position from our approach is superior to the other tracking
Chapters. Vision-based recognition interface systems
approaches. Overall correct classification performance rates were 83.25% with our approach, 72.75% with Aux particle filter and 64% with mean shift iteration. The incorrect classifications occur mostly with the withdraw command and often arise with the see balance, print statement and see statement commands. The reasons for the classification failure are the ambiguity between the hand and face or left and right hands.
5.6 Conclusions
In this chapter, the visual analysis module for an intelligent cash machine application has been presented to assess the performance of the hand tracking schemes. Although it has been primarily developed for this application, it might be used for other scenarios. It combines the detection and tracking of persons and their body parts (face, hands) with the recognition of their identity and gestures. From the experiments, the system could be used as an implementation of the environmental analysis module. The results show the comparison of quality of each hand tracking techniques. The particle filter embeds the mean shift-vector performed the accuracy than other techniques. Future work will focus on integrating the visual analysis module with the other modules required for a fully functional intelligent cash machine.
Chapter 6
Conclusion and Future Research
6.1 Summary
The hand is an important part of human communication, and because of this, several directions for research are presented in this thesis for the field of hand tracking. The main contribution of this thesis is a hand tracking algorithm, based on the particle framework, as well as the evaluation of tracking by adding tracking into gesture process. In Chapter 3, the integration of multiple cues for robust tracking was presented. Only skin colour had been developed for two hands tracking in Chapter 4 including. Also, the evaluation of hand tracking was presented to assess the quality compared with ground truth positions in Chapter 5.
It is noticeable that many tracking problems in vision based interaction share common difficulties, i.e., the lack of appropriate supervised information, and the highly articulated and deformable shape of the hand. From feature selection, the hand tracking problem also has the capability to cope with conditions, such as cluttered backgrounds, and objects blurred due to motion. From this point of view, hand tracking with multiple cues is employed to deal with these difficulties. The particle filter is able to cope with non-linear problems, such as hand tracking. The selection of the features to use in tracking is based on the problems introduced by hand movement. When hands move, blurred blobs may occur. The cues that can track in this scenario are the detection of blurring blob as well as motion vectors respect to previous frame. From this observation, features, such as colour and hand motion are employed to be the observation model in the particle filter. To obtain the colour skin in the frame, we employ the parametric skin colour model to detect skin as well as the diffusion. The results from this tracking scheme indicate that the accuracy is superior to mean shift iteration. However, the problems with the scheme are mis-tracking and lost tracking. Mis-tracking occurs when hands are moved near to the face. In cluttered backgrounds, containing colours similar to the colour of skin, lost tracking can arise.
Chapter 6. Conclusion and Future Research.
In Chapter 4, a technique for tracking two hands with a single feature is described. By contrast, we employed only the skin colour as a feature for tracking the hand, to avoid the computational complexity. The framework of the tracker sill employs the particle filter. We enhance the tracking performance by changing the re-sampling process. We utilize the mean shift vector before propagation into the observation model. The improvement of tracking had been indicated by comparing with the different tracking schemes such as the Aux particle filter as well as mean shift iteration. The experiments compare some of the most important factors, such as the accuracy and robustness. The author’s approach can perform better than the others in terms of both the accuracy and robustness. Throughout all of the experiments, the speed of the hand movement, background clutter and sources of light are the reasons for mis-tracking and lost tracking. For example, in the experiments, the skin tone colour is similar to the background, especially when the subjects have white skin. In un-controlled environments, bright light can cause hands to be too bright in some poses. The position of the light source can also significantly affect the performance of tracking. This is because light can produce shadows and bright areas on the hands.
To evaluate the hand tracking algorithm, we employ a gesture framework to be the benchmark. The comparison indicates the performance of each tracking scheme with respect to ground truth position. From the number of the commands, we can conclude that the evaluation framework can demonstrate the different performance between methods.
6.2 Future research directions
This section identifies a number of directions for future work. An advantage of the tracking approach, which was taken in this thesis, is in the sense that application-specific requirement can be included effortlessly.
• Combination of features: Although the accuracy of hand tracking adequate with the single feature as the skin, multiple features, such as skin colour, motion or shape can be used to increase the efficiency.
• O bservation features: The design of better similarity measures will directly, increase the robustness and accuracy of the tracker. It will be interesting to investigate using different filters as features. Also the integration of the other features with different combinations could lead to different cost function
Skin model: In cases where single features are used, stationary skin modelling is commonly employed. An alternative for coping with the problem, such as blurring and shadowing on hands is non-stationary skin modelling.
Particle filter: In the particle filter, there are several factors that can be changed and adjusted to increase the performance, such as the number of sampling and the re-sampling techniques
Extension to multiple views: The particle-filtering framework presented in this thesis works with a single view. To extend the possible positions of the hand, capture from multiple views can be utilized in order to improve the accuracy of the tracker.
Experimental comparison with more other approaches: The step from the research laboratory to commercial applications typically leads via an evaluation of the competing methods on benchmark data. Possibly due to the ambiguity of the task, the difficulty in obtaining ground truth positions, as well as different application goals, there is no publicly available database for hand tracking.
Evaluation framework of the hand tracking: The benchmark to evaluate the hand tracking still needs to set up as the general framework. Although we can compare the performance of the different tracking schemes, these schemes can be unfair judgment as different skin detection or motion. Moreover, this framework will help with fine-tuning the tracking.
Other applications: There are the open issues for hand-tracking in several applications, which mainly are the interface design of hand recognition such as manipulated machine in reality world and control equipments such as collaborative table with hand sign.
Bibliosraphv
Bibliography
[1] Q. Liu, S. Levinson, Y. Wu, and T. S. Huang, "Interactive and incremental learning via a mixture of supervised and unsupervised learning strategies," in Proceedings of
the Fifth Joint Conference on Information Sciences, 2000, pp. 555-558.
[2] K. Ignasiak, M. Morgos, and S. Ongkittikul, "Architecture of Information System for Intelligent Cash Machine," International Conference Signal Processing and
Multimedia Applications (SIGMAP 2007), pp. 28-31, 2007.
[3] B. Kisacanin, V. Pavlovic, and T. S. Huang(Eds.), Real-Time Vision for Human-
Computer Interaction: Springer, 2005.
[4] R. G. O'Hagan, A. Zelinsky, and S. Rougeaux, "Visual gesture interfaces for virtual environments," Interacting with Computer, vol. 14, pp. 231-250, 2002.
[5] C. V. Hardenberg and F. B'erard, "Bare-hand human-computer interaction," in
Proceeding ACM Workshop on Perceptive User Interfaces, 2001.
[6] Y. Wu and T. S. Huang, "Human hand modelling, analysis and animation in the context of HCI," in Proc. IEEE Int’l Conf on Image Processing, vol. 3, 1999, pp. 6- 10.
[7] Y. Wu and T. S. Huang, "Vision-Based Gesture Recognition: A Review," in Lecture
Notes in Computer Science, 1999, pp. 103-115.
[8] "http://www.dpt.state.va.us/comcur/commcurr/2000/mar/." [9] J. Napier, Hands. New York: Pantheon Books, 1980.
[10] C.-S. Chua, H. Guan, and Y.-K. Ho, "Model-based 3D hand posture estimation from a single 2D image," Image and Vision Computing, vol. 20(3), pp. 191-201, March 2002. [11] Y. J. Lin, Y. Wu, and T. S. Huang, "Modeling the Constraints of Human Hand
Motion," in Proceedings o f the 5th Annual Federated Laboratory Symposium
(ARL200I), 2001, pp. 105-110.
[12] Joint Motion: Method o f Measuring and Recording. New York: Churchill Livingstone,
1988.
[13] Y. S. C. Edmund, K.-N. An, W. P. C. HI, and R. L. Linscheid, Biomechanics ofthje
hand: A basic Research study. Singapore: World Scientific Publishing Co. Pte. Ltd.,
1989.
[14] J. J. Kuch and T. S. Huang, "Vision-based hand modelling and tracking for virtual teleconferencing and telecollaboration," in Proc. IEEE Int’l Conf on Image
Processing, \995, pp. 666-611.
[15] J. M. Rehg and T. Kanade, "Model-based tracking of self-occluding articulated objects," in Proc. 5th o f IEEE International Conference on Computer Vision, Cambridge, 1995, pp. 612-617.
[16] G. Ye and G. D. Hager, "Appearance-based visual interaction," in Technical report, CIRL Lab Technical Report,Johns Hopkins University, 2002.
[17] Y. Zhu, Y. Huang, G. Xu, and Z. Wen, "Appearance Based Visual Interpretation of Isolated Hand Gestures," in Proceedings o f 5th Int'l Conference on Mechatronics and
Machine Vision in Practice (M2VIP'98), 1998, pp. 179-184.
[18] J. Lee and T. L. Kunii, "Model-Based Analysis of Hand Posture," IEEE Computer
Graphics and Applications, vol. 15(5), pp. 77-86, 1996.
[19] D. Thompson, "Biomechanics of the hand," Perspectives in Computing, vol. 1, pp. 12- 19,1981.
[20] B. Moghaddam and A. Pentland, "Probabilistic Visual Learning for Object Representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19(7), pp. 696-710, July 1996.
[21] M. J. Black and A. D. Jepson, "EigenTracking: robust matching and tracking of articulated objects using a view-based representation," International Journal o f
Computer Vision, vol. 26(1), pp. 63-84, 1998.
[22] V. I. Pavlovic, R. Sharma, and T. S. Huang, "Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review," IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 19(17), pp. 677-695, 1997.
[23] S. U. Lee and I. Cohen, "3D Hand Reconstruction from a Monocular View," in ICPR
2004, Cambridge, United Kingdom, 2004.
[24] V. Athitsos and S. Sclaroff, "Estimating 3D Hand Pose from a Cluttered Image,"
Computer Vision and Pattern Recognition, vol. 2, pp. 432-439, 2003.
[25] V. Athitsos and S. Sclaroff, "An appearance-based framework for 3D hand shape classification and camera viewpoint estimation," in Proc. IEEE International Conf on
Automatic Face and Gesture Recogntion (FGR), 2002, pp. 45-50.
[26] N. Shimada, K. Kimura, and Y. Shirai, "Real-time 3-D Hand Posture Estimation based on 2-D Appearance Retrieval Using Monocular Camera," in Proceeding Int. WS. on
RATFG-RTS (satellite WS" ofICCV200I), 2001, pp. 23-30.
[27] J. M. Rehg and T. Kanade, "Visual tracking of high DOE articulated structures: an application to human hand tracking," in Proceeding 3rd European Conf on Computer
Vision, 1994, pp. 35-46.
[28] Y. Wu and T. S. Huang, "Capturing articulated human hand motion: A divide-and- conquer approach," in Proceeding IEEE International Conference on Computer Vwion, 1999, pp. 606-611.
[29] T. Heap and D. Hogg, "Towards 3D hand tracking using a deformable model," in
Proceedings o f the 2nd International Conference on Automatic Face and Gesture Recognition (FG’ 96), 1996, pp. 140-145.
[30] K. Oka, Y. Sato, and H. Koike, "Real-Time Fingertip Tracking and Gesture Recognition," IEEE Computer Graphics and Applications, vol. 22(6), pp. 64-71, 2002. [31] Q. Delamarre and O. D. Faugeras, "Finding pose of hand in video images: a
stereobased approach," in Proceeding 3rd International Conference on Automatic
Face and Gesture Recogntion, 1998, pp. 585-590.
[32] W. T. Freeman and C. Weissman, "Television control by hand gestures”. International Workshop on Automatic Face- and Gesture- Recognition," in IEEE Computer Society,
1995, pp. 179-183.
[33] H. Fillbrandt, S. Akyol, and K.-F. Kraiss, "Extraction of 3D Hand Shape and Posture from Image Sequences for Sign Language Recognition," in IEEE International
Workshop on Analysis and Modeling o f Faces and Gestures, 2003, pp. 181-186.
[34] T. Cootes, K. Walker, and C. Taylor, "View-based Active Appearance Models," in 4'^
IEEE International Conference on Automatic Face and Gesture Recognition, 2000,
pp. 227-232.
[35] L. Bretzner, I. Laptev, and T. Lindeberg, "Hand-gesture recognition using multi-scale colour features, hierarchical features and particle filtering " in Proceeding Face and
Gesture, 2002, pp. 63-74.
[36] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, "Wallflower : Principles and practice of background maintenance," in Proceeding 7th International Conference on
Computer Vision, 1999, pp. 255-261.
[37] M. Isard and A. Blake, "ICONDENSATION: Unifying low-level and high-level tracking in a stochastic framework," in Proceedings o f the 5th European Conference
on Computer Vision, 1998, pp. 893-908.
[38] J. F. Canny, "A computational approach to edge detection," IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 8(6), pp. 679-698., 1986.
[39] W. T. Freeman and M. Roth, "Orientation Histograms for Hand Gesture Recognition," in International Workshop on Automatic Face- and Gesture-Recognition, 1995, pp. 296-301.
Bibliography
[40] E. Maggio, F, Smerladi, and A. Cavallaro, "Adaptive Multifeature Tracking in a Particle Filtering Framework," IEEE Transactions on Circuits and Systems for Video
Technology, vol. 17(10), pp. 1348-59, 2007.
[41] J. Yang, W. Lu, and A. Waibel, "Skin-Color Modeling and Adaptation," in
Proceedings ofACCV'98, 1998, pp. 687-694.
[42] M.-H. Yang and N. Ahuja, "Gaussian Mixture Model for Human Skin Color and Its Applications in Image and Video Databases," in Proceedings o f SPIE: Conference on
Storage and Retrieval for Image and Video Databases, 1999, pp. 458-466.
[43] N. Sebe, I. Cohen, T. S. Huang, and T. Gevers, "Skin detection: a Bayesian network approach," in 17th International Conference on Pattern Recognition (ICPR'2004), 2004, pp. 903- 906.
[44] M. Stoning, T. Kocka, H. J. Andersen, and E. Granum, "Tracking regions of human skin through illumination changes," Pattern Recognition Letters, vol. 24(11), pp. 1715 - 1723, 2003.
[45] M. Soriano, B. Martinkauppib, S. Huovinenb, and M. Laaksonen, "Adaptive skin color modelling using the skin locus for selecting training pixels," Pattern Recognition, vol. 36(3), pp. 681-690, 2003.
[46] K. Schwerdt and J. L. Crowley, "Robust Face Tracking using Color," in Proceedings
o f 4th International Conference on Automatic Face and Gesture Recognition, France,
2000, pp. 90-95.
[47] D. Brown, I. Craw, and J. Lewthwaite, "A SOM Based Approach to Skin Detection with Application in Real Time Systems," in British Machine Vision, 2001.
[48] H. Wu, Q. Chen, and M. Yachida, "Face Detection From Color Images Using a Fuzzy Pattern Matching Method," IEEE Transactions on Pattern Analysis and Machine
Intelligence vol. 2\{6), pp. 551-562,1999.
[49] C. Chen and S. P. Chiang, "Detection of human faces in colour images," lEE
Proceedings Vision, Image and Signal Processing vol. 144(6), pp. 384-388, 1997.
[50] C. Poynton, "Color FAQ - Frequently Asked Questions Color," in
http://www.poynton.com/notes/colour and samma/ColorFAQ.html.
[51] J. C. Terillon, M. David, and S. Akamatsu, "Detection of human faces in complex scene images by use of a skin color model and of invariant Fourier-Mellin moments," in Proceedings o f the I4th International Conference on Pattern Recognition, 1998, pp.
1350-1355.
[52] K.-W. Wong, K.-M. Lam, and W.-C. Siu, "A Robust Scheme for Live Detection of Human Face in Color Images," Signal Processing: Image Communication, vol. 18(2), pp. 103-114, 2003.
[53] D. Chai and K. N. Ngan, "Face segmentation using skin-color map in videophone applications," IEEE Transactions Circuits and Systems Video Technology, vol. 9(4), pp. 551-564, Jun 1999.
[54] D. Chai and K. N. Ngan, "Locating facial region of a head-and-shoulders color image," in Third IEEE International Conference on Automatic Face and Gesture
Recognition (FG'98), Japan, 1998, pp. 124-129.
[55] J. J. d. Dios and N. Garcia, "Face detection based on a new color space YCgCr," in
International Conference on Image Processing(ICIP'2003), 2003, pp. 909-912.
[56] Y. Dai and Y. Nakano, "Face-texture model based on SGLD and its application in face detection in a color scene," Pattern recognition, vol. 29(6), pp. 1007-1017, 1996. [57] F. Marqués and V. Vilaplana, "A morphological approach for segmentation and
tracking of human faces," in Proceedings o f the International Conference on Pattern
Recognition ICPR’2000, Barcelona, 2000, pp. I.1064-I.1067.
[58] G. Gomez, M. Sanchez, and L. E. Sucar, "On Selecting an Appropriate Colour Space for Skin Detection," in Proceedings o f the Second Mexican International Conference
on Artificial Intelligence: Advances in Artificial Intelligence, 2002, pp. 69-78.
[59] S. Kawato and J. Ohya, "Automatic skin-color distribution extraction for face detection andtracking," in 5th International Conference on Signal Processing, 2000, pp. 1415-1418.
[60] J. Cai, A, Goshtasby, and C. Yu, "Detecting Human Faces in Color Images," Image
Vision Computing, vol. 18(1), pp. 63-75, 1999.
[61] P. Peer, J. Kovac, and F. Solina, "Human Skin Colour Clustering for Face Detection," in International Conference on Computer as a Tool(EUROCON 2003), 2003, pp. 144 - 148.
[62] G. Gomez and E. F. Morales, "Automatic Feature Construction and a Simple Rule Induction Algorithm for Skin Detection," in Proa, o f the ICML Workshop on Machine
Learning in Computer Vision, 2002, pp. 31-38.
[63] K. Sobottka and I. Pitas, "A novel method for automatic face segmentation, facial feature extraction and tracking," Signal Processing: Image Communication, vol. 12(3),