Abstract— This paper describes a novel approach towards
recognizing of Indian Sign Language (ISL) gestures for Humanoid Robot Interaction (HRI). An extensive approach is being introduced for classification of ISL gesture which imparts an elegant way of interaction between humanoid robot HOAP-2 and human being. ISL gestures are being considered as a communicating agent for humanoid robot which is being used in this context explicitly. It involves different image processing techniques followed by a generic algorithm for feature extraction process. The classification technique deals with the Euclidean distance metric. The concrete HRI system has been established for initiation based learning mechanism. The Real time robotics simulation software, WEBOTS has been adopted to simulate the classified ISL gestures on HOAP-2 robot. The JAVA based software has been developed to deal with the entire HRI process.
Keywords—- Indian Sign Language; Gesture; Real Time; Euclidean distance; WEBOTS simulation software; HOAP-2 humanoid robot.
I. INTRODUCTION
Gestures are used for communication in daily life. Sign language is used for exchanging the ideas, messages, and thoughts among deaf and dumb people. The sign language which confirms their existence in India is called Indian Sign Language [ISL] like as American Sign Language (ASL) and British Sign Language (BSL) are concerned inherently with their existence [1, 2]. ISL produces static as well as dynamic gesture respectively and to recognize them needs to solve several kinds of challenges like two handed , both hands moving some times, sometimes one hand is moving fast another slow, different hand shapes, contacting the body. Most of ISL gestures are being considered here are dynamic in nature, which offers more problem complexity towards humanoid robot interaction. Unlike ASL gesture, Indian Sign Language constructs a prominent hand gesture being considered with both hands [1,2].
Numerous approaches have been adopted for gesture based communication and teaching demonstration to humanoid robot in a wide variety of way. Programming by demonstration emerges in a new way of learning architecture where human demonstrator is used to teach a humanoid
robot by generalizing probabilistic estimation to different context [3]. It offers humanoid robot to achieve new skills and goal by imitation learning and observation [5]. A mathematical framework has been involved for reproducing the gestures for humanoid robot and generating the joint angle trajectories of hand motion are encapsulated in Hidden Markov Models (HMM) [4]. A probabilistic representation model based on Principal Component Analysis (PCA) and Gaussian Mixture Model (GMM) has been entertained for incremental learning to humanoid robot with kinesthetic teaching process [6]. A learning algorithm for humanoid robot based on the Gaussian Mixture Regression (GMR) has been encountered to learn several tasks by a set of kinesthetic demonstrations effectively [7]. An imitation game has been identified to teach a humanoid robot in order to recognize gestures based on the Hidden Markov Model (HMM) [8]. Numerous techniques have been illustrated for gesture classification and recognition which could be accommodated for teaching demonstration to humanoid robot.
One prominent approach describes the vision based recognition technique [9] to achieve visual information in the form of feature vector. Several techniques have been evolved for pattern classification with the implementation of dynamic gestures in real time [10, 11]. The moving gesture classification in Indian sign language which keeps all the information about the hand motions [10] behaves as most complicated task. Dynamic gesture implies sequence of images. An approach has been undergone based on 2-D locations of fingertips and palms which were used predominately in [11]. An elegant way of gesture classification is entirely based on the hand trajectory, hand shapes ,hand motions [12] and Hidden Markov Model (HMM) [13] being considered as a robust classifier.
In this paper we propose a practical framework for ISL gesture based human robot interaction. Gesture based learning mechanism contributes a significant role to achieve new skills for a humanoid robot interaction in a very efficient manner. ISL video is being used extensively to compose several hand gestures to train the HOAP-2 robot in real time. Orientation histogram feature is extracted for real time classification using Euclidean distance metric. For humanoid robot command is provided in terms of joint angle values. Look up table approach is followed for mapping the human gesture with humanoid robot to avoid real time
Recognizing & Interpreting Indian Sign Language Gesture for Human Robot
Interaction
Anup Nandy, Soumik Mondal, Jay Shankar Prasad, Pavan Chakraborty and G.C.Nandi Robotics & Artificial Intelligence Lab
Indian Institute of Information Technology Allahabad, India
{ nandy.anup , mondal.soumik }@gmail.com, { jsp , pavan , gcnandi }@iiita.ac.in
inverse kinematics calculation which becomes the overhead to solve.
The entire paper has been illustrated into subsequent sections as below.
Section II describes the accomplishment of ISL gesture acquisition techniques in addition to applying image processing method, background uniformity, image enhancement using histogram equalization and noise elimination using Gaussian filtering techniques. Next section deals with ISL gesture classification approach with addition to feature selection and evaluation. It elaborates a statistical technique for calculation of Euclidean distance among all the training set of ISL gestures with a new test ISL gesture. Section IV deals with real time simulation software, WEBOTS and specification of HOAP-2.Section V illustrates the learning of humanoid robot HOAP-2 with the classified result of ISL gesture. Next section demonstrates the comprehensive simulation results generated by HOAP-2 on WEBOTS platform. Finally, conclusion and Future work have been identified in section VII. References are included at the end of this paper. Next section takes an inspiration to discuss ISL acquisition process.
II. GESTURE COLLECTION AND PREPROCESSING
ISL video has been captured by selecting several dynamic gestures (i.e. sequence of frames) in real time using webcam as shown in Fig. 1. We have chosen the gestures arbitrarily from standard ISL dictionary. One elementary approach for image processing tends to background uniformity which applies to be chosen of dark background for dealing with gray scale images effectively [1]. We have identified several ISL dynamic gestures in different light illumination contrast. The process is being followed by histogram equalization technique for normalization. To perform histogram equalization, consider CDF (Cumulative Distribution Function) as shown in equation (1).
(1)
Where needs to be obtained in the form of contrast stretching, is the histogram of the probability. Mathematically the discrete form of the transformation function for the histogram equalization is given by equation (2).
(2)
Where , with = the
normalized intensity value (gray level value). = Number of gray levels in the image. = Total number of pixels with gray level . = total number of pixels.
Transform the input image to output image using the
transformation function Whenever the
transformation function is equal to the CDF of the image or normalized sum of the histogram which gives the histogram equalization of the input image.
Next section deals with feature selection and classification.
ISL Gesture
Initial Position Final Position
Shoot
River
Recover
Quiet Down
Miss
Meet
Heap
Go
Fig 1. ISL video gestures with different light illumination condition
III. ISL RECOGNITION APPROACH
Learning is a very complicated task for humanoid robot, HOAP2 through the extensive interaction with sign language gestures. It induces challenges towards the understanding of ISL gestures and then performs some specified task eventually. Feature selection plays an important role in pattern classification technique [13]. Orientation of edges in an image is being considered as a feature vector of the entire ISL video. We are obtaining the different direction histogram in the form of 18 and 36 bins. It follows an algorithm as discussed in [17, 18] for the evaluation of feature vector which is being used for pattern classification.
3.1 Feature Extraction Algorithm
Step 1: Divide the image frame into grids (60x40) pixels. If P(x ,y) is the point in the grid. Then set of pixel is
where
Step 2: Find the motion vector between I and I+1 frame for each grids of step1.
Step 3: Generate a direction histogram from these vectors using gradient direction .
Step 5: Mean feature F = , where
Step 6: Find the Mean of the feature vector F’. 3.2 Classification algorithm
Find the Euclidean distance.
Step 1: Apply all the above steps for test gesture till feature
vector computation. Let .
Step 2: Find the minimum distance using
between vector G and F’.
Step 3: Identify the matched class based on step 2 result towards calculation of direction of edges.
We have trained our classifier using twenty ISL gestures some of them have been shown in Fig 1. It is very fast for the calculation of direction of edges of ISL gesture in real time. The geometrical representation of training gestures and test gesture in feature space has been described in Fig.2. It provides real time performance to train humanoid robot HOAP-2.
Fig 2. Representation of 2D feature space for ISL Test and Training gestures where D3 is the minimum Euclidean distance classified as gesture A.
[image:3.612.56.280.339.458.2]
Fig 3. Flowchart of ISL gesture recognition for the simulation of HOAP-2
Fig.3 explains the entire classification process in addition to simulation of HOAP-2.
IV. WEBOTS & HOAP-2 SPECIFICATION
WEBOTS is simulation robotics software [15] which provides comprehensive facilities towards modeling, programming (with C, C++ and Java) and simulating for any
kind of robots. It was developed to work on different algorithms in order to build different types of robot in different environments. Several libraries are available for controller programs which are essentially required to transform the programs into the real robot. It invokes the Comma Separated Value (CSV) in the controller program which is essentially required for various jobs executed by HOAP-2 robot. It offers some challenges to make the CSV file programmatically. Several CSV files have been constructed for humanoid robot. Fig.4 shows the WEBOTS simulation platform with keeping all the modalities of the HOAP-2 performance in real time.
HOAP-2 or Humanoid Open Architecture Platform is a well known humanoid robot having two arms, two legs and other important body joints. Several applications could be done using HOAP-2 robot by controlling the movements of different joints of the robot. The mechanical architecture of HOAP-2 is extremely well organized for coordination between all the body joints effectively. It deals with controller program in order to generate any type of motion. It is defined that controller program invokes a CSV file having joint angles of every joint of the robot at any particular time span. It is being observed that every joint has been introduced with a specific name, designation and also a unique column has been reserved in the CSV file for each joint. Each pattern carries some useful information which is generated using HOAP-2 controller. Every CSV possesses twenty-seven columns which belongs to controller with twenty-five joints including two arms having five degree of freedom, two legs having six degrees of freedom, the head of two degree of freedom, and a body joint. It works on Real Time Linux platform with number of challenging applications like imitation learning [16], human portraits drawing and teaching by human demonstration. Next section explores the idea of learning technique on WEBOTS platform using real time JAVA based software.
[image:3.612.325.556.478.580.2]
Fig 4. Demonstration of HOAP-2 on WEBOTS platform.
V.
LEARNING OF ISL GESTURE [image:3.612.54.300.508.623.2]challenging aspect incorporates the human robot interaction in any environment. The DFD (Data Flow Diagram) tells about the learning process of humanoid robot which has been described elaborately in Fig.5. The working principle of JAVA based software indicates the integration with HOAP-2 in real time. Fig. 6 and Fig. 7 show the step by step process of learning of ISL gestures. The software allows capturing of any type of ISL gesture using WEBCAM. The modality of data acquisition is being followed by learning of ISL gesture to humanoid robot HOAP-2. It offers robustness to calculate the feature vector and do classification with addition to simulation on HOAP-2. The learning mechanism depends on the appropriate classification of ISL gesture which imparts a significant role throughout the entire simulation process of HOAP-2.
User Command Video Capture
Feature Extraction
Store Test Gesture Direction Histogram
Direction Hist
Recognition of The Test Gesture
User Command
Direction Histogram
Classification Result Store Recognized
Test Gesture
Simulation Of Humanoid Robot
Classification Result of The Test Gesture
User
[image:4.612.60.298.254.470.2]Simulated Output
Fig 5. Data Flow Diagram of Java based HRI software
[image:4.612.332.551.317.470.2]
Fig 6. Real time ISL video (gesture) capturing tool
In the context of feature evaluation, the software manipulates the orientation of edges of ISL gesture in the form of histogram bins. It comprises 18 and 36 different column values of edge orientation respectively. The software provides the generic architecture towards learning mechanism of HOAP-2 which performs some specific kind of gestures according to the newly classified gesture. The details of learning gestures have been explored in the next
subsequent sections.
The human robot interaction has been taken place with HOAP-2 by performing several activities in terms of creating following gestures:
a). BYE BYE. b) WALKING. c). MAY I PLEASE. d). GREETING. e). DRINKING. f). TRAFFIC. g). MAKING PHONE CALL h). MARTIAL ART.
All the gestures have been performed predominantly by the humanoid robots which are associated with the classified ISL gesture. The way of learning process marks an intelligent behavior of HOAP-2 which sustains its learning capability in any type of environment. The learning process is dealt with the HOAP-2 robot controller which has been built intelligently to invoke CSV file in order to do above mentioned gestures in real time. All the predefined gestures carry out some useful information about all the joints of upper body and the lower body of the humanoid robot. The above mentioned gesture applications have been developed using HOAP-2 robot considering each joint angles preciously. Fig.8.(a) illustrates the gesture MAY I PLEASE on WEBOTS platform using HOAP-2.
Fig 7. Java based software for HRI process
(a) (b)
Fig 8. (a) Illustrates MAY I PLEASE and (b) TRAFFIC gesture
[image:4.612.317.557.501.609.2]keeping all the pulses in order to execute the movement of the humanoid robot. It achieves an allowable joint range while making the above gestures application effectively. Next section illustrates the significance of the patterns generated by the HOAP-2 robot along with the simulation results.
VI. RESULT ANALYSIS
The classified ISL gestures with average accuracy 90% are entirely mapped with specific gesture based applications on humanoid robot. Each gesture generates a pattern which comprises some movement of the body joint or movement of other parts of the body. Different classified ISL gesture patterns have been presented in Fig.8 and Fig.9. Those above defined patters (Fig 10&11) are composed against ISL gesture GO and QUIET DOWN which are being included as our predefined classes. Each pattern carries its own signature which could be used as a learning entity for humanoid robot HOAP-2. Every pattern has been customized by the feature vector of the orientation histogram with considering 18 bins and 36 bins respectively. The Y axis of the gesture pattern describes the distribution of the edge orientation values where as the X axis indicates the total number samples of each histogram bin. The learning pattern of humanoid robot has been achieved by the classification of ISL gesture in real time. It has been constructed using CSV files as shown in Fig. 10 & Fig. 11.This gesture has been performed by humanoid robot HOAP-2 with keeping movement of all the respective joint angle values in CSV file. The estimation of the joint angle values and position command of the robot are defined using mathematical expression as shown below.
[image:5.612.322.549.51.451.2]. Where indicates the command of position value of the humanoid robot that has been formulated in pulses. refers to the measurement of joint angles in degree. represents to the change of coefficient in pulse/deg. The movement of HOAP-2 can be both in clockwise and anticlockwise direction. So the CSV controller needs to know about which joint moves in which direction with changing the value in + ve direction as well as –ve direction. Each joint is having a unique device ID and a unique column name associated with CSV controller files. It is being noticed that to move a particular joint, the robot needs to rotate its certain motor.
Fig 8. Pattern for GO gesture
Fig 9. Pattern for QUIET DOWN gesture
Fig 10. MAY I PLEASE gesture for HOAP-2
Fig 11. TRAFFIC gesture for HOAP-2
[image:5.612.68.296.600.727.2]directions. The Y axis for each pattern indicates the position values represented by counts (pulses) where as X axis indicates the time sequence which increases monotonically with the trajectory. Fig.12 illustrates the estimated manipulation of all the gestures.
Fig 12. Estimated manipulations of ISL gestures in 3-D
This Fig. 12 depicts an intelligent orientation of several objects which apparently describes the mean of the feature vector estimated, number of total gestures accepted and finally values of average features computed. It can be analyzed from the 3-D plot that each gesture carries its own values of average edge orientation feature vector. It would be extensively used for classification by computing the minimum distance from the mean of all the training gestures.
VII. CONCLUSION AND FUTURE WORK
Learning to ISL gesture is a daunting task for HOAP-2. The trajectory generation for a particular gesture had come up with challenging aspect. The comprehensive software architecture had been followed up in order to perform all the computations on image processing and classifications. The exact trajectory generation and execution implies that classification rate among all the training set of ISL gestures was extremely satisfactory. The controller program for HOAP-2 was associated with CSV files to generate those patterns. The JAVA based software had undergone to make association with each predefined ISL gestures and trajectory generated by humanoid robot HOAP-2.
Future work implies to implement the software directly to deaf and dumb persons to make them understand the ISL gesture and translate it into sentences. It will provide an efficient way of learning to cope up with different circumstances. The real time social interaction with different challenging persons in speech and hearing would explore several issues in human computer interaction terminology.
REFERENCE
[1] Tirthankar Dasgupta, Sambit Shukla, Sandeep Kumar,Synny Diwakar,
Anupam Basu,“A Multilingual Multimedia Indian Sign Language
Dictionary Tool”, The 6’th Workshop on Asian Language Resources, pp. 57-64, 2008.
[2] M.K. Bhuyan, D. Ghoah, P.K. Bora, “A Framework for Hand Gesture
Recognition with Applications to Sign Language”, India Conference, 2006 Annual IEEE, pp. 1-6, Sept, 2006.
[3] Sylvain Calinon, Florent Guenter and Aude Billard, “On Learning, Representing and Generalizing a Task in a Humanoid Robot”, IEEE Trans.on Systems,Man and Cybernetics, Part B, Vol. 37, No. 2, pp. 286-298, 2007.
[4] Sylvain Calinon & Aude Billard, “ Stochastic Gesture Production and Recognition Model for a Humanoid Robot”, In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vol. 3, pp. 2769-2774, 2004.
[5] Sylvain Calinon, Florent Guenter and Aude Billard, “Goal-Directed Imitation in a Humanoid Robot”, In Proceedings of the International Conference on Robotics and Automation (ICRA), pp. 299-304, 2005. [6] M. Lopes, J. Santos-Victor, "Visual learning by imitation with motor
representations," Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on , vol. 35, no. 3, pp. 438-449, June 2005.
[7] M. Hersch, F. Guenter, S. Calinon, A.G. Billard, "Learning Dynamical System Modulation for Constrained Reaching Tasks”, 6th IEEE-RAS International Conference on Humanoid Robots, pp. 444-449, 4-6 Dec. 2006.
[8] M. Pantic, L.J.M. Rothkrantz, "Toward an affect-sensitive multimodal human-computer interaction", Proceedings of the IEEE, vol. 91, no. 9, pp. 1370- 1390, Sept. 2003.
[9] D. Kelly, Reilly Delannoy, J. Mc Donald, and Markham, “A
framework for continuous multimodal sign language recognition”, In Proceedings of the 2009 international Conference on Multimodal interfaces (Cambridge, Massachusetts, USA,), pp. 351-358, November 02-04, 2009.
[10] Isaac Garcia Incertis, Jaime Gomez Garcia-Bermejo, Eduardo Zalama
Casanova, "Hand Gesture Recognition for Deaf People Interfacing", icpr, 18th International Conference on Pattern Recognition (ICPR'06), Vol. 2, pp. 100-103, 2006.
[11] Thomas Coogan , George Awad, Junwei Han and Alistair Sutherland,
“Real time hand gesture recognition including hand segmentation and tracking”, ISVC 2006 - 2nd International Symposium on Visual Computing, 6-8 November 2006, Lake Tahoe, NV, USA. ISBN 978-3-540-48628-2.
[12] S. Kettebekov, M. Yeasin, R. Sharma, "Prosody based audiovisual coanalysis for coverbal gesture recognition", Multimedia, IEEE Transactions on, vol. 7, no. 2, pp. 234- 242, April 2005.
[13] J S Prasad, G.C. Nandi, “Clustering Method Evaluation for Hidden Markov Model Based Real-Time Gesture Recognition”, Advances in Recent Technologies in Communication and Computing, ARTCom '09, pp. 419-423, 27-28 Oct. 2009.
[14] J. Alon, V. Athitsos, Quan Yuan, S. Sclaroff, "A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation", Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 9, pp. 1685-1699, Sept. 2009.
[15] Webots software, http://www.cyberbotics.com/products/webots/. [16] M. A. Wood, J. J. Bryson, "Skill Acquisition Through Program-Level
Imitation in a Real-Time Domain", Systems Man and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 37, no. 2, pp. 272-285, April 2007.
[17] Anup Nandy, Jay Shankar Prasad, Pavan Chakraborty, G. C. Nandi, Soumik Mondal, “Classification of Indian Sign Language In Real Time”, In the proceedings of International Journal on Computer Engineering and Information Technology (IJCEIT), Vol. 10, No. 15, pp. 52-57, Feb. 2010.
[18] Anup Nandy, Jay Shankar Prasad, Soumik Mondal, Pavan