Sign Language Recognition by Pattern Matching

(1)

238 Available online at www.ijiere.com

International Journal of Innovative and Emerging

Research in Engineering

e-ISSN: 2394 - 3343 e-ISSN: 2394 - 5494

Sign Language Recognition System

by Pattern Matching

Amruta S. Talreja

a

_{, Darshana Tekade}

a

_{, Shailesh Bharad}

a

_{and Lovely Mutneja}

b a_{Department of Computer Science 1, PRMCEAM, Badnera, Amravati, India} b_{Faculty at Department of Computer Science 2, PRMCEAM, Badnera, Amravati, India}

ABSTRACT:

The biggest issue today is the increasing gap between the impaired people and the normal people. Sign Language Recognition System is to facilitate the communication between people with hearing impairment (i.e. deaf people) as well as dumb people and normal people. In addition, an efficient pattern matching technique for automatic translation system for gestures of numbers, alphabets and words in the sign language and from the sign language to its respective numbers, alphabets and words is proposed. The proposed technique does not rely on the use of any gloves or visual markings to accomplish the recognition job. The proposed system uses the concept of Boundary Tracing and Pattern Matching. As an alternative, it deals with images of bare hands, which allows the user to interact with the system in a natural way.

Keywords: Sign language Recognition, Image Processing, Visual markings, Pattern Matching, Boundary Tracing, Automatic Translation System.

I. INTRODUCTION

A sign language is a language which uses manual communication and body language to convey meaning. Normally, there is no problem when two deaf persons communicate using their common sign language. The problem arises when a deaf person wants to communicate with a non-deaf person. Usually both will be dissatisfied in a very short time. Signing has always been part of human communications. For thousands of years, deaf people have created and used signs among themselves. These signs were the only form of communication available for many deaf people. Within the variety of cultures of deaf people all over the world, signing evolved to form complete languages. Sign language is a form of manual communication and is one of the most natural ways of communication for most people in deaf community. There has been many researchers who are surging interest in recognizing human hand gestures.

The aim of the sign language recognition system is to provide an accurate and convenient mechanism to transcribe sign gestures into meaningful text or speech so that communication between deaf and dumb and hearing society can easily be made. As in oral language, sign language is not universal; it varies according to the country, or even according to the regions. Arabian Sign Language has just been revised and documented. Many efforts have been made to establish the sign language used in individual countries, including Jordan, Egypt, and the Gulf States, by trying to standardize the language and spread it among members of the deaf community and those concerned. Most of previous studies on sign languages are based on vision or glove based methods. In glove based method, the user needs to wear special devices, like gloves, to provide the system with data related to the hand shape and motion. But in the vision based method, the system uses image processing techniques to recognize the gestures without imposing any limitation on the user. Some methods include the use of webcams for the tracing of hands n later recognition is done.

In this proposed work, the first phase includes the training of the system that how to recognize the alphabets or numbers or the words together with their respective sign language. The database is maintained in this phase to store the data which will further help to retrieve the output. The second phase includes the image processing on the input images, i.e. to convert the input images to their text output. This is done through the pattern matching technique. The user is provided with the appropriate text/image corresponding to the given input.

II. RELATED WORK

(2)

239 convert normal English sentence into sign language and later it converts sign language into respective English sentence. The author uses the language processing engine for the corresponding conversions which are based on computer vision from sign language to English sentence and from English sentence to sign language.

Speech

Sign

Figure 1: A Typical sign language interpretation system

According to Annelies Braffort, the linguistic ethics encourage the impaired people and sign language trainees and also reduce the communication gap between the normal people and deaf people. Below figure shows the Indian Sign Language Hierarchy [2], this helps in designing the architecture of Interpretation System for Sign Language. The hierarchy is mainly divided into three main categories which are: One handed, Two handed and Non-manual.

Figure 2: A hierarchical classification of ISL [2]

The conclusion after the study was that there is some inter link between the hand gesture and speech. The Hand Gesture System has two types of subsets: 1) Hand Posture and 2) Hand Gesture. In the first subset there is no hand movements required whereas in second subset there are dynamic signs that means the movement of hands is required. This study also includes survey on various other techniques which are: 1) Hand Tracking and Segmentation, 2) Feature Extraction, 3) Classification and Recognition and also a survey related to various sign languages with their respective modalities.

Another approach proposed by C. S. MYERS and L. R. RABINER[3]includes the recognition of connected words which are formed by the isolated words. The author talks about experimental and theoretical algorithms which are: 1) Two-level Dynamic Programming Matching approach, 2) The Sampling Approach and 3) Level Building Approach [3]. All the proposed algorithms for connected-word recognition are related to general information-based-theory-algorithm. Following figure shows the block diagram of a generic connected word recognition system. It has typical properties like energy set of band pass filters and LPC coefficients.

Figure 3: Block diagram of a generic connected-word recognition system

Sentence

Process

Output sign

Sign

Image

processing

Output text

& voice

Feature Extraction

Time and Distance calculation

Decidability Rule

Sign information Input parameters

(3)

240 With respect to given syntactical constraints, that best matches the test pattern, finding the finding the sequence of concatenated reference patterns is the main aim of connected-word recognizer.

A. Two-level DP-matching algorithm:

The TLDP Algorithm is regarded as the best recognizer for the concatenation of given set of sequence with reference pattern of length M. The optimal reference pattern is first determined to match with any part of the given pattern. Later, the optimal way is used to find the best concatenating technique for the isolated parts. After the first stage, the optimal manner is used to find the best matches for the pieces of the reference frame. With the use of Dynamic Programming Algorithm one can find the best concatenation of reference pattern. Until the second stage of the TLDP algorithm is completed, no partial match is completely decided; hence TLDP is not a sequential decision method.

B. The sampling approach:

The sampling approach is a sequential decision process. A local minimum DTW algorithm, is used for matching the reference pattern to a part of given pattern in sampling algorithm. Here, only a small subset of given pattern are tested and not all like the TLDP Algorithm. The syntactical constraints are used directly in sampling algorithm instead of using in post-processing stage. The grammar simultaneously with the matching of reference pattern is traced on a graph with respect to testing pattern which helps in tracing both the current state as well as the current ending pattern, which a method is proposed by Levinson and Rosenberg.

C. The level-building algorithm:

The basic idea of LB Algorithm is to show the constrained endpoint DTW algorithm where the slope of the wrapping function is between ½ and 2, which gives the best match when the test pattern and given pattern are matched with respect to reference frame. Like the TLDP algorithm, the LB algorithm also has the strong point of recovering from mistakes and also it is not a sequential decision process. One drawback of LB algorithm is to generate alternative candidates. LB algorithm uses the sequential constraints. The ease for incorporating the sequential constraints into its structure is an important aspect of this algorithm. This has being proposed by Myers and Levinson.

Chin-Chen Chang*, I-Yen Chen and Yea-Shuan Huang[4]has related to Human Computer interactions which are the most renowned and widely research technique in the context of interaction between humans and computer [5]. Analyzing Gestures include two main methods, 1) 3-D model based approaches and 2) 2-D appearance based model. Spatial description of the hands is the basic concept behind the 3-D based model approach and appearance of hands in 2-D visual images implements the 2-D model based approach. Hand poses which characterize the information about the hand and the dynamic gestures which are the hand movements are mainly the two types of gestures. Here, a 2-D based approach based on Curvature Scale Space (CSS) for translation, scale, and rotation invariant recognition of hand poses. The first stage includes the representation of the contours of hand poses by using the CSS images and the second stage overcomes the problem of deep concavities in contours of hand poses by extracting features of CSS images in multiples sets. The Curvature Scale Space Image was first proposed by Mokhtarian and Mackworth which consisted of the object contour based shape descriptor based on the CSS image of the contour. The major steps performed for the feature extraction is first giving the input of hand pose images, next through image processing the binary contour images are produced, and at the end acquiring the CSS features.

Joyeeta Singha and Karen Das[6]worked on a technique for recognition of sign language through live videos without the use of any gloves i.e. by using bare hands. Here the authors use the Eigen vector algorithm for image processing which gives the recognition rate of 96.25%. The first and the foremost step of this technique is to input a stream of video for a sign language. The required hand is extracted from the stream of videos by the technique called Skin Filtering. After the Skin Filtering method, the output comes out to be a white image on a black background. After this method the histogram matching on the skin colored region is performed. Based on Eigen values and Eigen vectors an effective way of sign recognition methodology is proposed in this work.

(4)

241 For the preprocessing of the image it consist of the technique called thresholding which converts the RGB colors of the input image into the representation of 0 and 1. For conversion from image to text or speech the author here uses predefined methods.

Figure 4: Architecture of recognition of sign language by webcam

Chance M. Glenn, Divya Mandloi, Kanthi Sarella, and Muhammed Lonon [8] there already has been done a great research and development regarding sign language and according to that many researchers have concluded the different solutions and need of sign language. Translation of ASL Finger-Spelling to Digital Audio or Text [8] is one of the technique which basically focuses on image processing, finger spelling. The main challenge for them was to design a standard set of minimal physical measurement criteria for ASL finger-spelling and signing as according to image sampling. The main task performed by this method was to capture the image from the video stream, eliminate the unnecessary background and then resizing and cropping the image according to need. A typical embodiment of the Sign2 system is a stereo imaging device connected to a storage system that leads to a video/image processor, as shown in figure [8].

Figure 5: Image processing

After the overall study the conclusion drawn was that they envision imaging devices taking advantage of current technology which will allow communication with hearing non-signers, devices that establishments can erect to allow disable person to communicate to normal people (ASL accessible) [8], This method will be more appropriate for learning and socialization of children and also to those who is suffering from sudden hearing loss.

In this paper, Thad Starner, Student Member, IEEE, Joshua Weaver,and Alex Pentland [9]describe two systems which can be extended, In this technique they have used use one color camera[9] which will be useful to track simple plain hands at run time. The parameters of this technique are the camera to detect the hand’s motion and the natural color of the signer’s hands. Interesting part of this is that the system only requires the natural color of the signer’s hands without any use of gloves. Now the recognition of sign given by signer is done through the camera mounted over the signer’s cap. This can be done through the two different positions of the camera, first is from the camera mounted at the signer’s cap and another possible location is from the signer’s point of view as shown in the figure [9] whenever communication with a normal person with a disable person is need to be done then he can wear this cap with camera mounted on in order to recognize the signs. While implementing this method it is very important to consider the random Head motion and facial gestures which plays significant roles in sign. For the wearable machine it would tedious job to do.

Arti Thorat, Varsha Satpute ,Arati Nehe, Tejashri Atre,Yogesh R Ngargoje [10]The main method will capture the image through the imaging device such as webcam. And that image will be stored into the database. Stored image will be then matched with the input image. The images will be given though the webcam itself through MATLAB [3] and then this image will be capture in order to store it in another directory. Now this signs will be then converted to the characters and alphabets accordingly. This will be very useful to signer to understand non signer’s language and vice versa.

The conclusion drawn from the method by the author was that this method will be user friendly and a signer can easily access the system. It not only have single domain where it can work effectively but a user (impaired person) can actually use it for learning purpose as well as for the social communication use with the non signers. In order to avoid other understandings regarding the system to use it, it also has extra feature of audio corresponding to the text which will be shown. It is observed through the author’s idea that according to him the biggest challenges that SLR now faces are designing such methods that will solve the problem of large-vocabulary continuous sign problems.

The technique followed by the author Gaolin Fang, Wen Gao, Member, IEEE, and Debin Zhao [11] is large vocabulary sign language recognition based on transition movement model technique [10]. The main parameters used for this technique is the cyber gloves and the position tracker system. Position tracker and cyber gloves both are used as an input device for the proposed system by the author. It says that cyber gloves are used to collect the various information or we can say the sign input provided by the signer and whereas position trackers are used to detect the exact motion of signers hands and gestures. TMMs are used here for the transition among two adjacent signs. The TMMs used here are basically used for continuous SLR. Result through this experiment they have concluded that continuous SLR based on TMMs has good performance on a large vocabulary

Edge detection Image extraction

Image stream

Eliminate

background

Cropping and

resizing

Compare image with database

Input Image

through webcam

(5)

242 [10] The main aim of the paper was to design a weakly supervised system which automatically extracts the isolated samples of signs [12] using which we can then use it for the hand posture recognition.

The author Daniel Kelly, John Mc Donald, Member, IEEE, and Charles Markham[12] has focused on to develop a pattern recognition Framework. This framework will be detected sign language gestures automatically. As a method they have introduced the feature notation [12]. Interesting part of this system is that the feature tracking methods was the main domain of its work. The tracking mechanism used here is done through the colored gloves [see the figure b] and to accomplish that task they have used tracking colored gloves [13]. After the implementation of the method the conclusion was drawn from it was to enhance the research of the sign language in the same domain as speech recognition done in some earlier research techniques and methods. As we can study that speech recognition has a really large vocabulary set and hence the database maintained so the similar task should be done with sign languages of course it is difficult and time consuming procedure [14].

Table 1: Comparative Analysis

Parameter Ration of Sign language Considered

Background Used

Remark/Accuracy

Recognition of signs by capturing images from webcam and performing image processing[7]

Single hand(Static Signs)

Simple black background

The author state that the bridge between the normal people and hearing impaired people can be reduced by this inexpensive method using webcams.

Sign language Interpretation System[1]

Hands(Dynamic signs, Static signs )

Complex background

Specified significance of Indian Sign Language together with various other related techniques of Sign Language Interpretation System. As per the Annelies Braffort, the bridge between the impaired people and normal people can be built effectively. Connected-word

Algorithm[3]

Speech No Background The modified reference pattern increases the accuracy and overall performance. For TLDP, the error rate decreases from 11.5% to 4.6%. For sampling approach there is loss in accuracy from 2 to 4.4%. Increase in computation increases the cost factor. In spite of this, this method is used in many applications.

Feature extraction through CSS images[4]

Hand poses Plain black

background

This method is used for the recognition of 6 different hand poses. The recognition rate is quite high which 98.3 % when compared with other methods.

Eigen vector algorithm for recognition through live videos[6]

Both the hands Black background In all for 24 different alphabets for video stream an efficient method has being designed. Recognition rate found for this method is 96.25%

Image extraction, real time input parameters[8]

Edge detection and cropping, resizing the image.

Background elimination.

This technology will allow communication with hearing non-signers, this device will allow signing people to communicate to non-signing people. Mounted camera

without the aid of gloves[9]

Natural color of hands Natural color According to this technique the normal person and impaired person can communicate without any communication barriers.

Main parameter is webcam[10]

Feature extraction Images taken as an input through webcam.

The proposed system is user friendly.

With the use of Gloves and

Position trackers[11]

Hands gesture using the gloves.

No background is used.

This technique has

Concluded that SLR with TMM is proved to be efficient over large vocabulary.

Colored gloves[12] Continuous SLR system

(6)

243

III.PROPOSED WORK

The main aim of this proposed work is to create a system which will work on sign language recognition. Many researchers have already introduced about many various sign language recognition systems and have implemented using different techniques and methods. This proposed system is focusing on an approach which is to put the SLR system which will work on ‘Signs’ as well as ‘Text’ (which will understandable by deaf and dumb persons and also by normal persons). The main task will be performed in two ways by the system. It will take input by the user in the form of text which will be then perform matching with the sign and vice-versa (i.e. sign to text as shown in figure below).

Figure 6: Processing Structure of System

Figure 7: An example of sign to text and text to sign conversion

The first way is when user will give the input as a text, it will perform matching with the already created database entries with its corresponding signs and then system will output that sign to the requesting user. The same technique is used to process letters, numbers as well as words and eventually phrases.

The second way includes the concept of image processing, the input given by another user as a sign (which will be in image format) will be processed by the system on the basis of the outer portion of the fingers and hand’s portion of the image .If the sign is valid then it will generate it’s text format which will be output on screen to user.

The proposed system is user friendly. i.e., user need not to know about the tedious terms regarding the system and still will be able to use the system efficiently.

IV.CONCLUSIONS

In this paper, the sign language recognition is effectively performed and the sign language for all the alphabets and digits of English language can be recognized. This proposed work also helps the impaired people with hearing and speaking disabilities to learn the basic words and sentences in both the image and text format. Even the normal people who are keen to learn the sign language for the purpose of educating the impaired people can learn it in a user friendly way through this proposed work. Many other new words together with their images can be added with time to improve the work; even the already present data can be easily modified and changed. The proposed can be simply used a means for educating the deaf and dumb people of the society.

Text

Processing

Sign

(7)

244

Acknowledgement

The authors would like to thank Prof. Lovely Mutneja for his guidance and support in carrying out this research and to publish this paper. The authors would also like to thank the department of computer science for giving the required resources and feedback during the course of this proposed work.

REFERENCES

[1] Archana S. Ghotkar, Dr. Gajanan K. Kharate, “Study Of Vision Based Hand Gesture Recognition Using Indian Sign Language”, International Journal On Smart Sensing And Intelligent Systems Vol. 7, No.1,

[2] March 2014

[3] Dasgupta, Shulka, S. Kumar, D. Basu, “A Multilingual Multimedia Indian Sign Language Dictionary Tool”, the 6th Workshop on Asian Language Resources, 2008, pp. 57-64.

[4] C. S. Myers and L. R. Rabiner, "A Level Building Dynamic Time Warping Algorithm for Connected Word Recognition," IEEE Trans, on Acoustics, Speech, and Signal Processing, ASSP-29, No. 2 (April, 1981), pp. 284-97.

[5] Chin-Chen Chang*, I-Yen Chen and Yea-Shuan Huang, “Hand Pose Recognition Using Curvature Scale Space” Advanced Technology Center Computer & Communications Research Laboratories Industrial Technology Research Institute, Chutung, Hsinchu, Taiwan 310, R.O.C. *E-mail: [email protected]

[6] D.S. Banarse and A.W.G. Duller, “Deformation Invariant Pattern Classification for Recognizing Hand Gestures,” in IEEE International Conference on Neural Networks, Vol. 3, pp. 1812-1817, 1996.

[7] Joyeeta Singha, Department of Electronics and Communication DBCET, Assam Don Bosco University Guwahati, Assam, Karen Da,s Department of Electronics and Communication DBCET, Assam Don Bosco University Guwahati, Assam, International Journal of Computer Applications (0975 – 8887) Volume 70– No.19, May 2013

[8] Sawant Pramada, Deshpande Saylee, Nale Pranita, Nerkar Samiksha, Mrs.Archana S. Vaidya, GES’s R. H. Sapat College of Engineering, Management Studies and Research, Nashik (M.S.), INDIA 5 Asst. Prof. Department of Computer Engineering, “Intelligent Sign Language Recognition Using Image Processing” IOSR Journal of Engineering (IOSRJEN) e-ISSN: 2250-3021, p-ISSN: 2278-8719 Vol. 3, Issue 2 (Feb. 2013), ||V2|| PP 45-51. [9] Chance M. Glenn, Divya Mandloi, Kanthi Sarella, Muhammed Lonon, “An Image Processing Technique for the

Translation of ASL Finger-Spelling to Digital Audio or Text”, The Laboratory for Advanced Communications Technology/CASCI ECTET Department/CAST Rochester Institute of Technology Rochester, New York 14623 [email protected].

[10]Thad Starner, Student Member, IEEE, Joshua Weaver, Alex Pentland, Member, IEEE Computer Society, “Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 12, DECEMBER 1998.

[11]Arti Thorat, Varsha Satpute, Arati Nehe, Tejashri Atre, Yogesh R Ngargoje, “Indian Sign Language Recognition System for Deaf People”, International Journal of Advanced Research in Computer and Communication Engineering Vol. 3, Issue 3, March 2014.

[12] Gaolin Fang, Wen Gao, Member, IEEE, and Debin Zhao, “Large-Vocabulary Continuous Sign Language Recognition Based on Transition-Movement Models”, IEEE Transactions On Systems, Man, And Cybernetics—Part A: Systems And Humans, Vol. 37, No. 1, January 2007.

[13]Daniel Kelly, John Mc Donald, Member, IEEE, and Charles Markham, “Weakly Supervised Training of a Sign Language Recognition System Using Multiple Instance Learning Density Matrices”, IEEE Transactions On Systems, Man, And Cybernetics—Part B: Cybernetics, Vol. 41, No. 2, April 2011.

[14]D. Comaniciu, V. Ramesh, P. Meer, “Real-time tracking of non-rigid objects using mean shift,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2000, vol. 2, pp. 142–149

[15] P. Buehler, A. Zisserman, and M. Everingham, “Learning sign language by watching TV (using weakly aligned subtitles),” in Proc. IEEE Comput. Soc. Conf. CVPR Workshops, Jun. 2009, pp. 2961–2968.