• No results found

Clustering Method Evaluation for Hidden Markov Model Based Real- Time Gesture Recognition

N/A
N/A
Protected

Academic year: 2020

Share "Clustering Method Evaluation for Hidden Markov Model Based Real- Time Gesture Recognition"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Clustering Method Evaluation for Hidden Markov Model Based Real- Time

Gesture Recognition

Jay Shankar Prasad G.C.Nandi Robotics & AI Lab

Indian Institute of Information Technology Allahabad, India

e-mail: [email protected], [email protected]

Abstract— This paper deals with the development of high performance real-time system for complex dynamic gesture recognition. The various motion features are extracted from the video frames which are used by HMM classifier. We used several clustering techniques for performance evaluation of the classifier. Our system vectorises gestures into sequential symbols both for training and testing. We found very encouraging results and the proposed method has potential application in the field of human machine interaction.

Keywords—Clustering; HMM; Gesture

I. INTRODUCTION

Gestures are used to communicate due to their versatile appeal. They play a vital role in many aspects of communication. But gesture recognition is difficult due to its inherent ambiguity. A class room tutor’s gestures suggest a series of steps that, if followed will lead to a correct solution to the problem. Merely copying these steps for solving any problem will not always produce the correct solution in any way. To figure out how to solve the problem the learner must not only reproduce the tutor’s gestures, but must also understand what those gestures represent. Through a gesture based user interface, users can specify commands by simple actions. For example a system or a robot may be programmed by showing specific actions and the robot may not only follow the human commands but also the meaning of those commands and respond consequently. It requires true and accurate gesture sensing and recognition due to its stochastic nature. This gesture based imitation learning can be deployed for monitoring industrial processes, for criminal investigations, for helping disabled peoples and many other public services. A good gesture based system must be adaptable to change and able to interact in a real time environment. It can be implemented from simple facial gesture learning to complete bodily human action learning. Gestures can be represented by hand actions, position, orientation and movement of arms and other body parts of a human being. Gestures can be applied to conversational functions, for controlling purposes and communication. Gestures used for communication work contain a lot of information and their role is highly structured. It requires extracting feature which is crucial and typical. Two types of features are selected: ‘Spatial’- consisting of two dimensional or three dimensional location of hand, arm or some other body parts. Another feature considered is ‘Temporal’ which needs statistical

methods to represent the body parts which are independent of time. We use HMM for gesture recognition and evaluated the recognition performance based on different clustering methods.

II. ANALYSIS OF PREVIOUS RESEARCH WORK

Gestures have been studied and used for machines since the 90’s. In [1], the authors have described the gesture based interaction and communication method for classification of hand gestures contour. Computer vision based technique for gesture recording, segmentation, filtering and representation are used due to ease of use. The code optimization and classification time is a payoff. In [2, 3] the method for simultaneous spotting and recognition of whole-body key gestures were used. The recognition of gesture in video needs fine segmentation of the useful gestures from the body gesture sequence. The several body joints were used for finding out the structural features of the body in 3D. To analyse a gesture, [4] used hand detection, splitting of meaningful gestures from images and 2D & 3D feature extraction and recognition. Hidden markov model approach is used by various researchers for gesture based learning and recognition [3],[4]. A dynamic gesture recognition system [7], which requires no special hardware other than a webcam, is based on a novel method combining Principal Component Analysis (PCA) with hierarchical multi-scale theory and Discrete Hidden Markov Models (DHMM). The decision tree is used for recognition and the search time found by them is low. Dimension reduction technique was applied to reduce the input space and utilize the gesture data in an efficient way [7].

The neural network approach is also a method for gesture learning and classification [10, 12]. The main disadvantage with neural network approach in gesture learning or training is the time required to train a network [12].

There are many gesture capturing devices which were used like the electro-mechanical, optical, sensor based gesture capturing device. The performance of any recognition process also depends on how accurately the data has been captured [6].The motion capturing devices are costly but the gesture recognition through video image becomes slow due to lot of image processing and feature computation which relies on complex methodologies for these works. For accurate motion capturing multiple cameras are needed and it requires lot of 2009 International Conference on Advances in Recent Technologies in Communication and Computing

(2)

processing power to process those video images to extract the joint angle values.

2.1 Data Collection for Gesture Recognition:

The preliminary work in gesture recognition is to gather the raw data for using in different learning algorithms. The three approaches widely used for Getsure Data Gathering are as follows:

1. Use of data gloves or joint angle measuring instruments worn by the demonstrator.

2. Use of multiple camera to capture body motion then apply computer vision algorithms to extract useful data from it. 3. The 3rd approach is to combine both the 1st and 2nd approaches for improving the result of recognition.

2.1.1. Vision based motion data capturing instrument:

While utilizing video techniques, the most important thing is the efficient use of the camera. Our first concern is the number of cameras to be used; it may be one or two cameras is needed for stereo vision and multi camera systems are used for accurately obtaining motion capture data . Rajesh Rao et. al.[12] have shown that it is possible to identify gestures using one camera. When depth information is needed two or more cameras are used. Since hand is the most important part to make gestures it is thus of upmost importance to recognize and understand hand motion correctly. LEDs (Light Emitting Diode) are placed on various points of the hand for easy recognition by the camera. Another way is to wear differently colored dresses to highlight hand and its different joints. Colored gloves are easier to extract hands gestures through image video processing. For gesture recognition Rao[12] used single camera for dynamic imitation in a humanoid robot through learning full body motions.

2.2 Gesture recognition techniques:

The most popular and interesting technique used for gesture recognition is Hidden markov model since 90’s [3,4,5,7,8]. A hidden Markov model (HMM) is a statistical classifier where the system being modeled is believed to be a Markov process with unknown parameters. The challenge is to determine the hidden parameters from the observable data. The extracted model parameters are used to perform pattern recognition An HMM can be considered as the simplest dynamic Bayesian network [9]. In a regular Markov model, the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters [8]. In a hidden Markov model, the state is not directly visible, but variables influenced by the state are visible [4]. Each state has a probability distribution over the promising output tokens. Hence the strings of tokens generated by an HMM gives information about the state-sequences. Hidden Markov models are especially applied in temporal pattern recognition such as speech, handwriting, gesture recognition, parts-of-speech tagging, musical score following, partial discharges and bioinformatics [3, 7, 10].There are three problems associated with HMM:

1. If the parameters of the model are known, the probability of a particular output sequence and the probabilities of the hidden state values, given that output sequences are calculated using forward back ward algorithm.

2. If the parameters of the model are known, the most likely sequence of hidden states, that could have generated a given output sequence is calculated using

Viterbi algorithm.

3. If output sequence or a set of such sequences is given to the system, the most likely set of state transitions and output probabilities is calculated using Baum Welch Algorithm.

HMM techniques were used by many researchers for gesture recognition. The steps followed are almost same here. We are exploiting the effect of clustering on the training and testing of HMM, hence the overall recognition process gets affected by clustering method. We applied several clustering methods and analysed the result.

The next section explains the Methodology applied for Gesture training and testing.

III. METHODOLOGY

[image:2.612.315.556.390.668.2]

Our methodology for gesture recognition is shown in the Fig. 1. We first captured the gesture using a webcam in real time and stored the video(s).

Figure 1. Overview of Gesture Recognition Vector quantization

Gesture Capture

Feature Extraction (mx,

my, vx, vy, intensity)

Code book design Clustering

Hidden Markov Model Training

a11 a22 a33 a44 a55

a12 a23 a34 a45

b1 b2 b3 b4 b5

Find the parameters of HMM

(3)

We considered these five gestures for experimentations: Right hand straight, head gesture showing yes, swinging both hands, bending upper torso body portion, turning towards right direction.

We converted the color image frames extracted from video into gray image. We calculated the features from image frames. The features we considered are mean and variance in both the x and y direction of motion and intensity of motion [16]. At first the image differences were obtained from it helped us in eliminating the background and static part of the image. The difference image contains ‘+’, ‘-‘, and ‘0’ pixel values for black, white and gray respectively. Difference image is calculated by subtracting the pixel value of same position (x,y) of adjacent frames of original image sequence using equation (1).

D(x, y, t) = B(x, y, t) - B(x, y, t-1) (1)

Noise is eliminated from the difference image by applying a threshold as equation (2).

, , 0

, ,

, , : , , (2)

Centre of motion of image ,

∑ , | , |

∑ , | , | (3)

and ∑, || , |, |

, (4)

Another feature is mean absolute deviation from centre of

motion , where

∑ ,| , |

∑ , | , | (5) ∑, ,

∑ ,| , | (6)

, represents the intensity value of the point , of the ith frame where 0,1,2, . . . Then

,

1, , ,

0, , ,

1,2,3, . . .

1.

Where p intensity stable intervals are 1, 2, … . Hence motion intensity value is the average intensity value of

sth intensity stable interval and is given by

∑ ,

(7)

if 2,3. . . , then 1 2 s-1. Else b=0 for s=1. Based on these features obtained from the equation(3),(4),(5), (6), and (7) our feature vector is a 5 tuple

; ; ; ; .

Next step is the vector quantization which encompasses the clustering and code book generation procedure.

3.1 Clustering

Clustering is the process in which the assignment of a set of observations into subsets (called clusters) is done so that observations in the same cluster are similar in some sense. Clustering is an unsupervised learning technique for statistical data analysis used in various fields including machine learning, data mining, pattern recognition, image analysis and bioinformatics.

We applied different clustering algorithms for analyzing the effect of clustering on recognition. They are:

(i) Hierarchical clustering (ii) Mean shift clustering (iii) K-means clustering (iv) Fuzzy C-means clustering (v) Gaussian mixture clustering

3.1.1 Hierarchical clustering

Bottom up or top down approach is followed for it. To determine which cluster should be combined or split, a measure of dissimilarity between sets of observations is required. It is done by a metric like Euclidean distance and linkage criteria which specifies the dissimilarity of sets as a function of the pair wise distances of observations in the sets [11].

Complete linkage clustering = , : , (8) Single linkage clustering = , : , . (9) Average linkage clustering=| || | ∑ , , . (10) Where A,B are observations and distance metric

∑ .

3.1.2 Mean shift clustering

(4)

3.1.3 K-means clustering

Given an initial set of k no. mean values say 1 1 , 2 1 , … . 1 , these means can be obtained stochastically or heuristically. In the assignment stage of the algorithm each observation is assigned to the cluster with the closest mean as explained in [14] using equation (11).

: 1,2, …

(11)

In the update stage new mean is calculated to become the centroid of the observation in the cluster using equation (12).

∑ (12).

Algorithm converges when assignment becomes constant (no change).

3.1.4 Fuzzy c-means clustering

Initially numbers of clusters are selected. Assign randomly to each point coefficient for being in clusters. Compute the centroid for each cluster [15] using equation (13).

∑ (13).

Where membership function is u of x in cluster k and obtained using equation (14).

, ; 1 ∞ (14).

Update

∑ ,,

(15).

Stop when algorithm converges.

3.1.5 Gaussian mixture clustering

Initialize the initial Gaussian means μi, i=1,2,..G, Initialize the covariance matrices, , to the distance to the nearest cluster. Initialize the weights πi =1 / G so that all Gaussian are equally likely [2]. Present each pattern X of the training set and model each of the classes K as a weighted sum of Gaussians:

)

G

|

p(X

π

)

θ

|

p(X

i G 1 i i s

=

=

(16)

Where G is the number of Gaussians, the πi’s are the weights, and )] ( ) ( 2 / 1 [ 2 / 1 2 / 1

|

|

)

2

(

1

)

|

(

X i TVi X i

i d i

e

V

G

X

p

μ μ

π

− −

− −

=

(17)

Calculate

= = = ≡ G 1 j k j j k i i k i i i ip ) C , θ | p(X π ) C , G | p(X π p(X) ) C , G | p(X π X) | P(G

τ (18)

Iteratively update the weights, means and covariances using equation (19),(20) and (21) respectively:

=

=

+ Nc

p ip c

i(t ) N τ (t) π

1 1

1 (19)

p N p p i i c X t t N t

i

c

= = + 1 ) ( ) ( 1 ) 1 ( τ π

μ (20)

) )) ( ))( ( (( ) ( ) ( 1 ) 1 ( 1 T i p i p N p p i i c

i t X t X t

t N t

V cτ μ μ

π − −

=

+

=

(21).

3.2 Code book design

Based on the cluster obtained from previous algorithms of clustering we are using the clusters with numerical assignments. Thus code book is a look up table of entire frame sequence of a particular gesture.

3.3 Hidden markov model training

We are creating 5 state HMM and Baum Welch algorithm for training our Hidden markov model. , , 1,

parameters of HMM are obtained. Where A is the transition matrix, B is the observation matrix, pi1 is the a-priori transition probability and loglike is the maximum likelihood of the event.

3.4 Testing a new sample

All the above steps upto vector quantization is performed. For recognition we are using the Viterbi algorithm.

_ , _ , _ , is given as the input to Viterbi Algorithm. We estimate log-likelihood for each model and produced the matched result.

IV. RESULTS

(5)
[image:5.612.72.290.72.338.2]

Figure 2. Gesture recognition using various cluster

gestures we considered so far. We took 5 g persons, 5 times each total 250 training sam these videos we trained the model. A new ge it could be of the same person who participat new person. We found extremely encourag tested 150 samples, 100 samples of the sam participated in gesture training and 50 sampl did not participated in training. In real time al done. Gaussian mixture model gave the bes our case and the minimum recognition achiev case of mean shift clustering method.

V. CONCLUSION

We applied various clustering algorithms vectorisation process. We used the clustering book design for training and testing samples. W hierarchical clustering is fast where-as k-mea means take almost equal time but more than mean shift clustering. Gaussian Mixture cl slow, if a compromise between time and re tolerable then we suggest using k-means c vectorisation process. The developed framew gesture recognition in a robust manner with special hardware devices. Systems developed

70 75 80 85 90 95 100

R

e

co

gn

it

io

n

%

Type of Gestures

Hierarchical

Mean S

K-means

Fuzzy c Gaussian Mixture

ring technique

gestures from 10 mples. Based on esture was tested, ed for training or ging results. We

me persons who les of those who ll the testing was t performance in ved by us was the

for the HMM g result for code We observed that ans and fuzzy

c-hierarchical and lustering is very ecognition rate is classifier for the work is useful for hout use of any d in future can be

interfaced with a Robot for obeyi with more gesture vocabulary.

REFERENC

[1] Lalit Gupta and Suwei Ma,

Communication: Automated Classific IEEE Transactions on Systems Man Applications and Reviews, VOL. 31, N

[2] Hee-Deok Yang, A-Yeon Park, and Se

of Key Gestures from Whole Body M the 7th International Conference on Recognition (FGR’06) IEEE.

[3] Seong-Whan Lee,,Automatic Ges

Human-Robot Interaction”, Proce Conference on Automatic Face and IEEE.

[4] Kye Kyung Kim, Keun Chang Kwa

Analysis for Human- Robot Interacti Feb. 20-22, 2006.

[5] Pengyu Hong, Matthew Turk, Thoma

and Recognition Using Finite St on Face and Gesture Recognition, Ma

[6] Deyou Xu, A Neural Network Approa

in Virtual Reality Driving Trainin International Conference on Pattern Re

[7] Hai Wu, “Dynamic Gesture Recognit

Theory and HMM”, Image Extraction, Proceedings of SPIE Vol. 4550, pp 13

[8] Junghyun Kwon and Frank C. Park, “

Generate Natural Humanoid Movem IEEE/RSJ International Conference on pp-1990-1995, 2006.

[9] Matthew W. Hoffman, David B. Grim

Rao, “A probabilistic model of gaze Neural Networks 19, pp 299–310, 2006

[10] Masato Ito, Kuniaki Noda, Yukiko H

interactive generation of object handlin robot using a dynamic neural network 323–337, 2006.

[11] Chuanjun Li, Punit R Kulkarni and B

Recognition of Motion Capture D Multimedia Tools and Applications, Vo

[12] Fels, S. and Hinton, G., “ Glove-Ta

which maps gestures to parallel form IEEE Transactions on Neural Netwo 1998.

[13] D.Comaniciu, P. Meer, "Mean shift: space analysis," Pattern Analysis a Transactions on , vol.24, no.5,

pp.603-[14] Kanungo, T.; Mount, D. M.; Netanyah

R.; Wu, A. Y. "An efficient k-means c

implementation". IEEE Trans. Pa

Intelligence 24: 881–892,2002. [15] Krishnapuram, R.; Keller, J.M., "A po

Fuzzy Systems, IEEE Transactions o 1993.

[16] G. Rigoll, A. Kosmala, and S. Eickele

Gesture Recognition Using Hidden M Workshop, Bielefeld, Germany, vol 137

Shift -means

ing the masters’ command

ES

Gesture-Based Interaction and cation of Hand Gesture Contours”,

n and Cybernetics Part C: NO.1, pp 114-120, February 2001.

eong-Whan Lee,Robust Spotting

Motion Sequence”, Proceedings of Automatic Face and Gesture

ture Recognition for Intelligent eedings of the 7th International Gesture Recognition (FGR’06)

ak and Su Young Chi, “Gesture

on”, ICACT2006, pp 1824-1827,

as S. Huang, “Gesture Modeling

tate Machines”, IEEE Conference

arch 2000.

ach for Hand Gesture Recognition

ng System of SPG, The 18th

ecognition (ICPR'06).

tion Using PCA with Multi-scale Segmentation, and Recognition, , 2- 140, 2001.

“Using Hidden Markov Models to ment”, Proceedings of the 2006

n Intelligent Robots and Systems,

mes, Aaron P. Shon, Rajesh P.N. imitation and shared attention”, 6.

Hoshino, Jun Tani, “Dynamic and ng behaviors by a small humanoid k model”, Neural Networks 19 , pp

. Prabhakaran, “Segmentation and Data Stream by Classification”,

ol. 20.

alkII: A neural network interface mant speech synthesizer controls”,

rks, pp. 205--212, Vol 9, No. 1,

a robust approach toward feature and Machine Intelligence, IEEE

619, May 2002.

u, N. S.; Piatko, C. D.; Silverman, clustering algorithm: Analysis and attern Analysis and Machine

ssibilistic approach to clustering,"

on , vol.1, no.2, pp.98-110, May

er, “ High Performance Real-Time

Markov Models,” In Proc. Gesture

Figure

Figure 1. Overview of Gesture Recognition
Figure 2. Gesture recognition using various cluster ring technique

References

Related documents

convergence and reliable results is the most widely used method of large load flow analysis. In large-sized problems, Newton-Raphson method generates the solution in less time,

We cannot be definite if the deep water sampled in the MEDOC re- gion during March 2006 was a signature of nWMDW formed there the previous winter or if it represents nWMDW formed

Project Preparation Phase Completed Blueprint Phase Completed Customer Acceptance Final Preparation Phase Completed (Go-Live Readiness) Project Completed (Project

Regarding patients with LA-PC, a number of studies have shown that preoperative treatment (eg, with FOLFIRINOX or chemoradiation) results in markedly improved OS compared to

We have noted that the everyday expression ‘maturity’ is used in connection with intellectual development, which means that what is mature or immature is a human mind. But we also

By a post-processing phase, once an FDTD simulation has been per- formed on an opal crystal sample of given structure — in the present case, the FCC one — and the DFT complex

In summary, this project is the first randomised controlled trial to be conducted to evaluate the efficacy of customised foot orthoses for reducing pain and improving function

To explore how soft skills influence labor market outcomes, in particular wage premiums or penalties and gen- dered labour market composition, we developed a semi-automatic approach