Abstract - Hand gestures are an important modality for humancomputerinteraction. Compared to many existing interfaces, hand gestures have the advantages of being easy to use, natural, and intuitive. Successful applications of handgesture recognition include computer games control, human-robot interaction, and sign language recognition, to name a few. Vision-based recognition systems can give computers the capability of understanding and responding to hand gestures. The paper gives an overview of the field of handgestureinteraction with Human-Computer, and describes the early stages of a project about gestural command sets, an issue that has often been neglected. Currently we have built a first prototype for exploring the use of pie- and marking menus in gesture-based interaction. The purpose is to study if such menus, with practice, could support the development of autonomous gestural command sets. The scenario is remote control of home appliances, such as TV sets and DVD players, which in the future could be extended to the more general scenario of ubiquitous computing in everyday situations. Some early observations are reported, mainly concerning problems with user fatigue and precision of gestures. Future work is discussed, such as introducing flow menus for reducing fatigue, and control menus for continuous control functions. The computer vision algorithms will also have to be developed further.
Abstract — Expressive and meaningful body motions involving physical movements associated with hands, arms, and face can be hugely ideal for conveying meaningful information, and interacting with the surroundings. This calls for a posture or a gesture a movement that is dynamic of body part. Generally, there exist many-to-one mappings from concepts to gestures and other method around. Hence gestures are ambiguous and incompletely specified. Each handgesture recognition depends on character in Sign Language. The related study area of sign gesture recognition is HumanComputerInteraction (HCI) and image processing that assist within the solution to this matter. This technology having the potential to progress way that is traditional which user connect to computers through the elimination of input devices such as for instance joysticks, mice and keyboards and allowing the human body to give signals into computer through gestures such as for example finger pointing. This paper provides Ann based gesture recognition approaches that have been implemented over the years.
In today’s age a lot of research is done for finding effective techniques and methods to make existing systems more reliable and efficient. One of the most important parameter to make system efficient and reliable is HumanComputerInteraction (HCI). Many systems provide simple techniques for HCI, most common techniques for input to the systems include use of mouse, keyboard etc. These are physically in contact with the system. Recently new techniques have been developed to make interaction with the system more efficient. This paper presentsthree interactive techniques for HumanComputerInteraction using the hand detection. The paper provides efficient and reliable hand based interaction techniques, which can be implemented using specific methodologies according to user’s applicationrequirement. The techniques described in the paper requires preprocessing like transformation of the users hand image from one form to another, enhancements etc for implementation.These preprocessing tasks can be done using various methods depending on the environment around in which the system is being used by the specific user. This paper tries to compare the different methods for all the preprocessing tasks required and then create a pipeline of these preprocessing tasks and then use the detected hand as an interaction device for HCI applications. Next part contains the three different types of
Once the set of fingertip candidates E have been found, there are upwards of two false positive extrema which may be due to folded fingers and wrist extrema. Those which do not belong to fingertips must be filtered to a subset of valid fingertips. There are a number of approaches that can be performed, template matching has been used in the past  however, the fingertip can have a wide range of appearances due to finger orientation, which is challenging to normalise, given the limited resolution. There are also increased challenges due to the contour noise, inherent to depth sensors that would lead to curvature based approaches being unstable. Optimisation of a kinematic model offers robust localisation [6,59] but with increased computational complexity. However, the route of the shortest path across the hand provides insight. The proposed approach considers the path taken to each extrema location relative to the wrist. It is highly unlikely that the path should approach the wrists location, due to kinematic limitations. For frontal interaction it is reasonable to assume the fingers are projecting towards the camera. Therefore a path that traverses towards the camera with decreasing depth is favourable. A penalty metric is formulated that considers both of these criteria. This is highly efficient to compute allowing four hands to be processed simultaneously. The penalty criterion seeks to remove falsely identified tips that reside around the wrist. The first stage is to localise the wrist using the segmentation computed in Section 3.2. The wrists centroid p c W is determined using the centroid of the set of points
3.3.1 k means: This algorithm [111, 112] is an unsupervised classifier that determines k centre points to minimize clustering error, defined as the sum of the distances of all data points to their respective cluster centres. The classifier randomly locates k clus- ter centres in the feature space. Each point in the input dataset is assigned to the nearest cluster centre, and their locations are updated to the average location value for each cluster. This process is then repeated until a stopping condition is met. The stopping condition could be either a user specified maximum number of iterations or a distance threshold for the movement of cluster centres. Ghosh and Ari  used a k means clustering based radial basis function neu- ral network (RBFNN) for static handgesture recognition. In this work, k means clustering is used to determine the RBFNN centres. 3.3.2 Mean shift clustering: Mean shift is a nonparametric clustering method that does not require prior knowledge of the number or distribution of the clusters . The points in the d-dimensional feature space are treated as an empirical probabil- ity density function with dense regions at the local maxima or modes of the underlying distribution. A gradient ascent procedure
A Liquid Crystal Display (LCD) is a flat panel display, e lectronic visual d isplay or video display that uses the light modulating properties of liquid c rystals. Liquid c rystals do not emit light directly. LCDs are available to display arbitrary images (as in a general purpose computer display) or fixed images which can be displayed or hidden, such as preset words, digits and seven segment displays as in a digital c lock. They use the same basic technology, except that arbitrary images are made up of a large number of small p ixe ls, while other displays have larger ele ments They are common in consumer devices such as clocks, watches, calculators, telephones and have replaced cathode ray tube (CRT) displays in most applications. They are available in a wider range of screen sizes than CRT and plas ma displays and since they do not use phosphors, they do not suffer image distortion. LCDs are however, susceptible to image persistence.
In glove-based systems (see Figure 1), the common technique for hand pose tracking is to instrument the hand with a glove which is equipped with a number of sensors to provide input to the computer about hand position, orientation, and flex of the fingers using magnetic or inertial tracking devices. An example of such device is the Data-glove which was the first commercially available hand tracker .While it is however easier to collect hand configuration and movement with this approach, the major drawback is that the required devices are quite expensive and cumbersome. Also, the ease and naturalness with which the user can interact with the computer controlled environment is hampered by the load of cables attached to the user. More details about data-glove approaches are available in a survey on data glove by Dipietro et al. .
The foreground segmentation aims to extract moving regions of interest, though moving shadows are often detected as well which are unwanted and should be removed.In gesture recognition domain and for static cameras, the popular approach to addressing the issue of foreground extraction is ‘Background Modeling’ . The background model can either be acquired in advance (Haritaoglu et al., 2000) or estimated adaptively online (Wren et al., 1997). The features/statistical models used have been well-studied in the literature, often depending on the type of imaging sensors or scene properties under study. Some notable examples of feature/model combination are: RGB colour/mixture of Gaussian model (Stauffer and Grimson, 2000) ; YUV colour and depth/MoG model (Harville et al., 2001); Chromaticity and gradient/single Gaussian model (McKenna et al., 2000) ; normalised rg/kernel density function (Elgammal et al., 2002).
Mihai Gavrilescu  recognized emotions from facial expressions and body postures based on stochastic context-free grammar (SCFG) using eight hand gestures and body postures. Anger, sadness, fear, happiness, surprise, and disgust were accounted. Montgomery and Haxby  developed a mirror neuron system (MNS), which mapped actions based on motor representations. Experimentation of functional magnetic resonance imaging was used where participants imitated, viewed, and produced facial expressions and social hand gestures. There are distinct representations of different types of social nonverbal communications in an MNS. A media-player system controlled by facial expression and hand gestures  was proposed by Agarwal and Umer. To find movement, one landmark point for a finger and 18 landmark points for lips were captured using support vector machines. Especially for the hearing impaired , the system was applied based on computed trajectory using the Cam-shift algorithm for face and hand motions. HMM was applied for hand tracking, finger tracking, face detection, and feature extraction. Efficiency was calculated by the number of objects presented in the video or image, such as one or two hands.
ABSTRACT : Considerable effort has been put toward the development of intelligent and natural interfaces between users and computer systems. In line with this endeavor, several modes of information (e.g., visual, audio, and pen) that are used either individually or in combination have been proposed. The use of gestures to convey information is an important part of human communication. Handgesture recognition is widely used in many applications, such as in computer games, machinery control (e.g., crane), and thorough mouse replacement. Computer recognition of hand gestures may provide a natural computer interface that allows people to point at or to rotate a computer-aided design model by rotating their hands. Hand gestures can be classified into two categories: static and dynamic. The use of hand gestures as a natural interface serves as a motivating force for research on gesture taxonomy, its representations, and recognition techniques. This paper summarizes the surveys carried out in human--computerinteraction (HCI) studies and focuses on different application domains that use hand gestures for efficient interaction. This exploratory survey aims to provide a progress report on static and dynamic handgesture recognition (i.e., gesture taxonomies, representations, and recognition techniques) in HCI and to identify future directions on this topic.
A study was carried out by  on eye-gaze interaction for mobile phone use following two methods, the standard dwell-time based method and the gaze gesture method. Proposing to implement an eye tracker on a mobile phone platform, they further designed a number of gaze gestures which, upon recognition, can trigger certain actions such as scrolling up and down a phone book, opening or closing an internet browser. This study concludes that gaze gesture is robust to head movement since it only captures relative eye movement rather than absolute eye fixation. Calibration is also unnecessary and this makes eye gesture more suitable for real-world applications. The two interaction methods are further compared by  where participants were using either gaze gestures or dwell icons in the context of a 3D immersive game. At the end of the experiment, they evaluated the task completion time, selection error and missed gestures or clicks so as to compare the two types of command input method. They suggested that “gaze gestures are not only a feasible means of issuing commands in the course of game play, but they also exhibited performance that was at least as good as or better than dwell selections”. Another study  achieved gaze gesture recognition for HCI under more general circumstances. It employs the hierarchical temporal memory pattern recognition algorithm to recognise predefined gaze gesture patterns. The recognition of 10 different intentional gaze gesture patterns achieved 98% accuracy. Some other works on gaze gestures dedicated to HCI have similar limitations. Firstly, they all depend on active NIR lighting for eye centre localisation. Secondly, the eye centre localisation algorithms only work at short distances.
With the development of science and technology, human- computerinteraction has gradually become a part of people's lives . The human-computerinteraction technology has been constantly developed . Gesture recognition is an important part of human-computerinteraction  and plays an indispensable role in people’s daily life, and it has a wide application in computer games, virtual reality, medical care and other areas [5,6]. Kuang et al.  used the zed stereo camera to get the gesture depth image, segmented the gesture image by the depth information and color information detection, carried out the fingertip detection, and recognized the five kinds of digital gestures by support vector machine (SVM). The average recognition rate was 94.9%, which indicated the high validity of the method. Huang et al.  proposed a Gabor filter and SVM based handgesture recognition method which eliminated the limitation of illumination conditions and obtained a recognition rate of 96.1% in the experiment. Moreover it was found that the use of Gabor filter improved the recognition accuracy from 72.8% to 93.7%, which suggested the high feasibility of the method. Li et al.  designed a gesture recognition system, used previous facial knowledge based adaptive skin region segmentation algorithm to segment gesture, and then used SVM to recognize gesture. The experimental results showed that the gesture recognition method had a recognition rate of 95.88%, indicating that the method had excellent performance in gesture recognition and could be applied in real life. Nagarajan et al.  proposed a gesture recognition system based on edge histogram features and multi-class SVM to recognize American Sign
Since both accelerometers and MATLAB signals have their own ad-vantages in capturing hand gestures, the combination of both sensing approaches may improve the performance of handgesture recognition. Although studies that utilized both MATLAB and ACC signals, few combined them to realize a gesture-based interaction system. In our pilot studies, a series of promising applications with gestural interfaces relying on portable ACC and MATLAB signals were developed, including sign language recognition and human–computerinteraction. We further designed a wearable gesture-capturing device and then re-alized a gesture-based interface for a mobile phone to demonstrate the feasibility of gesture-based interaction in the mobile application . In that preliminary work, MATLAB and ACC signals were not actually fused together in that interface, and only nine gestures were supported.
Defining cursor position with yellow. The user might be left or right handed, so a video frame of rectangular sub- portion of smaller size is considered. Due to the noise captured and vibrations of the hand, the centers keep dislocating the mean position. It uses setcursorposition() so that the new center position is set close to the old center. Now the three centers are sent for deciding what action to be performed depending upon their relative position. This is done by the chooseAction() in the code depending upon its output. The performAction() carries out either of the following pyaotogui library function: click, select, drag, scroll.
Abstract - Hand gestures are a powerful way for human communication, with lots of potential applications in the area of humancomputerinteraction. Vision-based handgesture recognition techniques have many proven advantages compared with traditional devices, giving users a simpler and more natural way to communicate with electronic devices. This work proposes a generic system architecture based in computer vision and machine learning, able to be used with any interface for human-computerinteraction. The proposed solution is mainly composed of three modules: a pre-processing and hand segmentation module, a static gesture interface module and a dynamic gesture interface module. The experiments showed that the core of vision- based interaction systems could be the same for all applications and thus facilitate the implementation. For hand posture recognition, a SVM (Support Vector Machine) model was trained and used, able to achieve a final accuracy of 99.4%. For dynamic gestures, an HMM (Hidden Markov Model) model was trained for each gesture that the system could recognize with a final average accuracy of 93.7%. The proposed solution as the advantage of being generic enough with the trained models able to work in real-time, allowing its application in a wide range of human-machine applications. To validate the proposed framework two applications were implemented. The first one is a real-time system able to interpret the Portuguese Sign Language. The second one is an online system able to help a robotic soccer game referee judge a game in real time.
The developed methodology makes use of the previous described information and for each depth image received, if the face is detected, it performs the following process: The background is removed using the distance of the user to the RGB-D sensor as a threshold value. This distance was computed based on an histogram approach. Then a morphological close and a region grow- ing operation are used in order to remove all the possible noise present on the image. Another threshold using the same distance value is then performed in the original depth image in order to obtain the arms of the user and other possible objects close to the camera. An AND operation is performed with the two resultant images and the original depth image and the arms of the user are obtained. After the segmentation process and with the arms obtained, a PCA is computed to each individual object to detect its orientation and the tip of the hand that is tracked later. The tracking is performed using a Kalman filter using a constant position model that proved to be efficient. In order to identify the three gestures parameterized three simple FSMs were implemented since the gestures are very distinct and not difficult to model. The FSM were low compute expensive and of easy implementation.
in terms of pointing gesture. In order to distinguish a point- ing gesture from other unintentional hand movements, we comprehend certain requirements that a hand movement has to meet to be recognized as a pointing gesture. With- out these requirements, background noise or other random hand motions may interfere the identification process and reduce the effectiveness in human-robot interaction. People are not prone to randomly lift their hands to the same level as shoulders for a while, as this motion is likely to cause discomforts. We utilize this finding to set up our triggering mode of detection initialization. Also, to avoid complexity of notion and construct a uniform standard, we propose the following criterion: a meaningful point- ing gesture will be recognized only when a human part- ner lifts either his/her left hand or right hand to the same level as his/her shoulder. This will lead to a relatively com- putational method. In order to comprise flexibility to this method, we allow for a range of relative vertical distance between the vertical position of the hand and shoulder, e.g., 13cm. The holding phase is also required to initiate the detection process which must last for more than a certain time, e.g., 1s.
Interaction with robots is often carried out in indus- trial environments with control panels. However, con- trol panels are not often user-friendly. Ideally, in service tasks, their use for interacting with people should be intuitive and user-friendly through the use of voice and gestures. The use of voice is usually con- strained by the environmental noise. Therefore, visual methods are interesting for human-robot interaction, specifically, in the recognition of hand gestures to in- dicate actions to the robot.
This chapter presented a summary of the results for the components of a gesture re- cognition system. The presented method for hand segmentation yields better results compared to using only depth or colour information and works for a wide variety of users even with a non-uniform background and with skin coloured objects in the background. This method satisfies user adaptability, reconfigurability and envir- onmental requirements specified in Section 1.2. An accuracy of 75% was achieved using an ANN classifier with joint angles for hand-gesture recognition. This was shown to be an improvement over using the position of the fingertips or using a k- means classifier. The per-subject accuracy for the handgesture recognition system has a small standard deviation, proving the robustness of the system for a variety of users. While the joint angle feature vector is superior to the position or state- based feature vectors this method relies on the correct detection of fingertips that are both fully extended. Further improvements to the fingertip detection algorithms must be made to improve the accuracy of the hand-gesture recognition system.The designed cascaded neural network architecture with MBSGD results in upper body gesture recognition of 100% across all gestures in three different datasets implying that the combination of ANNs with joint angles as a feature vector are ideal for use in classifying dynamic upper body gestures. Real-time performance is achieved for both the hand gestures and upper body gestures, fulfilling the responsiveness requirement specified in Section 1.2.
Many researchers in the field of robotics and humancomputerinteraction have tried to control mouse movement using video devices. However, different methods were used to make a clicking event. One approach, by Erdem et al, used finger tip tracking to control the motion of the mouse. A click of the mouse button was evolved by defining a screen such that a click occurred when a user passed his hand over the surface [5, 6]. Another approach was developed by Chu-Feng Lien . Only the finger-tips to were used to control the mouse cursor movements. The clicks were based on image density, and the user needed to hold the mouse cursor on the desired spot for a short period of time. Paul et al, used still another mechanism to click. They used the motion of the thumb from a ‘thumbs-up’ position to a fist to mark a clicking event . Movement of the hand while making a special handgesture moved the mouse pointer.