Future work - Independent hand-tracking from a single two-dimensional view and its application

Although substantial contributions to hand tracking were formed in this thesis, and the research objectives were achieved, directions for further improvement can still be made. In this section, several directions to further improve and extend the approaches

Chapter 7. Conclusion and Directions for Future Research 148 presented in this thesis will be discussed. This includes suggestions that could increase the overall performance of the hand-tracking framework.

7.3.1 Arm-based hand-tracking

As an extension to the hand-tracking framework, a method could be integrated that would allow for the hands to be tracked when short-sleeved clothing is worn. One way to address this problem is to detect skin pixels from both the hands and arms, and create a binary image of skin and non-skin pixels. When the hands move from one location to another, the arms move with them at all times, and therefore the hands and arms can be isolated using a background subtraction technique. Combining the foreground image representing the moving objects with the skin-pixel binary image would result in only skin-coloured foreground objects represented by silhouettes. Segmenting elongated silhouettes would be a means to identify the hands and arms. Finding the skeleton of these silhouettes and using the skeletons to identify the end at which the hand is situated, is a way to segment the hands from the arms. Since the arms are bound by physical constraints, they have a limited area in which they can move. This fact can then be used to track and distinguish between hands.

7.3.2 Enhancing skin-colour segmentation

A fundamental function of the tracking algorithm is to identify the hands based on the skin colour of the individual, since it provides shape and scale invariance. In the current research, the individual’s skin colour was identified based on a selected area around the nose. This requires the individual to face forward towards the camera, which is a valid assumption, since facial expressions are the non-manual features that are used to distinguish between signs.

If the individual turns his/her face or looks away from the camera, a Gaussian model, for example, should be used. Therefore, an extension to the skin-segmentation algorithm in the hand-tracking framework would be to continue segmenting skin-coloured regions even when an individual happens to not face towards the camera. A possible approach would be to train an online Gaussian model using the skin-colour distribution selected from the face at times when the individual is facing towards the camera and employing

Chapter 7. Conclusion and Directions for Future Research 149 the model when either the individual’s face cannot be detected or when the individual does not face the camera. An alternative approach would be to apply an online trained model in every frame and logically ‘AND’ it with the output of the skin-segmentation algorithm of the hand-tracking framework, which would possibly provide a more accurate and enhanced skin-colour segmentation.

Another possible approach would be to employ a semantic-segmentation algorithm sim- ilar to the work of Liu et al. [95] and use the semantic labels to isolate the individual in the scene. The individual’s skin colour should then be determined or pixels should be grouped together based on their similarity, so more non-hand objects or areas can be excluded from the scene, which would lead to a more accurate and efficient tracking process.

7.3.3 Improved hand detection

The ELBP-RF hand-detection algorithm proposed in this study requires an image region to determine if this region is a hand. Given an image region, the extended LBPs are applied and used to populate a histogram using a global feature approach. The histogram features are then used to train a random forest model, where the model is used to predict the image region. As an extension to this algorithm and an improvement to the hand-tracking framework, the algorithm should include a ‘sliding window’ mask that scans the entire image from the top-left to the bottom-right corner and assigns probability values to each region in the image using the trained random forest model. These probability values will define each image region’s likelihood of containing a hand. Based on these likelihoods, regions with lower probability values can be excluded, which would decrease the probability of hand-tracking failure. In SASL, hand shapes are an additional feature, together with hand-tracking and facial expressions, that are used to describe sign language gestures. Therefore, to further improve hand detection, the hand shape can be used. The hand shape can be determined using several 3D model approaches, where templates are extracted and matched, or appearance-based methods that, for example, predict the hand shape based on extracted low-level features and a trained model. Knowing the hand shape helps to further eliminate areas in an image that do not contain a hand.

Chapter 7. Conclusion and Directions for Future Research 150

7.3.4 Optimised hand-tracking framework

The current hand-tracking framework was aimed at improving the hand accuracy using several algorithms; however, these algorithms were not optimised to achieve real- time performance. A performance evaluation was therefore not done; however, several areas can be optimised to increase hand-tracking performance. To improve the face- detection algorithm, rather than scanning the entire image for a face, only the top half of the image should be scanned. Another improvement would be to determine the foreground objects and skin-coloured objects in parallel, followed by combining these areas using a logical ‘AND’ function. To identify the skin-coloured clusters in an image, the connected-components labelling algorithm was used. By applying the optimised connected-components labelling algorithm of Wu et al. [165], the time required to scan an image can be reduced by half and the total execution time can be reduced by a factor of five or more.

Furthermore, the time required to compare features in and around the hand with features from the respective databases can be reduced by excluding features that are identical in both the right- and left-hand databases. These features are often extracted from the region where the hands overlap and are then stored in both databases. Only storing and comparing the features that are unique to each hand will not only reduce the computa- tional time, but might also improve the accuracy of the hand-tracking framework. In order to further improve the performance of the hand-tracking framework, the proposed algorithms can be ported to run on a GPU, thereby exploiting the GPU’s parallel processing capabilities. These capabilities are attributed to the GPU’s architecture, which contains hundreds of cores that allows it to run millions of threads concurrently.

In document Independent hand-tracking from a single two-dimensional view and its application to South African sign language recognition (Page 164-167)