Human Identification Based on Gait Paths
Adam Świtoński1,2, Andrzej Polański1,2, Konrad Wojciechowski1,2 1 Polish-Japanese Institute of Information Technology, Aleja Legionów 2, 41-902 Bytom,
Poland
{aswitonski, apolanski, kwojciechowski}@pjwstk.edu.pl
2 Silesian Ubiversity of Technology, ul. Akademicka 16, 41-100 Gliwice, Poland
{adam.switonski, andrzej,polanski, konrad.wojciechowski}@polsl.pl
Abstract. We have proposed and evaluated features extracted from the gait paths based on the data from a motion capture for human identification task. We have used the following paths: skeleton root element, feet, hands and head. We have collected motion gait database containing 353 different motions of 25 actors. We have proposed four approaches to extract features from motion clips: statistical, histogram, Fourier transform and timeline We have prepared motion filters to reduce the impact of the capturing location and actor’s height on the gait path and the method of steps detection. We have applied supervised machine learning techniques to classify gaits described by the proposed feature sets. We have prepared scenarios of the features selections for every approach and iterated classification experiments. On the basis of obtained classifications results we have discovered most remarkable features for the identification task. We have achieved almost 97% identification accuracy for reliable normalized paths.
Keywords: motion capture, human identification, gait recognition. supervised learning, features extraction, features selection, biometrics
1 Introduction
Biometrics is the discipline of recognizing humans based on their individual traits. There are numerous areas in which it is used. We can enumerate crime, civil and consumer identification, authorization and access control, work time registration, monitoring and supervision of public places, border control and many others. Biometrics methods most often are based on: finger, palm and foot prints, face, ear, retina and iris recognition, the way of typing, speech, DNA profiles matching, spectral analysis, hand geometry, and gait.
The great advantage of the gait identification is the fact that it does not require the awareness of the identified human. Unfortunately, it is not so accurate as for instance fingerprints or especially DNA methods. Gait identification is useful when very high efficiency is not required. It can be used for the introductory detection or selection of the suspicious or wanted humans. It could be used in customer identification. If a customer is identified and the profile of his interest is evaluated on the basis of the earlier visits, special offer can be addressed to him, by salesman, displaying banner
or by playing the recordings of his favorite type of music in the music shop. The applications are multiple.
The gait can be defined as coordinated, cyclic combination of movements which results in human locomotion [5]. It means that even short its fragment is representative, and has common features with the remaining part of the gait.
2 Motion capture
The gait can be captured by traditional two-dimensional video cameras of monitoring systems or by much more accurate motion capture systems.
Motion capture system acquires motion as a time sequence of poses. There are numerous formats for representing a single pose. In a basic C3D format, without applied skeleton model, we obtain only direct coordinates of the markers located on the human body and tracked by the specialized cameras. Comparison and processing of such data is difficult, because there is no given direct meaning of the following markers and what is more, same markers on the different motions or even different frames of the single motion can have different labels. The raw data has to be processed to estimate the pose in which we have direct information of the location and state of the body parts. The skeleton model has to be applied to label markers properly. Then, on the basis of labeled markers, the pose can be calculated.
A well known format of the pose description is ASF/AMC. It describe pose by skeleton tree like structure with measured bones lengths. The root object is placed on the top of the tree and is described by its position in global coordinate system. Child objects are connected to their parents and have information of rotation relative to the parents represented by Euler angles.
Direct applications of the motion capture systems to the human identification tasks are limited because of the inconvenience of the capturing process. The identified human has to put on a special suit with attached markers and can move only within narrow bounded region monitored by the mocap cameras. However there is one great advantage of the motion capture, it is the precision of measurements. It minimizes the influence of capturing errors and allows to discover the most remarkable features of the human gait. Thus, using motion capture in the development phase of the human identification system is reasonable. It makes it possible to focus on evaluating individuality of the motion features and just after that to do work on detecting only those features form the 2D images.
3 Related work
It is believed that a human is able to recognize people by gait. The experiment presented in [6], only partially confirms the thesis. The gait was captured for the group of six students who knew each other. The capture form was a moving light display of human silhouettes. Afterwards, students tried to identify randomly presented gait. Their performance was only 38%, which is twice better than guessing, but still very poor. A human is probably able to recognize only some characteristic
gaits, which differ strongly. In general, the number of gait features is too great to notice effectively their low variations by human.
Gait identification methods can be divided into two categories: model based and motion or appearance based ones. In the motion based approaches we have only the outline of a human extracted from 2D image called a silhouette. In [9] modified ICA is applied to skeletons of extracted silhouettes by background subtraction to represent the original gait features from a high dimensional measurement space to a low-dimensional Eigenspace and L2 Norm is used to compare the transformed gaits. Similar approach is proposed by [10], but is based on the PCA reductions technique instead of ICA. In [11] recognition is performed by temporal correlation of silhouettes. To track silhouettes we can use optical flow methods or calculate the special images - motion energy and motion history [5].
In model based approaches we have defined model of the observed human and capture their configurations in the following moments. The above mentioned ASF/AMC format assumes given skeleton model. There are many proposed methods to estimate model directly from 2D images. In [7] the authors use the particle swarm optimization algorithm to find optimal configuration of particles, corresponding to the model parts, which match the image in the best way.
In [12] time sequences of all model configuration parameters are transformed into frequency domain and the first two Fourier components are chosen. Finally, such a description is reduced by PCA method.
The comparison of time sequence, directly applicable to the sequence of motion frames can be performed by dynamic time warping [13]. It requires robust method of calculating the similarity between motion frames. The authors of the [14] propose 3D cloud point distance measure. First they build cloud points for compared frames and their temporal context. Further, they find global transition to match both clouds and finally calculate the sum of distances corresponding points of matched clouds. For the configuration coded by the unit quaternions, the distance can be evaluated as sum of quaternion distances. In [15] the frame distance is total weighted sum of quaternion distances because the influence of transformations can differ - the differences depend on the joints. Binary relational motion features proposed in [8] and [13] give the new opportunity of motion description. Binary relational feature is enabled if given joints and bones are in the defined relation, for example the left knee is behind the right knee or the right ankle is higher than the left knee. However, it is very difficult to prepare a single set of features which is applicable to the recognition of every gait. Features are usually dedicated to specialized detections and because of their relatively easy interpretation. We can generate large features vectors from generic features set proposed by [8], but because of the difficulty in pointing significant features, it leads to long pose description and redundant data.
We have not found comprehensive study based on the features calculated by the precise motion capture system and relatively easy to be extracted from 2D video recordings, which tries to evaluate most remarkable features. This is what we have planned to do.
4 Collected database of human gaits
We have used PJWSTK laboratory with Vicon motion capture system [1] to acquire human gaits. We have collected database of 353 gaits coming from 25 different males at the age of 20 to 35 years old. We have specified the gait route, the straight line of the 5 meters long. The acquiring process started and ended with T-letter pose type because of the requirement of the Vicon calibration process. Example collected gait is presented in Fig. 1.
Fig. 1. Example gait
The actor walks alongside the Z axis, Y axis has default orientation - up and down and perpendicular X axis registers slight hesitations outside the specified route. We have defined two motion types: slow gait and fast gait, without strict rules for the actors. Slow and fast gait have been interpreted individually. A typical slow gait usually lasts up to 5 seconds and contains several steps; fast gait usually lasts up to 4 seconds. The motions are stored in ASF/AMC format.
The gait path can be defined as the time sequence of three dimensional coordinates of the path: 3
)
,
,
(
]
:
1
[
:
T
X
Y
Z
R
P
→
⊂
(1)It can be estimated by the location of the root element of the ASF/AMC frames which points lower end of the spine. Two example motions of different actors with plotted line of the root positions are presented in Fig. 2. As we can notice the root position strongly depends on the height of the actor, exactly the length of his legs. In such a case identification based on this path would strongly depend on the actors heights instead of only the gate path. To minimize the influence of the actors height we can apply simple transformation of the path by translating them relatively to the first motion frame. `
)
1
(
P
P
P
Translated=
−
. (2)In fact it can be done only for the Y attribute, but translating in the same way X and Z attributes results in independence of the gait path on the position of the captured gait in the global coordinate system.
Another way to reduce dependency of the gait path of the height of the actors and the location of the gait is normalizing the attributes to the specified range. It can be done in the linear way, the transformation for the default range (0,1) is presented below:
−
−
−
−
−
−
=
min max min min max min min max min,
,
Z
Z
Z
Z
Y
Y
Y
Y
X
X
X
X
P
scaled (3)where Xmin, Ximax, Ymin, Yimax, Zmin, Zimax are respectively minimum and maximum
values of the X, Y and Z attributes in the given motion path.
It seems to work better than translation. Despite the global location of the path, the actor’s height has an impact on the path variations. In contrast to the normalization, translating them relatively to the specified frame does not reduce such a dependency. What is more, common range of the path makes them undistinguishable as regards the path length, which is the result of the time of capturing process.
Probably more common way to obtain the gait path is tracking the movements of feet. In such a case we have to transform pose representation from kinematic chain of the ASF/AMC format to the cloud points and take proper point of each frame. It is disputable whether to choose the left or the right foot. To take into consideration both of them we can calculate a midpoint between them - we will call such a path center foot path.
Fig. 2. Example collected gaits with plotted gait paths
In Fig. 2 we have presented two example gaits of different actors and plotted their gaits paths. The first one contains raw root paths, the second root paths with translation of the Y attribute relative to the first frame, the third left and right foot paths and the fourth center foot path.
. Fig. 3. Main cycle detection
As it has been described above, each motion starts with the T pose type, which contains some individual features of the actors. They could be obtained on the basis of the individual abilities to stand in a static pose: keeping the right angle between the
spine and the hands and differences between hands, slight movements of the hands, straightness of the hands and legs, distance between feet and many others. However, the T pose type is not natural pose during a typical gait. Thus, gait identification using typically absent features of the pose would artificially improve the results.
a) b)
c) d)
e) f)
Fig. 4. Example gait paths for five randomly selected actors. a) raw root paths, b) root paths translated relatively to the first frame, c) raw left feet paths, d) scaled to default range left feet paths, e) left feet trajectories of the Y attribute with scaling to the default range, f) left feet trajectories of the Z attribute with scaling to the default range
That is why we have prepared a special filter for detecting the main cycle of the gait. The gait can be represented as a repeated sequence of the steps with the left and right legs. The steps of the given legs are almost identical, hence we can calculate
global gait features based only on two adjacent steps. To detect the following steps it is sufficient to track distances between two feet and analyze the extremes. The longest distance takes place when a current step is finishing and the next is starting. The shortest distance points the middle phase of the step.
In Fig.3 we have visualized process of the main cycle detection which contains two adjacent steps, for a randomly chosen motion. The left chart presents the distances between the right and the left legs for the following motion frames. The right figure shows the analyzed motion with the main cycle labeled by the green line.
There is one more issue to consider in the main cycle detection. To directly compare main cycles of the motions, they should start with the step of the same leg. It means we should choose the proper minimum of the legs distances. If we assume that the first step should start with left leg in the front and the right in the back, we have to remove those minimums for which the left leg is closer to the starting point than the right leg.
In Fig. 4 we have presented fifteen, randomly chosen, different gait paths of five actors. Paths of a single actor are labeled by the same color. The first chart presents raw ASF/AMC root paths. We can notice remarkable boundaries between actors, especially for the Y coordinates and a little bit less for X coordinates. It means that actors have different heights and walked in slightly different places. The second chart presents root paths after translating all attributes relatively to the first frame. The differences are much less clear in comparison to the case without translation. The height of the actors does not have such an impact on the position of the feet, hence we can easily notice differences only for the X coordinate, similar to the root paths. It is difficult to state simple, general rules to recognize the actors for the paths with the proposed filtering applied. For the trajectories of Y attribute we can notice loops which reflect the following steps. For Z attribute there are no loops, because actors are moving alongside Z axis, but the T pose can be easily detected.
5 Experiments, results and conclusions
On the basis of the gait paths we have tried to identify actors. In the experiment we have chosen paths for the following body parts:
•root
•left, right and center foot •head
•left and right hand
The root and the feet paths seem to most obviously estimate the way of the human gait, which should have some individual features. The reason of testing the head paths is relative simplicity of their detection from the 2D video images. The extraction of the hands from the video images also does not seem to be very complicated, and what is more, we expected that their movements could give some information useful in the identification task.
Head and hands paths are detected in the same way as the feet paths. They are obtained from cloud points representations by choosing proper points.
We have generated new paths by cutting the motion to the main cycle window and in the next stage by applying previously described filters: translation relatively to the first frame and linear scaling each attribute to the default range (0,1). We have taken into consideration all the combinations.
The complexity of the problem and difficulty to propose general rules to identify the above presented gait paths, has inclined us to choose the supervised machine learning techniques. The crucial problem was to prepare a proper set of features describing each motion which will be able to separate different actors. We have proposed four different approaches:
•Statistical •Histogram •Fourier transform •TimeLine
In the statistical approach we calculate mean values and variances of each pose attribute. In the histogram based one, we build separate histogram for each attribute with different number of bins: five, ten, twenty, fifty and one hundred. It means that there are five different histogram representations of every gait.
In Fourier approach we transform the motion into frequency domain and take the first twenty components with the lowest frequencies. The number of components has been chosen based on the motion reconstruction with inverse Fourier transform. Twenty components are sufficient to restore motion in the time domain without visible damages. The feature set includes the module of the complex number, which gives information of the total intensity of a given frequency and the phase that points its time shift. We had expected that Fourier transform would be useful only for the gait representation with the main cycle detection. Only in that case, the same Fourier components store similar information and are directly comparable. What is more, because of different gait speeds, we have decided to build additional representation by applying linear scaling of the time domain to the equal number of frames. That satisfies even more the direct comparability of the same Fourier components.
We have called the last approach timeline. The feature set stores information of every attribute values as time sequence. The moments in which attribute values are taken into the set are determined by the division of the motion to the given number of intervals. For the same reason as described in the previous approach, timeline feature sets are expected to be most informative for the motions with the main cycle detection. We have prepared timeline motion representation with sequence of five, ten, twenty, fifty and one hundred different time moments.
In the statistical approach we have calculated the velocities and accelerations across the paths and included them in the feature set in the same way as coordinate values. Thus, statistical feature set contains mean values and variances calculated for the coordinates of the paths, velocities and accelerations. As described below, the results with included velocities and accelerations were promising, much better than without them. That is why, we have repeated tests with velocities and acceleration added to the Fourier and timeline approaches. Once again they have been treated similarly as coordinates. We have calculated Fourier components for them and taken their temporal values in the following moments.
The number of features depends strongly on the proposed approach. The entire motion is described by seven separate three dimensional gait paths and each path
could be divided into three time sequences: coordinates, velocities and accelerations. For the statistical approach, each dimension of the path is described by means and variances, which gives 126 features. In histogram based, which has no velocities and accelerations, there is 105 features for the five bins histograms and 2100 for one hundred bins. Fourier sets contain 2520 features, and the number of timeline features is in the range (315, 6300), depending on the number of time moments.
It seems that we do not need such a great number of features to identify actors. It concerns especially the Fourier and timeline datasets. Some of the features are probably useless and cause noise, which usually worsens classification results. What is more important, such a huge feature set does not allow to evaluate them. To verify the hypothesis of useless features and to discover the most remarkable ones, we have prepared feature selection scenarios, separately for every dataset type. After applying selection we have repeated classification and analyzed the results. At the current stage, we have not used automatic selection techniques [2] based on the attribute subset evaluation because of the complexity of the problem. The attribute rankings methods [2] appear too naive to achieve the task. What is more, manual selections allows us to obtain clearer results.
The selection scenarios we have prepared, are following. In all cases we have selected every combination of attributes associated with:
•axes of the global coordinate system: X, Y and Z,
•gait paths: root, left foot, right foot, center foot, left hand, right hand, head. •Position, velocity and acceleration.
For the statistical datasets we have made additional combinations by selecting means and variances and for Fourier datasets we have limited the number of Fourier components and selected modules and phases of the complex numbers.
The number of experiments to execute was very large. Thus, we could not apply slow teaching and testing classifiers. In the introductory step we have used two statistical classifiers:
•k Nearest Neighbour [3] •Naive Bayes [4]
For the nearest neighbors classifier we have applied different number of analyzed nearest neighbors ranging from 1 to 10. In the Naive Bayes we have used normal distribution of the attributes and distribution estimated by a kernel based method.
We have tested every combination of the preprocessing filters applied, features set calculation approaches with their all features selection scenarios, classifiers and their parameters. It gives almost three millions of different experiments made and because of applied leave-one-out method [2] for splitting the dataset into train and test part, over one billion training cycles and tests.
In Fig. 5,6,7,8 and 9 we have visualized aggregated classification results. We have calculated classifier efficiencies in the meaning of percentage of correctly identified gaits. In the aggregation we have chosen the highest efficiency from the experiments performed for the specified approaches and attributes.
Fig 5 Total classification results
Fig 6 Classification results for normalized datasets with translation relatively to the first frame and linear scaling of all attributes to the default range (0,1)
Fig 7 Evaluation of coordinates values, velocity and acceleration attributes
Fig 9 Evaluation of Fourier components
For the raw paths the most informative are hands paths, a little bit worse are feet. What is surprising, the root path which stores information of the actor’s height is less informative than feet. The best total efficiency is 96.6%, achieved by timeline approach with 50 time points and the main cycle detection. The main cycle detection, not only makes the results more reliable, but also improves them noticeably. It is observed, as we expected, for the timeline and Fourier approaches, which have obtained the highest efficiencies.
The normalization of the paths which removes the data of actor’s height and gait location, worsens the results. For stronger normalization with attributes scaling best efficiency is 93,5% and for the weaker with translation is 94,3%. Opposite to the previous case, slightly better is Fourier approach than timeline and similarly, statistical and histogram are the worst ones. The normalization has caused significant loss of information by the root, hands and head paths.
In the evaluation of the attributes and Fourier components presented in Fig. 7,8,9 we have taken into consideration only the reliable normalized paths by attributes scaling with the main cycle detection.
There is another surprising observation - the velocities and the accelerations contain more individual data than the coordinate values. It is particularly noticeable for the root and feet paths. We can conclude that more important is how energetic the movements are rather than what is their shape. The reason for not repeating the tests for histogram approach with velocities and accelerations were the preliminary tests with only root paths. Histogram approach has obtained much lower efficiencies than Fourier and timeline ones and we have regarded it as less promising. On the basis of Fourier components calculated for the coordinates values, we can reconstruct the entire sequence of original paths, which means that they contain indirect information about velocities and accelerations. However, the knowledge is hidden and simple classifiers applied were not able to explore it. It was necessary to add direct features representing velocities and acceleration to improve the results.
As we have expected, the most informative are directions of the Z and Y axes, pointing the main directions of the gait and up-down direction. It means that the actor should be observed from the side view. Despite quite good quality, which is sufficient for the 80% efficiency, X attributes contain some noise. Adding X attributes to Y and Z ones, it worsens the results in most cases.
The most informative are feet paths, except for the cases of not normalized paths, which prefers including height data and hands paths. Unfortunately, head paths, which can be relatively easy to extract from 2D video recordings, have obtained the
worst results. The second worst are the root paths. Root and head paths are static and reflect general gait path, in contrast to feet and hands paths, which have greater variations. This implies easier extraction of individual features. What is more, feet and hands paths contain data of the step’s length, height of feet lifting and hands waving which are surely individual.
The best result for normalized paths with linear scaling attributes values has obtained Fourier approach with 5 components of Y and Z directions and feet, root and hands paths with complete description: coordinates values, velocities and accelerations. It is 93,5% of classifier efficiency, which means that we have misclassified 23 motions from the set of 353. Substituting root path with head path causes only one mistake more and removing root path, five mistakes more.
The individual features are not concentrated in a single path, but they are dispersed in the movements of different body parts. It is required to track more details to achieve high accuracy. That probably explains the above mentioned the difficulty for a human to recognize gait.
For the best discovered feature set of normalized paths we have tested more sophisticated functional classifier, with greater computational costs, a multilayer perceptron [2]. We have iterated tests for different network structure complexities, learning rates and learning cycles. The multilayer perceptron has improved the classification twice. It has 96,9% of classifier efficiency, which means only 11 mistakes out of 353 tests.
References
1.http://hm.pjwstk.edu.pl: Webpage of PJWSTK Human Motion Group
2. Witten I., Frank E.: Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2005
3. Aha D., Kibler D,: Instance-based learning algorithms. Machine Learning. (1991)
4. George H. John, Pat Langley: Estimating Continuous Distributions in Bayesian Classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, 338-345, 1995.
5. Boyd J.E. Little J.J. Biometric Gait Identification, Lecture Notes in Computer Science 3161 Springer 2005
6. Cutting, J.E., Kozlowski, L.T.: Recognizing friends by their walk: gait perception without familiarity cues. Bulletin of the Psychonomic Society, 1977
7. Krzeszowski T., Kwolek B/, Wojciechowski K: Articulated Body Motion Tracking by Combined Particle Swarm Optimization and Particle Filtering, ICCVG 2010, Lecture Notes in Computer Science, Springer Verlag, 2010
8. Muller M., Roder T.: 00 A Relational Approach to Content-based Analysis of Motion Capture Data. Vol. 36 of Computational Imaging and Vision, ch. 20, 477-506, 2007. 9. M. Pushpa Rani1 and G.Arumugamz, An Efficient Gait Recognition System For Human
Identification Using Modified ICA, International Journal of Computer Science and Information Technology, vol. 2, no. 1, 2010
10. Liang W., Tieniu T., , Huazhong N., and Weiming H., Silhouette Analysis-Based Gait Recognition for Human Identification, IEEE Transactions on Pattern Analysis and Machine Intelligence vol. 25, no. 12, 2003
11. Sarkar S., Phillips J., Liu Z., Vega I. R. Grother P., Bowyer K., The HumanID Gait Challenge Problem:Data Sets, Performance, and Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol;. 27, no. 2, 2005
12. Zonghua Zhang,Nikolaus F Troje:, View-independent person identification from human gait, Neurocomputing 69, 2005
13. Roder T.:Similarity, Retrieval, and Classification of Motion Capture Data. PhD thesis, Massachusetts Institute of Technology, 2006
14. Kovar L., Gleicher M., Pighin F.. Motion graphs. ACM, Trans. Graph., 2002
15. Johnson M. Exploiting Quaternions to Support Expressive Interactive Character Motion. PhD thesis, Massachusetts Institute of Technology, 2003
Acknowledgement
This paper has been supported by the project ,,System with a library of modules for advanced analysis and an interactive synthesis of human motion'' co-financed by the European Regional Development Fund under the Innovative Economy Operational Programme - Priority Axis 1. Research and development of modern technologies, measure 1.3.1 Development projects.