Evaluating the robustness of machine learning algorithm on Human Activity Recognition
Federal University of Pernambuco [email protected] www.cin.ufpe.br/~secgrad
Recife 2019
Roberto Costa Fernandes
Evaluating the robustness of machine learning algorithm on Human Activity Recognition
A B.Sc. Dissertation presented to the Center of Informatics of Federal University of Pernambuco in partial fulfillment of the requirements for the degree of Bachelor in Computer Engineering.
Concentration Area: Machine Learning Advisor: George Darmiton da Cunha Cavalcanti
Recife 2019
I would like to thank all my family, who guided me through all my life, leading to who I am now and always doing all to give me all the best. Also, I would like to thank especially my sisters and my parents to always put me and my education in the first place, and to provide me with the best home education and school education.
I would like to thank my advisor Prof. George Darmiton, who led me and motivated me in my worse days without his support, it was not possible to make this work. I also would like to thank my friend Walber Macedo, who helped with his knowledge and who listen and helped me to take some projects decisions.
I also would like to thank all my friends who always supported me and helped me through the university years. I would like to thank especially Prof. Edna Barros and all my work partners from RobôCIn, who became my friends and who teach me to become a better person and a better work partner. Also, I would like to thank my friends Lucas Cavalcanti, Renato Sousa, Caio Lima, Geovanny Lucas, and all others that were part of my journey.
ABSTRACT
Physical inactivity can cause several diseases, like diabetes, obesity, hypertension, car- diovascular diseases, depression, osteoporosis, and many other physical problems. To diagnose physical inactivity and avoid this kind of problems, it is needed to monitor the movements performed by someone and classify them as one of the Activities of Daily Living, like walking, laying down, standing, and other activities. In this context, signals from smartphone sensors can be used as input to machine learning models to perform such classification. The present work has the objective to bring new results to the classification phase using Multiple Classifier Systems. It is used two public data sets with data collected from smartphone triaxial sensors:
gyroscope and accelerometer. Thus, this work uses these two data sets to compare monolithic classifiers with Multiple Classifier Systems using static and dynamic selection, using metrics like accuracy, precision, recall, and F1-score. For one of the data sets the best classifier is K-Nearest-Oracle-Union (KNORA-U) achieving an accuracy of 95.62% and for the other data set is Random Forest with an accuracy of 88.04%. It was not possible to determine which is the best classifier to use on Human Activity Recognition, because the monolithic classifiers and Multi-Classifier System has similar results.
A falta de atividade física pode causar diversas doenças, como diabetes, obesidade, hipertensão, doenças cardiovasculares, depressão, osteoporoses e diversos de outros proble- mas físicos. Para diagnosticar a inatividade e evitar tais problemas é necessário monitorar as atividades feitas por uma pessoa e classificá-las em atividades do dia-a-dia, como andar, deitar, ficar em pé e outros. Nesse contexto, sinais de sensores presentes nos celulares po- dem ser utilizadas como entrada em modelos de aprendizagem de máquina para classificar tais atividades. O presente trabalho tem como objetivo apresentar novos resultados para a fase de classificação utilizando multi-classificadores. Foram utilizadas duas bases de dados públicas com dados coletados dos sensores triaxiais de celulares: giroscópio e acelerômetro. Para isso, esse trabalho utiliza esses duas bases de dados para comparar classificadores monolíticos com multi-classificadores utilizando seleção estática e seleção dinâmica, utilizando métricas, como acurácia, precisão, revocação e medida F1. Para uma dessas bases o melhor classificador foi o KNORA-U atingindo uma acurácia de 95,62% e para a outra base o melhor classificador foi Random Forest com uma acurácia de 88,04%. Não foi possível determinar qual tipo de modelo é melhor para Reconhecimento de Atividades Humanas, pois os classificadores monolíticos e os multi-classificadores tiveram resultados parecidos.
LIST OF FIGURES
Figure 1 – Monolithic train architecture . . . 16
Figure 2 – Monolithic test architecture . . . 16
Figure 3 – MCS train architecture . . . 17
Figure 4 – MCS test architecture . . . 17
Table 1 – Protocol followed by (Anguita et al.,2013) to acquire data . . . 13
Table 2 – Signals obtained to create data set (Anguita et al.,2013) . . . 13
Table 3 – Features extracted for signals (Anguita et al.,2013) . . . 14
Table 4 – Data Sets characteristics . . . 18
Table 5 – Class distribution on Data Set 1. Inside of parenthesis are the percentage of instances from a class in the train and the test set . . . 19
Table 6 – Class distribution on Data Set 2. Inside of parenthesis are the percentage of instances from a class in the train and the test set . . . 19
Table 7 – Monolithic classifiers parameters . . . 22
Table 8 – Results obtained using raw data of test set from Data Set 1. The best result for each metric is highlighted . . . 22
Table 9 – Results obtained using raw data of test set from Data Set 2. The best result for each metric is highlighted . . . 23
Table 10 – Results obtained using normalized data with L2 distance of test set from Data Set 1. The best result for each metric is highlighted. . . 23
Table 11 – Results obtained using normalized data with L2 distance of test set from Data Set 2. The best result for each metric is highlighted. . . 23
Table 12 – Results obtained using normalized data with MinMaxScaler of test set from Data Set 1. The best result for each metric is highlighted. . . 24
Table 13 – Results obtained using normalized data with MinMaxScaler of test set from Data Set 2. The best result for each metric is highlighted. . . 24
Table 14 – Parameters used in the main classifier . . . 25
Table 15 – Parameters used in MCS . . . 25
Table 16 – Results from MCS using Perceptron as main classifier of test set from Data Set 1. The best result for each metric is highlighted . . . 25
Table 17 – Results from MCS using Perceptron as main classifier of test set from Data Set 2. The best result for each metric is highlighted . . . 26
Table 18 – Results from MCS using Decision Tree as main classifier of test set from Data Set 1. The best result for each metric is highlighted . . . 26
Table 19 – Results from MCS using Decision Tree as main classifier of test set from Data Set 2. The best result for each metric is highlighted . . . 26
Table 20 – Best results from monolithic and MCS for Data Set 1 . . . 27
Table 21 – Best results from monolithic and MCS for Data Set 2 . . . 27 Table 22 – Best results for Data Set 1 and results obtain by (Anguita et al.,2013) . 27
LIST OF ACRONYMS
AAL Ambient Assisted Living
ADL Activity of Daily Living
DCS Dynamic Classifier Selection
DES Dynamic Ensemble Selection
DSP Digital Signal Processing
HAR Human Activity Recognition
KNN k-Nearest Neighbor
KNORA-E K-Nearest-Oracle-Eliminate
KNORA-U K-Nearest-Oracle-Union
MCS Multiple Classifier Systems
MLP Multi-Layer Perceptron
NB Naive Bayes
OLA Overall Local Accuracy
SVM Support Vector Machine
UCI University Of California Irvine
WHO World Health Organization
1 INTRODUCTION . . . 9
1.1 MOTIVATION . . . 9
1.2 OBJECTIVES . . . 10
1.3 WORK STRUCTURE . . . 10
2 BACKGROUND . . . 12
2.1 DATA ACQUISITION . . . 12
2.2 MULTIPLE CLASSIFIER SYSTEMS . . . 14
3 SYSTEM ARCHITECTURE . . . 16
4 EXPERIMENTS . . . 18
4.1 DATABASE . . . 18
4.2 EXPERIMENTS METHODOLOGY . . . 19
4.3 RESULTS . . . 21
4.3.1 Monolithic Classifier . . . 21
4.3.2 Multiple Classifier Systems . . . 24
4.4 FINAL CONSIDERATIONS . . . 27
5 CONCLUSION AND FUTURE WORKS . . . 28
REFERENCES . . . 30
9 9 9
1
INTRODUCTION
1.1 MOTIVATION
The World Health Organization (WHO) classifies physical inactivity as the fourth most significant risk factor for global mortality (World Health Organization,2010). This risk emerges because physical inactivity can cause some health problems (Warburton et al., 2006), like cardiovascular diseases, diabetes, cancer, hypertension, obesity, depression, and osteoporosis.
One of the most important action to treat this sedentary behavior is to identify if the amount of activity performed by a person reach the time of active time recommended by the WHO (World Health Organization,2010).
Human Activity Recognition (HAR) is the research field that intends to identify the activities performed by someone, and it is the main task is to classify the Activity of Daily Living (ADL). The ADL can be defined as the activities performed by one person during a day, such as walking, sitting, standing, laying down, walking upstairs or walking downstairs. The HAR process can be divided into two parts, first, it must acquire data from the person, to secondly be able to analyze and classify the ADL.
There are two different ways to execute the first step of HAR. The first one is to use cameras and other sensors to identify activities on a well-defined environment. Although this approach has the downside of being used only in a controlled area, can detect a much wide range of ADL (Ni et al.,2011). The second approach is to place sensors in different positions in the body (Lukowicz et al.,2004) (Karantonis et al.,2006) (Ravi et al.,2005), like wrist, waist, and chest, also known as body-worn sensors. This method solves the problem of the first approach, but it cannot be used as a long-term solution because it is not always comfortable to wear a sensor daily — the needed for recalibrating the sensors every time after dressing is another problem.
The rising popularity of smartphones in recent years is creating a new method to acquire data to use on HAR. This demand is occurring because these devices already have a triaxial accelerometer and gyroscope built in, making this approach very similar with body-worn sensors, with the advantage of almost everyone, is carry one smartphone on their ADL. Some works are emerging proposing using that methodology to obtain data (Wang et al., 2016) (Jain &
Kanhangad,2017) (Anguita et al.,2013), and this direction is showing to be very promising.
These data obtained in the first part of HAR are used to identify and classify the ADL.
The first approach to classify ADL was using Digital Signal Processing (DSP) (Lukowicz et al., 2004) (Karantonis et al., 2006). One of the approaches (Lukowicz et al.,2004) using DSP combines accelerometer data with sound data to recognize workshop activity, and achieves an accuracy of 84.4%. Another technique using DSP (Karantonis et al.,2006) is based on a waist-mounted triaxial accelerometer, aiming to identify 12 activities and reaches an accuracy of 90.8%.
Although the results using only DSP can be considered good, recent works propose the combination of DSP to extract features and use those features in machine learning models to classify the ADL. One of these approaches (Jain & Kanhangad,2017) extracts features using a histogram of gradient and Fourier descriptor and then uses a Support Vector Machine (SVM) to classify achieving an accuracy of 97.12%. Other techniques extract only simple features of the signal from a sample window and use a multiclass SVM (Anguita et al.,2013), achieving 96%
of accuracy or a static ensemble of classifiers achieving an accuracy of 99.22% (Elamvazuthi et al.,2018).
Some results (Britto Jr et al.,2014) are showing that the using of Multiple Classifier Systems (MCS) with dynamic selection are presenting better results, with a significant level of α = 0.05 in Friedman’s test, when compared with static selection. None of these works consider the use of MCS with dynamic selection to HAR.
1.2 OBJECTIVES
The main objective of this work is to compare the results of using monolithic classifiers, and MCS and present which classifiers are best in using both techniques. The techniques that will be analyzed will be SVM, Naive Bayes (NB), Random Forest, k-Nearest Neighbor (KNN), Multi-Layer Perceptron (MLP), Decision Tree and some MCS.
There are three specific objectives to accomplish to achieve the main one. They are:
Train the chosen monolithic classifiers using raw data and normalized data, to analyze the impact of the normalization techniques in the data sets and choose to use it or not.
Train and test the MCS, using the chosen normalization technique, to determine which base classifier is the best.
Compare the results using some defined metrics, shown on section 4.2 and presents the best classifiers among the used classifiers.
1.3 WORK STRUCTURE
This work is divided as follow: Chapter 2 will present a background on how to acquire data for HAR and the methodology used by the author of the data set to acquire the data. Chapter
11 11 11
3 will present the proposed system architecture used in this work and the differences in the architecture using a monolithic classifier and when using MCS. Chapter 4 will present some information about the data set used in the experiments, and it will also describe the methodology of the experiments and show the obtained results. Chapter 5 will report the findings from this work and some future works.
2
BACKGROUND
HAR can be divided into two steps, the first one is data acquisition, and the second is classification, so this chapter presents some technical fundamentals for the first step and some concepts of machine learning techniques. Section 2.1 presents the technique used by (Anguita et al.,2013) to collect the data used to create the data sets. Section 2.2 shows some concepts about MCS.
2.1 DATA ACQUISITION
There are many ways to acquire data to HAR (Ni et al.,2011) (Lukowicz et al.,2004) (Anguita et al.,2013), each one has its advantage, but this work focuses on data collected by smartphone sensors. This method to collect data was chosen because of the rising ratio of smartphone per person, and this lead to an increase in the number of data to get better predictions.
Although creating one data set to use, could represent a problem because it should be needed to determine a methodology to acquire these data. So to avoid this problem, this work uses two data sets already created and available on University Of California Irvine (UCI) Machine Learning Repository (Dua & Graff,2017). One advantage of these data sets is they were created by the same author and follow the same methodology (Anguita et al.,2013).
There are many possible ADL that could be aimed to identify, but to make the classifi- cation process more manageable, the author of these data sets choose to define six basic ADL:
walking, walking downstairs, walking upstairs, sitting, standing and laying. It was used 30 volunteers to perform those activities, and data were acquired utilizing a smartphone in their pocket. It was created a specific sequence order of actions to be performed, to keep each data collection similar, table 1 shows this sequence. Also, it was given a rest time of 5 seconds between each activity, to avoid making the volunteer tired. Each volunteer had to perform this protocol one time for the smartphone in their left pocket and another one with the smartphone in their right pocket.
13 13 13
Table 1: Protocol followed by (Anguita et al.,2013) to acquire data
No. Static Time (sec) No. Dynamic Time (sec)
0 Start (Standing Pos) 0 7 Walk (1) 15
1 Stand (1) 15 8 Walk (2) 15
2 Sit (1) 15 9 Walk Downstairs (1) 12
3 Stand (2) 15 10 Walk Upstairs (1) 12
4 Lay Down (1) 15 11 Walk Downstairs (3) 12
5 Sit (2) 15 12 Walk Upstairs (2) 12
6 Lay Down (2) 15 13 Walk Downstairs (3) 12
14 Walk Upstairs (3) 12
15 Stop 0
Total 192
The linear acceleration and angular velocity data for the three-axis was acquired during the protocol from using the smartphone’s accelerometer and gyroscope, using a sample rate of 50 Hz. A median filter and a 3rd order low-pass Butterworth filter, with a cutoff frequency of 20 Hz, were used to reduce the noise from the signals. After that, another Butterworth low-pass filter was used to separate acceleration from the body motion from gravity acceleration. Thus, it was calculated the Euclidean magnitude of signals and taken the derivatives to get jerk and angular acceleration. All those signals were sampled using a sliding window of 2.56 seconds and 50% of overlap, resulting in the 17 signals shown on table 2.
Table 2: Signals obtained to create data set (Anguita et al.,2013)
Name Time Frequency
Body Acc. Yes Yes
Gravity Acc. Yes No
Body Acc. Jerk Yes Yes
Body Angular Speed Yes Yes
Body Angular Acc. Yes No
Body Acc. Mag. Yes Yes
Gavity Acc. Mag. Yes No
Body Acc. Jerk Mag. Yes Yes
Body Angular Speed Mag. Yes Yes Body Angular Acc. Mag. Yes Yes
Some of the resulting 17 signals had information from a triaxial sensor, so were divided into three components: X, Y and Z, and the remaining were only a single value, as the magnitude.
For all of those signals, were defined as a set of 17 features to extract, table 3 shows all the features extracted for these signals. This feature extraction results in 561 features available on the data set used. Also, the author already divided the data set in the training and the testing set,
given 70% of the data to the train set and the remaining data to the test set.
Table 3: Features extracted for signals (Anguita et al.,2013) Function Description
mean Mean Value std Standard deviation mad Median absolute value max Largest value in array min Smallest value in array sma Signal magnitude area energy Average sum of squares iqr Interquartile range entropy Signal entropy
arCoeff Autoregression coefficients correlation Correlation coefficient
maxFreqInd Largest Frequency Component meanFreq Frequency signal weighted average skewness Frequency signal Skewness kurtosis Frequency Signal Kurtosis energyBand Energy if a frequency interval angle Angle between two vectors
2.2 MULTIPLE CLASSIFIER SYSTEMS
In machine learning researchers are always trying to improve their models or create new ones to achieve better results, but this it is leading to over complex models, and it is showing that it is almost impossible to have one model to cover all the variability on the classification task. With this obstacle, some researchers are studying MCS, because of its advantage in being more diverse, with same or better results (Britto Jr et al.,2014). This diversity is created in MCS by training multiple simpler models, as Perceptron or Decision Tree, in different ways, to each trained model be better in one sub-problem, and all the trained models achieve a better result together.
MCS are divided in three steps (Britto Jr et al.,2014): generation, selection and inte- gration. The generation phase consists on, given the train set, generate a pool of classifiers, by training different classifiers. This phase receives two parameters: the number of base classifiers generated in the pool and the method to create the pool, that can be bagging or boosting. Bagging is a method proposed in 1996 and consists of training classifiers with different random subsets of the training set, and each classifier is trained independently (Breiman,1996). The boosting method is very similar to bagging, but instead of training each classifier independently, the result of one trained classifier is used to give weight to instances of the same class, that were not
15 15 15
well-classified (Freund et al.,1996)
The selection step can be static, that same sub-set of classifiers are chosen for all instances or can be dynamic, given the test instance, it chose those specific classifiers (Britto Jr et al., 2014). The dynamic selection chose a classifier based in some metrics, as accuracy (Woods et al.,1997), raking of classifiers (Sabourin et al.,1993) and probabilistic information (Giacinto
& Roli,1999) (Kurzynski et al.,2010). Dynamic selection can choose only one classifier, called Dynamic Classifier Selection (DCS), or an ensemble of classifiers, named Dynamic Ensemble Selection (DES). After the selection of classifiers, it is needed to combine each classification, and this phase has some strategies (Kittler et al.,1996), as Max Rule, Median Rule, or Majority Vote.
MCS also present better results than monolithic classifiers when compared using Fried- man’s test, with a significant level α = 0.05 (Britto Jr et al.,2014). Although MCS has the advantage of being simpler and diverse, its training phase is computationally slower than the monolithic classifiers, because more models have to train.
3
SYSTEM ARCHITECTURE
The proposed work aims to analyze the use of two types of classifiers, monolithic and MCS, and it has to be used a different architecture to train and test each one. In the monolithic classifiers to train each model it is needed only the train set and the parameters for the model, after training the model it returns the trained function, figure 1 shows this architecture. In the test step, the model classifies each instance of the test set using the trained function, which returns the class for each example, figure 2 shows the test architecture for the monolithic classifiers. For the monolithic classifiers, it has to be defined some parameters for each model, that are different for each one.
Figure 1: Monolithic train architecture
Figure 2: Monolithic test architecture
The MCS consists of using an ensemble of classifiers and aims to use simpler classifiers as the base on the pool, to achieve better results. Its architecture is different from the monolithic ones, in both train and test step. The MCS are divided into three phases (Britto Jr et al.,2014):
generation, selection, and integration, the first one is part of the training step, and the remaining
17 17 17
two are part of the test step. Figure 3 shows an overview of the train architecture on MCS, that aims to train a pool of classifiers given the train set.
Figure 3: MCS train architecture
The test step consists of, given one instance of the test set and the pool of classifiers already trained, select one or an ensemble of these classifiers, classify this instance, and then combine the results. Figure 4 shows the selection and integration on the MCS test architecture.
Figure 4: MCS test architecture
For the MCS, besides the parameter for the base classifiers, it has to choose other parameters and methods for each one of the phase. As this works aims to analyze the different result for different methods for the selection phase, the settings for the generation and the integration phase are the same for all the experiments.
4
EXPERIMENTS
This chapter aims to describe the methodology of the experiments to create ground truth results for the data sets. This chapter also discusses some decision taken for further experiments, such as, the use or not of normalization techniques, the value of some parameters of classifiers methods and compare the result of monolithic classifiers and multiple classifier systems.
This chapter is divided as follow: Section 4.1 describes the data sets used in those experiments, Section 4.2 explains the experiment methodology, and also the used classifiers, Section 4.3 presents the results obtained with those experiments and Section 4.4 shows some conclusions derived from these experiments.
4.1 DATABASE
It was chosen to use two data sets of HAR available on the UCI Machine Learning Repository (Dua & Graff,2017). The first one is HAR Using Smartphones Data Set (Anguita et al.,2013), and for simplicity, it is named Data Set 1 in the rest of this work. The second one is the Smartphone Data Set for HAR in Ambient Assisted Living (AAL) and is called Data Set 2.
Table 4: Data Sets characteristics
Data Set 1 Data Set 2
Number of classes 6 6
Number of attributes 561 561 Number of instances- Train set 7352 4252
Number of instances - Test set 2947 1492 Number of instances - Total 10299 5744
The biggest reason to select these two data set instead of other public data sets, it is that the same author created them, and they are complementary to each other, this brings two significant advantages. First, they followed the same methodology to be created, and second, they have the same number of classes and the same number of attributes. Although they do not have the same amount of instances, Data Set 1 is bigger that Data Set 2, but there is not an
19 19 19
intersection between the data sets. The author of these data sets divided it into train set (70% of total data), and test set (30% of total data), Table 4 show the number of attributes and how many instances are in train and test set.
Another aspect observed in these data sets is that the six classes, they have almost the same number of instances so they can be considered balanced. Besides all the classes are well divided in train and test set, keeping the proportion of 70% in the train set and 30% in the test set.
So, it did not need to use a strategy to make the data sets balanced. Table 5 and Table 6 shows the distribution of instances in each class.
Table 5: Class distribution on Data Set 1. Inside of parenthesis are the percentage of instances from a class in the train and the test set
Class Instances on
Train Set
Instances on Test Set
Instances on Data Set 1
WALKING 1226 (71.20%) 496 (28.80%) 1722
WALKING_UPSTAIRS 1073 (69.50%) 471 (30.50%) 1544 WALKING_DOWNSTAIRS 986 (70.13%) 420 (29.87%) 1406
SITTING 1286 (72.37%) 491 (27.63%) 1777
STANDING 1374 (72.09%) 532 (27.91%) 1906
LAYING 1407 (72.28%) 537 (27.62%) 1944
Table 6: Class distribution on Data Set 2. Inside of parenthesis are the percentage of instances from a class in the train and the test set
Class Instances on
Train Set
Instances on Test Set
Instances on Data Set 2
WALKING 769 (75.99%) 243 (24.01%) 1012
WALKING_UPSTAIRS 629 (73.31%) 229 (26.69%) 858 WALKING_DOWNSTAIRS 691 (74.30%) 239 (25.70%) 930
SITTING 834 (74.27%) 289 (25.73%) 1123
STANDING 775 (75.32%) 254 (24.68%) 1029
LAYING 554 (70.00%) 238 (30.00%) 792
4.2 EXPERIMENTS METHODOLOGY
Some works presenting good results to HAR (Anguita et al.,2013) (Jain & Kanhangad, 2017) (Elamvazuthi et al.,2018), but they use different experiment methodology, so to be able to compare our results with results presented in the literature we had to choose one of the used methods. As said before the data sets are already divided into train and test, so, there was no
need to use any technique to distribute the data set. Therefore, for each model and each data set, the model had to be retrained and tested with respectively train and test set.
After defining the procedures for the experiments, it needed to choose the classifiers to be evaluated. It was chosen two different types of classifiers: Monolithic Classifiers and MCS.
Monolithic Classifiers use only one model to classify the instance, those classifiers, are simpler, and it is only needed to define its parameters. In the MCS, a pool of classifiers are trained, and for, given an instance, it decides if one or an ensemble of those classifiers are used to categorize it. In some areas, MCS achieves better results than monolithic classifiers, but they are a bit more complex to train and has more parameters to be defined. Monolithic Classifiers models used was SVM, Decision Tree, Bernoulli NB, Gaussian NB, KNN, Random Forest and MLP.
The MCS models use monolithic classifiers as their main classifier, and the different methods to choose those classifiers, that can be static or dynamic. The statics MCS use the same classifier for all patterns. In this work was used two static MCS models: Static Selection and Single Best. The dynamic models select a specific classifier for a given pattern, and can be divided in two class, the DCS, that choose the best classifier in a poll of classifiers, and DES, that use a combination of those classifiers. The DCS models used was Overall Local Accuracy (OLA) and the DES models used was K-Nearest-Oracle-Eliminate (KNORA-E), KNORA-U and META-DES. All of these models use an ensemble of simpler classifiers, like Perceptron or Decision Tree as their main classifier, to achieve a better result than complex classifiers. It was performed experiments with all MCS using both classifiers as its main.
One technique widely used to try to obtain better results is normalizing the data. Although other works (Anguita et al., 2013) do not use any normalization technique, in this work it was tested two normalization techniques, Normalizer and MinMaxScaler, and also, without normalization. The first experiment using monolithic classifiers was performed to choose which normalization technique to use in further tests, so raw data and those two normalization techniques were used.
Normalizer uses the concept of norm one vector to scale one attribute. Equation 4.1 defines the norm of a feature vector fff of size n, and it is computed as the square root of the sum of square of each one of the n feature f . Thus, this technique normalizes each attribute to have a unit norm.
|| fff ||2= s n
∑
i
fi2
= q
f12+ ... + fn2 4.1 MinMaxScaler uses the min-max technique (Han et al.,2011) to normalize the attributes.
This technique maps the value of one feature in a range between 0 and 1, where 0 is the minimum value that feature can assume, and 1 is the maximum value of that feature. Equation 4.2 shows the equation used to map features values, f0is the value of the feature after the normalization, f is the original value of the feature, minA is the minimum value of that feature and maxA is its maximum value.
21 21 21
f0= f− minA maxA− minA
4.2
It had to be defined as some metrics to be possible to compare each model using each data set. So, to evaluate those models and create a ground truth to thus be able to compare with noisy label data, accuracy (equation 4.3), precision (equation 4.4), recall (equation 4.5) and F1 score (equation 4.6) were measured on the test set. Accuracy measures the ratio of correct, true predictions in all predictions made. Precision is the ratio of correct positive classified instances over the positive predictions made. Recall the number of correct positive classifications divided by the relevant samples. F1 score is the harmonic mean between precision and recall.
accuracy= T P+ T N T P+ T N + FP + FN
4.3
precision= T P T P+ FP
4.4
recall= T P T P+ FN
4.5
F1 =2 × recall × precision recall+ precision
4.6
4.3 RESULTS
This Section presents the results obtained in each group of classifier evaluated using accuracy, precision, recall, and F1 score. For each of one of the used classifiers, it presents all the parameters used, fixing the same random seed when the model needs it. The next sections is divided as follow, section 4.3.1 presents the parameters and results obtained for the monolithic classifiers and the comparison between normalization techniques. Section 4.3.2 shows the parameters and results obtained for the MCS and discusses which classifier is best to use as the main one.
4.3.1 Monolithic Classifier
The monolithic classifiers chosen to be evaluated with these data sets were SVM, Decision Tree, Bernoulli NB, Gaussian NB, KNN, Random Forest and MLP. The SVM uses the same parameters as previous works (Anguita et al.,2013). Table 7 shows the parameters for each one of the monolithic classifiers. The parameters chosen were the default one, it was not performed any optimization.
Table 7: Monolithic classifiers parameters
Model Parameters
SVM Radial basis kernel C = 1 random_state = 0 Decision Tree Gini criteriation spliter best random_state = 0 Bernoulli NB α = 1
Gaussian NB α = 1
KNN uniform wheight k = 3
MLP 1000 hidden neurons Adam Optimization α = 10−4 random_state = 0 Random Forest Gini criteriation 100 estimators random_state = 0
The monolithic classifiers were also used to test the normalization techniques, so for them, they were tested using raw data, data normalized using L2 distance and data normalized using MinMaxScaler. After these results, it is possible to choose to use or not a normalization technique. Table 8 shows the results without using any normalization technique on Data Set 1 and Table 9 shows the result for Data Set 2. Table 10 shows the results obtained using the L2 distance normalization technique on Data Set 1 and Table 11 shows result for Data Set 2.
Table 12 shows the results obtained from data normalized using MinMaxScaler on Data Set 1 and Table 13 shows result for Data Set 2.
Table 8: Results obtained using raw data of test set from Data Set 1. The best result for each metric is highlighted
Model Accuracy Precision Recall F1
SVM 0.9403 0.9410 0.9403 0.9401
Decision Tree 0.8595 0.8605 0.8595 0.859 Bernoulli NB 0.8500 0.8555 0.8500 0.8467 Gaussian NB 0.7703 0.7947 0.7703 0.7688
KNN 0.8907 0.8936 0.8907 0.8899
MLP 0.9477 0.9513 0.9477 0.9479
Random Forest 0.9267 0.9253 0.9251 0.9253
23 23 23 Table 9: Results obtained using raw data of test set from Data Set 2. The best result for each metric is highlighted
Model Accuracy Precision Recall F1
SVM 0.8063 0.8070 0.8063 0.8053
Decision Tree 0.8097 0.8134 0.8097 0.8102 Bernoulli NB 0.6823 0.7134 0.6823 0.6836 Gaussian NB 0.6133 0.6454 0.6133 0.6077
KNN 0.7520 0.7563 0.7520 0.7512
MLP 0.8646 0.8707 0.8646 0.8653
Random Forest 0.8804 0.8780 0.8780 0.8780
Table 10: Results obtained using normalized data with L2 distance of test set from Data Set 1.
The best result for each metric is highlighted.
Model Accuracy Precision Recall F1
SVM 0.2505 0.1242 0.3505 0.1834
Decision Tree 0.8541 0.8556 0.8541 0.8531 Bernoulli NB 0.8500 0.8555 0.8500 0.8467 Gaussian NB 0.8215 0.8425 0.8215 0.8144
KNN 0.8907 0.8936 0.8907 0.8897
MLP 0.9555 0.9573 0.9555 0.9555
Random Forest 0.9345 0.9338 0.9335 0.9338
Table 11: Results obtained using normalized data with L2 distance of test set from Data Set 2.
The best result for each metric is highlighted.
Model Accuracy Precision Recall F1
SVM 0.2426 0.1155 0.2426 0.1279
Decision Tree 0.7976 0.8035 0.7976 0.7981 Bernoulli NB 0.6823 0.7134 0.6823 0.6836 Gaussian NB 0.6294 0.6742 0.6294 0.6278
KNN 0.7507 0.7527 0.7507 0.7497
MLP 0.8740 0.8757 0.8740 0.8740
Random Forest 0.8723 0.8686 0.8684 0.8686
Table 12: Results obtained using normalized data with MinMaxScaler of test set from Data Set 1.
The best result for each metric is highlighted.
Model Accuracy Precision Recall F1
SVM 0.9050 0.9089 0.9050 0.9039
Decision Tree 0.8595 0.8605 0.8595 0.8590 Bernoulli NB 0.6288 0.6319 0.6288 0.6054 Gaussian NB 0.7703 0.7947 0.7703 0.7688
KNN 0.8907 0.8936 0.8907 0.8899
MLP 0.9620 0.9626 0.9620 0.9619
Random Forest 0.9269 0.9257 0.9253 0.9257
Table 13: Results obtained using normalized data with MinMaxScaler of test set from Data Set 2.
The best result for each metric is highlighted.
Model Accuracy Precision Recall F1
SVM 0.7621 0.7631 0.7621 0.7592
Decision Tree 0.8097 0.8134 0.8097 0.8102 Bernoulli NB 0.4638 0.4690 0.4638 0.4381 Gaussian NB 0.6133 0.6454 0.6133 0.6077
KNN 0.7554 0.7620 0.7554 0.7551
MLP 0.8613 0.8646 0.8613 0.8617
Random Forest 0.8804 0.8780 0.8780 0.8780
It was clear that, for Data Set 1, MLP is the best monolithic classifier to be used with raw data or data normalized using any of normalization technique. For Data Set 2 the best technique to use with raw data and using MinMaxScaler is Random Forest and for data normalized using L2 distance is MLP With the obtained results, it is possible to see that most of the classifiers achieve the same result as using or not a normalization technique. Besides, in some case, such as SVM using L2 normalization or Bernoulli NB using MinMaxScaler obtained results were worst when compared to the ones using raw data. Thus, it shows that normalize the data does not bring any advantage and in some case, gets the worst result, so in further experiments, it was used only raw data.
4.3.2 Multiple Classifier Systems
MCS used in this experiments was Single Best, Static Selection, OLA, KNORA-U, KNORA-E and META-DES. Each of these models uses an ensemble of monolithic classifiers, in this works it was chosen to use 100 classifiers in each model and use Perceptron or Decision Tree as the main classifier, so the following results served as the baseline to choose which one is best.
25 25 25
In Decision Tree model was selected the same parameters used when this model was used only as a monolithic classifier. For Perceptron, the maximum number of iteration chose was 1000, and the tolerance of 10-4. Table 14 shows all of these parameters. Table 15 shows the parameters used in each of the MCS models.
Table 14: Parameters used in the main classifier
Main Classifier Parameters
Decision Tree Gini criteriation spliter best random_state = 0 Perceptron tolerance = 0.001 max. iteration = 1000 random_state = 0
Table 15: Parameters used in MCS
Model Parameters
Static Selection selection 50%
KNORA-E k = 7 no prune no indecision KNORA-U k = 7 no prune no indecision META-DES Multinom. NB Kp = 5 k = 7
Table 16 shows the results for Data Set 1 from MCS using Perceptron as the main classifier and Table 17 shows the results for Data Set 2. Using Perceptron as the main classifier is possible to observe that, for Data Set 1 the KNORA-U model has the best result in all metrics, and for Data Set 2 META-DES has the best results for all the metrics.
In results obtained using Decision Tree classifier is possible to see that META-DES has the best results overall metrics using the Data Set 1. Although, the Static Selection has the best result in all metrics when the model is evaluate using Data Set 2. Table 18 shows those results the results for Data Set 1 using Decision Tree as the main classifier and Table 19 shows the results for Data Set 2.
Table 16: Results from MCS using Perceptron as main classifier of test set from Data Set 1. The best result for each metric is highlighted
Model Accuracy Precision Recall F1 Single Best 0.9528 0.9532 0.9528 0.9528 Static Selection 0.9532 0.9538 0.9532 0.9530
OLA 0.9528 0.9533 0.9528 0.9528
KNORA-U 0.9562 0.9565 0.9562 0.9561 KNORA-E 0.9555 0.9558 0.9555 0.9554 META-DES 0.9539 0.9540 0.9539 0.9537
Table 17: Results from MCS using Perceptron as main classifier of test set from Data Set 2. The best result for each metric is highlighted
Model Accuracy Precision Recall F1 Single Best 0.7198 0.7514 0.7198 0.7112 Static Selection 0.8190 0.8192 0.8190 0.8189
OLA 0.8190 0.8219 0.8190 0.8197
KNORA-U 0.8224 0.8228 0.8224 0.8225 KNORA-E 0.8277 0.8298 0.8277 0.8284 META-DES 0.8365 0.8373 0.8365 0.8365
Table 18: Results from MCS using Decision Tree as main classifier of test set from Data Set 1.
The best result for each metric is highlighted
Model Accuracy Precision Recall F1 Single Best 0.8351 0.8366 0.8351 0.8342 Static Selection 0.8921 0.8926 0.8921 0.8913
OLA 0.8324 0.8335 0.8324 0.8314
KNORA-U 0.8979 0.8981 0.8979 0.8972 KNORA-E 0.8907 0.8908 0.8907 0.8901 META-DES 0.8985 0.8997 0.8985 0.8978
Table 19: Results from MCS using Decision Tree as main classifier of test set from Data Set 2.
The best result for each metric is highlighted
Model Accuracy Precision Recall F1 Single Best 0.6944 0.6887 0.6944 0.6857 Static Selection 0.8398 0.8453 0.8398 0.8398
OLA 0.8190 0.8219 0.8190 0.8197
KNORA-U 0.8224 0.8228 0.8224 0.8225 KNORA-E 0.8277 0.8298 0.8277 0.8284 META-DES 0.8365 0.8373 0.8365 0.8365
Analyzing the results from using Perceptron and Decision Tree as the main classifier, is possible to see, that given a data set and the base classifier, all models achieve very similar results.
Although the Single Best model has worst results for Data Set 2, regardless of the main classifier, that the other MCS. Besides, the results from Perceptron are quite better than the Decision Tree results, most of them in Data Set 1. For the Data Set 1, the best result was using KNORA-U with Perceptron, and for Data Set 2 was using Static Selection with Decision Tree.
27 27 27
4.4 FINAL CONSIDERATIONS
It is possible to compare results obtained using monolithic classifiers and MCS, the best result for Data Set 1 is on Table 20 and for Data Set 2 is on Table 21. Analyzing the data sets separately, the results are quite similar, but for Data Set 1 the MCS using KNORA-U with Perceptron was better than the best monolithic classifier using MLP. Besides, both results were better the SVM result presented in previous work (Anguita et al.,2013). For Data Set 2, the monolithic model was better than MCS one.
Table 20: Best results from monolithic and MCS for Data Set 1 Model Accuracy Precision Recall F1 score
MLP 0.9477 0.9513 0.9477 0.9479
KNORA-U
(Perceptron) 0.9562 0.9565 0.9562 0.9561
Table 21: Best results from monolithic and MCS for Data Set 2 Model Accuracy Precision Recall F1 score Random Forest 0.8804 0.8780 0.8780 0.8780 Static Selection
(Decision Tree) 0.8398 0.8453 0.8398 0.8398
Comparing all results for both data sets is interesting to notice that models were better in Data Set 1. As Table 4 shows, that was expected, because Data Set 1 has more instances than Data Set 2, the first has 10299 instances, and the second has 5744 instances. Another aspect of analyzing is that using MCS does not achieve a better result than monolithic classifiers. Table 22 shows the result obtained by (Anguita et al.,2013) and the best results obtained for the Data Set 1, so it is possible to see that the results are similar to those obtained by (Anguita et al.,2013).
Table 22: Best results for Data Set 1 and results obtain by (Anguita et al.,2013) Model Accuracy Precision Recall
SVM
(Anguita et al.,2013) 0.9600 0.9600 0.9600 KNORA-U
(Perceptron) 0.9562 0.9565 0.9562
5
CONCLUSION AND FUTURE WORKS
Physical Inactivity brings some health problems, and the decreasing of active time per person is one of the biggest concerns of the WHO. One of the initial efforts to treat it, it is to identify and measure the activity performed by someone and HAR is the tools to it. HAR process is divided into two steps: the acquisition phase, that uses a sensor to collect information about the movement performed by someone and the classification phase, that consists in classify the data in one of the ADL.
The presented work tested some monolithic classifiers and some MCS, comparing their results using accuracy, precision, recall, and F1-score. Also, this work compared the results of the monolithic classifiers using L2 distance normalization and MinMaxScaler in HAR with no normalized data, showing how these techniques impact in the results of some classifiers as SVM and Bernoulli NB. Considering these worse result, the MCS were trained using no normalized data.
Analyzing the results, it is notable that the MCS does not achieve better results than the monolithic classifiers. Even in Data Set 1 that the best model, KNORA-U using Perceptron, is only quite better than the best monolithic classifier, MLP, the KNORA-U obtain an accuracy of 0.9562, and the MLP an accuracy of 0.9477. In Data Set 2, the best monolithic classifier, Random Forest, achieve an accuracy of 0.8804, better than the accuracy of 0.8398, accomplish by the best MCS, Static Selection with Decision Tree.
Besides these results, some points can be made in future works to complete the analyzes on HAR using MCS:
Compare the computational time of training all the models, to give the trade-off of use MCS instead of using a monolithic classifier.
Perform a grid search on the best parameters for some classifiers, as MLP, SVM and the MCS to find the finest results for these classifiers.
Use both data sets to train a unique model, given more instances in the training phase and observe if that leads to better results.
29 29 29
Compare achieved results using a statistical test to precisely define which is better between monolithic classifiers and MCS.
REFERENCES
Anguita, D., Ghio, A., Oneto, L., Parra Perez, X., & Reyes Ortiz, J. L. (2013). A public domain dataset for human activity recognition using smartphones. In Proceedings of the 21th International European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 437–442.
Breiman, L. (1996). Bagging predictors. Machine learning, 24(2):123–140.
Britto Jr, A. S., Sabourin, R., & Oliveira, L. E. (2014). Dynamic selection of classifiers—a comprehensive review. Pattern Recognition, 47(11):3665–3680.
Dua, D. & Graff, C. (2017). UCI machine learning repository.
Elamvazuthi, I., Izhar, L., Capi, G., et al. (2018). Classification of human daily activities using ensemble methods based on smartphone inertial sensors. Sensors, 18(12):4132.
Freund, Y., Schapire, R. E., et al. (1996). Experiments with a new boosting algorithm. In icml, 96:148–156.
Giacinto, G. & Roli, F. (1999). Methods for dynamic classifier selection. In Proceedings 10th International Conference on Image Analysis and Processing, 659–664.
Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
Jain, A. & Kanhangad, V. (2017). Human activity classification in smartphones using accelerom- eter and gyroscope sensors. IEEE Sensors Journal, 18(3):1169–1177.
Karantonis, D. M., Narayanan, M. R., Mathie, M., Lovell, N. H., & Celler, B. G. (2006).
Implementation of a real-time human movement classifier using a triaxial accelerometer for ambulatory monitoring. IEEE transactions on information technology in biomedicine, 10(1):156–
167.
Kittler, J., Hater, M., & Duin, R. P. (1996). Combining classifiers. In Proceedings of 13th international conference on pattern recognition, 2:897–901.
Kurzynski, M., Woloszynski, T., & Lysiak, R. (2010). On two measures of classifier competence for dynamic ensemble selection-experimental comparative analysis. In 2010 10th International Symposium on Communications and Information Technologies, 1108–1113.
Lukowicz, P., Ward, J. A., Junker, H., Stäger, M., Tröster, G., Atrash, A., & Starner, T. (2004).
Recognizing workshop activity using body worn microphones and accelerometers. In Interna- tional conference on pervasive computing, 18–32.
Ni, B., Wang, G., & Moulin, P. (2011). Rgbd-hudaact: A color-depth video database for human daily activity recognition. In 2011 IEEE international conference on computer vision workshops (ICCV workshops), 1147–1153.
Ravi, N., Dandekar, N., Mysore, P., & Littman, M. L. (2005). Activity recognition from accelerometer data. In Aaai, 5(2005):1541–1546.
31 31 31
Sabourin, M., Mitiche, A., Thomas, D., & Nagy, G. (1993). Classifier combination for hand- printed digit recognition. In Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR’93), 163–166.
Wang, A., Chen, G., Yang, J., Zhao, S., & Chang, C.-Y. (2016). A comparative study on human activity recognition using inertial sensors in a smartphone. IEEE Sensors Journal, 16(11):4566–4578.
Warburton, D. E., Nicol, C. W., & Bredin, S. S. (2006). Health benefits of physical activity: the evidence. Cmaj, 174(6):801–809.
Woods, K., Kegelmeyer, W. P., & Bowyer, K. (1997). Combination of multiple classifiers using local accuracy estimates. IEEE transactions on pattern analysis and machine intelligence, 19(4):405–410.
World Health Organization (2010). Global recommendations on physical activity for health.
World Health Organization.