Roberto Costa Fernandes. Evaluating the robustness of machine learning algorithm on Human Activity Recognition

(1)

Evaluating the robustness of machine learning algorithm on Human Activity Recognition

Federal University of Pernambuco [email protected] www.cin.ufpe.br/~secgrad

Recife 2019

(2)

Roberto Costa Fernandes

Evaluating the robustness of machine learning algorithm on Human Activity Recognition

A B.Sc. Dissertation presented to the Center of Informatics of Federal University of Pernambuco in partial fulfillment of the requirements for the degree of Bachelor in Computer Engineering.

Concentration Area: Machine Learning Advisor: George Darmiton da Cunha Cavalcanti

Recife 2019

(3)

I would like to thank all my family, who guided me through all my life, leading to who I am now and always doing all to give me all the best. Also, I would like to thank especially my sisters and my parents to always put me and my education in the first place, and to provide me with the best home education and school education.

I would like to thank my advisor Prof. George Darmiton, who led me and motivated me in my worse days without his support, it was not possible to make this work. I also would like to thank my friend Walber Macedo, who helped with his knowledge and who listen and helped me to take some projects decisions.

I also would like to thank all my friends who always supported me and helped me through the university years. I would like to thank especially Prof. Edna Barros and all my work partners from RobôCIn, who became my friends and who teach me to become a better person and a better work partner. Also, I would like to thank my friends Lucas Cavalcanti, Renato Sousa, Caio Lima, Geovanny Lucas, and all others that were part of my journey.

(4)

ABSTRACT

Physical inactivity can cause several diseases, like diabetes, obesity, hypertension, cardiovascular diseases, depression, osteoporosis, and many other physical problems. To diagnose physical inactivity and avoid this kind of problems, it is needed to monitor the movements performed by someone and classify them as one of the Activities of Daily Living, like walking, laying down, standing, and other activities. In this context, signals from smartphone sensors can be used as input to machine learning models to perform such classification. The present work has the objective to bring new results to the classification phase using Multiple Classifier Systems. It is used two public data sets with data collected from smartphone triaxial sensors:

gyroscope and accelerometer. Thus, this work uses these two data sets to compare monolithic classifiers with Multiple Classifier Systems using static and dynamic selection, using metrics like accuracy, precision, recall, and F1-score. For one of the data sets the best classifier is K-Nearest-Oracle-Union (KNORA-U) achieving an accuracy of 95.62% and for the other data set is Random Forest with an accuracy of 88.04%. It was not possible to determine which is the best classifier to use on Human Activity Recognition, because the monolithic classifiers and Multi-Classifier System has similar results.

(5)

A falta de atividade física pode causar diversas doenças, como diabetes, obesidade, hipertensão, doenças cardiovasculares, depressão, osteoporoses e diversos de outros problemas físicos. Para diagnosticar a inatividade e evitar tais problemas é necessário monitorar as atividades feitas por uma pessoa e classificá-las em atividades do dia-a-dia, como andar, deitar, ficar em pé e outros. Nesse contexto, sinais de sensores presentes nos celulares po- dem ser utilizadas como entrada em modelos de aprendizagem de máquina para classificar tais atividades. O presente trabalho tem como objetivo apresentar novos resultados para a fase de classificação utilizando multi-classificadores. Foram utilizadas duas bases de dados públicas com dados coletados dos sensores triaxiais de celulares: giroscópio e acelerômetro. Para isso, esse trabalho utiliza esses duas bases de dados para comparar classificadores monolíticos com multi-classificadores utilizando seleção estática e seleção dinâmica, utilizando métricas, como acurácia, precisão, revocação e medida F1. Para uma dessas bases o melhor classificador foi o KNORA-U atingindo uma acurácia de 95,62% e para a outra base o melhor classificador foi Random Forest com uma acurácia de 88,04%. Não foi possível determinar qual tipo de modelo é melhor para Reconhecimento de Atividades Humanas, pois os classificadores monolíticos e os multi-classificadores tiveram resultados parecidos.

(6)

LIST OF FIGURES

Figure 1 – Monolithic train architecture . . . 16

Figure 2 – Monolithic test architecture . . . 16

Figure 3 – MCS train architecture . . . 17

Figure 4 – MCS test architecture . . . 17

(7)

Table 1 – Protocol followed by (Anguita et al.,2013) to acquire data . . . 13

Table 2 – Signals obtained to create data set (Anguita et al.,2013) . . . 13

Table 3 – Features extracted for signals (Anguita et al.,2013) . . . 14

Table 4 – Data Sets characteristics . . . 18

Table 5 – Class distribution on Data Set 1. Inside of parenthesis are the percentage of instances from a class in the train and the test set . . . 19

Table 6 – Class distribution on Data Set 2. Inside of parenthesis are the percentage of instances from a class in the train and the test set . . . 19

Table 7 – Monolithic classifiers parameters . . . 22

Table 8 – Results obtained using raw data of test set from Data Set 1. The best result for each metric is highlighted . . . 22

Table 9 – Results obtained using raw data of test set from Data Set 2. The best result for each metric is highlighted . . . 23

Table 10 – Results obtained using normalized data with L2 distance of test set from Data Set 1. The best result for each metric is highlighted. . . 23

Table 11 – Results obtained using normalized data with L2 distance of test set from Data Set 2. The best result for each metric is highlighted. . . 23

Table 12 – Results obtained using normalized data with MinMaxScaler of test set from Data Set 1. The best result for each metric is highlighted. . . 24

Table 13 – Results obtained using normalized data with MinMaxScaler of test set from Data Set 2. The best result for each metric is highlighted. . . 24

Table 14 – Parameters used in the main classifier . . . 25

Table 15 – Parameters used in MCS . . . 25

Table 16 – Results from MCS using Perceptron as main classifier of test set from Data Set 1. The best result for each metric is highlighted . . . 25

Table 17 – Results from MCS using Perceptron as main classifier of test set from Data Set 2. The best result for each metric is highlighted . . . 26

Table 18 – Results from MCS using Decision Tree as main classifier of test set from Data Set 1. The best result for each metric is highlighted . . . 26

Table 19 – Results from MCS using Decision Tree as main classifier of test set from Data Set 2. The best result for each metric is highlighted . . . 26

Table 20 – Best results from monolithic and MCS for Data Set 1 . . . 27

Table 21 – Best results from monolithic and MCS for Data Set 2 . . . 27 Table 22 – Best results for Data Set 1 and results obtain by (Anguita et al.,2013) . 27

(8)

LIST OF ACRONYMS

AAL Ambient Assisted Living

ADL Activity of Daily Living

DCS Dynamic Classifier Selection

DES Dynamic Ensemble Selection

DSP Digital Signal Processing

HAR Human Activity Recognition

KNN k-Nearest Neighbor

KNORA-E K-Nearest-Oracle-Eliminate

KNORA-U K-Nearest-Oracle-Union

MCS Multiple Classifier Systems

MLP Multi-Layer Perceptron

NB Naive Bayes

OLA Overall Local Accuracy

SVM Support Vector Machine

UCI University Of California Irvine

WHO World Health Organization

(9)

1 INTRODUCTION . . . 9

1.1 MOTIVATION . . . 9

1.2 OBJECTIVES . . . 10

1.3 WORK STRUCTURE . . . 10

2 BACKGROUND . . . 12

2.1 DATA ACQUISITION . . . 12

2.2 MULTIPLE CLASSIFIER SYSTEMS . . . 14

3 SYSTEM ARCHITECTURE . . . 16

4 EXPERIMENTS . . . 18

4.1 DATABASE . . . 18

4.2 EXPERIMENTS METHODOLOGY . . . 19

4.3 RESULTS . . . 21

4.3.1 Monolithic Classifier . . . 21

4.3.2 Multiple Classifier Systems . . . 24

4.4 FINAL CONSIDERATIONS . . . 27

5 CONCLUSION AND FUTURE WORKS . . . 28

REFERENCES . . . 30

(10)

9 9 9

1

INTRODUCTION

1.1 MOTIVATION

The World Health Organization (WHO) classifies physical inactivity as the fourth most significant risk factor for global mortality (World Health Organization,2010). This risk emerges because physical inactivity can cause some health problems (Warburton et al., 2006), like cardiovascular diseases, diabetes, cancer, hypertension, obesity, depression, and osteoporosis.

One of the most important action to treat this sedentary behavior is to identify if the amount of activity performed by a person reach the time of active time recommended by the WHO (World Health Organization,2010).

Human Activity Recognition (HAR) is the research field that intends to identify the activities performed by someone, and it is the main task is to classify the Activity of Daily Living (ADL). The ADL can be defined as the activities performed by one person during a day, such as walking, sitting, standing, laying down, walking upstairs or walking downstairs. The HAR process can be divided into two parts, first, it must acquire data from the person, to secondly be able to analyze and classify the ADL.

There are two different ways to execute the first step of HAR. The first one is to use cameras and other sensors to identify activities on a well-defined environment. Although this approach has the downside of being used only in a controlled area, can detect a much wide range of ADL (Ni et al.,2011). The second approach is to place sensors in different positions in the body (Lukowicz et al.,2004) (Karantonis et al.,2006) (Ravi et al.,2005), like wrist, waist, and chest, also known as body-worn sensors. This method solves the problem of the first approach, but it cannot be used as a long-term solution because it is not always comfortable to wear a sensor daily — the needed for recalibrating the sensors every time after dressing is another problem.

The rising popularity of smartphones in recent years is creating a new method to acquire data to use on HAR. This demand is occurring because these devices already have a triaxial accelerometer and gyroscope built in, making this approach very similar with body-worn sensors, with the advantage of almost everyone, is carry one smartphone on their ADL. Some works are emerging proposing using that methodology to obtain data (Wang et al., 2016) (Jain &

Kanhangad,2017) (Anguita et al.,2013), and this direction is showing to be very promising.

(11)

These data obtained in the first part of HAR are used to identify and classify the ADL.

The first approach to classify ADL was using Digital Signal Processing (DSP) (Lukowicz et al., 2004) (Karantonis et al., 2006). One of the approaches (Lukowicz et al.,2004) using DSP combines accelerometer data with sound data to recognize workshop activity, and achieves an accuracy of 84.4%. Another technique using DSP (Karantonis et al.,2006) is based on a waist-mounted triaxial accelerometer, aiming to identify 12 activities and reaches an accuracy of 90.8%.

Although the results using only DSP can be considered good, recent works propose the combination of DSP to extract features and use those features in machine learning models to classify the ADL. One of these approaches (Jain & Kanhangad,2017) extracts features using a histogram of gradient and Fourier descriptor and then uses a Support Vector Machine (SVM) to classify achieving an accuracy of 97.12%. Other techniques extract only simple features of the signal from a sample window and use a multiclass SVM (Anguita et al.,2013), achieving 96%

of accuracy or a static ensemble of classifiers achieving an accuracy of 99.22% (Elamvazuthi et al.,2018).

Some results (Britto Jr et al.,2014) are showing that the using of Multiple Classifier Systems (MCS) with dynamic selection are presenting better results, with a significant level of α = 0.05 in Friedman’s test, when compared with static selection. None of these works consider the use of MCS with dynamic selection to HAR.

1.2 OBJECTIVES

The main objective of this work is to compare the results of using monolithic classifiers, and MCS and present which classifiers are best in using both techniques. The techniques that will be analyzed will be SVM, Naive Bayes (NB), Random Forest, k-Nearest Neighbor (KNN), Multi-Layer Perceptron (MLP), Decision Tree and some MCS.

There are three specific objectives to accomplish to achieve the main one. They are:

Train the chosen monolithic classifiers using raw data and normalized data, to analyze the impact of the normalization techniques in the data sets and choose to use it or not.

Train and test the MCS, using the chosen normalization technique, to determine which base classifier is the best.

Compare the results using some defined metrics, shown on section 4.2 and presents the best classifiers among the used classifiers.

1.3 WORK STRUCTURE

This work is divided as follow: Chapter 2 will present a background on how to acquire data for HAR and the methodology used by the author of the data set to acquire the data. Chapter

(12)

11 11 11

3 will present the proposed system architecture used in this work and the differences in the architecture using a monolithic classifier and when using MCS. Chapter 4 will present some information about the data set used in the experiments, and it will also describe the methodology of the experiments and show the obtained results. Chapter 5 will report the findings from this work and some future works.

(13)

2

BACKGROUND

HAR can be divided into two steps, the first one is data acquisition, and the second is classification, so this chapter presents some technical fundamentals for the first step and some concepts of machine learning techniques. Section 2.1 presents the technique used by (Anguita et al.,2013) to collect the data used to create the data sets. Section 2.2 shows some concepts about MCS.

2.1 DATA ACQUISITION

There are many ways to acquire data to HAR (Ni et al.,2011) (Lukowicz et al.,2004) (Anguita et al.,2013), each one has its advantage, but this work focuses on data collected by smartphone sensors. This method to collect data was chosen because of the rising ratio of smartphone per person, and this lead to an increase in the number of data to get better predictions.

Although creating one data set to use, could represent a problem because it should be needed to determine a methodology to acquire these data. So to avoid this problem, this work uses two data sets already created and available on University Of California Irvine (UCI) Machine Learning Repository (Dua & Graff,2017). One advantage of these data sets is they were created by the same author and follow the same methodology (Anguita et al.,2013).

There are many possible ADL that could be aimed to identify, but to make the classification process more manageable, the author of these data sets choose to define six basic ADL:

walking, walking downstairs, walking upstairs, sitting, standing and laying. It was used 30 volunteers to perform those activities, and data were acquired utilizing a smartphone in their pocket. It was created a specific sequence order of actions to be performed, to keep each data collection similar, table 1 shows this sequence. Also, it was given a rest time of 5 seconds between each activity, to avoid making the volunteer tired. Each volunteer had to perform this protocol one time for the smartphone in their left pocket and another one with the smartphone in their right pocket.

(14)

13 13 13

Table 1: Protocol followed by (Anguita et al.,2013) to acquire data

No. Static Time (sec) No. Dynamic Time (sec)

0 Start (Standing Pos) 0 7 Walk (1) 15

1 Stand (1) 15 8 Walk (2) 15

2 Sit (1) 15 9 Walk Downstairs (1) 12

3 Stand (2) 15 10 Walk Upstairs (1) 12

4 Lay Down (1) 15 11 Walk Downstairs (3) 12

5 Sit (2) 15 12 Walk Upstairs (2) 12

6 Lay Down (2) 15 13 Walk Downstairs (3) 12

14 Walk Upstairs (3) 12

15 Stop 0

Total 192

The linear acceleration and angular velocity data for the three-axis was acquired during the protocol from using the smartphone’s accelerometer and gyroscope, using a sample rate of 50 Hz. A median filter and a 3rd order low-pass Butterworth filter, with a cutoff frequency of 20 Hz, were used to reduce the noise from the signals. After that, another Butterworth low-pass filter was used to separate acceleration from the body motion from gravity acceleration. Thus, it was calculated the Euclidean magnitude of signals and taken the derivatives to get jerk and angular acceleration. All those signals were sampled using a sliding window of 2.56 seconds and 50% of overlap, resulting in the 17 signals shown on table 2.

Table 2: Signals obtained to create data set (Anguita et al.,2013)

Name Time Frequency

Body Acc. Yes Yes

Gravity Acc. Yes No

Body Acc. Jerk Yes Yes

Body Angular Speed Yes Yes

Body Angular Acc. Yes No

Body Acc. Mag. Yes Yes

Gavity Acc. Mag. Yes No

Body Acc. Jerk Mag. Yes Yes

Body Angular Speed Mag. Yes Yes Body Angular Acc. Mag. Yes Yes

Some of the resulting 17 signals had information from a triaxial sensor, so were divided into three components: X, Y and Z, and the remaining were only a single value, as the magnitude.

For all of those signals, were defined as a set of 17 features to extract, table 3 shows all the features extracted for these signals. This feature extraction results in 561 features available on the data set used. Also, the author already divided the data set in the training and the testing set,

(15)

given 70% of the data to the train set and the remaining data to the test set.

Table 3: Features extracted for signals (Anguita et al.,2013) Function Description

mean Mean Value std Standard deviation mad Median absolute value max Largest value in array min Smallest value in array sma Signal magnitude area energy Average sum of squares iqr Interquartile range entropy Signal entropy

arCoeff Autoregression coefficients correlation Correlation coefficient

maxFreqInd Largest Frequency Component meanFreq Frequency signal weighted average skewness Frequency signal Skewness kurtosis Frequency Signal Kurtosis energyBand Energy if a frequency interval angle Angle between two vectors

2.2 MULTIPLE CLASSIFIER SYSTEMS

In machine learning researchers are always trying to improve their models or create new ones to achieve better results, but this it is leading to over complex models, and it is showing that it is almost impossible to have one model to cover all the variability on the classification task. With this obstacle, some researchers are studying MCS, because of its advantage in being more diverse, with same or better results (Britto Jr et al.,2014). This diversity is created in MCS by training multiple simpler models, as Perceptron or Decision Tree, in different ways, to each trained model be better in one sub-problem, and all the trained models achieve a better result together.

MCS are divided in three steps (Britto Jr et al.,2014): generation, selection and integration. The generation phase consists on, given the train set, generate a pool of classifiers, by training different classifiers. This phase receives two parameters: the number of base classifiers generated in the pool and the method to create the pool, that can be bagging or boosting. Bagging is a method proposed in 1996 and consists of training classifiers with different random subsets of the training set, and each classifier is trained independently (Breiman,1996). The boosting method is very similar to bagging, but instead of training each classifier independently, the result of one trained classifier is used to give weight to instances of the same class, that were not

(16)

15 15 15

well-classified (Freund et al.,1996)

The selection step can be static, that same sub-set of classifiers are chosen for all instances or can be dynamic, given the test instance, it chose those specific classifiers (Britto Jr et al., 2014). The dynamic selection chose a classifier based in some metrics, as accuracy (Woods et al.,1997), raking of classifiers (Sabourin et al.,1993) and probabilistic information (Giacinto

& Roli,1999) (Kurzynski et al.,2010). Dynamic selection can choose only one classifier, called Dynamic Classifier Selection (DCS), or an ensemble of classifiers, named Dynamic Ensemble Selection (DES). After the selection of classifiers, it is needed to combine each classification, and this phase has some strategies (Kittler et al.,1996), as Max Rule, Median Rule, or Majority Vote.

MCS also present better results than monolithic classifiers when compared using Fried- man’s test, with a significant level α = 0.05 (Britto Jr et al.,2014). Although MCS has the advantage of being simpler and diverse, its training phase is computationally slower than the monolithic classifiers, because more models have to train.

(17)

3

SYSTEM ARCHITECTURE

The proposed work aims to analyze the use of two types of classifiers, monolithic and MCS, and it has to be used a different architecture to train and test each one. In the monolithic classifiers to train each model it is needed only the train set and the parameters for the model, after training the model it returns the trained function, figure 1 shows this architecture. In the test step, the model classifies each instance of the test set using the trained function, which returns the class for each example, figure 2 shows the test architecture for the monolithic classifiers. For the monolithic classifiers, it has to be defined some parameters for each model, that are different for each one.

Figure 1: Monolithic train architecture

Figure 2: Monolithic test architecture

The MCS consists of using an ensemble of classifiers and aims to use simpler classifiers as the base on the pool, to achieve better results. Its architecture is different from the monolithic ones, in both train and test step. The MCS are divided into three phases (Britto Jr et al.,2014):

generation, selection, and integration, the first one is part of the training step, and the remaining

(18)

17 17 17

two are part of the test step. Figure 3 shows an overview of the train architecture on MCS, that aims to train a pool of classifiers given the train set.

Figure 3: MCS train architecture

The test step consists of, given one instance of the test set and the pool of classifiers already trained, select one or an ensemble of these classifiers, classify this instance, and then combine the results. Figure 4 shows the selection and integration on the MCS test architecture.

Figure 4: MCS test architecture

For the MCS, besides the parameter for the base classifiers, it has to choose other parameters and methods for each one of the phase. As this works aims to analyze the different result for different methods for the selection phase, the settings for the generation and the integration phase are the same for all the experiments.

(19)

4

EXPERIMENTS

This chapter aims to describe the methodology of the experiments to create ground truth results for the data sets. This chapter also discusses some decision taken for further experiments, such as, the use or not of normalization techniques, the value of some parameters of classifiers methods and compare the result of monolithic classifiers and multiple classifier systems.

This chapter is divided as follow: Section 4.1 describes the data sets used in those experiments, Section 4.2 explains the experiment methodology, and also the used classifiers, Section 4.3 presents the results obtained with those experiments and Section 4.4 shows some conclusions derived from these experiments.

4.1 DATABASE

It was chosen to use two data sets of HAR available on the UCI Machine Learning Repository (Dua & Graff,2017). The first one is HAR Using Smartphones Data Set (Anguita et al.,2013), and for simplicity, it is named Data Set 1 in the rest of this work. The second one is the Smartphone Data Set for HAR in Ambient Assisted Living (AAL) and is called Data Set 2.

Table 4: Data Sets characteristics

Data Set 1 Data Set 2

Number of classes 6 6

Number of attributes 561 561 Number of instances- Train set 7352 4252

Number of instances - Test set 2947 1492 Number of instances - Total 10299 5744

The biggest reason to select these two data set instead of other public data sets, it is that the same author created them, and they are complementary to each other, this brings two significant advantages. First, they followed the same methodology to be created, and second, they have the same number of classes and the same number of attributes. Although they do not have the same amount of instances, Data Set 1 is bigger that Data Set 2, but there is not an

(20)

19 19 19

intersection between the data sets. The author of these data sets divided it into train set (70% of total data), and test set (30% of total data), Table 4 show the number of attributes and how many instances are in train and test set.

Another aspect observed in these data sets is that the six classes, they have almost the same number of instances so they can be considered balanced. Besides all the classes are well divided in train and test set, keeping the proportion of 70% in the train set and 30% in the test set.

So, it did not need to use a strategy to make the data sets balanced. Table 5 and Table 6 shows the distribution of instances in each class.

Table 5: Class distribution on Data Set 1. Inside of parenthesis are the percentage of instances from a class in the train and the test set

Class Instances on

Train Set

Instances on Test Set

Instances on Data Set 1

WALKING 1226 (71.20%) 496 (28.80%) 1722

WALKING_UPSTAIRS 1073 (69.50%) 471 (30.50%) 1544 WALKING_DOWNSTAIRS 986 (70.13%) 420 (29.87%) 1406

SITTING 1286 (72.37%) 491 (27.63%) 1777

STANDING 1374 (72.09%) 532 (27.91%) 1906

LAYING 1407 (72.28%) 537 (27.62%) 1944

Table 6: Class distribution on Data Set 2. Inside of parenthesis are the percentage of instances from a class in the train and the test set

Class Instances on

Train Set

Instances on Test Set

Instances on Data Set 2

WALKING 769 (75.99%) 243 (24.01%) 1012

WALKING_UPSTAIRS 629 (73.31%) 229 (26.69%) 858 WALKING_DOWNSTAIRS 691 (74.30%) 239 (25.70%) 930

SITTING 834 (74.27%) 289 (25.73%) 1123

STANDING 775 (75.32%) 254 (24.68%) 1029

LAYING 554 (70.00%) 238 (30.00%) 792

4.2 EXPERIMENTS METHODOLOGY

Some works presenting good results to HAR (Anguita et al.,2013) (Jain & Kanhangad, 2017) (Elamvazuthi et al.,2018), but they use different experiment methodology, so to be able to compare our results with results presented in the literature we had to choose one of the used methods. As said before the data sets are already divided into train and test, so, there was no

(21)

need to use any technique to distribute the data set. Therefore, for each model and each data set, the model had to be retrained and tested with respectively train and test set.

After defining the procedures for the experiments, it needed to choose the classifiers to be evaluated. It was chosen two different types of classifiers: Monolithic Classifiers and MCS.

Monolithic Classifiers use only one model to classify the instance, those classifiers, are simpler, and it is only needed to define its parameters. In the MCS, a pool of classifiers are trained, and for, given an instance, it decides if one or an ensemble of those classifiers are used to categorize it. In some areas, MCS achieves better results than monolithic classifiers, but they are a bit more complex to train and has more parameters to be defined. Monolithic Classifiers models used was SVM, Decision Tree, Bernoulli NB, Gaussian NB, KNN, Random Forest and MLP.

The MCS models use monolithic classifiers as their main classifier, and the different methods to choose those classifiers, that can be static or dynamic. The statics MCS use the same classifier for all patterns. In this work was used two static MCS models: Static Selection and Single Best. The dynamic models select a specific classifier for a given pattern, and can be divided in two class, the DCS, that choose the best classifier in a poll of classifiers, and DES, that use a combination of those classifiers. The DCS models used was Overall Local Accuracy (OLA) and the DES models used was K-Nearest-Oracle-Eliminate (KNORA-E), KNORA-U and META-DES. All of these models use an ensemble of simpler classifiers, like Perceptron or Decision Tree as their main classifier, to achieve a better result than complex classifiers. It was performed experiments with all MCS using both classifiers as its main.

One technique widely used to try to obtain better results is normalizing the data. Although other works (Anguita et al., 2013) do not use any normalization technique, in this work it was tested two normalization techniques, Normalizer and MinMaxScaler, and also, without normalization. The first experiment using monolithic classifiers was performed to choose which normalization technique to use in further tests, so raw data and those two normalization techniques were used.

Normalizer uses the concept of norm one vector to scale one attribute. Equation 4.1 defines the norm of a feature vector fff of size n, and it is computed as the square root of the sum of square of each one of the n feature f . Thus, this technique normalizes each attribute to have a unit norm.

|| fff ||₂= s _n

∑

i

f_i²

= q

f₁²+ ... + f_n² 4.1 MinMaxScaler uses the min-max technique (Han et al.,2011) to normalize the attributes.

This technique maps the value of one feature in a range between 0 and 1, where 0 is the minimum value that feature can assume, and 1 is the maximum value of that feature. Equation 4.2 shows the equation used to map features values, f⁰is the value of the feature after the normalization, f is the original value of the feature, min_A is the minimum value of that feature and max_A is its maximum value.

(22)

21 21 21

f⁰= f− min_A max_A− min_A

4.2

It had to be defined as some metrics to be possible to compare each model using each data set. So, to evaluate those models and create a ground truth to thus be able to compare with noisy label data, accuracy (equation 4.3), precision (equation 4.4), recall (equation 4.5) and F1 score (equation 4.6) were measured on the test set. Accuracy measures the ratio of correct, true predictions in all predictions made. Precision is the ratio of correct positive classified instances over the positive predictions made. Recall the number of correct positive classifications divided by the relevant samples. F1 score is the harmonic mean between precision and recall.

accuracy= T P+ T N T P+ T N + FP + FN

4.3

precision= T P T P+ FP

4.4

recall= T P T P+ FN

4.5

F1 =2 × recall × precision recall+ precision

4.6

4.3 RESULTS

This Section presents the results obtained in each group of classifier evaluated using accuracy, precision, recall, and F1 score. For each of one of the used classifiers, it presents all the parameters used, fixing the same random seed when the model needs it. The next sections is divided as follow, section 4.3.1 presents the parameters and results obtained for the monolithic classifiers and the comparison between normalization techniques. Section 4.3.2 shows the parameters and results obtained for the MCS and discusses which classifier is best to use as the main one.

4.3.1 Monolithic Classifier

The monolithic classifiers chosen to be evaluated with these data sets were SVM, Decision Tree, Bernoulli NB, Gaussian NB, KNN, Random Forest and MLP. The SVM uses the same parameters as previous works (Anguita et al.,2013). Table 7 shows the parameters for each one of the monolithic classifiers. The parameters chosen were the default one, it was not performed any optimization.

(23)

Table 7: Monolithic classifiers parameters

Model Parameters

SVM Radial basis kernel C = 1 random_state = 0 Decision Tree Gini criteriation spliter best random_state = 0 Bernoulli NB α = 1

Gaussian NB α = 1

KNN uniform wheight k = 3

MLP 1000 hidden neurons Adam Optimization α = 10⁻⁴ random_state = 0 Random Forest Gini criteriation 100 estimators random_state = 0

The monolithic classifiers were also used to test the normalization techniques, so for them, they were tested using raw data, data normalized using L2 distance and data normalized using MinMaxScaler. After these results, it is possible to choose to use or not a normalization technique. Table 8 shows the results without using any normalization technique on Data Set 1 and Table 9 shows the result for Data Set 2. Table 10 shows the results obtained using the L2 distance normalization technique on Data Set 1 and Table 11 shows result for Data Set 2.

Table 12 shows the results obtained from data normalized using MinMaxScaler on Data Set 1 and Table 13 shows result for Data Set 2.

Table 8: Results obtained using raw data of test set from Data Set 1. The best result for each metric is highlighted

Model Accuracy Precision Recall F1

SVM 0.9403 0.9410 0.9403 0.9401

Decision Tree 0.8595 0.8605 0.8595 0.859 Bernoulli NB 0.8500 0.8555 0.8500 0.8467 Gaussian NB 0.7703 0.7947 0.7703 0.7688

KNN 0.8907 0.8936 0.8907 0.8899

MLP 0.9477 0.9513 0.9477 0.9479

Random Forest 0.9267 0.9253 0.9251 0.9253

(24)

23 23 23 Table 9: Results obtained using raw data of test set from Data Set 2. The best result for each metric is highlighted

SVM 0.8063 0.8070 0.8063 0.8053

KNN 0.7520 0.7563 0.7520 0.7512

MLP 0.8646 0.8707 0.8646 0.8653

Random Forest 0.8804 0.8780 0.8780 0.8780

Table 10: Results obtained using normalized data with L2 distance of test set from Data Set 1.

The best result for each metric is highlighted.

SVM 0.2505 0.1242 0.3505 0.1834

KNN 0.8907 0.8936 0.8907 0.8897

MLP 0.9555 0.9573 0.9555 0.9555

Random Forest 0.9345 0.9338 0.9335 0.9338

Table 11: Results obtained using normalized data with L2 distance of test set from Data Set 2.

SVM 0.2426 0.1155 0.2426 0.1279

KNN 0.7507 0.7527 0.7507 0.7497

MLP 0.8740 0.8757 0.8740 0.8740

Random Forest 0.8723 0.8686 0.8684 0.8686

(25)

Table 12: Results obtained using normalized data with MinMaxScaler of test set from Data Set 1.

SVM 0.9050 0.9089 0.9050 0.9039

KNN 0.8907 0.8936 0.8907 0.8899

MLP 0.9620 0.9626 0.9620 0.9619

Random Forest 0.9269 0.9257 0.9253 0.9257

Table 13: Results obtained using normalized data with MinMaxScaler of test set from Data Set 2.

SVM 0.7621 0.7631 0.7621 0.7592

KNN 0.7554 0.7620 0.7554 0.7551

MLP 0.8613 0.8646 0.8613 0.8617

Random Forest 0.8804 0.8780 0.8780 0.8780

It was clear that, for Data Set 1, MLP is the best monolithic classifier to be used with raw data or data normalized using any of normalization technique. For Data Set 2 the best technique to use with raw data and using MinMaxScaler is Random Forest and for data normalized using L2 distance is MLP With the obtained results, it is possible to see that most of the classifiers achieve the same result as using or not a normalization technique. Besides, in some case, such as SVM using L2 normalization or Bernoulli NB using MinMaxScaler obtained results were worst when compared to the ones using raw data. Thus, it shows that normalize the data does not bring any advantage and in some case, gets the worst result, so in further experiments, it was used only raw data.

4.3.2 Multiple Classifier Systems

MCS used in this experiments was Single Best, Static Selection, OLA, KNORA-U, KNORA-E and META-DES. Each of these models uses an ensemble of monolithic classifiers, in this works it was chosen to use 100 classifiers in each model and use Perceptron or Decision Tree as the main classifier, so the following results served as the baseline to choose which one is best.

(26)

25 25 25

In Decision Tree model was selected the same parameters used when this model was used only as a monolithic classifier. For Perceptron, the maximum number of iteration chose was 1000, and the tolerance of 10^-4. Table 14 shows all of these parameters. Table 15 shows the parameters used in each of the MCS models.

Table 14: Parameters used in the main classifier

Main Classifier Parameters

Decision Tree Gini criteriation spliter best random_state = 0 Perceptron tolerance = 0.001 max. iteration = 1000 random_state = 0

Table 15: Parameters used in MCS

Model Parameters

Static Selection selection 50%

KNORA-E k = 7 no prune no indecision KNORA-U k = 7 no prune no indecision META-DES Multinom. NB Kp = 5 k = 7

Table 16 shows the results for Data Set 1 from MCS using Perceptron as the main classifier and Table 17 shows the results for Data Set 2. Using Perceptron as the main classifier is possible to observe that, for Data Set 1 the KNORA-U model has the best result in all metrics, and for Data Set 2 META-DES has the best results for all the metrics.

In results obtained using Decision Tree classifier is possible to see that META-DES has the best results overall metrics using the Data Set 1. Although, the Static Selection has the best result in all metrics when the model is evaluate using Data Set 2. Table 18 shows those results the results for Data Set 1 using Decision Tree as the main classifier and Table 19 shows the results for Data Set 2.

Table 16: Results from MCS using Perceptron as main classifier of test set from Data Set 1. The best result for each metric is highlighted

Model Accuracy Precision Recall F1 Single Best 0.9528 0.9532 0.9528 0.9528 Static Selection 0.9532 0.9538 0.9532 0.9530

OLA 0.9528 0.9533 0.9528 0.9528

KNORA-U 0.9562 0.9565 0.9562 0.9561 KNORA-E 0.9555 0.9558 0.9555 0.9554 META-DES 0.9539 0.9540 0.9539 0.9537

(27)

Table 17: Results from MCS using Perceptron as main classifier of test set from Data Set 2. The best result for each metric is highlighted

OLA 0.8190 0.8219 0.8190 0.8197

Table 18: Results from MCS using Decision Tree as main classifier of test set from Data Set 1.

The best result for each metric is highlighted

OLA 0.8324 0.8335 0.8324 0.8314

Table 19: Results from MCS using Decision Tree as main classifier of test set from Data Set 2.

The best result for each metric is highlighted

OLA 0.8190 0.8219 0.8190 0.8197

Analyzing the results from using Perceptron and Decision Tree as the main classifier, is possible to see, that given a data set and the base classifier, all models achieve very similar results.

Although the Single Best model has worst results for Data Set 2, regardless of the main classifier, that the other MCS. Besides, the results from Perceptron are quite better than the Decision Tree results, most of them in Data Set 1. For the Data Set 1, the best result was using KNORA-U with Perceptron, and for Data Set 2 was using Static Selection with Decision Tree.

(28)

27 27 27

4.4 FINAL CONSIDERATIONS

It is possible to compare results obtained using monolithic classifiers and MCS, the best result for Data Set 1 is on Table 20 and for Data Set 2 is on Table 21. Analyzing the data sets separately, the results are quite similar, but for Data Set 1 the MCS using KNORA-U with Perceptron was better than the best monolithic classifier using MLP. Besides, both results were better the SVM result presented in previous work (Anguita et al.,2013). For Data Set 2, the monolithic model was better than MCS one.

Table 20: Best results from monolithic and MCS for Data Set 1 Model Accuracy Precision Recall F1 score

MLP 0.9477 0.9513 0.9477 0.9479

KNORA-U

(Perceptron) 0.9562 0.9565 0.9562 0.9561

Table 21: Best results from monolithic and MCS for Data Set 2 Model Accuracy Precision Recall F1 score Random Forest 0.8804 0.8780 0.8780 0.8780 Static Selection

(Decision Tree) 0.8398 0.8453 0.8398 0.8398

Comparing all results for both data sets is interesting to notice that models were better in Data Set 1. As Table 4 shows, that was expected, because Data Set 1 has more instances than Data Set 2, the first has 10299 instances, and the second has 5744 instances. Another aspect of analyzing is that using MCS does not achieve a better result than monolithic classifiers. Table 22 shows the result obtained by (Anguita et al.,2013) and the best results obtained for the Data Set 1, so it is possible to see that the results are similar to those obtained by (Anguita et al.,2013).

Table 22: Best results for Data Set 1 and results obtain by (Anguita et al.,2013) Model Accuracy Precision Recall

SVM

(Anguita et al.,2013) 0.9600 0.9600 0.9600 KNORA-U

(Perceptron) 0.9562 0.9565 0.9562

(29)

5

CONCLUSION AND FUTURE WORKS

Physical Inactivity brings some health problems, and the decreasing of active time per person is one of the biggest concerns of the WHO. One of the initial efforts to treat it, it is to identify and measure the activity performed by someone and HAR is the tools to it. HAR process is divided into two steps: the acquisition phase, that uses a sensor to collect information about the movement performed by someone and the classification phase, that consists in classify the data in one of the ADL.

The presented work tested some monolithic classifiers and some MCS, comparing their results using accuracy, precision, recall, and F1-score. Also, this work compared the results of the monolithic classifiers using L2 distance normalization and MinMaxScaler in HAR with no normalized data, showing how these techniques impact in the results of some classifiers as SVM and Bernoulli NB. Considering these worse result, the MCS were trained using no normalized data.

Analyzing the results, it is notable that the MCS does not achieve better results than the monolithic classifiers. Even in Data Set 1 that the best model, KNORA-U using Perceptron, is only quite better than the best monolithic classifier, MLP, the KNORA-U obtain an accuracy of 0.9562, and the MLP an accuracy of 0.9477. In Data Set 2, the best monolithic classifier, Random Forest, achieve an accuracy of 0.8804, better than the accuracy of 0.8398, accomplish by the best MCS, Static Selection with Decision Tree.

Besides these results, some points can be made in future works to complete the analyzes on HAR using MCS:

Compare the computational time of training all the models, to give the trade-off of use MCS instead of using a monolithic classifier.

Perform a grid search on the best parameters for some classifiers, as MLP, SVM and the MCS to find the finest results for these classifiers.

Use both data sets to train a unique model, given more instances in the training phase and observe if that leads to better results.

(30)

29 29 29

Compare achieved results using a statistical test to precisely define which is better between monolithic classifiers and MCS.

(31)

REFERENCES

Anguita, D., Ghio, A., Oneto, L., Parra Perez, X., & Reyes Ortiz, J. L. (2013). A public domain dataset for human activity recognition using smartphones. In Proceedings of the 21th International European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 437–442.

Breiman, L. (1996). Bagging predictors. Machine learning, 24(2):123–140.

Britto Jr, A. S., Sabourin, R., & Oliveira, L. E. (2014). Dynamic selection of classifiers—a comprehensive review. Pattern Recognition, 47(11):3665–3680.

Dua, D. & Graff, C. (2017). UCI machine learning repository.

Elamvazuthi, I., Izhar, L., Capi, G., et al. (2018). Classification of human daily activities using ensemble methods based on smartphone inertial sensors. Sensors, 18(12):4132.

Freund, Y., Schapire, R. E., et al. (1996). Experiments with a new boosting algorithm. In icml, 96:148–156.

Giacinto, G. & Roli, F. (1999). Methods for dynamic classifier selection. In Proceedings 10th International Conference on Image Analysis and Processing, 659–664.

Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.

Jain, A. & Kanhangad, V. (2017). Human activity classification in smartphones using accelerometer and gyroscope sensors. IEEE Sensors Journal, 18(3):1169–1177.

Karantonis, D. M., Narayanan, M. R., Mathie, M., Lovell, N. H., & Celler, B. G. (2006).

Implementation of a real-time human movement classifier using a triaxial accelerometer for ambulatory monitoring. IEEE transactions on information technology in biomedicine, 10(1):156–

167.

Kittler, J., Hater, M., & Duin, R. P. (1996). Combining classifiers. In Proceedings of 13th international conference on pattern recognition, 2:897–901.

Kurzynski, M., Woloszynski, T., & Lysiak, R. (2010). On two measures of classifier competence for dynamic ensemble selection-experimental comparative analysis. In 2010 10th International Symposium on Communications and Information Technologies, 1108–1113.

Lukowicz, P., Ward, J. A., Junker, H., Stäger, M., Tröster, G., Atrash, A., & Starner, T. (2004).

Recognizing workshop activity using body worn microphones and accelerometers. In Interna- tional conference on pervasive computing, 18–32.

Ni, B., Wang, G., & Moulin, P. (2011). Rgbd-hudaact: A color-depth video database for human daily activity recognition. In 2011 IEEE international conference on computer vision workshops (ICCV workshops), 1147–1153.

Ravi, N., Dandekar, N., Mysore, P., & Littman, M. L. (2005). Activity recognition from accelerometer data. In Aaai, 5(2005):1541–1546.

(32)

31 31 31

Sabourin, M., Mitiche, A., Thomas, D., & Nagy, G. (1993). Classifier combination for hand- printed digit recognition. In Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR’93), 163–166.

Wang, A., Chen, G., Yang, J., Zhao, S., & Chang, C.-Y. (2016). A comparative study on human activity recognition using inertial sensors in a smartphone. IEEE Sensors Journal, 16(11):4566–4578.

Warburton, D. E., Nicol, C. W., & Bredin, S. S. (2006). Health benefits of physical activity: the evidence. Cmaj, 174(6):801–809.

Woods, K., Kegelmeyer, W. P., & Bowyer, K. (1997). Combination of multiple classifiers using local accuracy estimates. IEEE transactions on pattern analysis and machine intelligence, 19(4):405–410.

World Health Organization (2010). Global recommendations on physical activity for health.

World Health Organization.