that the classifier can use to locate analogous traits. Therefore, the traits of each instance are judged by the attribute values which are contained in every instance. The use of the term pattern is common in machine learning. It is defined as a sample of data that conveys useful information that can be repeated in a recognizable way. Generally it is possible to distinguish between:
• Numerical pattern: are measurable values, properties or characteristics;
• Categorical patterns: properties and qualitative characteristics of an object that cannot be mapped numerically;
• Sequential patterns: data sequences of fixed or variable length.
2.2.2 Machine Learning Techniques
Machine learning can be divided into multiple sectors on the basis of the types of work to be performed. Three major areas of studies are identified in the literature: Supervised, Unsupervised and Reinforcement method [59].
Supervised learning is the most popular paradigm for performing machine learning operations. The algorithms are presented with a set of classified instances from which they learn a way of classifying not-seen instances. It is called supervised because the scheme works under supervision by being provided with the actual outcome for each of the training instances. It is widely used for data where there is a precise mapping between input-output data. The dataset is labeled, meaning that the algorithm identifies the features explicitly and carries out predictions or classification accordingly. As the training period progresses, the algorithm is able to identify the relationships between the two variables such that it can predict a new outcome. The success of the classification can be measured by testing the generated model with an independent set of instances for which the true classifications are known but are hidden to the classifier. Supervised learning algorithms are task-oriented. Providing it with more and more examples, it becomes able to learn more properly and provide an output more accurately. To evaluate the performance of different methods, a common practice is that to divide the set of instances into two sets: training, test and usage. The training set is used to build the classifier model, while the test set is used to measure the accuracy of the classifier, i.e, it is a measure on how well it generalizes to unseen instances. Finally, usage phase uses the model for classification on new data whose class labels are unknown. Supervised learning could be distinguished in two main categories: predictive or directed so as well divided into two branches: Classification (Support Vector Machine, Naïve Bayesan, Decision Tree, K-Nearest Neighbors, Logistic Regression) and Regression (Linear Regression, Support Vector
25
Regression, Ensemble Method, Neural Networks). Three of the most popular families of classifiers that have been considered in this research work: Naive-Bayes (NB), Decision Tree (DT) and K-Nearby Neighbors (KNN). Their characteristic will be deeply analyzed in 2.2.3Unsupervised learning is the machine learning method of trying to locate hidden structure in unlabeled training data. The model is able to learn from data by finding implicit patterns.
Unsupervised Learning algorithms identify the data based on their densities, structures, segments and other similar features. Unsupervised learning can be descriptive or undirected and can be divided into: Clustering (K. Means, Hierarchical, Hidden Markov Model, Gauss Mixture Model) and Associations (Apriori, FP-Growth).
Reinforcement Learning covers more area of Artificial Intelligence that allows machines to interact with their dynamic environment in order to reach their goals. With this, machines and software agents are able to evaluate the ideal behavior in a specific context. This type of learning is different from Supervised Learning in the sense that the training data in the former has output mapping provided such that the model is capable of learning the correct answer.
Whereas, in the case of reinforcement learning, there is no key answer provided to the agent when they have to perform a particular task. When there is no training dataset, it learns from its own experience. Q-Learning and Monte Carlo Method are the most implemented algorithms. Figure 4 shows some of the machine learning techniques proposed in the literature.
Figure 4: Machine Learning techniques
26 2.2.3 Classification
In machine learning, classification is a supervised learning approach in which the algorithm learns from the data input given to it and then uses this learning to classify new observation.
The success of the classification can be measured by testing the generated model with an independent set of instances for which the true classifications are known but are hidden to the classifier [60]. When evaluating the performance of different methods, it is a common practice to divide the classification in three steps: Training, Testing and Usage. Training phase is used to build the classifier model from training instances. The classification algorithm finds the relationships between predictors and targets, then the relationships found are summarized in a model. Testing phase is used to measure the accuracy of the classifier checking the model on a
test sample whose class labels are known but not used for training
the model.Usage phase uses the model for classification on new data whose class labels are unknown.
A classifier refers to a mathematical function that maps input data to a category. In this section, will be discussed three popular families of classifiers that were employed in this thesis. The classifiers chosen are those most commonly used in the state of the art and collectively represent a range of different approaches.
2.2.3.1 Naïve Bayes
To Naïve Bayes classifiers family belong simple probabilistic classifiers based on applying Bayes' theorem with strong (naïve) independence assumptions between the features.
[61].
Let be 𝐕 a space of d-dimensional patterns and W = {w1, w2,… wn} a set of s disjoint classes consisting of elements of 𝐕. For each 𝐱 ∈ 𝐕 and for every wi ∈ W, we denote by p(𝐱|wi) the conditional probability density of 𝐱 given wi that is the probability density that the next pattern is 𝐱 under the assumption that its class is wi. For each wi ∈ W, we denote by P(wi) the a priori probability of wi or the probability, regardless of observation, that the next pattern to be classified is of class wi. For each 𝐱 ∈ 𝐕 we denote by 𝑝(𝐱) the probability density absolute of 𝐱, or the probability density of the next one patterns to be classified is let be x
For each wi∈ W and for each 𝐱 ∈ 𝐕 we denote by P(wi|𝐱) the posterior probability of wi given x, or the probability that having observed the pattern 𝐱, the membership class is wi. For the Bayes theorem: