Section II Material and Methods
4. Acoustical analysis of the infant cry
4.4. Cry Classification Methods
As reported in Sect II, several studies show that the newborn cry contains specific features that enable the classification of various diseases and conditions by automatic techniques. In this PhD work the aim of classification was to find cry features capable to differentiate the newborns (that is, the development of their phonatory apparatus) during the first six months of life for the definition of normative ranges, which is one of the main objectives of the GR3 project.
All experiments were conducted using the Weka [249] machine learning software package with its standard settings.
For reasons of clarity the order in which they are described corresponds to that of their use in the present work.
77 Then, their use through the WEKA tool is described. Results are reported in Chapter 7.
In Fig. 4.13 is reported the procedure used for the classification with WEKA. It is made up by the following steps: dataset, features extraction, features selection, classification and evaluation.
Figure 4.12 – Flow-chart of the data mining process.
4.4.1 Pattern Recognition Methods
Generally, the process of Automatic Cry Recognition is done in two steps. The first step is the acoustic processing, or features extraction, while the second is known as pattern recognition or classification. In some systems, an extra step between them is performed, this is the process called feature selection. In the acoustical analysis, the cry signal is analysed to extract the most important characteristics. The set of obtained characteristics is represented by a vector, which, for the process purposes, represents a pattern. The set of all vectors is then used to train the classifier. Later on, a set of unknown feature vectors is compared with the previous one to measure the classification output efficiency.
Genetic Algorithms
Genetic Algorithms, proposed by John Holland are a family of computational models inspired by biological evolution, which encode a potential solution to a specific problem on a simple chromosome-like data structure and apply recombination operators to the structures to preserve relevant information. Two main problems to be solved when designing a genetic algorithm are the encoding problem and the evaluation function, also called fitness function. Encoding of chromosomes is fundamental to determine the best
78 recombination and mutation. The evaluation function can be mathematically formulated or obtained through a simulation process. Alternatively the fitness function can be based on the performance of the system under evaluation. In this type of genetic algorithm, a population of possible solutions (chromosomes or individuals) evolves in order to optimize the solution. Each candidate solution has a number of properties that can change or be altered. The best individuals are randomly selected from the population, and each individual genome is modified (recombined and possibly randomly mutated) to generate a new generation. This method will be used for the best attributes selection.
Random Forest
Random forests are built by combining the predictions of several decision trees, each of which is trained in isolation, and then the predictions of the trees are combined through averaging [249]. A Random Forest ensemble uses a large number of individual, un-pruned decision trees. The individual trees are constructed using a simple algorithm which represents a top-down decision tree induction algorithm in which the decision tree is not pruned and at each node the inducer randomly samples N of the attributes and chooses the best split from among those variables. The classification of an unlabeled instance is performed using majority vote. There are three main choices to be made when constructing a random tree. These are (1) the method for splitting the leaves, (2) the type of predictor to use in each leaf, and (3) the method for injecting randomness into the trees. Another method for randomization of the decision tree is through histograms. The use of histograms has long been suggested as a way of making the features discrete, while reducing the time to handle very large datasets. Typically, a histogram is created for each feature, and the bin boundaries used as potential split points. The randomization in this process is expressed by selecting the split point randomly in an interval around the best bin boundary. One important advantage of the random forest method is its ability to handle a very large number of input attributes. Another important feature of the random forest is that it is fast.
Neural Networks (Multilayer Perceptron)
Neural network methods construct a model using a network of interconnected units called neurons [Rokach (2010)]. The multilayer feedforward neural network is the most widely studied neural network, because it is suitable for representing functional relationships between a set of input attributes and one or more target attributes. In order to construct a classifier from a neural network inducer, a training step must be employed. The training step calculates the connection weights which optimize a given evaluation function of the training data. Various search methods can be used to train these networks, of which the most widely applied one is back propagation. Most neural networks are based on a unit called perceptron.
Support Vector Machines (SVM or SMO)
The SVM are a set of supervised learning methods used for classification. For each group of objects divided into two classes a SVM identifies the hyperplane having the maximum margin of separation,
79 Support Vector Machines are based on the Structural Risk Minimization principle from the computational learning theory. A SVM is a binary classifier that makes its decisions by constructing a linear decision boundary or hyperplane that optimally separates two classes. The SVM can be used to separate classes that could not be separated with a linear classifier; otherwise their application to cases of real interest would not be possible. In these cases, the coordinates of the objects are mapped in an area called "feature space" using non-linear functions, called "feature function". The feature space is a highly multi-dimensional space in which the two classes can be separated with a linear classifier.
Logistic Regression
This is a regression model applied to cases where the dependent variable is of binary type.
Linear regression can easily be used for classification in domains with numeric attributes. Indeed, we can use
any regression technique, whether linear or nonlinear, for classification. The trick is to perform a regression
for each class, setting the output equal to 1 for training instances that belong to the class and 0 for those that do not. The result is a linear expression for the class. The statistical technique called logistic regression approximating the 0 and 1 values directly, thereby risking illegitimate probability values when the target is overshot, logistic regression builds a linear model based on a transformed target variable.
Data Evaluation
The evaluation of the performance of classifiers is an important tool in pattern recognition, because it helps to understand the quality of an algorithm and to adjust its parameters. There are several metrics for evaluating the predictive performance of classifiers. Here we briefly describe those used in our experiments.
ROC Curves
One widely used performance measure is the Receiver Operating Characteristic (ROC) curves used to calculate the tradeoff between true positive (TP) to false positive (FP) rates. Precision helps to find how many of the classified cases are correct thus giving a measure of the performance: Precision = TP/(TP+FP). Performance can be measured through the so-called F-measure that is the harmonic mean of precision and sensitivity: F-measure= 2*sensitivity*Precision/(sensitivity+Precision), where Sensitivity = TP/(TP+FN) and FN are the false negative instances. This measure is taken as an alternative measure to the area under the ROC curve.
4.4.2 WEKA
All the experiments were carried out by means of algorithms implemented in WEKA [249]. WEKA is an open source software issued under the GNU General Public License. It is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. WEKA contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
80 For the classification processing we have used two techniques included in WEKA, which are described below:
Select Attributes
In order to reduce the processing time it is desirable to reduce the size of the input vectors without degrading the efficiency of the classification algorithm. To make a systematic reduction and with the goal of keeping the more relevant features for the pattern recognition process the command Select Attributes is applied. This command allows to evaluate the attributes and selects those with the best performance for the input data considered. We chose as Attribute Evaluator the CfsSubsetEval operator that evaluates the predictive ability of each attribute and the degree of redundancy between them, preferring a set of attributes that are most highly correlated with the class but with low cross-correlation between them. The output returns a measure that guides the search for the most relevant features. The searching method applied by the CfsSubsetEval operator is the GeneticSearch that searches the solution space based on a Genetic Algorithm.
Classify
Classification is a machine learning task that covers any context in which a decision based on available data is made. The learned models from training data are then evaluated using a different test dataset to determine whether the models can be generalized to new cases.
4.4.3 Test and Validation Experiments
In the Experimental results sect. III the classification of infant cry in two classes and in five classes will be presented. The first classification aims at distinguishing between low risk (LR) and high risk (HR) infants with 33 parameters. The second classification concerns children clinically validated as typically developing (TD) and it was carried out with 22 parameters. The aim is to classify these newborns according to 5 classes corresponding to the recording time points: 10 days, 6,12,18 and 24 weeks after birth.
The first action with WEKA was aimed at finding the more relevant features for a proper classification. This is performed running the Select Attributes command with the option CfsSubsetEval as Attribute Evaluator and GeneticSearch as search method. In the Experimental Results will be reported the selection of best parameters obtained with this function.
The classification algorithms used for our experiments are: Random Forest, SMO (Support Vector Machines), Multilayer Perceptron and Logistic (Logistic Regression). For all these methods 10 folds Cross- Validation was used.
The analyses presented in result section were performed on 22 starting attributes in an iterative way (mean, median, SD, min, max of F0, F1, F2, F3, CU length and number of CUs), varying the number of attributes using the SelectAttributes command, which allows to figure out which are the more efficient ones. For testing the following options were used: full training set, percentage split at 66% and 10 folds cross validation. The best results were obtained with all parameters.
81