CHAPTER 1: SETTING THE SCENE
1.4 REMOTE SENSING PRODUCT GENERATION
1.4.4 Machine learning classifiers
One of the core technologies affecting the fourth industrial revolution is the advancement in artificial intelligence and machine learning techniques (Schwab 2016). Figure 1-3 illustrates the various types of algorithms available.
Brownlee (2017)
Figure 1-3 Types of machine learning algorithms
Machine learning can be formulated as Equation 1-4, whereby an output variable (y) is predicted by a function (f) in conjunction with an input variable (x).
𝑦 = 𝑓𝑥 Equation 1-4
where y is the output variable;
f is the function that best describes input variables; and
x is the input variable.
One can discriminate between parametric versus non-parametric machine learning algorithms. In a parametric model, the form for a function is selected beforehand, and the coefficients for the function are subsequently calculated from the data. Examples of parametric models are linear regression and logistic regression. In non-parametric machine learning, the function is learned. They are normally more computation intensive and require more data. Support vector machines (SVM) and neural network classifications (NNC), as well as decision tree classifiers such as random forests (RF) are good examples of non-parametric machine learning algorithms. Machine learning classification techniques generally have a trade-off between the reduction of bias, introduced by using simplified assumptions, and the reduction of variance, both of which
should be as small as possible for a good classification. Decision tree classifiers normally produce low bias values, whereas linear regression leads to high values. The following is a description of the most important machine learning algorithms, taken from Brownlee (2017): Linear regression
Linear regression is expressed by correlating one set of variables with another set of variables, and is expressed by Equation 1-5. A coefficient of determination (R2) is calculated together with a linear regression, and expresses how well the two variables can be described with a linear function.
𝑦 = 𝑔𝑥 + 𝑏 Equation 1-5
Where y is the dependant variable;
x is the explanatory variable;
g is the gain or fractional relationship between the two variables; and
b is the bias between the two variables. Logistic regression
Logistic regression uses a non-linear function to provide binary classifications of values 0 or 1, and can also provide the probability of a data instance belonging to either class. This method requires feature properties that are not highly correlated with one another and unrelated to the class.
Linear discriminant analysis
Linear discriminant analysis is a technique preferred to logistic regression, especially when dealing with more than two classes. For each input variable, the mean value per class and the variance calculated across all classes are required. This technique assumes that the data have a normal distribution and provides a probability for each class, ultimately allocating it to the class with the highest prediction value.
Classification and regression trees
Classification and regression trees (CART) are binary classifications splitting a group of classes into the most dissimilar sub-groups until ending at the individual class/leaf level. Each node/split uses a single input variable. Decision trees have a high variance and perform best in an ensemble.
NaĂŻve Bayes algorithm
The NaĂŻve Bayes algorithm calculates two types of probabilities from training data assumed to be normally distributed, namely the probability of each class and the conditional probability of each class given each x value.
K-nearest neighbour
In k-nearest neighbour, the entire training dataset is considered and the algorithm attempts to find the training sample with properties closest to that of the object being classified, where proximity is measured in Euclidean distance. This method utilises more computer memory and is computationally intensive as it needs to compare an observation with each individual training sample.
Learning vector quantisation
This algorithm uses neural networks to reduce the training sample and to include only the most relevant ones defining the classes before applying a k-nearest neighbour classification. This algorithm thus produces similar results to the previous, but is computationally more efficient. Support vector machine (SVM)
SVM find boundaries between classes in feature space that are equal distances away in terms of class standard deviation. These boundaries are termed “hyperplanes” and can have non-linear multidimensional shapes to describe the boundaries between classes. Hyperplanes are mathematically described with support vectors that rely on clean samples, but are able to deal with a limited number of samples that do not need to be distributed equally among all classes. Random forest (RF)
RF uses ensembles of decision tree classifiers, run across subsets of samples and variables. By applying bootstrap aggregation, or bagging, this classifier improves model performance because it reduces the variance without increasing the bias. Classifications from multiple trees are averaged, and a class is assigned to a sample with which it is associated most frequently. RFs require a larger amount of samples, distributed evenly across all classes.
AdaBoost
This is an ensemble technique that creates a strong classifier from several weak decision tree classifiers by creating several layers of models on top of each other, where each model attempts to describe the error of its parent. Models are added until the prediction is correct or a maximum number of layers have been produced. Very clean training data is needed for this approach.
Artificial and deep learning neural networks
Neural networks are computational models based on mathematics and threshold logic algorithms. They consist of several layers, iteratively analysing and defining relationships between variables of input data and output classes, using mathematical functions such as addition and subtraction. Neural networks have been rebranded over the decades as they became more complex and approximated the function of a human brain, with descriptors ranging from “neural networks” and “artificial neural networks” to “deep learning”. The classification process entails the definition and training of a neural network, after which it can be applied to data to classify it.
This section provided an overview of the remote sensing technologies applicable to forestry and summarised several product generation techniques. The next section formulates the research problem.