Chapter 4 Materials and methods
4.3 Data partitioning, cross-validation and testing
4.3.2 Cross-validation
In ๐-fold cross-validation, the training data is split into ๐ approximately equally sized subsamples or โfoldsโ, and the validation process is repeated ๐ times. During repetition ๐ง (๐ง = 1,2, โฆ , ๐), the images in fold ๐ง are used as validation data, with the images in the remaining ๐ โ 1 folds being used as training data. The hyperparameter set with the lowest validation error, averaged across all folds, is selected as the best hyperparameter set for the algorithm and used during the final training and testing phase. The same data partitioning into folds were used for all hyperparameter sets tested. Pseudocode for the cross-validation procedure used in this work is given in figure 4-8. This information is also depicted visually in figures 4-9 and 4-10.
Class Training Number of images Test Total 1 60 (12 per fold) 20 80 2 90 (18 per fold) 30 120 3 60 (12 per fold) 20 80 Totals: 210 70 280
Class Training Number of images Test Total 1 30 (6 per fold) 10 40 2 116 (23 per fold) 38 154 3 80 (16 per fold) 26 106 Totals: 226 74 300
Chapter 4 โ Materials and methods 74 given number of folds Z;
given all training images, partitioned into folds 1, 2, ..., Z; given feature extraction hyperparameter options H_F;
given classification hyperparameters options H_C;
given known class information (labels) of all training images; % do cross-validation
for each cross-validation run z = 1, 2, ..., Z
training images = all training images not in fold z; validation images = all training images in fold z;
for each feature extraction hyperparameter combination i in H_F for each classification hyperparameter combination j in H_C training features = features extracted from ...
training images using hyperparameter combination i; classifier = model trained using training features ... and hyperparameter combination j;
validation features = features extracted from validation ... images using hyperparameter combination i;
predicted labels = classes of validation images ... predicted by classifier;
error(z,i,j) = fraction incorrectly predicted labels ... of validation images;
end
end end
% calculate average errors to determine optimal hyperparameter set
for each i for each j
error(i,j) = average of error(z,i,j) across all z; end
end
best hyperparameter combination = i and j with lowest error(i,j);
Chapter 4 โ Materials and methods 75
Figure 4-9: Detailed development of an inferential sensor showing the calculation of error rates ๐ ,๐,๐ for ๐-fold cross-
validation. Error rate ____ Classification hyperparameters ___ Feature extraction hyperparameters ___ Re peat for e a ch fo ld Re peat for e a ch hy pe rparame te r comb inati o n
Process / product scene
Image acquisition Image set
Data partitioning
Training images Test images
Model training
Predicted class information Trained classifier
Training images for CV Validation images
Dimensionality reduction Dimensionality reduction Model application Known class information Dimensionality reduction parameters Key External input Step input / output Final output Algorithm step Training texture feature set Validation texture feature set Pre-processing
Chapter 4 โ Materials and methods 76
Figure 4-9 adds more detail to the diagram explaining the development of a visual-based inferential sensor, as shown in the beginning of this chapter (figure 4-1, p. 64). It includes the data partitioning step (as explained in section 4.3.1) and the procedure for calculating error rates โฐ๐ง,๐,๐ for cross- validation. The subscript ๐ง refers to the fold, ๐ indicates that the ๐th feature extraction hyperparame-
ter set was used and ๐ indicates that the ๐th classification hyperparameter set was used.
An error rate for a specific combination of ๐ง, ๐ and ๐ is simply the fraction of incorrectly predicted labels of the validation data, and is calculated by comparing the predicted labels (class information) of the images with the known labels:
If there are ๐๐ฃ๐๐ validation images, ๐๐ and ๐๐ are vectors of length ๐๐ฃ๐๐ containing the predicted and known class labels, respectively. The โ==โ is a logical equal operator, so that ๐ด == ๐ต returns 1
Figure 4-10: The error rates ๐ ,๐,๐ are used to determine the optimal hyperparameter
combination {๐ , ๐ }. Here ๐๐ is the number of feature extraction hyperparameters settings and ๐๐ is the number of classification hyperparameters
โฐ๐ง,๐,๐ = 1 โโ (๐๐,๐== ๐๐,๐) ๐ ๐=1 ๐๐ฃ๐๐ (4-1) Average Average Average Minimum โฎ โฎ โฎ โฎ โฎ โฎ โฎ โฎ
Chapter 4 โ Materials and methods 77
when ๐ด = ๐ต and 0 when ๐ด โ ๐ต. The operator is applied in a pairwise fashion to each entry ๐๐,๐ and ๐๐,๐ of ๐๐ and ๐๐.
The details regarding the dimensionality reduction and modelling steps in this diagram will be explained in sections 4.5 and 4.6, respectively.
Figure 4-10 shows how the error rates โฐ๐ง,๐,๐ are used to determine the optimal hyperparameter combination. Let the set of possible feature extraction hyperparameters be ๐ = {โ๐น,1, โ๐น,2, โฆ , โ๐น,โ๐} and let the set of possible classification hyperparameters be ๐ = {โ๐ถ,1, โ๐ถ,2, โฆ , โ๐ถ,โ๐}, where โ๐ is the number of feature extraction hyperparameter settings and โ๐ is the number of classification hyperparameter settings. The error rate โฐ๐,๐ for each hyperparameter combination {โ๐น,๐, โ๐ถ,๐} is calculated as the average error rate across all folds ๐ง = 1,2, โฆ , ๐ for that hyperparameter combination:
The minimum โฐ๐,๐ is then determined, and the hyperparameter combination that led to this minimum error rate is the optimal combination {โ๐น, โ๐ถ}. These optimal hyperparameters will be used during the final training and test phase.