• No results found

Chapter 4 Materials and methods

4.3 Data partitioning, cross-validation and testing

4.3.2 Cross-validation

In ๐‘-fold cross-validation, the training data is split into ๐‘ approximately equally sized subsamples or โ€œfoldsโ€, and the validation process is repeated ๐‘ times. During repetition ๐‘ง (๐‘ง = 1,2, โ€ฆ , ๐‘), the images in fold ๐‘ง are used as validation data, with the images in the remaining ๐‘ โˆ’ 1 folds being used as training data. The hyperparameter set with the lowest validation error, averaged across all folds, is selected as the best hyperparameter set for the algorithm and used during the final training and testing phase. The same data partitioning into folds were used for all hyperparameter sets tested. Pseudocode for the cross-validation procedure used in this work is given in figure 4-8. This information is also depicted visually in figures 4-9 and 4-10.

Class Training Number of images Test Total 1 60 (12 per fold) 20 80 2 90 (18 per fold) 30 120 3 60 (12 per fold) 20 80 Totals: 210 70 280

Class Training Number of images Test Total 1 30 (6 per fold) 10 40 2 116 (23 per fold) 38 154 3 80 (16 per fold) 26 106 Totals: 226 74 300

Chapter 4 โ€“ Materials and methods 74 given number of folds Z;

given all training images, partitioned into folds 1, 2, ..., Z; given feature extraction hyperparameter options H_F;

given classification hyperparameters options H_C;

given known class information (labels) of all training images; % do cross-validation

for each cross-validation run z = 1, 2, ..., Z

training images = all training images not in fold z; validation images = all training images in fold z;

for each feature extraction hyperparameter combination i in H_F for each classification hyperparameter combination j in H_C training features = features extracted from ...

training images using hyperparameter combination i; classifier = model trained using training features ... and hyperparameter combination j;

validation features = features extracted from validation ... images using hyperparameter combination i;

predicted labels = classes of validation images ... predicted by classifier;

error(z,i,j) = fraction incorrectly predicted labels ... of validation images;

end

end end

% calculate average errors to determine optimal hyperparameter set

for each i for each j

error(i,j) = average of error(z,i,j) across all z; end

end

best hyperparameter combination = i and j with lowest error(i,j);

Chapter 4 โ€“ Materials and methods 75

Figure 4-9: Detailed development of an inferential sensor showing the calculation of error rates ๐“” ,๐’Š,๐’‹ for ๐’-fold cross-

validation. Error rate ____ Classification hyperparameters ___ Feature extraction hyperparameters ___ Re peat for e a ch fo ld Re peat for e a ch hy pe rparame te r comb inati o n

Process / product scene

Image acquisition Image set

Data partitioning

Training images Test images

Model training

Predicted class information Trained classifier

Training images for CV Validation images

Dimensionality reduction Dimensionality reduction Model application Known class information Dimensionality reduction parameters Key External input Step input / output Final output Algorithm step Training texture feature set Validation texture feature set Pre-processing

Chapter 4 โ€“ Materials and methods 76

Figure 4-9 adds more detail to the diagram explaining the development of a visual-based inferential sensor, as shown in the beginning of this chapter (figure 4-1, p. 64). It includes the data partitioning step (as explained in section 4.3.1) and the procedure for calculating error rates โ„ฐ๐‘ง,๐‘–,๐‘— for cross- validation. The subscript ๐‘ง refers to the fold, ๐‘– indicates that the ๐‘–th feature extraction hyperparame-

ter set was used and ๐‘— indicates that the ๐‘—th classification hyperparameter set was used.

An error rate for a specific combination of ๐‘ง, ๐‘– and ๐‘— is simply the fraction of incorrectly predicted labels of the validation data, and is calculated by comparing the predicted labels (class information) of the images with the known labels:

If there are ๐‘๐‘ฃ๐‘Ž๐‘™ validation images, ๐’ž๐‘ and ๐’ž๐‘˜ are vectors of length ๐‘๐‘ฃ๐‘Ž๐‘™ containing the predicted and known class labels, respectively. The โ€œ==โ€ is a logical equal operator, so that ๐ด == ๐ต returns 1

Figure 4-10: The error rates ๐“” ,๐’Š,๐’‹ are used to determine the optimal hyperparameter

combination {๐“— , ๐“— }. Here ๐’‰๐’‡ is the number of feature extraction hyperparameters settings and ๐’‰๐’„ is the number of classification hyperparameters

โ„ฐ๐‘ง,๐‘–,๐‘— = 1 โˆ’โˆ‘ (๐’ž๐‘,๐‘›== ๐’ž๐‘˜,๐‘›) ๐‘ ๐‘›=1 ๐‘๐‘ฃ๐‘Ž๐‘™ (4-1) Average Average Average Minimum โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ

Chapter 4 โ€“ Materials and methods 77

when ๐ด = ๐ต and 0 when ๐ด โ‰  ๐ต. The operator is applied in a pairwise fashion to each entry ๐’ž๐‘,๐‘› and ๐’ž๐‘˜,๐‘› of ๐’ž๐‘ and ๐’ž๐‘˜.

The details regarding the dimensionality reduction and modelling steps in this diagram will be explained in sections 4.5 and 4.6, respectively.

Figure 4-10 shows how the error rates โ„ฐ๐‘ง,๐‘–,๐‘— are used to determine the optimal hyperparameter combination. Let the set of possible feature extraction hyperparameters be ๐“— = {โ„‹๐น,1, โ„‹๐น,2, โ€ฆ , โ„‹๐น,โ„Ž๐‘“} and let the set of possible classification hyperparameters be ๐“— = {โ„‹๐ถ,1, โ„‹๐ถ,2, โ€ฆ , โ„‹๐ถ,โ„Ž๐‘}, where โ„Ž๐‘“ is the number of feature extraction hyperparameter settings and โ„Ž๐‘ is the number of classification hyperparameter settings. The error rate โ„ฐ๐‘–,๐‘— for each hyperparameter combination {โ„‹๐น,๐‘–, โ„‹๐ถ,๐‘—} is calculated as the average error rate across all folds ๐‘ง = 1,2, โ€ฆ , ๐‘ for that hyperparameter combination:

The minimum โ„ฐ๐‘–,๐‘— is then determined, and the hyperparameter combination that led to this minimum error rate is the optimal combination {โ„‹๐น, โ„‹๐ถ}. These optimal hyperparameters will be used during the final training and test phase.