Cross-validation - Data partitioning, cross-validation and testing

Chapter 4 Materials and methods

4.3 Data partitioning, cross-validation and testing

4.3.2 Cross-validation

In 𝑍-fold cross-validation, the training data is split into 𝑍 approximately equally sized subsamples or “folds”, and the validation process is repeated 𝑍 times. During repetition 𝑧 (𝑧 = 1,2, … , 𝑍), the images in fold 𝑧 are used as validation data, with the images in the remaining 𝑍 − 1 folds being used as training data. The hyperparameter set with the lowest validation error, averaged across all folds, is selected as the best hyperparameter set for the algorithm and used during the final training and testing phase. The same data partitioning into folds were used for all hyperparameter sets tested. Pseudocode for the cross-validation procedure used in this work is given in figure 4-8. This information is also depicted visually in figures 4-9 and 4-10.

Class _TrainingNumber of images _{Test Total} 1 60 (12 per fold) 20 80 2 90 (18 per fold) 30 120 3 60 (12 per fold) 20 80 Totals: 210 70 280

Class _TrainingNumber of images _{Test Total} 1 30 (6 per fold) 10 40 2 116 (23 per fold) 38 154 3 80 (16 per fold) 26 106 Totals: 226 74 300

Chapter 4 – Materials and methods 74 given number of folds Z;

given all training images, partitioned into folds 1, 2, ..., Z; given feature extraction hyperparameter options H_F;

given classification hyperparameters options H_C;

given known class information (labels) of all training images; % do cross-validation

for each cross-validation run z = 1, 2, ..., Z

training images = all training images not in fold z; validation images = all training images in fold z;

for each feature extraction hyperparameter combination i in H_F for each classification hyperparameter combination j in H_C training features = features extracted from ...

training images using hyperparameter combination i; classifier = model trained using training features ... and hyperparameter combination j;

validation features = features extracted from validation ... images using hyperparameter combination i;

predicted labels = classes of validation images ... predicted by classifier;

error(z,i,j) = fraction incorrectly predicted labels ... of validation images;

end

end end

% calculate average errors to determine optimal hyperparameter set

for each i for each j

error(i,j) = average of error(z,i,j) across all z; end

end

best hyperparameter combination = i and j with lowest error(i,j);

Chapter 4 – Materials and methods 75

Figure 4-9: Detailed development of an inferential sensor showing the calculation of error rates 𝓔 ,𝒊,𝒋 for 𝒁-fold cross-

validation. Error rate ____ Classification hyperparameters ___ Feature extraction hyperparameters ___ Re peat for e a ch fo ld Re peat for e a ch hy pe rparame te r comb inati o n

Process / product scene

Image acquisition Image set

Data partitioning

Training images Test images

Model training

Predicted class information Trained classifier

Training images for CV Validation images

Dimensionality reduction Dimensionality reduction Model application Known class information Dimensionality reduction parameters Key External input Step input / output Final output Algorithm step Training texture feature set Validation texture feature set Pre-processing

Chapter 4 – Materials and methods 76

Figure 4-9 adds more detail to the diagram explaining the development of a visual-based inferential sensor, as shown in the beginning of this chapter (figure 4-1, p. 64). It includes the data partitioning step (as explained in section 4.3.1) and the procedure for calculating error rates ℰ𝑧,𝑖,𝑗 for cross- validation. The subscript 𝑧 refers to the fold, 𝑖 indicates that the 𝑖th_{feature extraction hyperparame-}

ter set was used and 𝑗 indicates that the 𝑗th_{classification hyperparameter set was used.}

An error rate for a specific combination of 𝑧, 𝑖 and 𝑗 is simply the fraction of incorrectly predicted labels of the validation data, and is calculated by comparing the predicted labels (class information) of the images with the known labels:

If there are 𝑁𝑣𝑎𝑙 validation images, 𝒞𝑝 and 𝒞𝑘 are vectors of length 𝑁𝑣𝑎𝑙 containing the predicted and known class labels, respectively. The “==” is a logical equal operator, so that 𝐴 == 𝐵 returns 1

Figure 4-10: The error rates 𝓔 ,𝒊,𝒋 are used to determine the optimal hyperparameter

combination {𝓗 , 𝓗 }. Here 𝒉𝒇 is the number of feature extraction hyperparameters settings and 𝒉𝒄 is the number of classification hyperparameters

ℰ_{𝑧,𝑖,𝑗} = 1 −∑ (𝒞𝑝,𝑛== 𝒞𝑘,𝑛) 𝑁 𝑛=1 𝑁_𝑣𝑎𝑙 (4-1) Average Average Average Minimum ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

Chapter 4 – Materials and methods 77

when 𝐴 = 𝐵 and 0 when 𝐴 ≠ 𝐵. The operator is applied in a pairwise fashion to each entry 𝒞𝑝,𝑛 and 𝒞_𝑘,𝑛_{of 𝒞}_𝑝_{and 𝒞}_𝑘_.

The details regarding the dimensionality reduction and modelling steps in this diagram will be explained in sections 4.5 and 4.6, respectively.

Figure 4-10 shows how the error rates ℰ𝑧,𝑖,𝑗 are used to determine the optimal hyperparameter combination. Let the set of possible feature extraction hyperparameters be 𝓗 = {ℋ_𝐹,1, ℋ_𝐹,2, … , ℋ_𝐹,ℎ𝑓}_{and let the set of possible classification hyperparameters be 𝓗} = {ℋ𝐶,1, ℋ𝐶,2, … , ℋ𝐶,ℎ𝑐}, where ℎ𝑓 is the number of feature extraction hyperparameter settings and ℎ𝑐 is the number of classification hyperparameter settings. The error rate ℰ𝑖,𝑗 for each hyperparameter combination {ℋ𝐹,𝑖, ℋ𝐶,𝑗} is calculated as the average error rate across all folds 𝑧 = 1,2, … , 𝑍 for that hyperparameter combination:

The minimum ℰ𝑖,𝑗 is then determined, and the hyperparameter combination that led to this minimum error rate is the optimal combination {ℋ𝐹, ℋ𝐶}. These optimal hyperparameters will be used during the final training and test phase.

In document Image texture analysis for inferential sensing in the process industries (Page 87-91)