FH model evaluation - Logistic regression model

4 ML model development

4.2 ML model building

4.2.2 Logistic regression model

4.2.2.1 FH model evaluation

The confusion matrix for the FH wear prognostic model is provided in Table 6. From Table 6, the various accuracy metrics discussed in Section 3.2.5.2 were calculated and are provided in Table 7. An ROC curve, which is illustrated in Figure 33, was produced for the model performance on the test set using the R function ROCR. The associated AUC value was 0.831.

Table 6: Confusion matrix for logistic regression model of FH wear prognostics

Predicted Class = 1 Predicted Class = 0

Actual Class = 1 3’633 6’405

Table 7 :Confusion matrix metrics for logistic regression model of FH wear prognostics Metric Value Sensitivity 0.362 Specificity 0.993 Accuracy 0.917 F1 0.954

Figure 33: ROC curve for logistic regression model of FH wear prognostics

4.2.2.2 TD Model evaluation

Similarly to FH, the confusion matrix for the TD wear prognostic model is provided in Table 8 and the metrics discussed in Section 3.2.5.2 are provided in Table 9. An ROC curve, which is illustrated in Figure 34, was produced for the model performance on the test set using the R function ROCR. The associated AUC value was 0.975.

Table 8: Confusion matrix for logistic regression model of TD wear prognostics

Predicted Class = 1 Predicted Class = 0

Actual Class = 1 37’944 3’454

Table 9: Confusion matrix metrics for logistic regression model of TD wear prognostics Metric Value Sensitivity 0.917 Specificity 0.998 Accuracy 0.957 F1 0.959

Figure 34: ROC curve for logistic regression model of TD wear prognostics

4.2.2.3 HW model evaluation

The confusion matrix for the HW wear prognostic model is provided in Table 10 and the metrics discussed in Section 3.2.5.2 are provided in Table 11. An ROC curve, which is illustrated in Figure 35, was produced for the model performance on the test set. The associated AUC value was 0.757.

Table 10: Confusion matrix for logistic regression model of HW wear prognostics

Predicted Class = 1 Predicted Class = 0

Actual Class = 1 619 6’481

Actual Class = 0 218 75’584

Table 11: Confusion matrix metrics for logistic regression model of HW wear prognostics

Metric Value

Sensitivity 0.087

Specificity 0.997

Accuracy 0.919

F1 0.958

Figure 35: ROC curve for logistic regression model of HW wear prognostics

4.2.2.4 FS model evaluation

The confusion matrix for the FS wear prognostic model is provided in Table 12 and the metrics discussed in Section 3.2.5.2 are provided in Table 13. An ROC curve, which is illustrated in Figure 36, was produced for the model performance on the test set. The associated AUC value was 0.742.

Table 12: Confusion matrix for logistic regression model of FS wear prognostics

Predicted Class = 1 Predicted Class = 0

Actual Class = 1 10’821 16’409

Actual Class = 0 6’805 48’867

Table 13: Confusion matrix metrics for logistic regression model of FS wear prognostics

Metric Value

Sensitivity 0.397

Specificity 0.878

Accuracy 0.720

F1 0.482

Figure 36: ROC curve for logistic regression model of FS wear prognostics

4.2.2.5 FT model evaluation

The confusion matrix for the FT wear prognostic model is provided in Table 14 and the metrics discussed in Section 3.2.5.2 are provided in Table 15. An ROC curve, which is illustrated in Figure 37, was produced for the model performance on the test set. The associated AUC value was 0.756

Table 14: Confusion matrix for logistic regression model of FT wear prognostics

Predicted Class = 1 Predicted Class = 0

Actual Class = 1 1’150 6’256

Actual Class = 0 73 75’423

Table 15: Confusion matrix metrics for logistic regression model of FT wear prognostics

Metric Value

Sensitivity 0.155

Specificity 0.999

Accuracy 0.924

F1 0.985

Figure 37: ROC curve for logistic regression model of FT wear prognostics

4.2.2.6 Logistic regression summary

The logistic regression model performed well when it came to providing FH prognostics. The model achieved an accuracy of over 90% and an AUC of greater than 0.8. The ROC curve was indicative of a healthy model. It had a smooth curve which tended toward the point (0,1), before bending away toward the point (1,1). This indicates that the model was capable of separating the target variable classes with high accuracy.

The model performed exceptionaly well in terms of TD prognostics. It achieved an accuracy of over 95% and an AUC of over 0.95. The ROC curve was indicative of an extremely performant model that is capable of accuaretly separating the target variable classes. The ROC curve tended sharply toward the (0,1) point before cornering toward the (1,1) point. This jagged shape of the curve raises concerns over the behaviour of the model and could indicate that the output target variable classes were either very easily seperable, or the data exhibited strange behaviour.

The model struggled to perform well when it came to HW prognostics. Although it achieved an accuracy of over 90%, it had an exceptionally low sensitivity rate of less than 10%. This indicates that the model struggled to predict the positive cases of the target variable. The model’s incapabili_{ty to separate the target variable classes is reflected in its ROC curve, which is} relatively straight and flat. Furthermore, the model achieved a relatively low AUC of 0.753. The poor performance might be due to the relatively biased nature of the logistic regression model. The model also struggled to perform well in terms of FS prognostics. The model achieved a relatively low accuracy rate of 72% as well as a relatively low AUC of 0.742. The ROC curve was also relatively straight and flat, indicating that the model had trouble separating the target variable classes. Again, the poor performance might be due to the relatively biased nature of logistic regression.

Finally, the logistic regression model performed moderately well on the FT measurements. The model achieved an accuracy rate of 92%, however, its AUC was only 0.756. This indicates that the high accuracy was strongly attributed to target variable calss inbalance, and that the model still struggled to identifify cases with a positive target variable class.

4.2.3 ANN model

The ANN models were built with the help of the ANN2 R package. This package makes the neuralnet function available, which accepts various hyperparameters as well as the set of input features and the target variable as parameters. The following section describes the process by which the five ANN models were developed, as per the Model Building phase depicted in Figure 20.

In document Implementation of machine learning techniques for railway wheel prognostics (Page 84-90)