Model Training and Phase Division - Case Study 1

CHAPTER 4 Case Study 1 – Three Tank Simulation

4.4 Model Training and Phase Division

MPLS models were trained on the NOC data in order to relate the five measured variables to the quality variable. The effect of building separate models for different phases was also investigated. In order to choose the number of principal components (PCs) to retain for the global and operational phase model, a cross validation approach was used on the NOC training data. A compromise was made between the cumulative variance explained in Y and the overall mean squared error (MSE) between the actual quality variable and the predicted quality. Information about the model sets for each type of partitioning is given in table 4.5.

Table 4.5: Data Partitioning and Cumulative Variance Explained for Different Model Sets – Case Study 1 Phase Division Phase Reference Intervals No. PCs Retained Cumulative Variance Explained in Y Cumulative Variance Explained in X Global 1 1 – 435 2 98.7 % 27.5 % Operational Phases 1 1 – 36 4 78.2 % 52.2 % 2 37– 68 3 86.5 % 50.9 % 3 69 – 140 1 88.8 % 50.5 % 4 141 – 435 1 94.7 % 12.2 % MP algorithm 1 1 – 67 1 38.1 % 24.5 % 2 68 – 435 2 99.2 % 25.8 %

The MPPLS algorithm was initialised with aini = 1 PC, and was able to find a split that would

minimise the Mean Squared Error (MSE) across the entire batch duration. Two correlation phases, split at the 67th_{interval. This coincides almost exactly with the start of the 3}rd_{phase. At this point,}

both pumps flow rates have started to decrease and the tank level control is implemented as the levels approach their set points. The tank levels are not only influenced by the pump flow rates, but also by the valve positions between the tanks (which are controlled). This changes the correlation structure of the variables, thus it is an adequate phase division.

The amount of product variance explained in the first correlation phase is fairly low (38.1%), but analysis of the MSEs calculated through cross validation (labelled “test” in the figure 4.8) shows that the lowest value is found when one PC was included in the model. Therefore, the algorithm would not have added more PCs to the model. However, more than 80% of the variance could have been explained if 3 or more PCs were retained (figure 4.9). Choosing a model with 3 or more PCs would probably be more appropriate for this phase, as the slight MSE increase could be justified by the extra variance explained. Starting the algorithm with aini = 3 PCs returned a global

model with no extra PCs added. This is a caveat of the MPPLS algorithm, as sometimes a compromise between variance explained and model accuracy must be made. Incorporating the amount of variance explained into the algorithm may be a way to improve the model performance, although practically this is a difficult task as hard limits or thresholds are difficult to define for such a case.

Figure 4.8: MP Algorithm MSEs – First Correlation Phase

Figure 4.9: Cumulative Variance Explained for First Correlation Phase of MP Algorithm – Case Study 1

The trend of the MSE in the second correlation phase (figure 4.10) showed a decline in the MSE as more components were retained. The improvement of retaining more than 2 PCs was not enough to overcome the specified improvement threshold (T = 0.01, table 3.2), which was appropriate considering the model was able to account for almost all of the variance in the product

variable. The better fit of the model in this phase provided a compromise for the poor fit to the model in the first phase, considering this phase was much longer than the first and predictions at all time intervals were weighted evenly. Weighting the intervals would allow the algorithm to pay more attention to critical-to-quality parts of the process. This may need to be done through process analysis, as was done by Lu and Gao (2005) when defining critical-to-quality operational stages in their stage-based PLS modelling approach.

Figure 4.10: MP Algorithm MSEs – Second Correlation Phase

A comparison between the MSEs of the training data for the three approaches with respect to time is presented in figure 4.11 along with the operational phase boundaries. The models generally showed the highest MSE outcomes at the beginning of each phase as most of the data comprised of zeros imputed as missing data.

Figure 4.11: MSE Outcomes with respect to Time – Case Study 1

The MSE outcomes of the global and MPPLS models are similar in the second correlation phase identified by the MPPLS algorithm. However, the MPPLS algorithm was able to partition the data and reduce the MSE in the first correlation phase. An ANOVA on the MSEs produced showed no statistically significant differences (p-value < 0.05) among the approaches. Improvements mentioned previously may be needed to enhance accuracy of the predictions.

The following section details the application of these models to the test cases for on-line monitoring and end-of-batch prediction. Although the amount of variance explained in the first correlation phase could be improved by retaining more PCs, this would nullify the automatic nature of the algorithm. In practice, post analysis could be done on the MSEs to improve prediction power of the model(s). Changing the initial parameters of the MPPLS algorithm returned a global model, which was not necessary to include as a global model had already been trained.

4.5 Model Application for On-line Fault Detection and End-of-Batch

In document On-line fault detection and end-of-batch quality prediction for batch processes incorporating on-line synchronisation and phase identification (Page 87-92)