Other multivariate statistical techniques.

In addition to PCA, there are other multivariate statistical techniques. These are very similar to PCA and include Multiway PCA (MPCA), Partial least squares (PLS) and Principal component regression (PCR). The following subsections give a very brief explanation o f these techniques.

3.10.1. M ultiway Principal component analysis (MPCA)

Multiway PCA (Nomikos and Macgregor, 1995 and W ise and Gallagher, 1996) is an extension o f PCA inasmuch as the original data set X is not 2-dimensional but multi dimensional, commonly 3-D. This concept is illustrated in Figure 3.10. MPCA is most often used with batch processes where measurements describing the current state o f each batch is monitored at regular time increments.

Samples or batches

time

I

variables or measurements

Figure 3.10. A diagram o f a 3-dimensional data matrix X with dimensions (IxJxK). There are I samples, K measured variables and K time increments.

A Practical Investigation into the use o f Principal Com ponent Analysis for the M odelling and Scale-up o f High Performance

Liquid Chrom atography Chapter 3

The PCA algorithms work in exactly the same way as for PCA. First of all though, the block matrix needs to be unfolded so that it is effectively a 2 -dimensional

structure. This is illustrated in Figure 3.11.

Wise and Gallagher (1996) applied MPCA to the monitoring o f a nuclear waste storage tank. The waste constituents were being converted into hydrogen, ammonia and nitrogen dioxide gases. These gases were periodically released using a pump to agitate the waste slurry. The concentrations o f each gas were measured using various analytical techniques as a function of time. The procedure was done 57 times (1=57) with 280 time increments each (K=280). The variables were the concentrations of the 3 gases (J=3). The original matrix X with dimensions (57x3x280) was broken down into matrix X with dimensions (57x840) before applying PCA in the usual way.

unfold to form X

mean centre and scale variables

perform PCA

on X loadings

scores

re-fold p and E to form P an d E

Figure 3.11. The algorithm for performing MPCA (adapted from Wise and Gallagher, 1996). 3.10.2. Partial Least Squares (PLS)

Partial least squares uses the same techniques as in PCA to extract scores and loadings, but is applied to 2 data sets X and Y. X like in PCA is an (mxn) data set of process measurements. Y is an (mxq) matrix o f process variable settings. Similar data pre-treatments procedures (mean centring and adjustment to unit variance) to

A Practical Investigation into the use o f Principal C om ponent Analysis for the M odelling and Scale-up o f High Perform ance

Liquid C hrom atography Chapter 3

PCA are usually performed (sections 3.3-3.5). In a chromatographic context, X is the data matrix with m samples each having n UV absorbance values, and Y is a matrix of q process variable settings for each sample. These q variables may comprise settings such as temperature, flow rate, mass load and pH. PLS is usually used as a process control tool whereby the reason(s) for failure, i.e. which process variable(s), can rapidly be determined (and thus corrected) with a high degree o f accuracy. Difficulties often arise when attempting to predict many process variables from few process measurements. For this reason, n » k is required for models o f high accuracy.

The goal o f PLS is to make a future prediction o f a y; value (the process variable settings) from an x, value (a chromatographic i*un). PLS is described in several publications. Some o f the best descriptions can be found in Wold et al, 1984; Geladi and Kowalski, 1986; Kresta et al, 1993 and Wise and Gallagher, 1996. The NIPALS algorithm (section 3.6.1.) is used to develop PC models for X and Y. These are developed so that the covariance between the 2 data sets is maximised. As well as extraction o f scores and loadings, an additional set o f vectors are calculated known as weights, W. These weights are required to maintain orthogonal scores. A unique feature o f PLS is that it can be used to form models relating more than one predicted variable (in Y) to many predictor variables (in X) (Wise and Gallagher, 1996).

3.10.3. Principal component regression (PCR)

In PCR, instead o f regressing process variables (temperature, flow rate etc.) on to the measured variables (UV absorbance), the process variables are regressed on to the PC scores o f the measured variables (which are orthogonal and therefore well- conditioned) (Wise and Gallagher, 1996).

The purpose o f the regression model is to predict the properties o f interest for new samples. Thus, the number o f PCs should be determined which optimise the predictive ability o f the PCR model. A cross-validation technique is performed which splits the data into training and test sets. A prediction o f the residual error is on the test samples is determined as a function o f the number o f PCs in the regression model with the test data. The procedure is usually repeated several times, with each sample

A Practical Investigation into the use o f Principal C om ponent A nalysis for the M odelling and Scale-up o f High Perform ance

Liquid Chrom atography Chapter 3

in the original data being part o f the test set at least once. The minimum total prediction error over all test sets, expressed as a function o f the number o f PCs is then used to determine the number o f PCs.

3.11. Conclusion

This chapter has focused on the theory behind Principal Component Analysis and its application to chromatography data. The two most common methods o f extracting the PCs were discussed-the Decomposition o f the Variance-Covariance Matrix and the NIPALS algorithm which was the method used by the software employed throughout this study. The following chapter details the application o f PCA to an experimental Size Exclusion Chromatography system. The raw data used in Chapter 4 was from Chandwani's PhD thesis (1995) and an important pre-processing stage was required prior to analysis.

A Practical Investigation into the use o f Principal C om ponent Analysis for the M o d ellin g and Scale-up o f High Performance

Liquid C hrom atography Chapter 4

4, Pre-processing of chromatographic data for Principal Component

In document A practical investigation into the use of principal component analysis for the modelling and scale-up of high performance liquid chromatography (Page 86-90)