50 Time increment

4, Pre-processing of chromatographic data for Principal Component Analysis

100 150 200 250 300 0.4 pc1 loading pc2 loading 0.2

f

0.0 -0.4 15.0 0.0 25 5.0 7.5 10.0 125 Time (rrin)

Figure 5.3. The loadings profiles for the first two principal components o f the complete data set (with retention time adjustment). PCI represents the majority o f the variance (66.53%) within the data set and is visually similar to a chromatogram. PC2 plays a more minor role, only accounting for 6.55% of the variance. 1.00 mg 2.00 mg . , u ) 2.25 mg , /■ ; 0.75 mg ■ \ ; 1,50 mg v.v ' 0.50 and 0.5625 mg * 1 * ■ ; • 1.125 mg 1 I - -1 - 1 ■ 1 1 0.0 -0.2 - -0.6 -1.0 - 2 - 1 1

PCI scores

Figure 5.4. Scores plot for the first two PCs in the data set with RT adjustment. This plot shows a definite trend o f the chromatograms with respect to the mass o f material loaded onto the column, the larger loads having high scores in PCI and the smaller loads scoring low in PCI. PC2 serves to isolate the chromatograms with respect to the other variables and to how the mass o f material was loaded in terms o f combination o f concentration and volume. The mean scores values for each setting are shown.

A Practical Investigation into the use o f Principal C om ponent A nalysis for the M odelling and Scale-up o f High Perform ance

Liquid Chrom atography C hapter 5

SA.2. Selection o f subsets o f chromatograms - the fram ework fo r scale-up predictions

For PCA to be a useful tool it must be able to predict the trends in the principal components from as small a group o f chromatograms as possible. This is particularly important for scale-up where any large-scale tests which are included in the analysis will be expensive in material and resource. To discover if this was possible, subsets o f chromatograms were selected and analysed separately. The extracted principal components were then used to predict the features known to be present in the full data set. Subset of C h ro m ato g ram s centre cc cvTP c P cVtp C CC CVTP

Table 5.4. The critical subset o f chromatograms from which scale-up predictions were made. 9 sets resulted from the adjusted data set (27 chromatograms) representing one for each sample load setting and a set o f centre chromatograms.

In the first instance chromatograms were chosen which represented the different sample loads (Table 5.4) and which appeared in different “mass-loaded” clusters in the original analysis (Figure 5.4). (Table 5.3 shows the range o f sample loads both in terms o f mass and volume which were applied to each column). The selection o f chromatograms was also made so as to provide a good spread o f retention times, thus ensuring that the timing o f the eluted peaks could be predicted as well as the chromatographic separation itself. Together with the centre chromatogram a group o f nine resulted, each being performed in triplicate (27 chromatograms from the original

A Practical Investigation into the use o f Principal C om ponent A nalysis for the M odelling and Scale-up o f High Performance

Liquid C hrom atography Chapter 5

5.4.3. PCA of subsets of chromatograms (adjusted for retention time effects).

The maximum possible variance from the PCA o f the selected subset was seen to be around 85% for the same reason as was explained in section 5.4.1. The first three principal components extracted from the selected subset o f chromatograms accounted for 70.73%, 9.88%, and 2.57% o f the variation (83.18% total). The scores values for the remaining 81 chromatograms which had not been included were calculated from the loadings o f the subset o f chromatograms using an iterative technique in Microsoft Excel. This calculates a score for the principal component which when combined w ith the loading, minimises the difference from the actual chromatogram.

Comparing these scores with the original values derived from the model using all 108 chromatograms reveals striking similarities. This is clearly evident in Figure 5.5 which shows a plot o f the PCI scores derived from an analysis o f all 108 chromatograms (y axis) against the scores derived from the analysis o f the 27 chromatograms comprising its subset (x axis).

> ^ = 1 . 0 4 0 X - 0.025

coeff. regression r = 0.995

2 -

0 1 2

PCI scores predcted from 27 chrontoyams

Figure 5.5. A comparison between the PCI scores generated using just 9 groups from 36 (27 out of 108 chromatograms) (x-axis) and between the actual PCI scores generated using all 108 chromatograms. The scores show excellent agreement around line y = x, indicating that the chromatographic separation o f all 108 chromatograms are adequately described by a much smaller subset.

A Practical Investigation into the use o f Principal C om ponent Analysis for the M odelling and Scale-up o f High Performance

Liquid Chrom atography Chapter 5

The correlation is an excellent fit to the line y = x (the actual best fit line has equation y = 1.040x - 0.025, coefficient o f regression r = 0.995), which indicates that all o f the 108 chromatograms in the full factorial data set can adequately be described using only a fraction o f the original data. Similar results were obtained for PC2 except that there is more scatter about the line y = x (actual line y = 1.003x - 0.040, r = 0.989).

It is concluded that accurate principal component models can be obtained using only a fraction o f the original number o f runs. It is naturally desirable to use as few runs as possible at a large scale o f operation as the basis for predicting the shapes o f chromatograms. However if nine groups o f chromatograms (27 in total) are representative o f the original 36 groups (108 in total) what further reduction is possible without detracting from the analysis ? To answer this smaller subsets comprising seven, five and just three groups (9 chromatograms) was selected (Table 5.5) for separate analysis.

3 cluster model (9 5 cluster model (15 7 cluster model (21

chromatograms) chromatograms) chromatograms)

cvTP cvTP cvTP cent c c CVTP cent cent C cVtp CVTP C CC CVTP Table 5.5. Table showing the clusters selected to comprise the three, five and seven cluster subsets.

The PCA o f the smallest set generated PCI correlations similar to those described in Figure 5.5. The loadings for PCI derived from all 108 chromatograms and from the smaller subsets are very similar (Figure 5.6). This is most beneficial in minimising the number o f large-scale experiments which the analysis requires. The major discrepancies appear on either side o f the main peak. The fine structure becomes more clearly defined when larger numbers o f chromatograms are used to derive the loading.

A Practical Investigation into the use o f Principal C om ponent A nalysis for the M odelling and Scale-up o f High Perform ance

Liquid C hrom atography C hapter 5

0.4 0.3 pc1 Icadng (9) pc1 lOEdhg (27) — pc1 loading (108)

I

0.2 0.0 0 2 4 6 8 1 0 1 2 14 Time (rrin)

Figure 5.6. The PCI loadings profiles for the 25 cm column obtained using all 36, 9 and 3 groups (108, 27 and 9 chromatograms).

As before the scores for the remaining 99 chromatograms in the original set o f 108 were calculated from the loadings derived from the subset o f 9. The same iterative technique in Excel was used. The PCI scores so derived correlate well with the scores derived from the analysis o f the full set o f 108 indicated by the y = x plot in Figure 5.7 (actual line y = 1.031x - 0.060, r = 0.999).

i

I

S « o 3 2 >j^*niney=x 1.031 X - 0.060 coelf. regression r = 0.999 ■1 1 0 1 2 3

PCI scores predicted from 9 chromtograms

Figure 5.7. A comparison between the PCI scores generated using just 3 groups from 36 (9 out o f 108 chromatograms) (x-axis) and between the actual PCI scores generated using all 108 chromatograms.

A Practical Investigation into the use o f Principal C om ponent A nalysis for the M odelling and Scale-up o f High Performance

Liquid C hrom atography Chapter 5

5.4.4. Analysis o f chromatograms derived fro m colum ns o f different lengths 5.4.4.1. Retention times

The main focus o f this study was the prediction o f chromatograms expected from columns at a 25 cm scale from those obtained at 5 cm. A 15 cm column was also used to provide an intermediate scale.

The chromatograms from the columns o f different lengths contain valuable information regarding the retention times o f the separated species obtained under various operating conditions. This data is most useful in establishing the correlations between the retention times at the different scales o f operation, and it is easily obtained before the data is pre-processed (section 5.3.3). These relationships for the main erythromycin A peak in the unadjusted chromatograms at the 5 cm scale and for the peaks at both the 15 and 25 cm scales are shown in Figure 5.8. From this plot it is possible to predict the expected retention time at the larger scales from the 5 cm data. The times were correlated using least squares linear approximations.

5.0-

In document A practical investigation into the use of principal component analysis for the modelling and scale-up of high performance liquid chromatography (Page 126-131)