Comparison using second derivative spectra

Chapter 5 Drug identification using simple chemometrics

5.3.5 Comparison using second derivative spectra

Only second derivative spectra were used (as discussed in Part 1). The ratio of an inflection “peak” to a “true” peak is approximately 1:2. The resultant second derivative plot (Figure 51) has, however, lost the majority of physical information that would have been present in the origin spectral plot.

0.08 0.06 -- 0.04 -- 0.02 - - -0.02 - -0.04 -- -0.06 - Caffeine Omeprazo le Papaverine -0.08 - -0.1 (N 00 CO O ) ( O (M CM CO r l - O (M CO CO CM 0> If ) CD CD O CD CM fe s §

s

O) CM O O ) CO If ) CMCM O O T - CM CM CM CM CM

s s

CNJ CO ^ CM CM <M

I

CO Wavelength (nm)

Figure 51 Comparison of second derivative spectra of papaverine, omeprazole and caffeine drug substances in the range 1116 nm to 2484 nm

The searches against the database of second derivative spectra were made initially using the full spectral range (1100 nm - 2500 nm). This range was then altered to determine whether including less information would be of advantage or not. A selection of the search results are shown in Table 30

The highest match values ranged from 1.000 to 0.988, throughout all the searches carried out. A further noteworthy point is the difference between the highest and next highest correlations in each search. The best second placed match reported was 0.614 for progesterone (also displayed above). For each search there was now a clear distinction between the highest and next highest matches.

Pheniramine and paracetamol were not included in the spectral databases. Both substances are the best matches in Table 30 but the spectral match values are low (0.565 and 0.508 respectively). This is below the critical threshold value of 0.850 and would not be accepted as a high enough

correlation upon which to base a match. No other searches reported any mis- identifications at this wavelength range for spectral matching by correlation or distance.

Due to a number of overtones being present between 1100 and 2500 nm carrying duplicate information (section 1.6.1), it was decided to run the

identification matches across a single overtone. The range selected was 2000 - 2400 nm which is towards the mid infrared region. Theoretically towards the higher energy area of the NIR region there is a greater possibility of

differentiating between different substances by peak comparison. Hence mid-IR visual peak identification.

Table 30 identification of pure drugs by correlation spectral matching from the database of second derivative plots of spectra.

Spectroscopic range examined 1100 nm - 2500 nm (Top three matches displayed) True identity Correlation spectral

match value Best match Desipramine 1.000 Desipramine 0.591 Imipramine 0.582 Propranolol Aiiopurino! 1.000 Allopurinol 0.557 Phenytoin 0.410 Glutethimide Testosterone 1.000 Testosterone 0.614 Progesterone 0.323 Naproxen Papaverine 1.000 Papaverine 0.541 Omeprazole 0.397 Caffeine Cefuroxime 0.999 Cefuroxime 0.238 Amylobarbitone 0.193 Phenobarbitone Benzyl penicillin 1.000 Benzyl penicillin

0.387 Ampicillin 0.283 Phenobarbitone Ampicillin 0.988 Ampicillin 0.522 Glutethimide 0.441 Phenytoin Thiopentone 0.993 Thiopentone 0.546 Progesterone 0.535 Amylobarbitone Pheniramine - drug not 0.565 Ampicillin included in database

0.557 Benzyl penicillin 0.509 Orciprenaline Paracetamol - drug not 0.508 Phenytoin included in database

0.496 Allopurinol 0.484 Glutethimide

Also by eliminating variation liable to decrease the correlation spectral match from other areas of the spectrum, the match value should be optimised for the correct compound. The counter argument is that a fuller (wider) spectral range would contain more spectral information that would :

• increase the match value to the correct compound as the match would be tighter

• decrease the matches of incorrect compounds as there would larger areas of variation

• eliminate/ ignore data (i.e. it is not justified)

It is likely that a combination of both cases exist. However, it has already been seen that some data analysis modes are indeed wavelength specific e.g. multivariate discriminant analysis by Mahalanobis distances^®. Arguments for and against have been raised between this and techniques employing the full wavelength range such as (SIMCA)"^®.

The representative results displayed in Table 31 however, appear to indicate that the maximum or matched compounds are unchanged and that the match value for the next highest match decreases slightly, increasing the gap between best match and next best. The proposed value for paracetamol (not included in the database) in this case has also decreased. However it is noticeable that there were now no further selections suggested greater than 0.314 (Table 31). The deduction from this is clear, either the drug is not contained within the database or it has not been identified by this technique. The results from Table 30 and Table 31 and the database validations would suggest that the

substances are not present in the database.

Visual inspection of the spectra from Figure 51 suggests that there may be specific active peaks that are substance specific between 2200 and 2400 nm. Test substances were therefore compared between 2200 and 2400 nm with the expectation of enhanced separation between the substances and therefore higher and more selective match values.

Table 31 Identification of pure drugs by correlation spectral matching from the database of second derivative plots of spectra. Spectroscopic range examined 2000 nm - 2400 nm (Top three matches displayed)

True identity Correlation spectral match value Best match Papaverine 1.000 Papaverine 0.538 Omeprazole 0.401 Caffeine Desipramine 1.000 Desipramine 0.590 Imipramine 0.581 Propranolol Cimetidine 1.000 Cimetidine 0.512 Procaine 0.474 Ephedrine Aspirin 1.000 Aspirin 0.699 Orciprenaline 0.593 Isoprenaline Allopurinol 1.000 Allopurinol 0.532 Chlorpromazine 0.459 Isoniazid Quinidine 1.000 Quinidine 0.660 Quinine 0.316 Phenylephrine Testosterone 1.000 Testosterone 0.606 Progesterone 0.369 Naproxen

Pheniramine - Drug not 0.361 Benzyl penicillin included in database

N/A No other selections N/A No other selections Paracetamol - Drug not 0.314 Chlorpropamide included in database

N/A No other selections N/A No other selections Note ‘N/A’ denotes that there were no match values obtained greater than or equal to 0.1 The results in Table 32 actually show very little in terms of the highest match value as this appears as the maximum of 1.000 anyway. In actual terms, there may have been slight change in this value but it is not visible as the match value is rounded up to 1.000. However, the other second best, third best compounds have now taken on higher correlations. This may indicate that 200 nm is insufficient a region and that 100 is an insufficient number of data points for selective identification compared to either 1100- 2500 nm or 2000 - 2400 nm.

Table 32 Identification of pure drugs by correlation spectral matching from the database of second derivative plots of spectra. Spectroscopic range examined 2200 nm - 2400 nm (Top three matches displayed)

True identity Correlation spectral match value Best match Testosterone 1.000 Testosterone 0.743 Progesterone 0.732 Glutethimide Desipramine 1.000 Desipramine 0.706 Aspirin 0.613 Isoprenaline Cimetidine 1.000 Cimetidine 0.706 Procaine 0.575 Ephedrine

Acetylsalicylic acid 1.000 Aspirin

0.706 Desipramine 0.663 Orciprenaline Quinidine 1.000 Quinidine 0.618 Quinine 0.512 Nicotinamide Allopurinol 1.000 Allopurinol 0.581 Chlorpromazine 0.538 Isoniazid Pheniramine - Drug not

included in database

N/A No suggestion Paracetamol - Drug not

included in database

N/A No suggestion Note ‘N/A’ denotes that there were no match values obtained greater than or equal to 0.1

Generally, the results for the wavelength distance match mirrored those by Correlation. Over the greater wavelength range namely 1100 - 2500 nm all drug actives were now identified correctly with the highest standard deviation match no greater than 2.0 (Table 33), but clearly discernible from the next best

matched compound with a wavelength distance Match of 197.3. The method itself is highly discriminating as it is based upon the calculation of maximal standard deviations between the test sample spectra and the referenced database spectra at every other wavelength.

Table 33 Identification of pure drugs by wavelength distance from the database of second derivative spectra (Top three matches displayed). Spectral range 1100 - 2500 nm

True identity Wavelength distance match value Best match Desipramine 1.2 Desipramine 231.2 Procaine 274.9 Carbamazepine Cimetidine 1.4 Cimetidine 495.2 Procaine 587.5 Phenylephrine Allopurinol 1.9 Allopurinol 271.0 Procaine 328.5 Oxprenolol Papaverine 1.2 Papaverine 323.5 Ephedrine 354.5 Procaine Aspirin 1.3 Aspirin 197.3 Procaine 409.1 Ephedrine Ampicillin 1.5 Ampicillin 282.8 Procaine 381.0 Carbamazepine Imipramine 1.2 Imipramine 346.9 Procaine 637.1 Ephedrine Pheniramine - Drug not

included in database

N/A No suggestion Paracetamol - Drug not

included in database

N/A No suggestion

Note ‘N/A’ denotes that there were no match values obtained greater than or equal to 650 sd The difference between the two matching techniques is that the distance method, being based upon point distance differences at each wavelength considered is more discriminating than the correlation spectral match, which is based upon a simple correlation (see section 1.10.2). The correlation spectral match is less sensitive to changes in peak intensity if the magnitude of

absorption changes uniformly across the spectral range, so this would be thought to lead to mis-identifications of extremely closely related compounds -

however, this was not found to be the case in sections 5.3.1 and 5.3.2.

However, both spectral matching and comparison by distances were found to be equally effective when examined earlier in these contexts.

5.4 Conclusions

It is possible to identify drug substances use of NIR. It is not possible to provide a ‘one method fits all' to achieve this.

Criteria for identification by NIR should have the following objectives: • Reduce the number of candidates to a small number

• Identify the candidates by chemometrics • Identify by comparison to reference spectra In order to this I recommend the following:

Firstly, use second derivative spectra. Secondly, standardise the parameters to be used. Finally, base the approach upon a hierarchy of methods.

The identification methods would be based upon second derivative spectra. These were found to be the best order of spectra to provide reliable

identification in this experiment. This is consistent with other identification techniques such as ultra violet spectrophotometry and high performance liquid chromatography and the pioneers of modern derivative spectroscopy such as 0 ’Haver®' “ and

The method order to be used would be :

• Correlation matching - this proved to be the most effective method of identification, is probability based and accounts for variation in peak intensity from different samples of the same substance. It is the method suggested by the CPMP.

• Wavelength distance matching - more effective with substances that are closely related in structure because it is very sensitive to small changes in peak wavelength. The method is most effective when dealing with a small number of candidates.

• Wavelength selection - examine the spectra for wavelength ranges displaying differences peak position and re-execute both the correlation and wavelength distance methods over this reduced wavelength range • Visually examine the spectra - finally visual comparison of the second

derivative spectra of the remaining candidates with the test sample spectrum would now be possible due to the expected (low) number of candidates.

In an industrial scenario, it would be expected that the first and second methods (correlation and wavelength distance) could be run by relatively unskilled workers, such as in a warehouse, following standard operating procedures. The final two methods (wavelength selection and visual comparison) would need skilled workers and would be carried out in a laboratory, such as a quality control laboratory.

I would recommend criteria for matches or critical match values based upon experience of:

• Correlation - 0.95 indicates a pass, with values between 0.85 and 0.95 as a potential pass requiring corroboration by one of the remaining three methods (above)

• Wavelength distance - 5.0 standard deviations indicates a pass, with values between 5.0 and 10.0 standard deviations requiring corroboration by one of the remaining two methods (wavelength range selection or visual examination)

The proviso I would make for these values is that they are based upon the experience of a limited number of batches per sample and from a database of 301 substances. Many more batches per sample and substances overall would need to be investigated to provide a robust figure for the critical match value.

Chapter 6 Identification of drug substances by peak

In document Criteria for drug identification by thin layer chromatography and near infrared reflectance spectroscopy (Page 180-190)