Optimal data processing procedure for automatic bacterial identification by gas liquid chromatography of cellular fatty acids

(1)

0095-1137/88/091745-09$02.00/0

Optimal

Data

Processing

Procedure for

Automatic

Bacterial

Identification

by Gas-Liquid

Chromatography

of

Cellular

Fatty

Acids

ERKKI EEROLAl* AND OLLI-PEKKA LEHTONEN2

DepartmentofMedical Microbiology, Turku University,' and Central Laboratory, Clinical Microbiology,

Turku

University

Central

Hospital,2

SF-20520

Turku, Finland

Received 1March 1988/Accepted 17 May 1988

Gas-liquidchromatography of cellular fatty acidswasusedin automatic identification of clinical bacterial

isolates. The intraspecies variation in theoccurrenceof fatty acids and the variation inthe relative gas-liquid

chromatography peakareasof different fatty acidswereevaluated and compared withtherelativepeakareas

of these acids. A newchromatogram comparison method involving the use ofan exponential function was

developedtoadjusttodata variation optimally. This methodwascomparedwith several previously published

methods of correlationanalysis with data from representative clinical bacteriological isolates.Theefficacies of themethods in separating different bacterial species into distinct clusterswerecompared. Thenewexponential

functionmethodwassuperiortothe others both in its abilityto separatespecies into different clusters and in givinga greater degree of identityto strains withina proper cluster. The results indicatethat the gas-liquid

chromatography ofbacterial cellular fatty acidscanbe usedeffectively in the identification of clinically isolated bacteria.However, the usefulness of theanalysis dependsonthecomparison method used andonits abilityto

copewith data variations.

Oneof themaingoals of clinical bacteriologicalanalysis is

accurate identification of bacterial isolates. This serves two

purposes: taxonomic identification of species and compari-sons between new and previous isolates. In addition to

classicalidentification methods,based mainlyon

demonstra-tions of differences in the metabolic pathways of different species, methods have been developed which identify bac-teria according to chemical compoundspresent in the bac-terialcells (11). The main advantages of these methods are

increased speed of analysis, since there is no need to

cultivate the bacteria any further, and the possibility of

identifying bacteria which are no longer viable. Several groupsofcompoundsareused for identification. Among the

most often used are certain enzymes, DNA, membrane lipopolysaccharides, carbohydrates, proteins, and bacterial fattyacids(11).The usefulnessofthe variouscompoundsin this respectdepends bothonthetaxonomic effect of

differ-encesin theiroccurrenceandonthespeedandsimplicityof

the analytical methods. In this respect, fatty acid analysis

seems tobeoneofthe mostpromising methods (11, 12). Gas-liquid chromatography (GLC) of bacterial cellular fatty acids(5, 12)is aversatilemethod, allowing the identi-fication of thevastmajorityof isolates. However,

chromato-grams involve the problem ofcomparison of multivariate recordings.Tobeuseful inaclinicalmicrobiological labora-tory, data processing should be automatic. Although

com-puterizedtaxonomyhas been studiedextensively (3, 16), the

optimal way ofapplying a computerized comparative

anal-ysis of cellularfatty acid datato bacterialidentification has not yet been determined. This is especially true of the

identification of clinical isolates.

In the present study we collected isolates of clinically important strains and analyzed the occurrenceand relative

abundance of cellular fattyacids by GLC. We developed a new analytical GLC method which accounts for the

varia-* Correspondingauthor.

tions in the data and thus permits optimal distinction

be-tween species. The new exponential function method was

compared with several previously known chromatographic comparative methods.

MATERIALSANDMETHODS

Bacterial strains. The composition of bacterial fatty acids and itsreproducibilitywithinvariousspecieswereanalyzed

by using strains of 12 species. The species tested were Bacteroidesfragilis (66 strains),B. vulgatus (15 strains), B. thetaiotaomicron (17 strains), Staphylococcus aureus (17

strains), S. epidermidis (20 strains), S. hominis(13 strains), Clostridiumperfringens (19 strains),C. difficile (56 strains), Listeria monocytogenes (10 strains), Escherichia coli (13 strains), Yersinia enterocolitica serotype03(12 strains),and Pseudomonasaeruginosa (15 strains).

The strains were clinical isolates that were identified by standard clinical microbiological practices. These included growth and colony morphology on different media (mena-dione-cysteine-enriched brucella bloodagar with and

with-outneomycin,bacteroides bile esculinagar,cefoxitin cyclo-serine fructoseagar,andbloodagar);Gramstaining(insome casescombined with the KOHtest [6]);antibiotic suscepti-bilities;API20E, API20NE,API Staph,and API 20A tests

(Analytab Products, Plainview, N.Y.);andRapID-ANAtest

(Innovative Diagnostics Systems, Inc., Atlanta, Ga.). The

reverseCAMPtest(7)wasused to confirm the identification

of C.perfringens. Autofluorescencewasused to confirmthe

identification of C. difficile.

Thefollowingtype and reference strainswereincludedin

the analyzed bacteria: B. fragilis (ATCC 23745, ATCC

25285, and ATCC 25280), B. thetaiotaomicron (ATCC 29148), C. difficile (ATCC 17858, ATCC 17857, and ATCC

9689),S. aureus(ATCC 25923), S. epidermidis(ATCC 155, ATCC 35984, ATCC 31432, and ATCC 35983), S. hominis

(ATCC 35981 and ATCC 35982), P. aeruginosa (ATCC 9721), and E. coli (ATCC 25922).

1745

on April 11, 2020 by guest

http://jcm.asm.org/

(2)

1746 EEROLA AND LEHTONEN

GLC analysis. Bacterial cellular fatty acid GLC analysis

wasperformed asdescribed previously (L. Miller, Hewlett-Packardgas chromatography applicationnote 228-41; Hew-lett-Packard Co., Palo Alto, Calif.). For GLC, aerobic bacteriawereculturedon Trypticase soy agar(BBL Micro-biology Systems, Cockeysville, Md.)andobligateanaerobes were cultured on Schaedleragar (BBL). The bacteriawere collected from the plates, and the material was saponified,

methylated, and analyzed as described previously (Miller, Hewlett-Packard note). In brief, the collected material was incubated for30 minat 100°C in 15%(wt/vol)NaOH in 50%

aqueous methanol and then acidified to pH 2 with 6 N aqueous HCl in CH30H. The methylated fatty acids were further extracted with ethyl ether and hexane. The GLC analysis wasdoneasdescribedpreviously (Miller, Hewlett-Packard note) with an HP5890Agas chromatograph

(Hew-lett-Packard) and an Ultra 2, 004-11-09B fused-silica capil-larycolumn(0.2mmby25m;cross-linked5%phenylmethyl silicone; Hewlett-Packard). Ultra-high-purity helium was

used asthe carriergas. The GLC settings were asfollows: injection port temperature, 250°C; detector temperature, 300°C;initialcolumntemperature, 170°C, increasingat 5°C/

minupto270°Cat20min;totalanalysis time,25min; sample volume,1 1xl.Thepeakretention time andpeakareavalues wererecordedbyanHP3392Aintegrator(Hewlett-Packard).

We identified individual fatty acids by comparing their retention timeswith those ofabacterial fattyacid standard (bacterial acidmethyl ester mix CP, 4-7080; Supelco, Inc., Bellefonte, Pa.). Individual fatty acids were identified by usinga 0.5% retention time window when they were

com-paredwiththefattyacidstandard. Theanalytical reproduc-ibilityof themethodwasassessedby repeatedGLCanalyses

of thefattyacid standard.

Data analysis. GLC data were transferred from the

inte-gratorto the computerand storedas data files by using an

Olivetti M24 microcomputer with a 650-kilobyte memory and a 20-megabyte Winchester disk. Data transfer and all analysisprograms were donelocally (supplied by Scientific Expert Systems Ltd., Helsinki, Finland). The transferred

dataconsistedof retentiontime andpeak areavalues for all

peaks recorded by the integrator. Identified peaks

corre-sponding to the fatty acid standard and three frequently appearing unidentified peaks were used in the subsequent analyses.

The meanpeakareaand standarddeviation foreachfatty

acid were calculated for each species. The values were

calculatedaspercentages of the totalpeakareatoeliminate

theeffectofinoculumsize variation. Variationsinthe peak

areas of different fatty acids and their prevalence among

different species were analyzed. These values were

com-pared with the mean peak sizes to determine the effect of peakareas onthereproducibilityof theresults.

Correlationanalysis. Fattyacidprofileswerestoredonthe computer to be used for automatic identification of the

bacteria. Theidentificationwasbasedoncalculating

similar-itycoefficientsbetweenindividual bacterial strains.Thiswas

accomplished by comparing the fatty acid profile of the

unknown strain with those ofstandard strains to find the

reference strainsthatmost closelyresembled the bacterium beingtested.

A selection ofindividual strains were analyzed by

corre-lation andsubsequent clusteranalysistoevaluate the ability

ofthesemethods toseparatethestrains intodistinguishable species-related clusters. Several methods have been pub-lishedforcalculating thesimilaritycoefficients of GLCdata

(4). Thesemethodsare basedoncomparingeither the ranks

of peak areas or absolute peak area values. The methods used in the present

study

included the Pearson

product

moment test (4), correlation coefficient (2, 9, 10), the simi-larityindex based onangle separation vectors(8), theStack

method (17), the Spearman coefficient of rank correlation (4), and the

similarity

index basedon

overlap

of

fatty

acid profiles (1). Inaddition, derivatives of the

existing

methods

were produced to study how different factors affect the results. These methods included a variation ofthe Stack

method thatcompared peakareaswith

weighted

peak

sizes,

and a variation of the similarity index based only on the presence of the peaks,with orwithout weightingpeakareas. The methods used and their formulas are described in the

appendix, which also contains a new

exponential

function

method developed onthebasis oftheresults ofthe present study. Details ofthemethod are described inResults.

Theefficacy of various correlation methodswastested

by

using groups of12 species with 6 strains in each(total, 72 strains). The species were selected toinclude groupswhich

either showed clearqualitative and quantitative differences

in their fatty acid profiles ordiffered only in the amountsof

theirfattyacids.

The results ofthecorrelation analyseswere storedonthe computer as similarity index matrices. These were further

analyzed byusing the weightedpair-group clusteranalysis of

the arithmetic averages method (14, 15) and are presentedas dendrogramsfor comparison of the clusteringefficiencies of

thevarious methods. Thefollowingparameters(seeTable

1)

were calculated for each method to comparetheirefficacies:

the average clustering level within species (ACL), which is

the average value of the lowest clustering levels of each

species;theintrafamilial clusteringindex(CI) forthe Bacte-roides, Staphylococcus, Clostridium, and enterobacterial

species, which is equivalent to the ratio of the ACLofthe corresponding species and the lowest level of clustering

between the same species (this showed the ability ofthe methods to separate between the correspondingspecies); the number of strains placed outside their properclusters; and

thenumberofstrains allocatedintothe wrong clusters. RESULTS AND DISCUSSION

Bacterial fatty acid composition. Figure 1 shows the aver-age cellular fatty acid compositions and standarddeviations

within species for the tested strains. In general, both the occurrence of the various fatty acids and their amounts seemed constant within each species. Thus, the reproduc-ibility of the fatty acid data was high enough for bacterial identification. The observedvariationwas due to differences between different strains. If the same strain was cultured and analyzed repeatedly, the results were practically identical.

Figures 2 and 3 show the prevalence of individual fatty acids and their variation in peak area. The peak areas are presented as percentages of the total fatty acid peak areas. The prevalence of the fatty acids and their coefficients of variation [CVs, calculated as (standard deviation/mean) x

100] are plotted against the mean peak area of each individ-ual fatty acid in each species. No difference was found between different species or different fatty acids in their prevalence or CVs (data not shown). Thus, all individual fatty acids of all 12 tested species are plotted together. If the mean fatty acid peak area was .5% of the total area, the peak was present in almost 100% of the strains of the species. Below that level the prevalence began to

decrease,

sothat ifthe peak area was <1%, a prevalence of100% was never seen within a species for any fatty acid tested (Fig. 2).

J.CLIN. MICROBIOL.

on April 11, 2020 by guest

http://jcm.asm.org/

(3)

I. baglls Lieul

v- - ; o--.-re-oefo

8. thetaiotWem

Z-__

_-r!z

_ _ w

*.euIsatu

tz-Ilb h

_:

_r _

01A

... ... .... =

---.. --- --

---... ...4-.

.ffl --« rmmo

i qq je !1-,, .- .. m

m"em

ego" M- -C.pbtu

. "ea

.--.= 2b

7._

& -_:-- ._-.t E _

if '-4ii

_ _!

a

L. mmtugmss

...=

FIG. 1. Peakareasofvarious fatty acids within analyzed species. The bars show themeanpeakareasof fatty acids and standard deviation

inall strains (strains lacking the particular fatty acidarealsoincluded). The number of strains containing each fatty acid is given above each bar.

1747 s.

-N

J

a

le

S. dmddits

N

la

y. ateurlittea

t:

a aN à

a

T.- _ _=

f:m-S.ulis _

:m

F! =bL

N

la _t:

ii

'Il-à

À

a

at

..

3 aà

d ï

1N

Nd

JL i

1.1 1

m...

m

-1 ,JO ir= m

C.diffieile

1

on April 11, 2020 by guest

http://jcm.asm.org/

(4)

;

100-i

-I

i 50.

10j e

e

e e 0

0 0

0~~~~

*

e

elo

e00

0

e e

.001

-ui ---a JL _I_.s

1 2 3 4 56 7 8910 20 30 40506070 morn peksuz I%

FIG. 2. Effect of theamount offatty acidsontheir prevalence among analyzed strains. The prevalence of each fatty acid is presented in relation to themeanpeakareaof thatfattyacid in each species. Fatty acid peakareas arepresentedas apercentage of the totalpeakarea.

PeakareaCVs wereinversely correlated withmeanpeak

areas.If the meanpeakarea wasabove1%,CVswereabout

30%. When the mean peak area exceeded 10%, CVs

de-creasedgradually,

approaching

values of5 to10%. Atmean

peak sizes below 1%, CVs were almost constantly above

100% (Fig. 3). These findings formed the basis for the

required characteristics ofthe developed comparison

func-tion.

Development of the new comparison function. A crucial point for effective comparison of fatty acid

profiles

was

foundto be thepossibility ofadjustingthe methodaccording

tovariations intheprevalenceand amountoftheacids.The results showed that for strains of the same

species,

the

reproducibility of peak prevalence andarea was very

good

(prevalence,

100%;

CV,

<25%)

with

peaks

greater than 5%

ofthetotal areaand poor(prevalence, <50%, CV, >100%) withpeaks less than 1% of the totalarea(Fig. 1 to3).

Thefollowing initial procedurewasusedin the

analysis.

A

similarityvalue wasfirstcalculated for each

peak

pair

inthe two strains to be compared. We had to decide whether to

weightthecomparison by peakareas.Weightedor unweight-edpeakareascould be used forcomparison.Ifthepeakarea was not weighted, the

similarity

index was calculated

di-rectly as an averageofindividual peak similarityvalues(see

theequation below). Ifthepeakarea wasused as animpact factor fortheresult,the similarity indexwascalculated as a

weighted average of peak similarity values by using the

relative areasofthepeaks.

cv

60~~~~~~~

10~~~~~~~~~~

ol 1~~~~2I 3 4 6790 20 3450

12--3 4 6i;;;;O 20 -;O

monpee sixe1%h

FIG. 3. Effectof the peakareasof fatty acidsonthevariation of

peaksizes. PercentCVs foreachfatty acid withineach speciesare

presentedin relationtothemeanpeakareaof that particular fatty

acidwithineach species. All fatty acid peakareas arepresentedas

apercentageof the total peakarea.

100

oL

1 2 3 4

100

c

Ps

8>b

n.~~~~~``

1 2 3 4

FIG. 4. Shapes of various functionsshowingtheeffectsofpeak

arearatiosonsimilarityindices.Thehorizontal axis shows thepeak

area (always inverted to be 21). The vertical axis shows the

similarityindex.(A) Similarityindexcalculateddirectlyastheratio ofpeakareas.(B)Similarityindex calculatedby usingthedeveloped function, showingthe effectof theconstantk1 ontheshapeof the function

(k,

hasvalues from1to10, respectively,fromlineatoline b). (C)Similarity index calculatedby usingthedeveloped function, showingtheeffect of theconstantk2ontheslopeofthefunction(k2

hasvalues of0 and 0.25inlinesaandb,respectively). (D)Relation between similarity index and peak area ratio when using the

developedfunction. The shape ofthe function is as used in the

analyses.Valuesofkl,k2,andk3aregiveninthetext.Thevertical lineshows the location ofpeaksizeratio 1.3 in the function.

Figure 4A shows afunction reflecting the peak similarity

value when it iscalculated plainlyastheratio of peakareas

(XlkIXjk).

InFig. 4A the ratio is invertedsothat the value is always greater than or equal to 1. A characteristic of the

curve is that the drop of the similarity index is highest at

peak ratio values just higher than 1. The approximate CV

range, 0 to 30%, reflecting the observed variation of fatty acid peakareaswithina species, is in theareaof the curve

at ratio valuesbetween 1 and 1.3. (Two peaks with a 30%

difference intheir sizesgivearatio of 1.3.)Itis justthisarea

on the XÎkIXjk curve that shows the highest drop of the

function. Thus, the use of the plain peak area ratio is too

heavily influenced by variations in fattyacid peak areas.

To eliminate this effect, we developed an exponential

function which would show a minimal change in the above area and then drop rapidly, since variations greater than 30%, especially in large peaks (Fig. 3), always signifyalack

of identity between the species. Thus, the function was

developed so that it allowed a certainamountofpeakarea

variation with a minor impact on the similarity index.

1 2 3 4

J. CLIN. MICROBIOL.

ai

on April 11, 2020 by guest

http://jcm.asm.org/

(5)

Further weighting of individual peaks by their areas was used in the comparison.

Similarity indices were calculated betweenindividual fatty acid profiles by using the equation

n

Cf= E 0.5(Xlk +XJk)f(xlklxjk)

k= 1

where Cf is the similarity index between samples x1 and Xk, k is the peak number, Xlk and

xjk

are the relative areas of the kth peaks of strains

xi

andxj,

xikIxjk

is the peak size ratio

(xik/

Xjk was always inverted to be -1),

f(Xklxjk)

is the peak similarity value determining function, which is given by

lOOkll[[(xlk/xjk)

-

k2]k3

_ 1 +

kl],

where

kl,

k2, and k3 are

coefficients determining the effect of peak size variation on the similarity coefficient, and

(xII<xjk)

- k2 is determined such that if

[(xlk/xjk)

- k2] < 0, then

[(xI/xjk)

- k2] = 0.

When two samples,x,and

xj,

were compared, eachpeakin x1 was compared with the corresponding peak in

xj.

If the same peak was found in both samples, therelative amounts were compared by using the function

ftxlklxJk).

Ifthe values were equal, the value of

ftxlklxjk)

was 100. Ifthe peak was absent in curvexj,the value was0.Ifdifferences occurred in the amounts of the peaks, the value wasbetween 100 and 0. The constantsk1, k2, and k3 controlled the effect ofpeak area difference on the similarity coefficient. Figures 4B and C show the effects of coefficients k2 and k3 on the peak similarity value. The constant k2 moved the curve in a left-right direction, determining the length ofthe lag period before the increase in the peak area ratio began to decrease the similarity index. The constant k3affected theslope of the curve, determining howrapidly the function decreasedafter the lag period. With k1 values from0 to 10, the slope of the decreasing part of the function changed from an almost horizontal position (Fig. 4B, curve a) to an almost vertical one (Fig. 4B, curve b). Analyses of about 4,500 strains of normal clinical isolates gave the following optimal values, which were obtained empirically: k1 = 50, k2 = 0.25, k3 = 6.0. Figure 4D shows the shape of the function based on these selected values.

The

fxIIIxjk)

value was further multiplied by the relative peak areas,Xk and

Xjk,

to weight the effects ofmetabolites

according to their peak areas. Thus, the similarity coeffi-cient,

Cf,

was 100 ifthe twochromatograms were identical, irrespective of the numbers of peaks and their absolute

areas. If the formula was used without weighting the peaks according to theirsizes, it took the followingform:

n

Cf= f

f(xlklxjk)ln

k= 1

Comparison ofthe different data processing methods used for species identification. Figure 5 shows tree graphs that

presenttheresultsofthecorrelation and

subsequent

cluster

analysisofthe 72strainsofthe 12species by the 10methods.

Table 1 shows thecalculated index valuesfor each method.

The ACL, calculated as the mean of thelowest clustering

levels of each species, reflectedhow small the variation was within species, optimally 100. There was no marked

differ-ence in ACL among the methods. However, the Stack

method was poorest in this respect. The

species-specific

CI

values showed the ratiobetweenthe ACL(calculated sepa-rately for the threeBacteroides species, the three

Staphylo-coccus species, the two enterobacterial species Y. enteroco-litica and E. coli, and the two Clostridium species) and the mean clusteringlevel at which these specieswere

separated

from each other. Thus, values close to 1.0 showed a

com-plete inability to separate species. The further the values were below 1.0, the better was the ability to separate species. These intrafamilial index values must be compared with the numbers of species placed outside their own

clus-ters, which tended to increase as the species-specific CI values improved. On the other hand, the number of species misallocated increased as the CI values approached 1.0.

None of the analyzed strains was allocated to a wrong cluster by the developed comparison function (method A), which was capable of separating the three Bacteroides species, the three Staphylococcus species, and the two enterobacterialspecies into separate clusters (with CIvalues

within the range of 0.89 to 0.93). Almostequally goodresults were obtained with the overlapping method (method B). It misallocated only one strain, and the species separation was slightly lower than with the developed function (the CI

ranged between 0.84 and 0.95). The ACL values were, however, not as good (87.5 versus 92.4), showing higher variations within clusters. Another method with relatively good results was the Stack method, if it was modified by weighting the peaks accordingto peak areas (method C). It

misallocated six strains to wrong clusters. The other meth-odswere clearly poorer in their ability to separate species. The correlation coefficient method (method G), the angle

separation vector method (method H), and the method determining the peak presence (method F) gave similar results; they all had a marked tendency to combine several clusters.TheSpearman rank correlation method (method D) and the methods which compared the presence of peaks

(method E), peak size ratios directly (the Stack method [method

I]),

or peak sizes with the developed function (method J) but without weighting the peak areas allshowed similar results, with several strains placedoutside their own clusters. They also showed low ACL values.

The developed comparison function with weighted peak

areas (method A) and the overlapping method (method B) were clearly superior to the others. These were further

tested for their ability to separate numerousStaphylococcus

andBacteroides strains.Of75 S.aureus, S. epidermidis, and S. hominis strains, 60 and 54were allocatedcorrectly by the

exponential function method and the overlapping method,

respectively. Of113 B.fragilis, B. thetaiotaomicron, and B. vulgatus strains, 97 were allocated correctly by the

expo-nential function method. The overlapping method allocated 86 strainscorrectly. This difference was due to theinability

of the overlapping method to distinguish between B. theta-iotaomicron andB. vulgatus strains, whichwere distributed among several closely related subclusters (data not pre-sented).

Clearly, if GLC is used for identification, a computer program must be available for comparison of the fatty acid profiles. Several methods have been published for the cor-relation analysis ofchromatographic data (1). However, to ourknowledge, no studies have beenpublishedthatcompare

the efficacy of various methods in identifying clinical iso-lates. Theresults of the presentstudy show thatcomparison

of GLC data, without an optimal comparison procedure,

may not lead to adequate identification ofspecies. The new exponential comparison function wasdeveloped to

adjust

to the variation ofchromatographic data. In thepresent study

no false clustering and a high ACL were seen. The good

results in comparison with other methods evidently de-pended on selecting a shape of the discriminant function which filtered out the noise included in the data butnot the parts relevant to correct identification.

on April 11, 2020 by guest

http://jcm.asm.org/

(6)

~~~~~

m.~~~~~~~~~~~~~~~~a.B.~~~~~~~

.a a .8

""""ff@@@~* x sX EE eE ao o" - --- ''''''

-50

O-o- Z U ci ci ci ci w la (aJw w >>> V V o

»~~~~~~~~~~~~~

ArA Acm(A A(ArAr--rAr- rAruceloi A -Ja -J jj ci fiUUU(jw w- sa w> > >. > >>16A.à. &06

1~~~~~~~~~~~~L

1,

1

<

1

".

E

|

,

.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~60

0- o~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The~~~

correatio

~

anlyi fnton method-j-

~~~

r ArAr

sp

r A(

ratledtce

OC

mceto

(meho

G) th

>nl

seaato

>method.

veto

-1j~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-Thcorreation

functiontoethod

analysis

separate

Sepaatio

the M

0se

ho

(method

G),ethed-ngle

s a

method.

H

ve

whe

thesespecies depended mainly on differencesin, the amounts used for the analysis of a large number of strains, it was of the fatty acids and

nlot

in their presence. Previously found to be poorer at analyzing profiles with quantitative

published correlation

analysis

methodshave been compared variations only.

for their ability to analyze bacterial fatty acid data (1). AIl other methods clearly gave poorer results. This was

Hypothetical data and fatty acidprofiles from coryneform due to their inability to adjust to variation of data and

bacteria were analyzed by using the correlation coefficient inconsistent prevalence of small peaks. Their results cannot

J. CLIN.MICROBIOL.

on April 11, 2020 by guest

http://jcm.asm.org/

(7)

IDENTIFICATION OF BACTERIA BY AUTOMATIC GLC

- S cus.^ ^ @ gS gs gg vx vx ^ ,,s s, 9 E Z E. e 9 sv * 9 * v v os maaaaaa mmm o o ,5 , t1

0~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~S

"

G

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~o

*** ggg*r*5 ---|||s^ b3bb*999*U----XXXX*ID***u**t.u

.~~~~~~~~~~~~~~~~~~~~~~~~~~~~w

10 :|~~~~~~~~~~~~~~~~

_ E! uu

J.arethesameasssminutheaaàappendix.

be~~~~~~~~~~~~~v

imroe

byrisn the

deecio tresol

(A

to

(Aataè

lee at

elinA

Ar A..j-J- 3Q

in

(Oicrobiology

ica

.

Sinc all anlse

r

peakarèasis essentialfor optimal comparison idcmfcAinofaclinical isolates.Thprlea o

Th

slctdprnipeinte nlyi prcdr in identi~- auaye sri

is.

a

tasferre

on-lie frm

thcroa-w.-~~~~~~~~~~~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~~~~~~~~~~~~~~~~~~~~~.0 60.~~~~~~~~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ w

-b

0--0

bb mUA~ ~aA a aa gi=-

-fynth clnia islaes

was the

gorreatio

ànsusqet gaht h optran meitl oprdwt e

cluster.oeGdtT--&aa'oai'O .

0 0

0. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-0

FIG. 5. Dendrograms showingthe resultsof thevariousanalysismethods of the GLCdata.Themethods,presentedaslettersfrom Ato

J, arethesame asin theappendix.

be

improved

by raising thedetection threshold to alevel at

advantages

in clinical

microbiology.

Since ail

analyses

are

which the presence of

peaks

would have been more con- done by

comparing

chromatograms ofpresently and

previ-stant. Wherever the threshold was, there would always be

ously analyzed

strains, the user can create reference files.

species

having peaks

with sizes

varying

about the selected The system canthus have different

applications, depending

threshold level. These

peaks

wouldappear

inconsistently

in on the type of

microorganisms

to be

analyzed

(13). We are

GLC data and cause false

clustering.

Thus,

weighting

of currently

using

the present analysis method in automatic

peak

areas isessential for

optimal comparison.

identification of clinical isolates. The

fatty

acid

profile

ofthe The selected

principle

in the

analysis

procedure

inidenti-

analyzed

strain is transferred on-line from the

chromato-fying

theclinicalisolateswasthecorrelation and

subsequent

graph

tothecomputer and

immediately

compared

with aset

cluster

analysis

of the GLC data. This method has clear ofreference strains. Theresults ofthe

analysis'

are

printed

as

VOL.26, 1988 1751

on April 11, 2020 by guest

http://jcm.asm.org/

(8)

TABLE 1. Efficacy of comparison methods in species identification ofanalyzed strains

IntrafamilialCI'of: No. of No. of

Method' ACL

~~~~~~~~~~~~~~~species

species

Method ACL Bacteroides Staphylococcus Enterobacterial Clostridium outsideown inwrong

strains strains strains strains clusters clusters

A. Exponential functionweightedbypeak size 92.4 0.89 0.93 0.86 0.13 0 0

B. Overlap 87.5 0.92 0.95 0.84 0.37 1 1

C. Stack methodweightedbypeak size 73.8 0.82 0.91 0.81 0.28 1 6

D. Spearman ranked 90.8 0.78 0.93 0.92 0.81 8 1

E. Presenceofpeaks 85.7 0.74 0.93 0.95 0.79 9 5

F. Presenceof peaks weightedby peaksize 97.2 0.94 1.0 1.0 0.89 0 30

G. Correlationcoefficient 97.1 0.99 0.99 0.99 0.64 0 36

H.Angle separationvector 97.5 0.94 0.99 0.98 0.80 3 26

I. Stack method 64.2 0.69 0.82 0.78 0.63 8 3

J. Exponential functionnotweightedbypeaksize 78.1 0.57 0.78 0.57 0.78 9 1

aMethodsandlettersarethesame asintheappendix.

bCI= 1meanscompleteinabilitytodistinguishbetweenspecies. CTotal number of analyzed strainswas72.

dTherange -1 to0.1isrescaledtoOto100forcomparabilitywith other results.

a list ofstrains with the closest resemblance to the tested strain. The tested strain is also placed in a dendrogram consisting of reference strains. Thepart of the dendrogram containingthe test strain and those with the closest resem-blance is printedwiththe results. The location of thetested strain inthe referencedendrogramshows the level of corre-lation between the test strain and reference strains, as well asthecorrelationamongthereferencesthemselves,

indicat-ing the confidencelevelof the identification. Further, if the test strain is not among the references, the dendrogram

showshowcloselyit is related to certain reference species.

The system hasbeen testedwithabout4,500strains and has

proved useful inrapid identification of clinical isolates.

Depending on the type ofmicroorganisms analyzed and

hence on data variation, the optimal comparison function maydiffer from thepresent one. The parameters

defining

the slope of the present discriminating function can be deter-mined for each collection of standard reference isolates

individually.

APPENDIX

A list ofthe correlation analysis methods used in the study is given below. Method Ainvolves anexponential comparison func-tionwith weighted peak areas.Itisexplainedinthetext.MethodB

involves a similarity index basedonoverlap offatty acid profiles: n

Co= 100- 2.S IX/kXjkI k= 1

Method C is the Stack method withweighted peakareas:

n

Csw= 2 O.S(xlk+ Xjk)(xIklxjk)!lOO k= 1

(Xl/kXjk)

is always inverted to be '1. Method D is the Spearman

coefficient of rankcorrelationmethod:

Cr16 2 d132 (n3-n)

k= 1

where

d.

is thedifferenceofranks of the peak pair. Method E is the similarity index method based on the presence of peaks:

n/

CP = 2

f(xk,xjk)

n

k = 1

where

J(XlkXjk)

= 100 if

XIk

andxjk are present in both curves and

J(X1kXJk)

= O if XkorXk is missing ineithercurve.Method F is the

similarity index method based on the presence of peaks with weightedpeakareas:

n/ Cpw = 2 O.S(XIk +Xjk)f(XikXjk) 10

k= 1

where

fJXikXJk)

= 100 ifxIk and

xjk

arepresent in both curvesand

ftXikXjk)

= O ifxlkorxjkismissing in eithercurve. MethodG is the

correlationcoefficient method:

C,= 50(1 +

rj),

where _n/

rij= 2 (Xlk-Xl)(Xjk -Xj)

k= 1

[2(Xk-

Xà)2I[

(Xi1k -Xj)I12

Method H is the similarity index method based on the angle of

separationvectors:

n

C=

X"112X

1/2

Ca XIkjk

k= 1

MethodI is the Stackmethod,notweightedby peakareas:

n

Cs = 2 (xÎkIxjk)n

k= 1

where

(Xik/Xjk)

is always inverted to be 21. Method J is the exponential comparisonfunction,notweighted bypeakareas.Itis explainedinthe text. For allofthe abovemethods,Xk andxjkare

percentageareasof the kth peaks oforganismsi andj, respectively;

xi

and xjare meanpercentagepeakareasoforganismsland j; andn

is thenumber ofdifferent peaks in two comparedorganisms. ACKNOWLEDGMENTS

We thank Kirsti Tuomela and Marja-Riitta Terasjarvi for their excellent technicalassistance and Eija Nordlund for help in prepa-ration of themanuscript.

LITERATURE CITED

1. Bousfield, I. J., G. L. Smith, T. R. Dando, and G. Hobbs. 1983.

Numericalanalysis of total fatty acid profiles in the identifica-tion ofcoryneform, nocardioform, and some other bacteria. J. Gen. Microbiol. 129:375-394.

2. Drucker,D. B.1974.Chemotaxonomic fatty acid fingerprints of

somestreptococci withsubsequentstatistical analysis. Can. J. Microbiol. 20:1723-1728.

3. Drucker, D. B. 1981. Analysis of structural components of

J. CLIN.MICROBIOL.

on April 11, 2020 by guest

http://jcm.asm.org/

(9)

microorganisms bygaschromatography, p. 166-291. In D.B.

Drucker(ed.), Microbiological applications ofgas

chromatogra-phy. CambridgeUniversity Press, Cambridge.

4. Drucker,D. B.1981.Computation ofgaschromatographic data,

p.400-423.InD. B.Drucker (ed.),Microbiologicalapplications

of gas chromatography. Cambridge University Press,

Cam-bridge.

5. Guerrant, G. O., M. A. Lambert, and C. W. Moss. 1982. Analysis ofshort-chain acids from anaerobic bacteriaby high-performanceliquid chromatography.J.Clin.Microbiol. 16:355-360.

6. Halebian,S.,B. Harris,S. M. Finegold, and R. D. Rolfe. 1981.

Rapid method that aids in distinguishing gram-positive from

gram-negative anaerobic bacteria. J. Clin. Microbiol. 13:444-448.

7. Hansen, M. V., and L. P. Elliott. 1980. New presumptive identification testfor Clostridium perfringens: reverse CAMP

test.J. Clin.Microbiol. 12:617-619.

8. Ikemoto,S.,H.Kuiraishi, K. Komagata, R. Azuma, T. Suto, and H. Murooka. 1978. Cellular fatty acid compositionin Pseudo-monasspecies. J.Gen.Apple.Microbiol. 24:199-213.

9. Janzen, E., K. Bryn, T. Bergan, and K. Bovre. 1974. Gas

chromatography of bacterial whole cell methanolysates. V.

Fatty acid composition of neisseriae and moraxellae. Acta

Pathol. Microbiol. Scand. Sect.B 82:769-779.

10. Janzen, E., K. Bryn, T. Bergan, and K. Bovre. 1974. Gas

chromatography of bacterial whole cell methanolysates. VI. Fattyacid composition of strainswithinMicrococcaceae. Acta Pathol.Microbiol. Scand. Sect. B 82:785-798.

11. Janzen, E. 1984. Analysis of cellular components in bacterial classification and diagnosis, p. 257-302. In G. Odhan, L. Lars-son, and P.-A. Màrdh (ed.), Gaschromatography mass spec-trometry applications in microbiology. Plenum Publishing Corp., NewYork.

12. Moss, C. W., and O. L. Nunez-Montiel. 1982. Analysis of short-chain acids from bacteria by gas-liquid chromatography with afused-silica capillary column. J. Clin. Microbiol. 15:308-311.

13. Rizzo, A. F.,H.Korkeala, andI. Mononen. 1987. Gas-chromato-graphicanalysis of cellular fatty acids and neutral monosaccha-ridesin the identification of lactobacilli. Appl. Environ. Micro-biol. 53:2883-2888.

14. Sneath, P. H. A., and R. R. Sokal. 1973. The estimation of taxonomicresemblance, p. 114-187. In D. Kennedy andR. D.

Park(ed.), Numerical taxonomy.W. H.FreemanandCo., San Francisco.

15. Sneath,P. H.A.,and R. R. Sokal.1973.Taxonomic structure, p. 288-308. In D. Kennedy and R. D. Park (ed.), Numerical taxonomy.W. H. FreemanandCo., San Francisco.

16. Sokal, R. R. 1985. The principles of numerical taxonomy: twenty-five years later, p. 1-20. In M. Goodfellow, D. Jones, andF.G. Priest(ed.),Computer assisted bacterialsystematics. Academic Press,Inc. (London), Ltd., London.

17. Stack,M. V., H. D. Donoghue, J. E. Tyler, and M. Marshall.

1976.Comparisonof oralstreptococcibypyrolysis GLC,p.

57-68. In C. E. R. Jones and C. A. Cramers (ed.), Analytical pyrolysis. Elsevier Biomedical Press, Amsterdam.