0095-1137/88/091745-09$02.00/0
Copyright© 1988, American Society for Microbiology
Optimal
Data
Processing
Procedure for
Automatic
Bacterial
Identification
by Gas-Liquid
Chromatography
of
Cellular
Fatty
Acids
ERKKI EEROLAl* AND OLLI-PEKKA LEHTONEN2
DepartmentofMedical Microbiology, Turku University,' and Central Laboratory, Clinical Microbiology,
Turku
University
CentralHospital,2
SF-20520
Turku, FinlandReceived 1March 1988/Accepted 17 May 1988
Gas-liquidchromatography of cellular fatty acidswasusedin automatic identification of clinical bacterial
isolates. The intraspecies variation in theoccurrenceof fatty acids and the variation inthe relative gas-liquid
chromatography peakareasof different fatty acidswereevaluated and compared withtherelativepeakareas
of these acids. A newchromatogram comparison method involving the use ofan exponential function was
developedtoadjusttodata variation optimally. This methodwascomparedwith several previously published
methods of correlationanalysis with data from representative clinical bacteriological isolates.Theefficacies of themethods in separating different bacterial species into distinct clusterswerecompared. Thenewexponential
functionmethodwassuperiortothe others both in its abilityto separatespecies into different clusters and in givinga greater degree of identityto strains withina proper cluster. The results indicatethat the gas-liquid
chromatography ofbacterial cellular fatty acidscanbe usedeffectively in the identification of clinically isolated bacteria.However, the usefulness of theanalysis dependsonthecomparison method used andonits abilityto
copewith data variations.
Oneof themaingoals of clinical bacteriologicalanalysis is
accurate identification of bacterial isolates. This serves two
purposes: taxonomic identification of species and compari-sons between new and previous isolates. In addition to
classicalidentification methods,based mainlyon
demonstra-tions of differences in the metabolic pathways of different species, methods have been developed which identify bac-teria according to chemical compoundspresent in the bac-terialcells (11). The main advantages of these methods are
increased speed of analysis, since there is no need to
cultivate the bacteria any further, and the possibility of
identifying bacteria which are no longer viable. Several groupsofcompoundsareused for identification. Among the
most often used are certain enzymes, DNA, membrane lipopolysaccharides, carbohydrates, proteins, and bacterial fattyacids(11).The usefulnessofthe variouscompoundsin this respectdepends bothonthetaxonomic effect of
differ-encesin theiroccurrenceandonthespeedandsimplicityof
the analytical methods. In this respect, fatty acid analysis
seems tobeoneofthe mostpromising methods (11, 12). Gas-liquid chromatography (GLC) of bacterial cellular fatty acids(5, 12)is aversatilemethod, allowing the identi-fication of thevastmajorityof isolates. However,
chromato-grams involve the problem ofcomparison of multivariate recordings.Tobeuseful inaclinicalmicrobiological labora-tory, data processing should be automatic. Although
com-puterizedtaxonomyhas been studiedextensively (3, 16), the
optimal way ofapplying a computerized comparative
anal-ysis of cellularfatty acid datato bacterialidentification has not yet been determined. This is especially true of the
identification of clinical isolates.
In the present study we collected isolates of clinically important strains and analyzed the occurrenceand relative
abundance of cellular fattyacids by GLC. We developed a new analytical GLC method which accounts for the
varia-* Correspondingauthor.
tions in the data and thus permits optimal distinction
be-tween species. The new exponential function method was
compared with several previously known chromatographic comparative methods.
MATERIALSANDMETHODS
Bacterial strains. The composition of bacterial fatty acids and itsreproducibilitywithinvariousspecieswereanalyzed
by using strains of 12 species. The species tested were Bacteroidesfragilis (66 strains),B. vulgatus (15 strains), B. thetaiotaomicron (17 strains), Staphylococcus aureus (17
strains), S. epidermidis (20 strains), S. hominis(13 strains), Clostridiumperfringens (19 strains),C. difficile (56 strains), Listeria monocytogenes (10 strains), Escherichia coli (13 strains), Yersinia enterocolitica serotype03(12 strains),and Pseudomonasaeruginosa (15 strains).
The strains were clinical isolates that were identified by standard clinical microbiological practices. These included growth and colony morphology on different media (mena-dione-cysteine-enriched brucella bloodagar with and
with-outneomycin,bacteroides bile esculinagar,cefoxitin cyclo-serine fructoseagar,andbloodagar);Gramstaining(insome casescombined with the KOHtest [6]);antibiotic suscepti-bilities;API20E, API20NE,API Staph,and API 20A tests
(Analytab Products, Plainview, N.Y.);andRapID-ANAtest
(Innovative Diagnostics Systems, Inc., Atlanta, Ga.). The
reverseCAMPtest(7)wasused to confirm the identification
of C.perfringens. Autofluorescencewasused to confirmthe
identification of C. difficile.
Thefollowingtype and reference strainswereincludedin
the analyzed bacteria: B. fragilis (ATCC 23745, ATCC
25285, and ATCC 25280), B. thetaiotaomicron (ATCC 29148), C. difficile (ATCC 17858, ATCC 17857, and ATCC
9689),S. aureus(ATCC 25923), S. epidermidis(ATCC 155, ATCC 35984, ATCC 31432, and ATCC 35983), S. hominis
(ATCC 35981 and ATCC 35982), P. aeruginosa (ATCC 9721), and E. coli (ATCC 25922).
1745
on April 11, 2020 by guest
http://jcm.asm.org/
1746 EEROLA AND LEHTONEN
GLC analysis. Bacterial cellular fatty acid GLC analysis
wasperformed asdescribed previously (L. Miller, Hewlett-Packardgas chromatography applicationnote 228-41; Hew-lett-Packard Co., Palo Alto, Calif.). For GLC, aerobic bacteriawereculturedon Trypticase soy agar(BBL Micro-biology Systems, Cockeysville, Md.)andobligateanaerobes were cultured on Schaedleragar (BBL). The bacteriawere collected from the plates, and the material was saponified,
methylated, and analyzed as described previously (Miller, Hewlett-Packard note). In brief, the collected material was incubated for30 minat 100°C in 15%(wt/vol)NaOH in 50%
aqueous methanol and then acidified to pH 2 with 6 N aqueous HCl in CH30H. The methylated fatty acids were further extracted with ethyl ether and hexane. The GLC analysis wasdoneasdescribedpreviously (Miller, Hewlett-Packard note) with an HP5890Agas chromatograph
(Hew-lett-Packard) and an Ultra 2, 004-11-09B fused-silica capil-larycolumn(0.2mmby25m;cross-linked5%phenylmethyl silicone; Hewlett-Packard). Ultra-high-purity helium was
used asthe carriergas. The GLC settings were asfollows: injection port temperature, 250°C; detector temperature, 300°C;initialcolumntemperature, 170°C, increasingat 5°C/
minupto270°Cat20min;totalanalysis time,25min; sample volume,1 1xl.Thepeakretention time andpeakareavalues wererecordedbyanHP3392Aintegrator(Hewlett-Packard).
We identified individual fatty acids by comparing their retention timeswith those ofabacterial fattyacid standard (bacterial acidmethyl ester mix CP, 4-7080; Supelco, Inc., Bellefonte, Pa.). Individual fatty acids were identified by usinga 0.5% retention time window when they were
com-paredwiththefattyacidstandard. Theanalytical reproduc-ibilityof themethodwasassessedby repeatedGLCanalyses
of thefattyacid standard.
Data analysis. GLC data were transferred from the
inte-gratorto the computerand storedas data files by using an
Olivetti M24 microcomputer with a 650-kilobyte memory and a 20-megabyte Winchester disk. Data transfer and all analysisprograms were donelocally (supplied by Scientific Expert Systems Ltd., Helsinki, Finland). The transferred
dataconsistedof retentiontime andpeak areavalues for all
peaks recorded by the integrator. Identified peaks
corre-sponding to the fatty acid standard and three frequently appearing unidentified peaks were used in the subsequent analyses.
The meanpeakareaand standarddeviation foreachfatty
acid were calculated for each species. The values were
calculatedaspercentages of the totalpeakareatoeliminate
theeffectofinoculumsize variation. Variationsinthe peak
areas of different fatty acids and their prevalence among
different species were analyzed. These values were
com-pared with the mean peak sizes to determine the effect of peakareas onthereproducibilityof theresults.
Correlationanalysis. Fattyacidprofileswerestoredonthe computer to be used for automatic identification of the
bacteria. Theidentificationwasbasedoncalculating
similar-itycoefficientsbetweenindividual bacterial strains.Thiswas
accomplished by comparing the fatty acid profile of the
unknown strain with those ofstandard strains to find the
reference strainsthatmost closelyresembled the bacterium beingtested.
A selection ofindividual strains were analyzed by
corre-lation andsubsequent clusteranalysistoevaluate the ability
ofthesemethods toseparatethestrains intodistinguishable species-related clusters. Several methods have been pub-lishedforcalculating thesimilaritycoefficients of GLCdata
(4). Thesemethodsare basedoncomparingeither the ranks
of peak areas or absolute peak area values. The methods used in the present
study
included the Pearsonproduct
moment test (4), correlation coefficient (2, 9, 10), the simi-larityindex based onangle separation vectors(8), theStack
method (17), the Spearman coefficient of rank correlation (4), and the
similarity
index basedonoverlap
offatty
acid profiles (1). Inaddition, derivatives of theexisting
methodswere produced to study how different factors affect the results. These methods included a variation ofthe Stack
method thatcompared peakareaswith
weighted
peak
sizes,
and a variation of the similarity index based only on the presence of the peaks,with orwithout weightingpeakareas. The methods used and their formulas are described in the
appendix, which also contains a new
exponential
functionmethod developed onthebasis oftheresults ofthe present study. Details ofthemethod are described inResults.
Theefficacy of various correlation methodswastested
by
using groups of12 species with 6 strains in each(total, 72 strains). The species were selected toinclude groupswhich
either showed clearqualitative and quantitative differences
in their fatty acid profiles ordiffered only in the amountsof
theirfattyacids.
The results ofthecorrelation analyseswere storedonthe computer as similarity index matrices. These were further
analyzed byusing the weightedpair-group clusteranalysis of
the arithmetic averages method (14, 15) and are presentedas dendrogramsfor comparison of the clusteringefficiencies of
thevarious methods. Thefollowingparameters(seeTable
1)
were calculated for each method to comparetheirefficacies:
the average clustering level within species (ACL), which is
the average value of the lowest clustering levels of each
species;theintrafamilial clusteringindex(CI) forthe Bacte-roides, Staphylococcus, Clostridium, and enterobacterial
species, which is equivalent to the ratio of the ACLofthe corresponding species and the lowest level of clustering
between the same species (this showed the ability ofthe methods to separate between the correspondingspecies); the number of strains placed outside their properclusters; and
thenumberofstrains allocatedintothe wrong clusters. RESULTS AND DISCUSSION
Bacterial fatty acid composition. Figure 1 shows the aver-age cellular fatty acid compositions and standarddeviations
within species for the tested strains. In general, both the occurrence of the various fatty acids and their amounts seemed constant within each species. Thus, the reproduc-ibility of the fatty acid data was high enough for bacterial identification. The observedvariationwas due to differences between different strains. If the same strain was cultured and analyzed repeatedly, the results were practically identical.
Figures 2 and 3 show the prevalence of individual fatty acids and their variation in peak area. The peak areas are presented as percentages of the total fatty acid peak areas. The prevalence of the fatty acids and their coefficients of variation [CVs, calculated as (standard deviation/mean) x
100] are plotted against the mean peak area of each individ-ual fatty acid in each species. No difference was found between different species or different fatty acids in their prevalence or CVs (data not shown). Thus, all individual fatty acids of all 12 tested species are plotted together. If the mean fatty acid peak area was .5% of the total area, the peak was present in almost 100% of the strains of the species. Below that level the prevalence began to
decrease,
sothat ifthe peak area was <1%, a prevalence of100% was never seen within a species for any fatty acid tested (Fig. 2).
J.CLIN. MICROBIOL.
on April 11, 2020 by guest
http://jcm.asm.org/
I. baglls Lieul
v- - ; o--.-re-oefo
8. thetaiotWem
Z-__
_-r!z
_ _ w
*.euIsatu
tz-Ilb h
_:
_r _
01A
... ... .... =
---.. --- --
---... ...4-.
.ffl --« rmmo
i qq je !1-,, .- .. m
m"em
ego" M- -C.pbtu
. "ea
.--.= 2b
7._
& -_:-- ._-.t E _if '-4ii
_ _!
aL. mmtugmss
...=
FIG. 1. Peakareasofvarious fatty acids within analyzed species. The bars show themeanpeakareasof fatty acids and standard deviation
inall strains (strains lacking the particular fatty acidarealsoincluded). The number of strains containing each fatty acid is given above each bar.
1747 s.
-N
J
J
a
le
S. dmddits
N
la
y. ateurlittea
t:
a aN à
a
T.- _ _=
f:m-S.ulis _
:m
F! =bL
N
la t:
ii
'Il-à
À
a
at
..
3 aà
d ï
1N
Nd
JL i
1.1 1
m...
m
-1 ,JO ir= m
C.diffieile
1
on April 11, 2020 by guest
http://jcm.asm.org/
1748 EEROLA AND LEHTONEN
;
100-i
-I
i 50.
10j e
e
e e 0
0 0
0~~~~
*
e
elo
e00
0e e
.001
-ui ---a JL I.s
1 2 3 4 56 7 8910 20 30 40506070 morn peksuz I%
FIG. 2. Effect of theamount offatty acidsontheir prevalence among analyzed strains. The prevalence of each fatty acid is presented in relation to themeanpeakareaof thatfattyacid in each species. Fatty acid peakareas arepresentedas apercentage of the totalpeakarea.
PeakareaCVs wereinversely correlated withmeanpeak
areas.If the meanpeakarea wasabove1%,CVswereabout
30%. When the mean peak area exceeded 10%, CVs
de-creasedgradually,
approaching
values of5 to10%. Atmeanpeak sizes below 1%, CVs were almost constantly above
100% (Fig. 3). These findings formed the basis for the
required characteristics ofthe developed comparison
func-tion.
Development of the new comparison function. A crucial point for effective comparison of fatty acid
profiles
wasfoundto be thepossibility ofadjustingthe methodaccording
tovariations intheprevalenceand amountoftheacids.The results showed that for strains of the same
species,
thereproducibility of peak prevalence andarea was very
good
(prevalence,
100%;
CV,<25%)
withpeaks
greater than 5%ofthetotal areaand poor(prevalence, <50%, CV, >100%) withpeaks less than 1% of the totalarea(Fig. 1 to3).
Thefollowing initial procedurewasusedin the
analysis.
Asimilarityvalue wasfirstcalculated for each
peak
pair
inthe two strains to be compared. We had to decide whether toweightthecomparison by peakareas.Weightedor unweight-edpeakareascould be used forcomparison.Ifthepeakarea was not weighted, the
similarity
index was calculateddi-rectly as an averageofindividual peak similarityvalues(see
theequation below). Ifthepeakarea wasused as animpact factor fortheresult,the similarity indexwascalculated as a
weighted average of peak similarity values by using the
relative areasofthepeaks.
cv
60~~~~~~~
10~~~~~~~~~~
ol 1~~~~2I 3 4 6790 20 3450
12--3 4 6i;;;;O 20 -;O
monpee sixe1%h
FIG. 3. Effectof the peakareasof fatty acidsonthevariation of
peaksizes. PercentCVs foreachfatty acid withineach speciesare
presentedin relationtothemeanpeakareaof that particular fatty
acidwithineach species. All fatty acid peakareas arepresentedas
apercentageof the total peakarea.
100
oL
1 2 3 4
100
c
Ps
8>b
n.~~~~~``
1 2 3 4
FIG. 4. Shapes of various functionsshowingtheeffectsofpeak
arearatiosonsimilarityindices.Thehorizontal axis shows thepeak
area (always inverted to be 21). The vertical axis shows the
similarityindex.(A) Similarityindexcalculateddirectlyastheratio ofpeakareas.(B)Similarityindex calculatedby usingthedeveloped function, showingthe effectof theconstantk1 ontheshapeof the function
(k,
hasvalues from1to10, respectively,fromlineatoline b). (C)Similarity index calculatedby usingthedeveloped function, showingtheeffect of theconstantk2ontheslopeofthefunction(k2hasvalues of0 and 0.25inlinesaandb,respectively). (D)Relation between similarity index and peak area ratio when using the
developedfunction. The shape ofthe function is as used in the
analyses.Valuesofkl,k2,andk3aregiveninthetext.Thevertical lineshows the location ofpeaksizeratio 1.3 in the function.
Figure 4A shows afunction reflecting the peak similarity
value when it iscalculated plainlyastheratio of peakareas
(XlkIXjk).
InFig. 4A the ratio is invertedsothat the value is always greater than or equal to 1. A characteristic of thecurve is that the drop of the similarity index is highest at
peak ratio values just higher than 1. The approximate CV
range, 0 to 30%, reflecting the observed variation of fatty acid peakareaswithina species, is in theareaof the curve
at ratio valuesbetween 1 and 1.3. (Two peaks with a 30%
difference intheir sizesgivearatio of 1.3.)Itis justthisarea
on the XÎkIXjk curve that shows the highest drop of the
function. Thus, the use of the plain peak area ratio is too
heavily influenced by variations in fattyacid peak areas.
To eliminate this effect, we developed an exponential
function which would show a minimal change in the above area and then drop rapidly, since variations greater than 30%, especially in large peaks (Fig. 3), always signifyalack
of identity between the species. Thus, the function was
developed so that it allowed a certainamountofpeakarea
variation with a minor impact on the similarity index.
1 2 3 4
J. CLIN. MICROBIOL.
ai
on April 11, 2020 by guest
http://jcm.asm.org/
Further weighting of individual peaks by their areas was used in the comparison.
Similarity indices were calculated betweenindividual fatty acid profiles by using the equation
n
Cf= E 0.5(Xlk +XJk)f(xlklxjk)
k= 1
where Cf is the similarity index between samples x1 and Xk, k is the peak number, Xlk and
xjk
are the relative areas of the kth peaks of strainsxi
andxj,xikIxjk
is the peak size ratio(xik/
Xjk was always inverted to be -1),
f(Xklxjk)
is the peak similarity value determining function, which is given bylOOkll[[(xlk/xjk)
-k2]k3
_ 1 +kl],
wherekl,
k2, and k3 arecoefficients determining the effect of peak size variation on the similarity coefficient, and
(xII<xjk)
- k2 is determined such that if[(xlk/xjk)
- k2] < 0, then[(xI/xjk)
- k2] = 0.When two samples,x,and
xj,
were compared, eachpeakin x1 was compared with the corresponding peak inxj.
If the same peak was found in both samples, therelative amounts were compared by using the functionftxlklxJk).
Ifthe values were equal, the value offtxlklxjk)
was 100. Ifthe peak was absent in curvexj,the value was0.Ifdifferences occurred in the amounts of the peaks, the value wasbetween 100 and 0. The constantsk1, k2, and k3 controlled the effect ofpeak area difference on the similarity coefficient. Figures 4B and C show the effects of coefficients k2 and k3 on the peak similarity value. The constant k2 moved the curve in a left-right direction, determining the length ofthe lag period before the increase in the peak area ratio began to decrease the similarity index. The constant k3affected theslope of the curve, determining howrapidly the function decreasedafter the lag period. With k1 values from0 to 10, the slope of the decreasing part of the function changed from an almost horizontal position (Fig. 4B, curve a) to an almost vertical one (Fig. 4B, curve b). Analyses of about 4,500 strains of normal clinical isolates gave the following optimal values, which were obtained empirically: k1 = 50, k2 = 0.25, k3 = 6.0. Figure 4D shows the shape of the function based on these selected values.The
fxIIIxjk)
value was further multiplied by the relative peak areas,Xk andXjk,
to weight the effects ofmetabolitesaccording to their peak areas. Thus, the similarity coeffi-cient,
Cf,
was 100 ifthe twochromatograms were identical, irrespective of the numbers of peaks and their absoluteareas. If the formula was used without weighting the peaks according to theirsizes, it took the followingform:
n
Cf= f
f(xlklxjk)ln
k= 1
Comparison ofthe different data processing methods used for species identification. Figure 5 shows tree graphs that
presenttheresultsofthecorrelation and
subsequent
clusteranalysisofthe 72strainsofthe 12species by the 10methods.
Table 1 shows thecalculated index valuesfor each method.
The ACL, calculated as the mean of thelowest clustering
levels of each species, reflectedhow small the variation was within species, optimally 100. There was no marked
differ-ence in ACL among the methods. However, the Stack
method was poorest in this respect. The
species-specific
CIvalues showed the ratiobetweenthe ACL(calculated sepa-rately for the threeBacteroides species, the three
Staphylo-coccus species, the two enterobacterial species Y. enteroco-litica and E. coli, and the two Clostridium species) and the mean clusteringlevel at which these specieswere
separated
from each other. Thus, values close to 1.0 showed a
com-plete inability to separate species. The further the values were below 1.0, the better was the ability to separate species. These intrafamilial index values must be compared with the numbers of species placed outside their own
clus-ters, which tended to increase as the species-specific CI values improved. On the other hand, the number of species misallocated increased as the CI values approached 1.0.
None of the analyzed strains was allocated to a wrong cluster by the developed comparison function (method A), which was capable of separating the three Bacteroides species, the three Staphylococcus species, and the two enterobacterialspecies into separate clusters (with CIvalues
within the range of 0.89 to 0.93). Almostequally goodresults were obtained with the overlapping method (method B). It misallocated only one strain, and the species separation was slightly lower than with the developed function (the CI
ranged between 0.84 and 0.95). The ACL values were, however, not as good (87.5 versus 92.4), showing higher variations within clusters. Another method with relatively good results was the Stack method, if it was modified by weighting the peaks accordingto peak areas (method C). It
misallocated six strains to wrong clusters. The other meth-odswere clearly poorer in their ability to separate species. The correlation coefficient method (method G), the angle
separation vector method (method H), and the method determining the peak presence (method F) gave similar results; they all had a marked tendency to combine several clusters.TheSpearman rank correlation method (method D) and the methods which compared the presence of peaks
(method E), peak size ratios directly (the Stack method [method
I]),
or peak sizes with the developed function (method J) but without weighting the peak areas allshowed similar results, with several strains placedoutside their own clusters. They also showed low ACL values.The developed comparison function with weighted peak
areas (method A) and the overlapping method (method B) were clearly superior to the others. These were further
tested for their ability to separate numerousStaphylococcus
andBacteroides strains.Of75 S.aureus, S. epidermidis, and S. hominis strains, 60 and 54were allocatedcorrectly by the
exponential function method and the overlapping method,
respectively. Of113 B.fragilis, B. thetaiotaomicron, and B. vulgatus strains, 97 were allocated correctly by the
expo-nential function method. The overlapping method allocated 86 strainscorrectly. This difference was due to theinability
of the overlapping method to distinguish between B. theta-iotaomicron andB. vulgatus strains, whichwere distributed among several closely related subclusters (data not pre-sented).
Clearly, if GLC is used for identification, a computer program must be available for comparison of the fatty acid profiles. Several methods have been published for the cor-relation analysis ofchromatographic data (1). However, to ourknowledge, no studies have beenpublishedthatcompare
the efficacy of various methods in identifying clinical iso-lates. Theresults of the presentstudy show thatcomparison
of GLC data, without an optimal comparison procedure,
may not lead to adequate identification ofspecies. The new exponential comparison function wasdeveloped to
adjust
to the variation ofchromatographic data. In thepresent studyno false clustering and a high ACL were seen. The good
results in comparison with other methods evidently de-pended on selecting a shape of the discriminant function which filtered out the noise included in the data butnot the parts relevant to correct identification.
on April 11, 2020 by guest
http://jcm.asm.org/
1750 EEROLA AND LEHTONEN
~~~~~
m.~~~~~~~~~~~~~~~~a.B.~~~~~~~
.a a .8
""""ff@@@~* x sX EE eE ao o" - --- ''''''
-50
O-o- Z U ci ci ci ci w la (aJw w >>> V V o
»~~~~~~~~~~~~~
ArA Acm(A A(ArAr--rAr- rAruceloi A -Ja -J jj ci fiUUU(jw w- sa w> > >. > >>16A.à. &061~~~~~~~~~~~~L
1,
1<
1".
E
|
,
.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~60
0- o~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The~~~
correatio
~
anlyi fnton method-j-
~~~
r ArArsp
r A(ratledtce
OCmceto
(meho
G) th
>nl
seaato
>method.
veto
-1j~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-Thcorreation
functiontoethod
analysisseparate
Sepaatio
the M0se
ho
(method
G),ethed-ngle
s amethod.
H
vewhe
thesespecies depended mainly on differencesin, the amounts used for the analysis of a large number of strains, it was of the fatty acids and
nlot
in their presence. Previously found to be poorer at analyzing profiles with quantitativepublished correlation
analysis
methodshave been compared variations only.for their ability to analyze bacterial fatty acid data (1). AIl other methods clearly gave poorer results. This was
Hypothetical data and fatty acidprofiles from coryneform due to their inability to adjust to variation of data and
bacteria were analyzed by using the correlation coefficient inconsistent prevalence of small peaks. Their results cannot
J. CLIN.MICROBIOL.
on April 11, 2020 by guest
http://jcm.asm.org/
IDENTIFICATION OF BACTERIA BY AUTOMATIC GLC
- S cus.^ ^ @ gS gs gg vx vx ^ ,,s s, 9 E Z E. e 9 sv * 9 * v v os maaaaaa mmm o o ,5 , t1
0~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~S
"
G
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~o
*** ggg*r*5 ---|||s^ b3bb*999*U----XXXX*ID***u**t.u
.~~~~~~~~~~~~~~~~~~~~~~~~~~~~w
10
:|~~~~~~~~~~~~~~~~
_ E! uuJ.arethesameasssminutheaaàappendix.
be~~~~~~~~~~~~~v
imroe
byrisn the
deecio tresol
(Ato
(Aataè
lee at
elinA
Ar A..j-J- 3Qin
(Oicrobiology
ica
.Sinc all anlse
r
peakarèasis essentialfor optimal comparison idcmfcAinofaclinical isolates.Thprlea o
Th
slctdprnipeinte nlyi prcdr in identi~- auaye sri
is.
atasferre
on-lie frm
thcroa-w.-~~~~~~~~~~~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~~~~~~~~~~~~~~~~~~~~~.0 60.~~~~~~~~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ w
-b
0--0
bb mUA~ ~aA a aa gi=-
-fynth clnia islaes
was thegorreatio
ànsusqet gaht h optran meitl oprdwt ecluster.oeGdtT--&aa'oai'O .
0 0
0. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-0
FIG. 5. Dendrograms showingthe resultsof thevariousanalysismethods of the GLCdata.Themethods,presentedaslettersfrom Ato
J, arethesame asin theappendix.
be
improved
by raising thedetection threshold to alevel atadvantages
in clinicalmicrobiology.
Since ailanalyses
arewhich the presence of
peaks
would have been more con- done bycomparing
chromatograms ofpresently andprevi-stant. Wherever the threshold was, there would always be
ously analyzed
strains, the user can create reference files.species
having peaks
with sizesvarying
about the selected The system canthus have differentapplications, depending
threshold level. These
peaks
wouldappearinconsistently
in on the type ofmicroorganisms
to beanalyzed
(13). We areGLC data and cause false
clustering.
Thus,weighting
of currentlyusing
the present analysis method in automaticpeak
areas isessential foroptimal comparison.
identification of clinical isolates. Thefatty
acidprofile
ofthe The selectedprinciple
in theanalysis
procedure
inidenti-analyzed
strain is transferred on-line from thechromato-fying
theclinicalisolateswasthecorrelation andsubsequent
graph
tothecomputer andimmediately
compared
with asetcluster
analysis
of the GLC data. This method has clear ofreference strains. Theresults oftheanalysis'
areprinted
asVOL.26, 1988 1751
on April 11, 2020 by guest
http://jcm.asm.org/
1752 EEROLA AND LEHTONEN
TABLE 1. Efficacy of comparison methods in species identification ofanalyzed strains
IntrafamilialCI'of: No. of No. of
Method' ACL
~~~~~~~~~~~~~~~species
speciesMethod ACL Bacteroides Staphylococcus Enterobacterial Clostridium outsideown inwrong
strains strains strains strains clusters clusters
A. Exponential functionweightedbypeak size 92.4 0.89 0.93 0.86 0.13 0 0
B. Overlap 87.5 0.92 0.95 0.84 0.37 1 1
C. Stack methodweightedbypeak size 73.8 0.82 0.91 0.81 0.28 1 6
D. Spearman ranked 90.8 0.78 0.93 0.92 0.81 8 1
E. Presenceofpeaks 85.7 0.74 0.93 0.95 0.79 9 5
F. Presenceof peaks weightedby peaksize 97.2 0.94 1.0 1.0 0.89 0 30
G. Correlationcoefficient 97.1 0.99 0.99 0.99 0.64 0 36
H.Angle separationvector 97.5 0.94 0.99 0.98 0.80 3 26
I. Stack method 64.2 0.69 0.82 0.78 0.63 8 3
J. Exponential functionnotweightedbypeaksize 78.1 0.57 0.78 0.57 0.78 9 1
aMethodsandlettersarethesame asintheappendix.
bCI= 1meanscompleteinabilitytodistinguishbetweenspecies. CTotal number of analyzed strainswas72.
dTherange -1 to0.1isrescaledtoOto100forcomparabilitywith other results.
a list ofstrains with the closest resemblance to the tested strain. The tested strain is also placed in a dendrogram consisting of reference strains. Thepart of the dendrogram containingthe test strain and those with the closest resem-blance is printedwiththe results. The location of thetested strain inthe referencedendrogramshows the level of corre-lation between the test strain and reference strains, as well asthecorrelationamongthereferencesthemselves,
indicat-ing the confidencelevelof the identification. Further, if the test strain is not among the references, the dendrogram
showshowcloselyit is related to certain reference species.
The system hasbeen testedwithabout4,500strains and has
proved useful inrapid identification of clinical isolates.
Depending on the type ofmicroorganisms analyzed and
hence on data variation, the optimal comparison function maydiffer from thepresent one. The parameters
defining
the slope of the present discriminating function can be deter-mined for each collection of standard reference isolatesindividually.
APPENDIX
A list ofthe correlation analysis methods used in the study is given below. Method Ainvolves anexponential comparison func-tionwith weighted peak areas.Itisexplainedinthetext.MethodB
involves a similarity index basedonoverlap offatty acid profiles: n
Co= 100- 2.S IX/kXjkI k= 1
Method C is the Stack method withweighted peakareas:
n
Csw= 2 O.S(xlk+ Xjk)(xIklxjk)!lOO k= 1
(Xl/kXjk)
is always inverted to be '1. Method D is the Spearmancoefficient of rankcorrelationmethod:
Cr16 2 d132 (n3-n)
k= 1
where
d.
is thedifferenceofranks of the peak pair. Method E is the similarity index method based on the presence of peaks:n/
CP = 2
f(xk,xjk)
nk = 1
where
J(XlkXjk)
= 100 ifXIk
andxjk are present in both curves andJ(X1kXJk)
= O if XkorXk is missing ineithercurve.Method F is thesimilarity index method based on the presence of peaks with weightedpeakareas:
n/ Cpw = 2 O.S(XIk +Xjk)f(XikXjk) 10
k= 1
where
fJXikXJk)
= 100 ifxIk andxjk
arepresent in both curvesandftXikXjk)
= O ifxlkorxjkismissing in eithercurve. MethodG is thecorrelationcoefficient method:
C,= 50(1 +
rj),
where n/
rij= 2 (Xlk-Xl)(Xjk -Xj)
k= 1
[2(Xk-
Xà)2I[
(Xi1k -Xj)I12Method H is the similarity index method based on the angle of
separationvectors:
n
C=
X"112X
1/2Ca XIkjk
k= 1
MethodI is the Stackmethod,notweightedby peakareas:
n
Cs = 2 (xÎkIxjk)n
k= 1
where
(Xik/Xjk)
is always inverted to be 21. Method J is the exponential comparisonfunction,notweighted bypeakareas.Itis explainedinthe text. For allofthe abovemethods,Xk andxjkarepercentageareasof the kth peaks oforganismsi andj, respectively;
xi
and xjare meanpercentagepeakareasoforganismsland j; andnis thenumber ofdifferent peaks in two comparedorganisms. ACKNOWLEDGMENTS
We thank Kirsti Tuomela and Marja-Riitta Terasjarvi for their excellent technicalassistance and Eija Nordlund for help in prepa-ration of themanuscript.
LITERATURE CITED
1. Bousfield, I. J., G. L. Smith, T. R. Dando, and G. Hobbs. 1983.
Numericalanalysis of total fatty acid profiles in the identifica-tion ofcoryneform, nocardioform, and some other bacteria. J. Gen. Microbiol. 129:375-394.
2. Drucker,D. B.1974.Chemotaxonomic fatty acid fingerprints of
somestreptococci withsubsequentstatistical analysis. Can. J. Microbiol. 20:1723-1728.
3. Drucker, D. B. 1981. Analysis of structural components of
J. CLIN.MICROBIOL.
on April 11, 2020 by guest
http://jcm.asm.org/
microorganisms bygaschromatography, p. 166-291. In D.B.
Drucker(ed.), Microbiological applications ofgas
chromatogra-phy. CambridgeUniversity Press, Cambridge.
4. Drucker,D. B.1981.Computation ofgaschromatographic data,
p.400-423.InD. B.Drucker (ed.),Microbiologicalapplications
of gas chromatography. Cambridge University Press,
Cam-bridge.
5. Guerrant, G. O., M. A. Lambert, and C. W. Moss. 1982. Analysis ofshort-chain acids from anaerobic bacteriaby high-performanceliquid chromatography.J.Clin.Microbiol. 16:355-360.
6. Halebian,S.,B. Harris,S. M. Finegold, and R. D. Rolfe. 1981.
Rapid method that aids in distinguishing gram-positive from
gram-negative anaerobic bacteria. J. Clin. Microbiol. 13:444-448.
7. Hansen, M. V., and L. P. Elliott. 1980. New presumptive identification testfor Clostridium perfringens: reverse CAMP
test.J. Clin.Microbiol. 12:617-619.
8. Ikemoto,S.,H.Kuiraishi, K. Komagata, R. Azuma, T. Suto, and H. Murooka. 1978. Cellular fatty acid compositionin Pseudo-monasspecies. J.Gen.Apple.Microbiol. 24:199-213.
9. Janzen, E., K. Bryn, T. Bergan, and K. Bovre. 1974. Gas
chromatography of bacterial whole cell methanolysates. V.
Fatty acid composition of neisseriae and moraxellae. Acta
Pathol. Microbiol. Scand. Sect.B 82:769-779.
10. Janzen, E., K. Bryn, T. Bergan, and K. Bovre. 1974. Gas
chromatography of bacterial whole cell methanolysates. VI. Fattyacid composition of strainswithinMicrococcaceae. Acta Pathol.Microbiol. Scand. Sect. B 82:785-798.
11. Janzen, E. 1984. Analysis of cellular components in bacterial classification and diagnosis, p. 257-302. In G. Odhan, L. Lars-son, and P.-A. Màrdh (ed.), Gaschromatography mass spec-trometry applications in microbiology. Plenum Publishing Corp., NewYork.
12. Moss, C. W., and O. L. Nunez-Montiel. 1982. Analysis of short-chain acids from bacteria by gas-liquid chromatography with afused-silica capillary column. J. Clin. Microbiol. 15:308-311.
13. Rizzo, A. F.,H.Korkeala, andI. Mononen. 1987. Gas-chromato-graphicanalysis of cellular fatty acids and neutral monosaccha-ridesin the identification of lactobacilli. Appl. Environ. Micro-biol. 53:2883-2888.
14. Sneath, P. H. A., and R. R. Sokal. 1973. The estimation of taxonomicresemblance, p. 114-187. In D. Kennedy andR. D.
Park(ed.), Numerical taxonomy.W. H.FreemanandCo., San Francisco.
15. Sneath,P. H.A.,and R. R. Sokal.1973.Taxonomic structure, p. 288-308. In D. Kennedy and R. D. Park (ed.), Numerical taxonomy.W. H. FreemanandCo., San Francisco.
16. Sokal, R. R. 1985. The principles of numerical taxonomy: twenty-five years later, p. 1-20. In M. Goodfellow, D. Jones, andF.G. Priest(ed.),Computer assisted bacterialsystematics. Academic Press,Inc. (London), Ltd., London.
17. Stack,M. V., H. D. Donoghue, J. E. Tyler, and M. Marshall.
1976.Comparisonof oralstreptococcibypyrolysis GLC,p.
57-68. In C. E. R. Jones and C. A. Cramers (ed.), Analytical pyrolysis. Elsevier Biomedical Press, Amsterdam.