Improved Identification of Epidemiologically Related Strains of Salmonella enterica by use of a Fusion Algorithm Based on Pulsed Field Gel Electrophoresis and Multiple Locus Variable Number Tandem Repeat Analysis

(1)

J

OURNAL OF

C

LINICAL

M

ICROBIOLOGY

, Nov. 2010, p. 4072–4082

Vol. 48, No. 11

0095-1137/10/$12.00

doi:10.1128/JCM.00659-10

Copyright © 2010, American Society for Microbiology. All Rights Reserved.

Improved Identification of Epidemiologically Related Strains of

Salmonella enterica

by use of a Fusion Algorithm Based on

Pulsed-Field Gel Electrophoresis and Multiple-Locus

Variable-Number Tandem-Repeat Analysis

䌤

†

S. L. Broschat,

1,2

* D. R. Call,

1,2

M. A. Davis,

2

D. Meng,

1

S. Lockwood,

1

R. Ahmed,

3

and T. E. Besser

2

School of Electrical Engineering and Computer Science

1

_{and Department of Veterinary Microbiology and Pathology,}

2

_Washington

State University, Pullman, Washington 99164, and National Microbiology Laboratory, Canadian Science Centre for

Human and Animal Health, Winnipeg, MB, Canada

3

Received 29 March 2010/Returned for modification 18 June 2010/Accepted 17 August 2010

Pulsed-field gel electrophoresis (PFGE) and multiple-locus variable-number tandem-repeat analysis

(MLVA) are used to assess genetic similarity between bacterial strains. There are cases, however, when neither

of these methods quantifies genetic variation at a level of resolution that is well suited for studying the

molecular epidemiology of bacterial pathogens. To improve estimates based on these methods, we propose a

fusion algorithm that combines the information obtained from both PFGE and MLVA assays to assess

epidemiological relationships. This involves generating distance matrices for PFGE data (Dice coefficients)

and MLVA data (single-step stepwise-mutation model) and modifying the relative distances using the two

different data types. We applied the algorithm to a set of

Salmonella enterica

serovar Typhimurium isolates

collected from a wide range of sampling dates, locations, and host species. All three classification methods

(PFGE only, MLVA only, and fusion) produced a similar pattern of clustering relative to groupings of common

phage types, with the fusion results being slightly better. We then examined a group of serovar Newport isolates

collected over a limited geographic and temporal scale and showed that the fusion of PFGE and MLVA data

produced the best discrimination of isolates relative to a collection site (farm). Our analysis shows that the

fusion of PFGE and MLVA data provides an improved ability to discriminate epidemiologically related isolates

but provides only minor improvement in the discrimination of less related isolates.

Salmonellosis is one of the most common food-borne

dis-eases in the United States (5). Consequently, it is important to

understand how

Salmonella

strains disseminate within and

be-tween reservoirs and environments. Many molecular typing

tools have been used for this purpose (11). Of these methods,

pulsed-field gel electrophoresis (PFGE) is considered by many

to be the gold standard for strain typing, and variable-number

tandem-repeat (VNTR) assays are powerful alternative or

complementary typing tools (3, 22). Both methods offer a high

degree of genetic resolution for strain typing, depending on

several factors.

PFGE involves separating chromosomal DNA

macro-re-striction fragments by size, and strains are discriminated based

on the resulting band pattern observed after electrophoresis

has been completed. It is one of the most reproducible and

highly discriminatory typing techniques and has been widely

and successfully used for a variety of

Salmonella enterica

sero-vars (12, 15); for many situations, PFGE is capable of

discrim-inating between closely related strains. In addition, the use of

the assay to analyze different serovars does not require a great

deal of modification, as might be required with procedures that

are dependent on PCR. Difficulties arise when strains are very

closely related (i.e., poor discrimination [18, 27]) or when

bands comigrate in the gel or identically sized bands represent

completely different fragments of chromosomal DNA and

thereby produce spurious matches (6). These complications

are more pronounced when a large number of bands are

gen-erated by the restriction digest (4). In addition, while band

patterns convey a crude degree of genetic relatedness, a large

number of independent restriction digests would be needed to

infer an accurate phylogenetic relationship (6).

Multiple-locus

variable-number

tandem-repeat

analysis

(MLVA) is a PCR-based technique that relies on the

amplifi-cation of chromosomal or plasmid DNA that encompasses

short tandem repeats of a DNA sequence. The tandem repeats

are prone to higher-than-background mutation rates due to

DNA strand slippage during replication (23), and thus, the

amplified fragments will vary in length depending on the

num-ber of repeats harbored at a given locus. Different fragment

lengths are tallied either as the total length (base pairs) or the

estimated number of repeat units, and each discretely sized

fragment is considered a unique “allele” for the locus under

investigation. Because of the relatively high mutation rate,

strains can accumulate distinctive allele patterns within a

rel-atively short period of time (5). Furthermore, the technique

can be multiplexed and automated and is conducive to rapid

and relatively high-throughput strain typing. MLVA assays are

relatively robust (5, 15–17) and, while not perfect, they can

provide phylogenetic information even with a limited number

* Corresponding author. Mailing address: School of EECS,

Wash-ington State University, P.O. Box 642752, Pullman, WA 99164-2752.

Phone: (509) 335-5693. Fax: (509) 335-3818. E-mail: [email protected]

.edu.

† Supplemental material for this article may be found at http://jcm

.asm.org/.

䌤

_{Published ahead of print on 25 August 2010.}

4072

on May 16, 2020 by guest

http://jcm.asm.org/

(2)

of loci (13, 18). While access to a sequenced genome

dramat-ically speeds the ability to establish new assays (3), it is not a

requisite to assay development. The primary limitations of the

technique include the potential need for a new set of loci for

every species or serovar under investigation and the fact that

some loci are very “unstable” and can “disappear” from some

strains or lineages; this produces the equivalent of an

uninfor-mative “null” allele. Mutation rates can also vary between loci

(5, 24, 25); if ignored, this factor can introduce bias into

com-parative analyses.

Clearly, PFGE and MLVA offer different technical and

in-terpretive advantages and disadvantages, but it is important to

emphasize that the nature of the methodological and

interpre-tive errors is independent between the techniques. For

exam-ple, errors due to comigration of bands for PFGE are

inde-pendent of band size estimation errors for MLVA because

differences in MLVA band size are not detectable using PFGE

and macro-restriction fragments are generally independent of

tandem-repeat sequences. Provided that most of the

experi-mental variation from these two methods is uncorrelated (i.e.,

independent), it is possible to combine results from the two

methods to produce improved estimates (8), and this premise

underlies the current study.

Our objective was to determine whether combining the

in-formation obtained from both PFGE and MLVA assays

pro-duces more rigorous and discriminatory analyses of bacterial

pathogens, such as

Salmonella

. Two sets of

Salmonella enterica

isolates were used in this study; one set included serovar

Ty-phimurium isolates from a wide range of sampling dates,

lo-cations, and host species while the other set included a group

of serovar Newport isolates collected over a limited geographic

and temporal scale for a single host species. The results of the

different typing methods were assessed by comparison with

those of phage-typing assays (serovar Typhimurium) and with

known epidemiological relationships (serovar Newport). To

interpret the MLVA data, we employed a metric that

incor-porates a stepwise-mutation model, and to interpret the PFGE

data, we employed Dice coefficients to construct distance

ma-trices. Our analysis shows that the fusion of the two typing

methods provides an improved ability to discriminate between

isolates when PFGE and MLVA separately provide partial but

incomplete discrimination between strains with a high degree

of probable genetic similarity.

MATERIALS AND METHODS

Salmonellastrains.Two sets of isolates were used for this study. Set A included 37S. entericaserovar Typhimurium strains that were previously collected from different animal hosts and from different locations and time periods (see the figures for strain designations and descriptors). Because these isolates were epidemiologically unrelated, we assumed that they encompassed a high degree of

genetic variability. Set B included 63S. entericaserovar Newport isolates, mostly

collected from cattle in Washington State, and this set was assumed to represent less genetic diversity because of the restricted geographic representation, inclu-sion of multiple isolates from individual farms, and a limited temporal scale.

Salmonellaserovar Typhimurium strains were phage typed at the National Mi-crobiology Laboratory, Canadian Science Center for Human and Animal Health, Winnipeg, Manitoba. Serovar Newport isolates were tested for antibiotic resis-tance using a disc diffusion method (2) according to Clinical and Laboratory Standards Institute (NCCLS) guidelines (19, 20). Northwestern isolates from cattle were tested for susceptibility to a panel of antimicrobial drugs that

in-cluded ampicillin (10␮g), chloramphenicol (30␮g), gentamicin (10␮g),

kana-mycin (30␮g), streptomycin (10␮g), tetracycline (30␮g), triple-sulfa (a

combi-nation of sulfadiazine, sulfamethazine, and sulfamerazine) (250 ␮g),

trimethoprim-sulfamethoxazole (1.25 ␮g to 23.75␮g), ceftazidime (30 ␮g),

amoxicillin-clavulanic acid (20/10␮g), and nalidixic acid (30␮g) (BD

Diagnos-tics, Sparks, Maryland). Northeastern isolates were tested with the same panel

except that a sulfisoxazole disc (250␮g) was substituted for the triple-sulfa disc.

Pulsed-field gel electrophoresis.We followed a standard PFGE protocol for

Salmonella enterica, using an XbaI restriction digest (21). Briefly, genomic DNA was digested in agarose plugs with the restriction enzyme XbaI, and the resulting DNA fragments were gel separated using a CHEF-DR II (Bio-Rad, Hercules, CA) apparatus. The electrophoresis conditions included an initial pulse time of 2.2 s, final pulse time of 63.8 s, running temperature of 14°C, and run time of 18 to 20 h at 6 V/cm. PFGE gels were stained with ethidium bromide and visualized using a UV transilluminator. Gel images were analyzed using Bionumerics ver-sion 4.6 (Applied Maths, Sint-Martens-Latem, Belgium). Estimated band sizes were exported from Bionumerics for the current study.

Multiple-locus variable-number tandem-repeat analysis.ForS. enterica

sero-var Typhimurium isolates, four of five previously described VNTR loci were employed for MLVA (STTR5, STTR6, STTR9, and STTR10pl), following a published protocol (18) with the addition of a VNTR locus identified in-house (STTR11) (Forward, GATAAGCCGTACTGTTCAGG, and

STTR11-Reverse, TACTCCTTTGTGGTCTACGC). For theS. entericaserovar Newport

strains, two Typhimurium loci (STTR5 and STTR6) (18) and four published Newport-specific loci were employed (NewportA, NewportB, NewportM, and NewportL) (6). The PCRs for both serovars were conducted as previously de-scribed (6, 20). Capillary electrophoresis was carried out at the Washington State University Genomics Core using a 3730 DNA analyzer with Pop-7 polymer (Applied Biosystems). The resulting electropherograms were analyzed using GeneMarker software (Softgenetics LLC, State College, PA).

Data analysis.Dice similarity coefficients were calculated using Bionumerics

(Applied Maths, Austin, TX) from PFGE data to generate the distance matrix, and the unweighted-pair group method with arithmetic mean (UPGMA) algo-rithm was used to construct a dendrogram. For the MLVA data, the total length of the tandem repeats was divided by the estimated size of each repeat to obtain the number of tandem repeats for each locus and each strain. There were five

available loci with tandem repeats forS. entericaserovar Typhimurium, but data

from one locus were not used because it was from a plasmid locus and⬍73% of

37 bacterial isolates were positive for this locus. ForS. entericaserovar Newport,

there were six loci, all of which were used. Because passage experiments indicate that VNTR mutations are usually composed of a single step (5, 24), we modeled our data using a single-step stepwise-mutation model (SMM). Based on this

statistical model, we estimated the distanceS(the total number of single steps)

between two lineages (two different isolates) and their most recent common

ancestor (MRCA) using the number of tandem repeats,XL, of the two lineages.

The distances,S, for all lineage pairs were then used to construct the distance

matrix, and UPGMA was used to obtain a dendrogram.

If␮is the rate of stepwise mutations per generation and if we assume that the

gain or loss of a repeat is equally probable, then the following conditional

probabilities,P, characterize the single-step SMM:P(Xt⫹1⫽i⫹1兩Xt⫽i)⫽

P(Xt⫹1⫽i⫺1兩Xt⫽i)⫽ ␮/2,P(Xt⫹1⫽i兩Xt⫽i)⫽1⫺ ␮, andP(储Xt⫹1⫺Xt储ⱖ

2兩Xt⫽i)⫽0, whereidenotes the number of tandem repeats at distancetand

tis an integer number between zero and infinity. Based on these conditional

probabilities, the probability of the distancetis given by (26)P(t兩n0, …, nk)⫽

N(t)/D(t), whereN(t)⫽e⫺(␭ ⫹2␮n)t_⌸

j k

⫽0[Ij(2␮t)]j n

andD(t)⫽ 兰0⬁e⫺

(␭ ⫹2␮n)t

⌸jk⫽0[Ij(2␮t)]jndt. The equation forPassumes that the mutation rate␮is

constant for all loci, which is approximately true for ourS. entericaserovar

Typhimurium data;nmdenotes the locus number where the subscriptm⫽0, 1,

2,…,kis the difference between the number of tandem repeats for two lineages,

andm⫽kis the maximum number of differences;nis the number of loci used;

␭is a parameter associated with the distance to a MRCA (in this work,␭ ⫽

0.0002 was found to give satisfactory results [26]); andIjis thejth-order modified

Bessel function of type 1. The distanceSis the value oftwith the maximum

probability.

Fusion algorithm.Dendrograms constructed from PFGE or MVLA data can

differ substantially. Consequently, if we assume that both types of data contain useful information as well as error, it is possible that better results can be obtained by combining the data. In fact, it is known that (i) if two different algorithms used to describe a set of samples give different results, (ii) if the error for each result is less than the error associated with randomly generated results, and (iii) if the errors for both are uncorrelated, then a combination of these algorithms will give better results than either of the two algorithms alone (8). For our problem, we have different data (PFGE and MLVA) from the same sets of samples. We can safely assume that the error for both the MLVA and PFGE algorithms is less than the error for random clustering, because both PFGE and

on May 16, 2020 by guest

http://jcm.asm.org/

(3)

[image:3.585.84.512.92.671.2]

FIG. 1. Generalized tree for 37

S. enterica

serovar Typhimurium isolates generated from the fusion algorithm. Parameters used in this analysis

included

r

between 0 and 1 and the threshold parameter

thr

between 0.05 and 1. Information includes isolate designation, source, phage type,

collection date (month/day/year), and state where the isolate was collected. The scale bar represents a composite measure of genetic distance

determined using the averaged values of the normalized distance matrices (see Materials and Methods).

4074

BROSCHAT ET AL.

J. C

LIN

. M

ICROBIOL

.

on May 16, 2020 by guest

http://jcm.asm.org/

(4)

[image:4.585.82.512.96.669.2]

FIG. 2. Dendrogram for 37

S. enterica

serovar Typhimurium isolates constructed using UPGMA and a distance matrix obtained using a

single-step stepwise mutation model for VNTR data (MLVA only). The scale bar represents a measure of genetic distance.

on May 16, 2020 by guest

http://jcm.asm.org/

(5)

MLVA can recapitulate epidemiological relationships (6, 14). Furthermore, we can assume that the errors are uncorrelated because the PFGE and MLVA assays measure differences that arise from different genetic mechanisms. Con-sequently, because the two methods provide different results, it is probable that

combining the PFGE and MLVA data will provide a more comprehensive picture of the underlying population structure of these strains.

Several strategies can be used to combine different types of data. One strategy is to treat each data type independently and produce two independent

dendro-FIG. 3. Dendrogram for 37

S. enterica

serovar Typhimurium isolates constructed using UPGMA with Dice coefficients for PFGE data (PFGE

only). The scale bar represents a measure of genetic distance.

4076

BROSCHAT ET AL.

J. C

LIN

. M

ICROBIOL

.

on May 16, 2020 by guest

http://jcm.asm.org/

[image:5.585.83.519.71.646.2]

(6)

grams that are then combined to form a single dendrogram. While conceptually simple, this approach weights all sources of information equally, which may not be an optimal approach. Another strategy is to combine the data sets together before generating a dendrogram, but it may be difficult to combine the data if they differ in type (e.g., discrete and continuous), and even if this is accom-plished, a suitable approach for evaluating the combined data may not exist. An alternative approach is to process each type of data using an algorithm that is appropriate for that data type and then fuse the results at some midpoint in the process; this is the approach we employed.

We begin with two distance matrices, one for the PFGE data set and one for the MLVA data set, as described above. One distance matrix is used to construct a dendrogram, and a threshold is selected (see below) to define distinct clusters from this dendrogram. If two strains in the second distance matrix are grouped together in one of the clusters formed from the first distance matrix, the distance between them is reduced (see below); if these two strains are not grouped together, then the distance in the second matrix is left unchanged. After all pairwise distance values are adjusted based on the clusters from the first matrix, a final dendrogram is generated. Both sets of data can be used in alternating roles: PFGE data are used to create the clusters while MLVA data are used to create a second distance matrix to be modified, and MLVA data are used to create the clusters while PFGE data are used to create a second distance matrix to be modified.

For the fusion algorithm described above, values for two parameters must be

chosen. The first is the threshold value,thr, which divides the initial dendrogram

into distinct clusters. The second is the degree that each distance value should be

reduced in the modified distance matrix. If Dis the distance matrix to be

modified with elementsd(i,j) andD*is the modified distance matrix, the

ele-ments ofD*are given in terms ofd(i,j) by the equationd*(i,j)⫽d(i,j) if bacterial

samplesiandjare not in the same cluster and by the equationd*(i,j)⫽r䡠d(i,j)

if bacterial samplesiandjare in the same cluster, with 0⬍rⱕ1. Thus, the

weight parameterrdictates how much the distance value will be reduced. The

choice of these two parameters,thrandr, changes the results, and there is no

obvious way of knowing what values are the best to use. Our approach was to use

the entire range of values for the thresholdthr, i.e., 0.05 to 1, and several ranges

of values for the weight parameterr. For the former, this corresponds to a range

from having each strain form its own cluster (thr⫽0.05) to the opposite extreme

where all strains are grouped into a single cluster (thr⫽1). For each set of

ranges, we created a dendrogram using UPGMA with the modified distance matrix; from the resulting set of dendrograms, we constructed a “generalized tree” using Consense from the software package PHYLIP (version 3.68) with the default parameters (10).

The set of dendrograms used to construct the generalized tree is a combination of two sets of data, one created when PFGE clusters are used to modify the distance matrix and the other when MLVA clusters are used. When the sets are combined, a conflict will occur if they disagree completely on the relationship between two strains. For example, the PFGE data may indicate that strains A and B always occur as a pair, while the MLVA data may indicate that strains A and C always occur as a pair. If this happens, Consense (10) will construct a tree that depends on the order of the input. To prevent such an occurrence, we “break the tie” by multiplying the values of one set of data by 0.501 and the other set by 0.499.

One other issue arises when the two different data sets are combined; their distance measures are not the same. The manner in which we resolved this was to scale the dendrogram produced by Consense using an average-distance matrix obtained by averaging the values of the normalized PFGE and MLVA distance matrices. Normalization was achieved by dividing all matrix values by the max-imum distance to obtain values between 0 and 1. Then, the final distances were determined as follows. Find all leaves on a branch (e.g., branch 1 contains A and B; branch 2 contains C, D, and E; and F contains itself). Find all combinations of leaves on one branch and leaves on the other branch and obtain the distance for each combination from the average distance matrix. Use the maximum value as the distance between the two branches.

The fusion program can be downloaded at http://www.vetmed.wsu.edu /research_vmp/MicroArrayLab/.

RESULTS AND DISCUSSION

Comparison of genetically diverse strains of

S. enterica

sero-var Typhimurium.

To compare genetically diverse strains of

S. enterica

serovar Typhimurium, generalized trees were

con-structed using the fusion, MLVA, and PFGE algorithms

de-scribed above. For the fusion algorithm, we present results

from weight parameter

r

between 0 and 1. Potential ties were

broken by multiplying PFGE cluster data by a factor of 0.499

and MLVA cluster data by a factor of 0.501. This weights the

analysis in favor of MLVA under the assumption that there is

more phylogenetically relevant information available from

MLVA data than from PFGE (7).

Assessing the validity of our analysis is complicated by the

lack of a gold standard with which to compare our results.

Indeed, barring a complete genome sequence for each strain

and suitable algorithms for assessing genetic relationships, the

only potential gold standards available are multilocus sequence

typing (MLST) and phenotypic characteristics. Given the

prob-able lack of genetic variation for intraserovar MLST

compar-isons (9), we chose to compare the

S. enterica

serovar

Typhi-murium strains using susceptibility to a panel of lytic phages as

a measure of relatedness. The Centre for Infections of the

Health Protection Agency (Colindale, London, United

King-dom) provided 38 phages (1), and 5 additional phages were

developed at the National Microbiology Laboratory

(Win-nepeg, Canada). Our analysis assumes that strains with similar

phage susceptibilities are more closely related than strains with

dissimilar phage susceptibilities. All of the strains were

sub-jected to susceptibility testing using a panel of 31 phages.

Strains that were judged untypeable with this panel were

sub-jected to testing with an additional 12 phages. Only one strain

(8745) was considered untypeable using the combined panel of

43 lytic phages (see Tables S1a and b in the supplemental

material).

Each dendrogram was divided into six clusters, A to F, and

the strains within a cluster were compared according to their

phage susceptibilities with the expectation that, on average,

strains with greater genetic similarity will have fewer phage

susceptibility mismatches. Because lytic phage susceptibility is

unlikely to have a 1:1 correspondence with genetic similarity,

we arbitrarily selected a cutoff where isolates were considered

“more similar” if they had

ⱕ

7 differences in the lytic phage

panel or “less similar” if they had

⬎

7 differences in lytic phage

susceptibility. For this putatively diverse set of isolates, the

correspondence between lytic phage results, fusion (Fig. 1),

MLVA (Fig. 2), and PFGE (Fig. 3) was small but measurable

(Table 1). The fusion results included one or three fewer phage

susceptibility mismatches relative to the MLVA and PFGE

results, respectively. We also examined the ability of the three

approaches to distinctly classify unique phage types within the

same clusters (Fig. 1 to 3). With a discrimination index

calcu-TABLE 1. Number of unrelated strains in each of the clusters

delineated in the dendrograms of Fig. 1 to 3

a

Algorithm No. of strains in cluster: Total

no.

A B C D E F

Fusion

1

2

0

1

4 MLVA only

1

0

3

0

1

5 PFGE only

0

2

3

2

0

7

a

A phage susceptibility panel (see Tables S1a and b in the supplemental material) was used as a proxy for genetic similarity in which we arbitrarily

classified isolates as genetically similar if differing byⱕ7 lytic phage

susceptibil-ities and genetically distinct if differing by⬎7 lytic phage susceptibilities. Note

that the number of strains in each cluster is not the same for each algorithm.

on May 16, 2020 by guest

http://jcm.asm.org/

[image:6.585.301.542.90.153.2]

(7)

[image:7.585.110.483.54.649.2]

FIG. 4. Generalized tree for 63

S. enterica

serovar Newport isolates generated from 160 dendrograms. The dendrograms were obtained using

the fusion algorithm with the weight parameter,

r

, between 0 and 1 and the threshold parameter,

thr

, between 0.05 and 1. Information includes

isolate designation, collection date, county where the isolate was collected (Washington counties include Adams, Grant, King, Snohomish,

Whatcom, and Yakima; Idaho counties include Gooding, Jerome, and Twin Falls; Clay, Clinton, and Utah Counties are in Nebraska, New York,

and Utah, respectively), and antibiotic resistance phenotype (see Materials and Methods). Resistance profile abbreviations: A, ampicillin; C,

chloramphenicol; K, kanamycin; Sxt, trimethoprim-sulfamethoxazole; S, streptomycin; T, tetracycline; Amc, amoxicillin-clavulanic acid; Su,

triple-sulfa; Caz, ceftazidime; SUSCEPT, susceptible to all antimicrobials tested.

4078

on May 16, 2020 by guest

http://jcm.asm.org/

(8)

[image:8.585.107.490.71.694.2]

FIG. 5. Dendrogram for 63

S. enterica

serovar Newport isolates constructed using UPGMA and a distance matrix obtained using a single-step

stepwise mutation model for VNTR data (MLVA only). The scale bar represents a measure of genetic distance.

on May 16, 2020 by guest

http://jcm.asm.org/

(9)

[image:9.585.107.483.75.689.2]

FIG. 6. Dendrogram for 63

S. enterica

serovar Newport isolates constructed using UPGMA with Dice coefficients for PFGE data (PFGE only).

The scale bar represents a measure of genetic distance.

4080

BROSCHAT ET AL.

J. C

LIN

. M

ICROBIOL

.

on May 16, 2020 by guest

http://jcm.asm.org/

(10)

lated as one minus the average proportion of unique phage

types per cluster, the fusion algorithm had an improved but not

statistically significantly different discrimination index (0.34

⫾

0.13 [average

⫾

standard error of the mean]) compared with

those of PFGE (0.25

⫾

0.12) and MLVA (0.20

⫾

0.10). Thus,

for this relatively diverse set of isolates, we did not find a

compelling benefit to merging PFGE and MLVA data using

the fusion algorithm, although the fusion algorithm does

give better statistical performance than random allocation

of phage types to clusters (0.10, standard deviation

⫽

0.03,

n

⫽

1,000).

Comparison of genetically similar

S. enterica

serovar

New-port strains.

To compare genetically similar

S. enterica

serovar

Newport strains, generalized dendrograms were constructed

using the fusion algorithm, PFGE-only data, and MLVA-only

data with the same parameters discussed in the previous

sec-tion (Fig. 4, 5, and 6). As with the serovar Typhimurium

anal-ysis, we lacked a gold standard for assessing the performance

of our results relative to the true genetic relationships, but we

did have epidemiologically relevant information as inferred

from the location (farm) where the isolates were collected. We

assumed that isolates collected from the same farm were more

likely to be genetically similar relative to isolates collected at

different farms. Consequently, we examined all cases in the

dendrogram where serovar Newport isolates were

indistin-guishable and calculated a discrimination index as one minus

the average proportion of distinct farms within these

“mono-phyletic” groups. The number of monophyletic groups ranged

from 6 (fusion and PFGE) (Fig. 4 and 6) to 9 (MLVA) (Fig. 5),

with a significantly better discrimination index for the fusion

algorithm (0.76

⫾

0.05) than for MLVA (0.31

⫾

0.12) and

PFGE (0.39

⫾

0.09) (

P

⫽

0.004; analysis of variance). Based on

this assessment procedure, it is clear that combining the data

from PFGE and MLVA analyses provided a greater degree of

isolate discrimination as a function of the farm where the

isolates were collected. While the antimicrobial resistance

pro-files cannot be used quantitatively to validate the fusion results,

the clustering of the susceptible strains by the fusion algorithm

(Fig. 4) relative to the clustering by the MLVA (Fig. 5) and

PFGE (Fig. 6) algorithms provides further evidence that the

fusion algorithm provides more biologically consistent

classifi-cations.

Clearly, combining data from PFGE and MLVA can serve

to buffer the extremes in the level of genetic variation

mea-sured by these two methods alone, but the degree of benefit

may be a function of the genetic similarity among the

iso-lates under consideration. In cases in which isoiso-lates

repre-sent a relatively diverse and epidemiologically unrelated

group of bacteria, combining PFGE and MLVA data may

provide a minor benefit. In contrast, for closely related

isolates, as judged by their collection from identical farm

locations in this study, the combination of PFGE and

MLVA data are likely to provide a higher degree of

dis-crimination at the farm level. The advantage of increased

discrimination for closely related isolates should extend to

other cases of epidemiologically related strains, including

situations where there is a need to trace food-borne disease

outbreaks and infections in clinical settings.

ACKNOWLEDGMENTS

K. N. K. Baker provided invaluable technical assistance. Martin

Wiedmann, Cornell University, kindly provided northeastern

S

.

Typhi-murium isolates.

This project has been funded in part with federal funds from the

National Institute of Allergy and Infectious Diseases, National

Insti-tutes of Health, Department of Health and Human Services, under

contract no. NO1-AI-30055, and by the Agricultural Animal Health

Program, College of Veterinary Medicine, Washington State

Univer-sity, Pullman, WA. Scholarship support for D.M. was provided by the

Carl M. Hansen Foundation.

REFERENCES

1.Anderson, E. S., L. R. Ward, M. J. Saxe, and J. D. de Sa.1977.

Bacterio-phage-typing designations ofSalmonella typhimurium. J. Hyg. (Lond).78:

297–300.

2.Bauer, A. W., W. M. Kirby, J. C. Sherris, and M. Turck.1966. Antibiotic

susceptibility testing by a standardized single disk method. Am. J. Clin.

Pathol.45:493–496.

3.Benson, G.1999. Tandem repeats finder: a program to analyze DNA

se-quences. Nucleic Acids Res.27:573–580.

4.Call, D. R., J. G. Hallett, S. G. Mech, and M. Evans.1998. Considerations for measuring genetic variation and population structure with multilocus

finger-printing. Mol. Ecol.7:1337–1346.

5.Call, D. R., L. Orfe, M. A. Davis, S. Lafrentz, and M. S. Kang.2008. Impact of compounding error on strategies for subtyping pathogenic bacteria.

Food-borne Pathog. Dis.5:505–516.

6.Davis, M. A., K. N. K. Baker, D. R. Call, L. D. Warnick, Y. Soyer, M.

Wiedmann, Y. Gro¨hn, P. L. McDonough, D. D. Hancock, and T. E. Besser.

2009. Multilocus variable-number tandem-repeat method for typing

Salmo-nella entericaserovar Newport. J. Clin. Microbiol.47:1934–1938.

7.Davis, M. A., D. D. Hancock, T. E. Besser, and D. R. Call.2003. Evaluation

of pulsed-field gel electrophoresis as a tool for determining the degree of

genetic relatedness between strains of Escherichia coliO157:H7. J. Clin.

Microbiol.41:1843–1849.

8.Dietterich, T. G.2000. Ensemble methods in machine learning, p. 1–15.

InJ. Kittler and F. Roli (ed.), Proceedings of the First International

Workshop on Multiple Classifier Systems. Springer Verlag, London, United Kingdom.

9.Fakhr, M. K., L. K. Nolan, and C. M. Logue.2005. Multilocus sequence

typing lacks the discriminatory ability of pulsed-field gel electrophoresis for

typingSalmonella entericaserovar Typhimurium. J. Clin. Microbiol.43:2215–

2219.

10.Felsenstein, J.1989. PHYLIP: Phylogeny Inference Package (version 3.2).

Cladistics5:164–166.

11.Foley, S. L., S. Zhao, and R. D. Walker.2007. Comparison of molecular

typing methods for the differentiation ofSalmonellafoodborne pathogens.

Foodborne Pathog. Dis.4:253–276.

12.Gerner-Smidt, P., K. Hise, J. Kincaid, S. Hunter, S. Rolando, E.

Hyytia-Trees, E. M. Ribot, and B. Swaminathan.2006. PulseNet USA: a five-year

update. Foodborne Pathog. Dis.3:9–19.

13.Gulati, P., R. K. Varshney, and J. S. Virdi.2009. Multilocus variable

number tandem repeat analysis as a tool to discern genetic relationships

among strains of Yersinia enterocoliticabiovar 1A. J. Appl. Microbiol.

107:875–884.

14.Harbottle, H., D. G. White, P. F. McDermott, R. D. Walker, and S. Zhao.

2006. Comparison of multilocus sequence typing, pulsed-field gel elec-trophoresis, and antimicrobial susceptibility typing for characterization of

Salmonella enterica serotype Newport isolates. J. Clin. Microbiol.44:

2449–2457.

15.Hopkins, K. L., C. Maguire, E. Best, E. Liebana, and E. J. Threlfall.2007.

Stability of multiple-locus variable-number tandem repeats inSalmonella

entericaserovar Typhimurium. J. Clin. Microbiol.45:3058–3061.

16.Lindstedt, B. A., E. Heir, E. Gjernes, and G. Kapperud.2003. DNA

finger-printing ofSalmonella entericasubsp.entericaserovar Typhimurium with

emphasis on phage type DT104 based on variable number of tandem repeat

loci. J. Clin. Microbiol.41:1469–1479.

17.Lindstedt, B. A., M. Torpdahl, E. M. Nielsen, T. Vardund, L. Aas, and G.

Kapperud.2007 Harmonization of the multiple-locus variable-number

tan-dem repeat analysis method between Denmark and Norway for typing

Sal-monellaTyphimurium isolates and closer examination of the VNTR loci.

J. Appl. Microbiol102:728–735.

18.Lindstedt, B. A., T. Vardund, L. Aas, and G. Kapperud.2004. Multiple-locus

variable-number tandem-repeats analysis ofSalmonella entericasubsp.

en-tericaserovar Typhimurium using PCR multiplexing and multicolor capillary

electrophoresis. J. Microbiol. Methods59:163–172.

19.NCCLS.2003. Methods for dilution antimicrobial susceptibility tests for

bacteria that grow aerobically; approved standard M7-A6. National Com-mittee for Clinical Laboratory Standards, Wayne, PA.

20.NCCLS.2003. Performance standards for antimicrobial susceptibility testing,

on May 16, 2020 by guest

http://jcm.asm.org/

(11)

14th informational supplement, 13th ed. Approved standard M100-S13. Na-tional Committee for Clinical Laboratory Standards, Wayne, PA.

21.Ribot, E. M., M. A. Fair, R. Gautom, D. N. Cameron, S. B. Hunter, B.

Swaminathan, and T. J. Barrett.2006. Standardization of pulsed-field gel

electrophoresis protocols for the subtyping of Escherichia coliO157:H7,

Salmonella, andShigellafor PulseNet. Foodborne Pathog. Dis.3:59–67.

22.Torpdahl, M., G. Sorensen, B. A. Lindstedt, and E. M. Nielsen.2007.

Tan-dem repeat analysis for surveillance of human Salmonella Typhimurium

infections. Emerg. Infect. Dis.13:388–395.

23.van Belkum, A., S. Scherer, L. van Alphen, and H. Verbrugh.1998.

Short-sequence DNA repeats in prokaryotic genomes. Microbiol. Mol. Biol. Rev.

62:275–293.

24.Vogler, A. J., C. Keys, Y. Nemoto, R. E. Colman, Z. Jay, and P. Keim.2006.

Effect of repeat copy number on variable-number tandem repeat mutations inEscherichia coliO157:H7. J. Bacteriol.188:4253–4263.

25.Vogler, A. J., C. E. Keys, C. Allender, I. Bailey, J. Girard, T. Pearson, K. L.

Smith, D. M. Wagner, and P. Keim.2007. Mutations, mutation rates, and

evolution at the hypervariable VNTR loci ofYersinia pestis. Mutant. Res.

616:145–158.

26.Walsh, B.2001. Estimating the time to the most recent common ancestor for

the Y chromosome or mitochondrial DNA for a pair of individuals. Genetics

158:897–912.

27.Zheng, J., C. E. Keys, S. Zhao, J. Meng, and E. W. Brown.2007. Enhanced

subtyping scheme forSalmonella enteritidis. Emerg. Infect. Dis.13:1932–1935.

4082

BROSCHAT ET AL.

J. C

LIN

. M

ICROBIOL