GenomeTrakr: A Pathogen Database
Marc W. Allard, PhD
Senior Biomedical Research Services Officer
Division of Microbiology
FAO Expert workshop to develop case studies on the use of Whole Genome Sequencing (WGS) on food safety managment. Nov. 12, 2015
Eric W. Brown, PhD
Director
Outline
• Technology shift
• GenomeTrakr: Reference database and
pathogen detection pipeline
• Benefits to industry, growers, and distributers.
Outline
• Technology shift
• GenomeTrakr: Reference database and
pathogen detection pipeline
• Benefits to industry, growers, and distributers.
Current method for pathogen
identification
1. antigens are screened to identify serovar
2. PFGE: genome is cut into pieces. Sizes of these pieces and
the banding patterns they determine discrimination within
serovar.
4
PulseNet
• WGS is high resolution
3-5 million data points are collected for each isolate
• WGS analyses are statistically robust
Unlike PFGE patterns, WGS data can be analyzed in its
evolutionary context. Accurate and stable genetic changes within
pathogen genomes enable us to pin point specific common
sources of outbreak strains (farms, processing plants, food types,
and geographic regions).
Source Tracking is Key Application
PFGE identical
in red
NGS distinguishes geographical structure among
closely related
Salmonella
Bareilly strains
Same PFGE but not part of the outbreak
Outbreak Isolates 2-5 SNPs
Outline
• Technology shift
• GenomeTrakr: reference database and
pathogen detection pipeline
• Benefits to industry, growers, and distributers.
• Bases of DNA (ATGC) are sequentially identified from a DNA template strand
• Next Generation Sequencing (NGS) extends this process across millions of reactions in a massive parallel fashion
• NGS involves rapid sequencing of large DNA stretches spanning entire genomes
– Technology shift
– 3-5 million data points for each isolate
• Increasing availability and affordability of NGS is rapidly changing the face of microbiology
DNA Sequencing
$0 $500 $1,000 $1,500 $2,000 $2,500 $3,000 $3,500 2007 2008 2009 2010 2011 2012 2013Cost per bacterial genome
Desk-top FDA 1st $70/genome in 2014 $40/genome in 2015 w/ Higher througput Technology
GenomeTrakr Fast Facts
•
First distributed network of labs to utilize WGS
for pathogen identification
•
GenomeTrakr network has sequenced more than
40,000 isolates, and closed more than 100
genomes through November 12, 2015.
•
Currently sequencing more than 1,000 isolates a
month
•
The need for increased number of well
characterized environmental (food, water,
facility, etc.) sequences may outweigh the need
for extensive clinical samples
GenomeTrakr Labs
•
14 federal labs
•
14 state and university labs
•
1 U.S. hospital lab
•
5 labs outside of the U.S.
•
Collaborations with independent academic researchers
15 N u m b er o f S eq u en ces (as of t he l as t day of t he quar ter )
Total Number of Sequences in the GenomeTrakr Database
2013 2014 2015
Average Number of Sequences
Added Per Month in 2013 = 184 Added Per Month in 2014 = 1,049Average Number of Sequences
First sequences uploaded in Feb 2013
Public Health England uploads more than 8,000
0 5 10 15 20 25 30 35 40 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68
Timeline for Foodborne Illness Investigation
Using Whole Genome Sequencing
Contaminated food enters commerce
FDA, CDC, FSIS, and States use WGS in real-time and in parallel on clinical, food,
and environmental samples
Source of contamination
identified early through WGS combined database queries
Averted Illnesses N um ber of C ases Days
MINIMAL PATHOGEN METADATA
(FOODBORNE OUTBREAKS)
sample_name organism strain/isolate Category (attribute_package) 1a) Clinical/Host-associated 1a1) specific_host 1a2) isolation_source 1a3) host-disease OR 1b) Environmental/Food/Other 1b1) isolation_sourceFood Industry can hold confidential metadata linked to public records
collection_date Geographic location 6a) geo_loc_name OR 6b) lat_lon collected by
Where
When
Who
What
Immediate impacts of WGS to industry, growers,
and distributers, countries, states.
• Earlier intervention means:
1) Reduced amount of recalled product;
2) fewer sick patients which means fewer lawsuits;
3) less impact overall and minimal damage to
Impacts to industry, growers, and distributers (continued).
•
Regular testing throughout network:
1) identifies specific suppliers that are introducing contaminants;
2) identifies whether contaminant is resident to a facility or
transient;
3) knowledge of where contaminant is coming from allows industry
to fix the problem based on scientific evidence.
–Shift costs to the supplier who has introduced the contaminant. –How often is the root cause of the problem left unresolved
to occur again at a later date?
Background: CFSAN SNP Pipeline
Documentation: http://snp-pipeline.rtfd.org Source Code: https://github.com/CFSAN-Biostatistics/snp-pipeline PyPI Distribution: https://pypi.python.org/pypi/snp-pipelinePettengill JB, Luo Y, Davis S, Chen Y, Gonzalez-Escalona N, Ottesen A, Rand H, Allard MW, Strain E. (2014) An evaluation of alternative
methods for constructing phylogenies from whole genome sequence data: a case study with
Salmonella. PeerJ 2:e620
http://dx.doi.org/10.7717/peerj.620
Molecular Epidemiology and Ecology of
Multi-drug Resistance (MDR) Salmonella
in Tanzania
•Julius Medardus
•Sokoine University of agriculture
•Wondwossen A. Gebreyes
Salmonella
lose or gain resistance
depending on the ecosystem.
Environment- SSu
(Gebreyes and Altier, 2002Gebreyes et al., 2004, 2009)
GIT- ACSSuT
(Briggs and Fratamico, 1999)25
What triggers the recombination?
Interaction between bacterial factors and and
chemical intervention in pig production
Important
Element
In MDR
Common
In the
Environment
QAC
Quaternary ammonium compounds
qacE
on integrons and Quats are commonly used as
disinfectants.
Co-selection: Heavy metal v. MDR
•
Heavy metals in the ecosystem- Cu and Zn;
•
Assoc. b/n AMR-type and HM- MIC;
•
Co-selection with MDR;
•
Association with Invasive NTS strains?;
•
Efflux pump genes-
pcoA
and
czcD
;
Association
-
Heavy Metal tolerance and MDR
Salmonella
28
Odds ratio between copper tolerance (<20mM) and MDR AmStTeKm was 4.6 (Chi-square=17.9; P<0.05)
The odds of having a high Zn MIC (>8mM) were 14.66 times higher in
isolates with R-type AmClStSuTe than in those with R-type AmStTeKm(P<0.05).
FDA GenomeTrakr partnership
924 isolates submitted
to FDA-CFSAN
•
Brazil (4)
•
Ethiopia (401)
•
Kenya (86)
•
Mexico (63)
•
Tanzania (64)
•
Thailand (60)
•
U.S. –OSU (247)
29Tanzania
•
WGS- 45 food animal isolates completed
•
All Unknown STs
•
Plasmid types- ColRNAI, IncI1, IncI2, IncFII, ColpV2
(total 10)- Others?
•
Kentucky (16/ 45) and Not conforming with any
known type (n=8)
•
Pending- HM and biocide tolerance genes/ efflux
system…
31
COMPARE is a large EU project with the intention to speed up the detection of and response to disease outbreaks among humans and animals worldwide through the use of new genome technology.
The above figure represents genomic information as the pathogen-independent language across locations, sectors and time.
http://www.compare-europe.eu/ Coordinator
Frank M. Aarestrup
Technical University of Denmark National Food Institute
[email protected] Tel: +45 35 88 62 81
Co-Coordinator
Marion Koopmans
Erasmus Medical Centre Department of Viroscience [email protected] Tel: +31 10 70 44 066
Whole Genome Sequencing Program (WGS)
http://www.fda.gov/Food/FoodScienceResearch/WholeGenomeSequencingProgramWGS/default.htm#trakr
GenomeTrakr
• State and Federal laboratory network
collecting and sharing genomic data from foodborne pathogens
• Distributed sequencing based network
• Partner with NIH
• Open-access genomic reference database
• http://www.ncbi.nlm.nih.gov/bioproject/183844 • Can be used to find the contamination
For more information:
•
For information about joining the
GenomeTrakr network as a sequencing
lab, providing isolates to a current
member lab for sequencing, or using
the GenomeTrakr database as a
research tool, please contact FDA at
FDA
Steven Musser Patrick McDermott Ruth Timme Marc Allard Peter Evans Eric Brown Justin Payne Charlie Wang Rebecca Bell Christine Keys Errol Strain Yan Luo
James Pettengill Hugh Rand Darcy Hanes Gopal Gopinathrao Chis Grim Palmer Orlandi David Melka Cary Pirone Davies Justin Payne Maria Hoffman Eric Stevens Andrea Ottesen Tim McGrath Don Burr Jie Zheng Cong Li George Kastanis Tim Muravunda Shaohua Zhao
National Institutes of health (NCBI)
David Lipman Jim Ostell William Klimke Martin Shumway Richa Agarwala
State Health Labs
Bill Wolfgang (NY) Dave Boxrud (MN) Anita Wright (FL) Elizabeth Driebe (AZ) Angela Fritzinger (VA)
Ailyn Perez-Osorio (WA) More to come…….
USDA
David Goldman Kristin Holt
Illumina
Susan Knowles Omayma Al-Awar Kelly Hoon
With Additional Thanks….
ORA OCC OFS OC OAO OFVM/SRSC CFSAN CDER
CBER CDRH CVM NCTR FDA CHIEF SCIENTIST OIP OARSA
SCIENCE BOARD IAS FFC FERN JIFSAN ADVISORY COMMITTEE IFSH
MOFFETT CENTER CIO DAUPHIN ISLAND CFSAN-OCD CORE WESTERN CENTER
INTERNAL FDA STAKEHOLDERS
FDLI GMA VaFSTF CDC FBI PULSENET-LATIN AM. AM. ACAD MICROBIOL ASM FSIS ARS UNIV VERMONT MINN DOH AZ DOH UNIV FL VA DOH WA DOH TX DOH NY AG LAB IRISH FSA NOVA SE UNIV IGS BALTIMORE INFORM MEETING HONGKONG POLYT U NIST ITALIAN FSA EFSA
WHO-FOOD SAFETT DIR. WHO-GFN
CDC-EU
EMERGING INFECTIOUS DIS CONF DANISH TECH UNIV
NM STATE UNIV/ NM DOH CARLOS MALBRAN INST/ARG ST COULD UNIV/FOOD MICRO SENASICA GMI NY DOH/WADSWORTH CENT UNIV HAMBURG CHINA CDC NESTLE FERA-UK MD DOH IAFP APHL AFDO BELGIUM VaTech US ARMY US NAVY
MELBOURNE FSA (AUS) UNIV NEBRASKA
PUBLIC HEALTH ENGLAND DHS
DELMARVA TASKFORCE PENN STATE FOOD SCIENCE PROD MAN ASSOC
ILLUMINA
UNIV IRELAND/DUBLIN COLLEGE
NCBI/NIH
GSRS GLOBAL SUMMIT FAO/OIE
PUBLIC HEALTH CANADA CFIA
HEALTH CANADA INTL VTEC MEETING CPS-GA AOAC UNITED FRESH COLUMBIA HAWAII DOH CA DOH ALASKA DOH SOUTH DAK UNIV UNIV GA UNIV IOWA/DOH UNIV CHILE BRAZIL OSU VETNET TURKEY MEXICO IEH SILLAKER
NEW ENG BIOLAB PACIFIC BIO CLC-BIO/QIAGEN CON-AGRA DUPONT AGILENT UC-DAVIS HARVARD MED INFORM MEETING THAILAND
FDA\CFSAN Validation Efforts
1.
Technical Performance
Accuracy: Salmonella LT2 and Agona SL483
2.
Intralaboratory variation, sequencing platform
Salmonella Montevideo (180+ runs)
3.
Interlaboratory variation
Salmonella Braenderup BAA-664 (PFGE control), ISO/CEN
4.
Bioinformatics Pipeline
Software Validation
Salmonella Braenderup
Interlaboratory Study
39
Salmonella Braenderup
Environmental Samples from Florida
Contract Lab → FDA\CFSAN