Medicine and Big Data
A happy marriage?
David Madigan
Department of Sta8s8cs
Columbia University
& OMOP
How does the combina8on of Clinical
Judgment and Evidence-‐Based
Should John have an
angiogram?
John went to see a
48 years old
LDL = 70
university professor
no diabetes
HDL = 59
triglycerides = 106
calcium score in 2003 = 19
calcium score in 2008 = 42
father died of heart disease (47)
mother died of cancer (83)
stress test normal in 2007
lipitor
CRP normal
exercise
red wine
aspirin
arrhythmia in 2008
non-‐smoker
BMI = 21.6
normal heart ultrasound (2008)
Should John have an
angiogram?
Clinical judgment?
Who are we kidding?
genotyping
EKG unusual in 2009
Data-‐Driven Medicine
•
Mul=ple years of medical records for 200+ million people
•
Largest collec=on of medical records in the world
•
32,430 pa=ents just like John
Many Challenges
•
Sta8s8cal/Epidemiological
•
Computa8onal
OMOP Research Experiment
OMOP Methods Library Incep8on cohort Case control Logis8c regression Common Data ModelDrug Outcome ACE Inhi bito rs Amph oter icin B Antib iotic s:er ythrom ycins , sulfo nam ides, tetra cycli nes Antie pile ptic s: carb amaz epine , phe nyto in Benz odia zepi nes Beta blo cker s Bisp hosp hona tes: alendr onat e Tric yclic ant idepr essa nts Typi cal a ntip sycho tics Warfa rin Angioedema Aplastic Anemia Acute Liver Injury Bleeding Hip Fracture Hospitalization Myocardial Infarction Mortality after MI Renal Failure GI Ulcer Hospitalization Legend Total 2 9 44 True positive' benefit
True positive' risk Negative control'
• 10 data sources
• Claims and EHRs
• 200M+ lives
• 14 methods
• Epidemiology designs
• Sta8s8cal approaches
adapted for longitudinal data
• Open-‐source
• Standards-‐based
Comparing methods by sensi8vity and specificity at alpha=0.05
False posi8ve rate (1-‐Specificity)
Se ns i8 vi ty
Desired method would have perfect predic8on with Sensi8vity = 1 and False posi8ve rate = 0
No single method is ‘best’, but instead methods reflect trade-‐offs between false posi8ves and false nega8ves
All methods yield false posi8ve rate > 15% at conven8onal level of significance
Performance sensi8ve to threshold criteria, which can be based both on magnitude of effect (RR) and sta8s8cal significance (alpha)
12
13
Distribu8on of es8mates across all drug-‐outcome pairs
True -‐ False + False -‐ True + 14Es8mates are generally not consistent across methods…
ACE inhibitor-‐Angioedema is only 1 of 9 posi8ve controls to produce posi8ve, sta8s8cally significant associa8on across all methods
Warfarin-‐Angioedema is only 1 of 44 nega8ve controls that consistently showed insignificant
posi8ve associa8on across all methods
Tricyclic An8depressants and Aplas8c Anemia
Range of es8mates across high-‐dimensional
propensity score incep8on cohort (HDPS)
parameter seongs
Rela8ve risk
• Each row represents a drug-‐
outcome pair.
• The horizontal span reflects the
range of point es8mates observed across the parameter seongs.
• Ex. Benzodiazepine-‐Aplas8c
anemia: HDPS parameters vary in es8mates from RR= 0.76 and 2.70
15 True -‐
False +
False -‐
True + Parameter seMngs explored in OMOP: Washout period (1): 180d
Surveillance window (3): 30 days from exposure start; exposure + 30d ; all 8me from exposure start
Covariate eligibility window (3): 30 days prior to exposure, 180, all-‐8me pre-‐exposure
# of confounders (2): 100, 500
covariates used to es8mate propensity score
Propensity strata (2): 5, 20 strata
Analysis strategy (3): Mantel-‐Haenszel stra8fica8on (MH), propensity score adjusted (PS), propensity strata adjusted (PS2)
Comparator cohort (2): drugs with
same indica8on, not in same class; most prevalent drug with same indica8on, not in same class
Effect estimates of HDPS
against CCAE (RR, SE) Angio
edema #1 Aplas tic An emia # 1 Acute Live r Fail ure # 1 Bleed ing # 1 Acute myoc ardial Infar ction #1 Hip F ractu re #1 Mor tality after Myo card ial In farcti on # 1 Acute Rena l Fail ure # 1 Uppe r GI U lcer H ospita lizati on # 1
OMOP ACE Inhibitor 1.80 (0.15) 0.40 (0.05) 0.91 (0.12) 0.87 (0.03)
OMOP Amphotericin B 3.30 (0.99) 1.05 (0.24) 4.01 (0.99)
OMOP Antibiotics 1.22 (0.08) 1.00 (0.01) 1.14 (0.01) 1.06 (0.03) 1.05 (0.09) 1.44 (0.06)
OMOP Antiepileptics 1.74 (0.38) 4.60 (0.80) 1.63 (0.21) 0.54 (0.05)
OMOP Benzodiazepines 0.13 (0.01) 1.10 (0.06) 0.98 (0.01) 1.11 (0.01) 1.18 (0.03) 1.41 (0.12) 1.06 (0.05)
OMOP Beta blockers 0.81 (0.07) 0.63 (0.06) 0.95 (0.02) 1.69 (0.19) 0.78 (0.04) 0.88 (0.03)
OMOP Bisphosphonates 0.27 (0.05) 0.85 (0.03) 0.82 (0.07) 0.40 (0.04) 0.90 (0.06)
OMOP Tricyclic antidepressants 0.63 (0.07) 1.02 (0.02) 0.96 (0.01) 0.80 (0.04) 0.82 (0.06)
OMOP Typical antipsychotics 0.96 (0.08) 1.97 (0.16) 3.46 (0.21)
OMOP Warfarin 0.53 (0.11) 0.47 (0.04) 2.13 (0.04) 1.2 (0.09) 0.49 (0.07) 0.76 (0.05)
“Data”:
Effect es(mates from one method against one database
across an array of drug-‐outcome pairs
Revising prior expecta8ons in light of new
evidence from a risk iden8fica8on system
Prior:
___p=0.9
___p=0.5
___
p=0.1
If you observe a RR = 2.0 (1.78 – 2.25),
then your posterior probability depends on your prior expecta8ons
With moderate variance (SElogRR = 0.06),
observing RR<2.0 is only modestly informa8ve
Conclusion
•
Reliance on clinical judgment is scary
•
Massive observa8onal data can help
•
Nontrivial challenges remain