Medicine and Big Data

(1)

Medicine and Big Data

A happy marriage?

David Madigan

Department of Sta8s8cs

Columbia University

& OMOP

(2)

(3)

(4)

(5)

How does the combina8on of Clinical

Judgment and Evidence-‐Based

(6)

Should John have an

angiogram?

John went to see a

(7)

48 years old

LDL = 70

university professor

no diabetes

HDL = 59

triglycerides = 106

calcium score in 2003 = 19

calcium score in 2008 = 42

father died of heart disease (47)

mother died of cancer (83)

stress test normal in 2007

lipitor

CRP normal

exercise

red wine

aspirin

arrhythmia in 2008

non-‐smoker

BMI = 21.6

normal heart ultrasound (2008)

Should John have an

angiogram?

Clinical judgment?

Who are we kidding?

genotyping

EKG unusual in 2009

(8)

Data-‐Driven Medicine

•

Mul=ple years of medical records for 200+ million people

•

Largest collec=on of medical records in the world

•

32,430 pa=ents just like John

(9)

Many Challenges

•

Sta8s8cal/Epidemiological

•

Computa8onal

(10)

OMOP Research Experiment

OMOP Methods Library Incep8on cohort Case control Logis8c regression Common Data Model

Drug Outcome ACE Inhi bito rs Amph oter icin B Antib iotic s:er ythrom ycins , sulfo nam ides, tetra cycli nes Antie pile ptic s: carb amaz epine , phe nyto in Benz odia zepi nes Beta blo cker s Bisp hosp hona tes: alendr onat e Tric yclic ant idepr essa nts Typi cal a ntip sycho tics Warfa rin Angioedema Aplastic Anemia Acute Liver Injury Bleeding Hip Fracture Hospitalization Myocardial Infarction Mortality after MI Renal Failure GI Ulcer Hospitalization Legend Total 2 9 44 True positive' benefit

True positive' risk Negative control'

•  10 data sources

•  Claims and EHRs

•  200M+ lives

•  14 methods

•  Epidemiology designs

•  Sta8s8cal approaches

adapted for longitudinal data

•  Open-‐source

•  Standards-‐based

(11)

Comparing methods by sensi8vity and speciﬁcity at alpha=0.05

False posi8ve rate (1-‐Speciﬁcity)

Se ns i8 vi ty

Desired method would have perfect predic8on with Sensi8vity = 1 and False posi8ve rate = 0

No single method is ‘best’, but instead methods reﬂect trade-‐oﬀs between false posi8ves and false nega8ves

All methods yield false posi8ve rate > 15% at conven8onal level of signiﬁcance

Performance sensi8ve to threshold criteria, which can be based both on magnitude of eﬀect (RR) and sta8s8cal signiﬁcance (alpha)

(12)

12

(13)

13

(14)

Distribu8on of es8mates across all drug-‐outcome pairs

True -‐ False + False -‐ True + 14

Es8mates are generally not consistent across methods…

ACE inhibitor-‐Angioedema is only 1 of 9 posi8ve controls to produce posi8ve, sta8s8cally signiﬁcant associa8on across all methods

Warfarin-‐Angioedema is only 1 of 44 nega8ve controls that consistently showed insigniﬁcant

posi8ve associa8on across all methods

Tricyclic An8depressants and Aplas8c Anemia

(15)

Range of es8mates across high-‐dimensional

propensity score incep8on cohort (HDPS)

parameter seongs

Rela8ve risk

•  Each row represents a drug-‐

outcome pair.

•  The horizontal span reﬂects the

range of point es8mates observed across the parameter seongs.

•  Ex. Benzodiazepine-‐Aplas8c

anemia: HDPS parameters vary in es8mates from RR= 0.76 and 2.70

15 True -‐

False +

False -‐

True + Parameter seMngs explored in OMOP: Washout period (1): 180d

Surveillance window (3): 30 days from exposure start; exposure + 30d ; all 8me from exposure start

Covariate eligibility window (3): 30 days prior to exposure, 180, all-‐8me pre-‐exposure

# of confounders (2): 100, 500

covariates used to es8mate propensity score

Propensity strata (2): 5, 20 strata

Analysis strategy (3): Mantel-‐Haenszel stra8ﬁca8on (MH), propensity score adjusted (PS), propensity strata adjusted (PS2)

Comparator cohort (2): drugs with

same indica8on, not in same class; most prevalent drug with same indica8on, not in same class

(16)

Effect estimates of HDPS

against CCAE (RR, SE) Angio

edema #1 Aplas tic An emia # 1 Acute Live r Fail ure # 1 Bleed ing # 1 Acute myoc ardial Infar ction #1 Hip F ractu re #1 Mor tality after Myo card ial In farcti on # 1 Acute Rena l Fail ure # 1 Uppe r GI U lcer H ospita lizati on # 1

OMOP ACE Inhibitor 1.80 (0.15) 0.40 (0.05) 0.91 (0.12) 0.87 (0.03)

OMOP Amphotericin B 3.30 (0.99) 1.05 (0.24) 4.01 (0.99)

OMOP Antibiotics 1.22 (0.08) 1.00 (0.01) 1.14 (0.01) 1.06 (0.03) 1.05 (0.09) 1.44 (0.06)

OMOP Antiepileptics 1.74 (0.38) 4.60 (0.80) 1.63 (0.21) 0.54 (0.05)

OMOP Benzodiazepines 0.13 (0.01) 1.10 (0.06) 0.98 (0.01) 1.11 (0.01) 1.18 (0.03) 1.41 (0.12) 1.06 (0.05)

OMOP Beta blockers 0.81 (0.07) 0.63 (0.06) 0.95 (0.02) 1.69 (0.19) 0.78 (0.04) 0.88 (0.03)

OMOP Bisphosphonates 0.27 (0.05) 0.85 (0.03) 0.82 (0.07) 0.40 (0.04) 0.90 (0.06)

OMOP Tricyclic antidepressants 0.63 (0.07) 1.02 (0.02) 0.96 (0.01) 0.80 (0.04) 0.82 (0.06)

OMOP Typical antipsychotics 0.96 (0.08) 1.97 (0.16) 3.46 (0.21)

OMOP Warfarin 0.53 (0.11) 0.47 (0.04) 2.13 (0.04) 1.2 (0.09) 0.49 (0.07) 0.76 (0.05)

“Data”:

Eﬀect es(mates from one method against one database

across an array of drug-‐outcome pairs

(17)

Revising prior expecta8ons in light of new

evidence from a risk iden8ﬁca8on system

Prior:

___

p=0.9

___

p=0.5

___

p=0.1

If you observe a RR = 2.0 (1.78 – 2.25),

then your posterior probability depends on your prior expecta8ons

With moderate variance (SElogRR = 0.06),

observing RR<2.0 is only modestly informa8ve

(18)

Conclusion

• 

Reliance on clinical judgment is scary

•

Massive observa8onal data can help

• 

Nontrivial challenges remain

Medicine and Big Data