• No results found

Medicine and Big Data

N/A
N/A
Protected

Academic year: 2021

Share "Medicine and Big Data"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Medicine  and  Big  Data  

A  happy  marriage?  

David  Madigan  

Department  of  Sta8s8cs  

Columbia  University  

&  OMOP  

(2)
(3)
(4)
(5)

How  does  the  combina8on  of  Clinical  

Judgment  and  Evidence-­‐Based  

(6)

Should  John  have  an  

angiogram?  

John  went  to  see  a  

(7)

48  years  old  

LDL  =  70  

university  professor  

no  diabetes  

HDL  =  59  

triglycerides  =  106  

calcium  score  in  2003  =  19  

calcium  score  in  2008  =  42  

father  died  of  heart  disease  (47)  

mother  died  of  cancer  (83)    

stress  test  normal  in  2007  

lipitor  

CRP  normal  

exercise  

red  wine  

aspirin  

arrhythmia  in  2008  

non-­‐smoker  

BMI  =  21.6  

normal  heart  ultrasound  (2008)  

Should  John  have  an  

angiogram?  

Clinical  judgment?  

Who  are  we  kidding?  

genotyping  

EKG  unusual  in  2009  

(8)

Data-­‐Driven  Medicine  

Mul=ple  years  of  medical  records  for  200+  million  people  

Largest  collec=on  of  medical  records  in  the  world  

32,430  pa=ents  just  like  John  

(9)

Many  Challenges  

Sta8s8cal/Epidemiological  

Computa8onal  

(10)

OMOP  Research  Experiment  

OMOP  Methods  Library   Incep8on   cohort   Case  control   Logis8c   regression   Common  Data  Model  

Drug Outcome ACE Inhi bito rs Amph oter icin B Antib iotic s:er ythrom ycins , sulfo nam ides, tetra cycli nes Antie pile ptic s: carb amaz epine , phe nyto in Benz odia zepi nes Beta blo cker s Bisp hosp hona tes: alendr onat e Tric yclic ant idepr essa nts Typi cal a ntip sycho tics Warfa rin Angioedema Aplastic Anemia Acute Liver Injury Bleeding Hip Fracture Hospitalization Myocardial Infarction Mortality after MI Renal Failure GI Ulcer Hospitalization Legend Total 2 9 44 True positive' benefit

True positive' risk Negative control'

•  10  data  sources    

•  Claims  and  EHRs  

•  200M+  lives    

•  14  methods    

•  Epidemiology  designs    

•  Sta8s8cal  approaches  

adapted  for  longitudinal  data  

•  Open-­‐source  

•  Standards-­‐based  

(11)

Comparing  methods  by  sensi8vity  and  specificity  at  alpha=0.05  

False  posi8ve  rate  (1-­‐Specificity)  

Se ns i8 vi ty  

Desired  method  would  have  perfect   predic8on  with  Sensi8vity  =  1  and  False   posi8ve  rate  =  0  

 

No  single  method  is  ‘best’,  but  instead   methods  reflect  trade-­‐offs  between  false   posi8ves  and  false  nega8ves  

 

All  methods  yield  false  posi8ve  rate  >  15%  at   conven8onal  level  of  significance  

 

Performance  sensi8ve  to  threshold  criteria,   which  can  be  based  both  on  magnitude  of   effect  (RR)  and  sta8s8cal  significance  (alpha)  

(12)

12  

(13)

13  

(14)

Distribu8on  of  es8mates  across  all  drug-­‐outcome  pairs  

 True  -­‐   False  +   False  -­‐   True  +   14  

Es8mates  are  generally  not   consistent  across  methods…  

ACE  inhibitor-­‐Angioedema  is   only  1  of  9  posi8ve  controls  to   produce  posi8ve,  sta8s8cally   significant  associa8on  across   all  methods  

Warfarin-­‐Angioedema  is   only  1  of  44  nega8ve   controls  that  consistently   showed  insignificant  

posi8ve  associa8on  across   all  methods  

Tricyclic  An8depressants   and  Aplas8c  Anemia  

(15)

Range  of  es8mates  across  high-­‐dimensional  

propensity  score  incep8on  cohort  (HDPS)  

parameter  seongs  

Rela8ve  risk  

•  Each  row  represents  a  drug-­‐

outcome  pair.  

•  The  horizontal  span  reflects  the  

range  of  point  es8mates  observed   across  the  parameter  seongs.  

•  Ex.  Benzodiazepine-­‐Aplas8c  

anemia:  HDPS  parameters  vary  in   es8mates  from  RR=  0.76  and  2.70  

15    True  -­‐  

False  +  

False  -­‐  

True  +   Parameter  seMngs  explored  in  OMOP:   Washout  period  (1):  180d  

Surveillance  window  (3):    30  days  from   exposure  start;  exposure  +  30d  ;  all  8me   from  exposure  start  

Covariate  eligibility  window  (3):  30   days  prior  to  exposure,  180,  all-­‐8me   pre-­‐exposure  

#  of  confounders  (2):  100,  500  

covariates  used  to  es8mate  propensity   score  

Propensity  strata  (2):  5,  20  strata  

Analysis  strategy  (3):    Mantel-­‐Haenszel   stra8fica8on  (MH),  propensity  score   adjusted  (PS),  propensity  strata   adjusted  (PS2)  

Comparator  cohort  (2):  drugs  with  

same  indica8on,  not  in  same  class;  most   prevalent  drug  with  same  indica8on,   not  in  same  class  

(16)

Effect  estimates  of  HDPS  

against  CCAE  (RR,  SE) Angio

edema  #1 Aplas tic  An emia  # 1 Acute  Live r  Fail ure  # 1 Bleed ing    # 1 Acute  myoc ardial  Infar ction    #1 Hip  F ractu re    #1 Mor tality  after  Myo card ial  In farcti on    # 1 Acute  Rena l  Fail ure  # 1 Uppe r  GI  U lcer  H ospita lizati on    # 1

OMOP  ACE  Inhibitor 1.80  (0.15) 0.40  (0.05) 0.91  (0.12) 0.87  (0.03)

OMOP  Amphotericin  B 3.30  (0.99) 1.05  (0.24) 4.01  (0.99)

OMOP  Antibiotics 1.22  (0.08) 1.00  (0.01) 1.14  (0.01) 1.06  (0.03) 1.05  (0.09) 1.44  (0.06)

OMOP  Antiepileptics 1.74  (0.38) 4.60  (0.80) 1.63  (0.21) 0.54  (0.05)

OMOP  Benzodiazepines 0.13  (0.01) 1.10  (0.06) 0.98  (0.01) 1.11  (0.01) 1.18  (0.03) 1.41  (0.12) 1.06  (0.05)

OMOP  Beta  blockers 0.81  (0.07) 0.63  (0.06) 0.95  (0.02) 1.69  (0.19) 0.78  (0.04) 0.88  (0.03)

OMOP  Bisphosphonates 0.27  (0.05) 0.85  (0.03) 0.82  (0.07) 0.40  (0.04) 0.90  (0.06)

OMOP  Tricyclic  antidepressants 0.63  (0.07) 1.02  (0.02) 0.96  (0.01) 0.80  (0.04) 0.82  (0.06)

OMOP  Typical  antipsychotics 0.96  (0.08) 1.97  (0.16) 3.46  (0.21)

OMOP  Warfarin 0.53  (0.11) 0.47  (0.04) 2.13  (0.04) 1.2  (0.09) 0.49  (0.07) 0.76  (0.05)

“Data”:

 

Effect  es(mates  from  one  method  against  one  database  

across  an  array  of  drug-­‐outcome  pairs  

(17)

Revising  prior  expecta8ons  in  light  of  new  

evidence  from  a  risk  iden8fica8on  system  

Prior:  

___  

 p=0.9  

___  

 p=0.5  

___    

p=0.1    

If  you  observe  a  RR  =  2.0   (1.78  –  2.25),  

then  your  posterior   probability  depends  on   your  prior  expecta8ons  

With  moderate  variance   (SElogRR  =  0.06),  

observing  RR<2.0  is  only   modestly  informa8ve  

(18)

Conclusion  

• 

Reliance  on  clinical  judgment  is  scary  

Massive  observa8onal  data  can  help  

• 

Nontrivial  challenges  remain  

References

Related documents

In this paper we employed disaggregated bilateral data from Thailand and her five largest trading partners to investigate the short -run and the long-run response of the trade

The physical culture is for example like a long-sleeved white shirt, abit (sarong), lobe (white lebai), Saroben (Serban), Robe and Jas, Solop ( Slippers ),

or (][W) is a consequence o\' net imports in the manufacturing sectors, while .Argentina's exports are predominanth of agricultural and food products, Argentina's

In addition, the RCEP is also beyond the TRIPs Agreement in regard to the well-known trademarks, as it specifies the exclusive rights of well-known trademark owners

Potato tissue samples were collected from harvested tubers with scab symptoms from Balcarce, a location with more than 110 years of potato crop history. Thirty-one scab lesions

In this study, we assessed the dynamics of the abundance and community structure of selected soil bacterial communities as a function of plant cultivar, growth stage, and soil

From the experimental work it also became clear that the neuroluminescence-dependent assay offers several advantages over LFP recordings: (i) the larvae can freely move and hence

The distributions on the left cause all of the test statistics to have greatly inflated type-I error rates, while the distributions towards the right result in type-I error