Big Data
Medical Imaging
Brett Cowan
Centre for Advanced MRI
University of Auckland
Overview
1. Medical imaging and ‘Big Data’
2. ‘Big Data’ in action – the Cardiac Atlas Project
3. Data Analysis
4. A new approach to clinical trials?
Medical imaging is producing huge quantities of complex and high quality data – what do we do with it all?
International collaboration, data ownership, data sharing and infrastructure
New statistical analyses of shape, disease classifiers, and new and improved diagnostic accuracy
Can we get more for less out of international clinical trials?
Magnetic Resonance Imaging (MRI)
• in just 20 years, MRI has revolutionised medical imaging
• without radiation, without any known harmful effects and without even touching the patient, MRI produces diagnostic images of virtually
photographic quality
• the Auckland District Health Board PACS system has 15 TB on-line (one year of imaging), 27 TB near-line (3 years of imaging) and 40 TB off-line • ten years ago, one year of imaging was <1 TB, now it is 15 TB
Faster Image Acquisition
2010 Accelerated (5 seconds) 2005 SSFP (14 seconds) 2013 Realtime (1 second) 1995 FLASH (25 seconds) 4/40Neurological MRI
T2 weighted
T1 weighted
Tractography and fMRI
Tractography fMRI
Data or Information or Knowledge?
• in medical imaging, the data are usually grey scale pixel values
• they are not demographics (52 years old), blood pressure (155/95), a genetic sequence (ACAT), they are just numbers representing grey scale (or colour) in an image
• this data is not information in the sense of many other datasets • we must align images into a common reference frame, segment
features of interest, measure distances, thickness, volume and define shapes or regions of interest
• this process is time consuming relative to scan acquisition time (the acquisition to analysis ratio)
Image Processing
8/40
Edge detection Non-rigid registration Feature tracking
• a wide range of image processing techniques are used such as machine learning, finite element modeling - and human interaction
The Cardiac Atlas Project (CAP)
Fonseca et al. Bioinformatics 27(16): 2288–2295; 2011 • collecting data is expensive – and it is reusable
• the cardiac atlas project (CAP) is an international ‘Big Data’ project • funded by the NIH, led from Auckland
• subcontracts were awarded to collaborators at Johns Hopkins, UCLA, and a Los Angeles supercomputing centre (Centre for Computational Biology)
• aim “ … to collate cardiac (image) data from large international clinical trials into a web accessible ‘big data’ database for reuse by any
legitimate researcher …”
• other aims were “ … to create an infrastructure for managing approval for data use …. and to create advanced statistical analysis and display tools …”
• the first two contributing clinical trials were the MESA and DETERMINE trials
CAP Project Case Study
1. The overall rationale and strategy
2. “Ownership” of data
3. Project infrastructure
4. Data analysis and results
Patient Diagnosis Data Acquisition R A D I O L O G Y M E D I C I N E Heart Modelling B I O E N G I N E E R I N G Statistical Analysis B I G D A T A C O M P U T E R S C I E N C E Software Development
The “Big Data” Strategy
Ownership and Rights to ‘Big Data’
Data Ownership – Data Has Value
Who owns the rights to medical imaging trial data?
Who can use it and for what purpose?
• Participant – has rights, certainly they must provide informed consent, informed in that they fully understand the risks and benefits, and what the information will be used for
• ethics committees will not usually give permission for the ‘data to be used ‘by anyone for anything’ in the future
• Researcher – has rights, often jealously guarded as a strategic advantage for publication and career progression
• Institution – has rights, but what is an individual leaves the University, do they have the right to take all of the data with them?
• Funder – has rights, especially when they are a commercial entity such as ‘Big Pharma’, or if there are valuable patents at stake
Ethical Approval (IRB)
Individual consent required Application to IRB required Investigator can make the decisionNot human subjects research
14/40
IRB requirements
HIPAA and Anonymisation of Metadata
15/40
• the convenience and power of electronic data is also its Achilles heel • the DICOM standard allows for the inclusion of ‘private’ information,
Project Infrastructure
Database
Calculation of Volume and Mass
Complete mathematical representation of the left ventricle
Creation of a Mathematical Model
Using image processing (and operator input), the raw images are converted into a beating mathematical heart in the database. This allows any parameter to be determined without further analysis.
Reproducibility and Accuracy
12 animals (9 dogs, 3 pigs)
Data courtesy David Fieno and Paul Finn LVM determined by weight at autopsy LVM difference 2.1 ± 4.3 g (~3%)
Accuracy 25 patients with moderate to severe MR
Scanned twice at a six week interval Coefficient of variation 3% LVM difference -1.1 ± 5.7 g (~ 0.6%) Scan–rescan variability Average LVM (g) D iffe re nce ( g ) D iffe re nce ( g ) Postmortem (g) 20/40
Segmentation Challenge
(a) B asal slice
(b) M id-vent ricular slice
(c) A pical slice
Suinesiaputra et al. Medical Image Analysis In press 2013
Modal Analysis
Lewandowski et al. Circulation 2013;127:197-206
Anterior Lateral Septal
S L
P A
Medrano-Gracia et al. JCMR 15:80 ; 2013
Identification of Myocardial Infarction
S L P A S L P A 23/40
Modeling of Stiffness, Stress and Strain
Vicky Wang, Martyn Nash, STACOM Normal
Non-Ischaemic HF
Classification of Disease
Medrano-Gracia, PhD thesis, 2013
Does the patient have the disease??
Data Analysis
How bad is the disease??
Medrano-Gracia, PhD thesis, 2013
A Second Example – the Coronary Arteries
Clinical problem Database
Image data
Statistics
Catalonia
Cardiovascular Risk in Catalonia
If we wanted to determine the cardiovascular risk profile in Catalonia, how would we do this?
1. Recruit 5,000 normal Catalonians (preferably in 1948) and follow them for 50 years (similar to the definitive Framingham study)
2. Recruit 5,000 normal Catalonians and follow them for five years (an ‘abbreviated’ Framingham study) 3. Use the Framingham results and add local
‘correction factors’ from small studies where there are obvious discrepancies
4. Read the Framingham publications and speculate on how the data applies locally
Specific
Generic
High cost
Low cost 29/40
The VERIFICA Study
The Framingham function adapted to local population characteristics accurately and reliably predicted the 5-year CHD risk for patients aged 35–74 years, in contrast with the original function, which consistently overestimated the actual risk.
“ … about 60% was observed in the United Kingdom; however, this is far from the >260% overestimation observed in Spain …”
www.regicor.org
The Catalonian Risk Table
Trial performed in New Zealand Trial performed Overseas Cost Benefit
Not fundable Fundable
NZ Specific data Generic data
Can we do better in
this range?
Cost-Benefit Ratio of Clinical Trials
The MESA Trial
The Multi-Ethnic Study of Atherosclerosis (MESA) is a study of the characteristics of subclinical cardiovascular disease and the risk factors that predict progression to clinical disease. 6,814 asymptomatic men and women aged 45-84 have been recruited. (38% are white, 28% African-American, 22% Hispanic, and 12% Asian).
MESA Investigations
• extensive physical exam to determine coronary calcification • ventricular mass and function by MRI
• flow-mediated endothelial vasodilation
• carotid intimal-medial wall thickness and presence of echogenic lucencies in the carotid artery
• lower extremity vascular insufficiency • arterial wave forms
• electrocardiographic (ECG) measures • standard coronary risk factors
• socio-demographic factors
• lifestyle factors, and psychosocial factors
• blood samples are being assayed for putative biochemical risk factors and stored for case-control studies
• DNA are being extracted and lymphocytes immortalized for study of candidate genes and possibly, genome-wide scanning
• participants are being followed for identification and characterization of cardiovascular disease events, including acute myocardial infarction and other forms of coronary heart disease (CHD), stroke, and congestive heart failure; for cardiovascular disease interventions; and for mortality
The Jackson Heart Study
The objective of the Jackson Heart Study is to investigate the causes of cardiovascular disease (CVD) in African Americans (n=5301) with an emphasis on manifestations related to hypertension (such as remodeling of the left ventricle of the heart, coronary artery disease, heart failure, stroke and renal vascular disease).
35/40
The MESA and Jackson Heart Studies are using software developed in Auckland to analyse all of their cardiac MRI image data. The data and results are fully compatible with all of the work already done here.
New Zealand “Fingerprinting”
• could we perform the baseline MESA
investigations on a group of 100 participants in New Zealand?
• this group could be defined geographically, by age, gender, a specific risk factor, ….
• this would provide a mean and standard deviation for each investigation
• there would also be a ‘profile’ for each individual participant
• together these data would represent a ‘fingerprint’ for this group in New Zealand
• it is highly likely that some participants (and
groups of participants) in MESA will have a similar group (and individual) fingerprint
• could we then follow them in the MESA trial?
“Matching”
Fingerprinting Matching
n =100
n =1000
Local (New Zealand) Global New Zealand Sample (n=100) “Fingerprint” Statistical matching NZ “cohort” identified in baseline data (n=1000) NZ “cohort” followed in trial 10 year outcomes 15 year outcomes 5 year outcomes 20 year outcomes Outcomes for New Zealand –
• data which reflects NZ subgroups • direct application of results
• amplification of sample size by 10 X • trial cost met internationally
• prospective study design • international collaboration and
engagement
• development of “fingerprinting” and
“matching” technologies
Future of Clinical Trials
“Overall”
Big Data
• it is large and expanding medical imaging databases, preferably shared internationally
• there are issues of data ownership, appropriate ethical consents, data anonymisation and access control
• computationally intensive image processing is required to convert data into useful information
• computational and statistical anatomy and pathology (rather than feature labeling, or calculation of simple distance or volumes)
• image data may be represented using mathematical models to recreate physiological and pathophysiological shape, features and motion
• statistical analyses of shape, disease classifiers, calculation of new parameters (like stress) become possible
• clinical trial focus – devices and pharmaceuticals
What is ‘big data’ for us in Medical Imaging?
Thank you to
-Alistair Young Avan Michael Jae Do Randall
Lana Agustín Ben John
Yingmin Paul Finn
Carissa
Wenchao Pau