Biomedical Big Data for Clinical Research and
Patient Care: Role of Semantic Computing
Satya S. Sahoo
Division Medical Informatics Case Western Reserve University
Signal Big Data: Research and Patient Care
•
Velocity
: Rapid rate of signal data collection in epilepsy centers
o 24 hours recordings: 8-10 GB per patient
o Typical patient admissions span 5 days
o 100-150 patients per year
•
Volume
: 11 TB in 3 years and 18 TB by end of May 2014
•
Variety
: Data collected using different study protocols and
equipment
0 2000000 4000000 6000000 8000000 10000000 12000000 Ja n ' 11 F eb ' 11 M arc h ' 11 A pri l ' 11 M ay ' 11 June '1 1 Jul y ' 11 A ug ' 11 S ep ' 11 O ct '1 1 N ov ' 11 D ec '1 1 Ja n ' 12 F eb ' 12 M arc h ' 12 A pri l ' 12 M ay ' 12 June '12 Jul y ' 12 A ug ' 12 S ep ' 12 O ct '12 N ov ' 12 D ec '12 Ja n ' 13 F eb ' 13 M arc h ' 13 A pri l ' 13 M ay ' 13 S ize of D
ata (M
B)
Time Period (in months)
Growth in Electrophysiological Signal Data
Cumulative patient data in EMU Cumulative PRISM-specific patients data
0 200 400 600 800 1000 1200 Ja n ' 11 F eb ' 11 M arc h ' 11 A pri l ' 11 M ay ' 11 June '1 1 Jul y ' 11 A ug ' 11 S ep ' 11 O ct '1 1 N ov ' 11 D ec '1 1 Ja n ' 12 F eb ' 12 M arc h ' 12 A pri l ' 12 M ay ' 12 June '12 Jul y ' 12 A ug ' 12 S ep ' 12 O ct '12 N ov ' 12 D ec '12 Ja n ' 13 F eb ' 13 M arc h ' 13 A pri l ' 13 M ay ' 13
Cumulative Number of Patients
Number of patients admitted to EMU Number of patients enrolled in PRISM
(a)
(b)
Background: Electrophysiological Signal Data
•
Electrophysiological signal data
o Electroencephalogram (EEG): intracranial or scalp electrodes
o Electrocardiogram (ECG)
o Polysomnogram (PSG)
•
Signal data plays critical role in clinical research and patient
care
o Pre-surgical evaluations to identify eloquent cortex
o Identify seizure onset zone in epilepsy
o Correlation between seizure events and other physiological
Background: Neurosciences Research
•
Multi-center project to study Sudden Unexpected Death in
Epilepsy (SUDEP)
•
Low rate of reported incidents
o Multi-center collaboration for viable cohort size
•
Expect to enroll 1200 patients from Epilepsy Monitoring
Units
o Case Western-University Hospital, Cleveland
o University of California, Los Angeles
o Northwestern University, Chicago
o National Hospital for Neurology and Neurosurgery, London,
Computational Challenges: Signal “Big Data”
•
Scalable storage
for large volume of data
o Data partitioning and storage on distributed file systems
•
High performance
data processing pipeline to cope with
rapid rate of data generation
o High level of parallelization for both speedup (to cope with
velocity) and scale out (to cope with volume)
•
Efficient
query execution and data retrieval
o Optimal data partitioning for parallelizing data retrieval
o Optimal co-location for minimizing remote data transfer
•
Interactive
signal visualization
o Minimize network transfer latency
Cloudwave Architecture
•
Cloudwave aims to support:
o Web-based interface for signal
analysis and visualization
o Multi-center collaborative
studies
o Efficient signal processing and
analysis
•
Three components:
o MapReduce data processing
pipeline
o Data Modeling and optimal
data partitioning
o Ontology-driven query and
Data Processing: Results of Comparative Evaluation
•
Performance evaluated over two variables
o Extracting data for increasing number of channels
o Extracting data for increasing number of patient studies
• An order of magnitude improvement in time performance
...
Ch1Ch2 Ch3 Ch4Ch5 Ch6 Ch k
... ... ... . . . rec 1 rec 2 rec 3 rec n Ch1 rec 1 rec 2 rec 3 rec n Ch2 Ch3 . . . ... Ch k
EDF File Channel-specific Files Distributed File
System Map Reduce Program 0 20 40 60 80 100 120 140
10 20 30 40
Ex e c u ti o n T Im e (m in )
Number of Signals
Average EDF Processing Time for increasing number of signals
Standalone Cloudwave 0 20 40 60 80 100 120 140
5 10 15 20 25
Ex e c u ti o n T Im e (m in )
Number of Studies
Average EDF Processing Time for
increasing number of studies Standalone Cloudwave
Desktop
Computer Desktop Computer
Map Reduce Map Reduce
Signal Modeling: Cloudwave Signal Format
Cloudwave Signal Format (CSF)
Epilepsy and Seizure Ontology Cloudwave Signal Format (CSF)
Metadata: Signal Collection Protocol
Segmented Signal Data
Metadata: Study Patient Details
Signal Query and Visualization
•
Patient cohort queries: Using the VISAGE interface
•
Electrophysiological signals for selected patient visualized in
Features of the Signal Query and Visualization Module
• Query using the Epilepsy and Seizure Ontology (EpSO)* classes
• Reconciling semantic heterogeneity and subsumption reasoning
* Sahoo et al. JAMIA 2013
Data Partitioning: Efficient Network Transfer and
Visualization
•
Performance evaluated for transferring segments of CSF
files corresponding to “
signal epoch
” (e.g. 30 sec epoch)
•
Consistently faster than naïve signal channel-based
approach for six standard “signal montages”
0" 10000" 20000" 30000" 40000" 50000" 60000"
Channel" Epoch" Channel" Epoch" Channel" Epoch" Channel" Epoch" Channel" Epoch" Channel" Epoch" M1"Montage" M2"Montage" M3"Montage" M4"Montage" M5"Montage" M6"Montage"
CSF"Epoch"Render" Channel"Data"Segment"Render" CSF"Epoch"Load" Channel"Data"Segment"Load"
T
ime
(i
n
mi
lli
se
co
nd
•
Performance evaluated for transferring CSF format files
with signal data as array of integers
•
Consistently faster than traditional binary signal data
format for six standard “signal montages”
0" 10000" 20000" 30000" 40000" 50000" 60000"
Binary" CSF"" Binary" CSF" Binary" CSF"" Binary" CSF" Binary" CSF" Binary" CSF" M1"Montage" M2"Montage" M3"Montage" M4"Montage" M5"Montage" M6"Montage"
T
ime
(mi
ll
is
ec
on
d
s)
Binary Format Load CSF Load Binary Format Render CSF Render
Data Partitioning: Efficient Network Transfer and
Visualization
Semantics: Epilepsy and Seizure Ontology
•
EpSO models the four-dimensional epilepsy and seizure
Signal Processing: ECG Data
•
Use MapReduce algorithms to identify QRS complexes in ECG
data
Signal Processing: ECG Data
Take Home Points
•
Cloudwave represents a new approach for managing
massive amounts of electrophysiological signal data
o Potential role in clinical research and patient care
o Brain studies
o Sleep medicine
•
Cloudwave
!
Domain semantics
(modeled in Ontology)
+
Distributed Storage
+
Parallel Computation
•
Domain ontology support:
o Optimal data partitioning scheme
o Complex ad-hoc queries
Thank you!
• Funding: The PRISM (Prevention and Risk Identification of
SUDEP Mortality) Project (1-P20-NS076965-01)
• Acknowledgements: PRISM PI: Dr. Samden Lhatoo, Co-I Dr. GQ
Zhang, Catherine Jayapandian, Aman Dabir, Chien-Hung Chen, Licong Cui