• No results found

Using Big Data in Healthcare

N/A
N/A
Protected

Academic year: 2021

Share "Using Big Data in Healthcare"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Speaker

First Plenary Session

THE USE OF "BIG DATA" - WHERE ARE WE

AND WHAT DOES THE FUTURE HOLD?

David R. Holmes III, PhD

Mayo Clinic College of Medicine

Rochester, MN, USA

©2014 MFMER | slide-2

Using Big Data in Healthcare

David R. Holmes III

ISPOR 19

th

Annual Meeting

June 2

nd

, 2014

Graph Databases and

Graph Analytic Approaches

(2)

©2014 MFMER | slide-3

Teamwork

Special Purpose Processor Development Group

• Barry Gilbert, Ph.D.

• Robert Techentin

Center for Science of Healthcare Delivery

• Jeanne Huddleston, M.D.

• Nilay Shah, Ph.D.

Rochester Epidemiology Project

• Jennifer St. Sauver, Ph.D.

YarcData

• Steve Reinhardt

Biomedical Imaging Resource

Will and Charlie Mayo, The Mayo Brothers

©2014 MFMER | slide-4 Graph Analytics

(3)

©2014 MFMER | slide-5

What is a graph?

1

2

“Node 1 and Node 2 are related”

“Node 1 is forward related to Node 2”

3

“Node 1 is forward related to Node 2 and Node 3”

A

B

“Node 1 is forward related to Node 2 via Edge A. Node 1 is forward related to Node 3 via Edge B”

Smoking Coffee Drinking Heart Attack Correlates

Causes “Smoking is correlated with coffee drinking. Smoking may cause heart attacks. Smoking is a confounding variable.”

©2014 MFMER | slide-6

Semantic Graphs / Databases

• Node-typed, edge-typed, directed graph

Using the Resource Description Framework (RDF), we can describe each piece of information in the graph as a triple:

• <Subject> <Predicate> <Object>

<Smoking> <corr. with> <Coffee Drinking> <Coffee Drinking> <corr. with> <Smoking> <Smoking> <causes> <Heart Attacks>

• A semantic database is referred to as a triple-store (e.g. a collection of triples)

• Semantic Databases are queried using SPARQL (the semantic equivalent of SQL)

• Inferential rules and ontologies can be applied dynamically to the data to further enrich the dataset

Smoking Coffee Drinking Heart Attack Correlates Causes

(4)

©2014 MFMER | slide-7

Origins of Semantic Databases in

Healthcare

• Mishelevich, David J.

• "MEANINGEX: a computer-based semantic parse approach to the analysis of meaning." (1971)

"Semantic analysis of medical records." (1972)

• Initial notion of an ontology and semantic (i.e. noun phrase) representation of medical data

• Schmid, Hans Albrecht, and J. Richard Swenson. • "On the semantics of the relational data model." (1975) • Formalizing the graph-like nature of semantic data models

• 1970s…

• 1980s…

• 1990s…

• 2000s...

• Lenz, Richard, Mario Beyer, and Klaus A. Kuhn. • "Semantic integration in healthcare networks.“ (2007)

©2014 MFMER | slide-8

Benefits of Semantic Databases

Semantic databases center around the users need to collect and

interrogate the heterogeneous data

Flexible Schema

• New variables can be added to the data model easily

• Data type agnostic

• New variables are added with indifference to variables already

in the data model

Expressability

• Ability to query the database in a flexible manner without

regards for the specific data model

• Can dynamically apply inferential rules and ontologies

• Whole graph algorithms can be applied in order to find unique

relationships between variables

(5)

©2014 MFMER | slide-9

Healthcare Semantification at Mayo

Rochester Epidemiology Project (Population-based)

• Goal: Leverage the stable population to track health

over time

• 500K Individuals, 40 year duration

• 2 M healthcare records

Bedside Patient Rescue (In-hospital)

• Goal: Early Warning Systems (EWS) for patient events

• 115K patient encounters, 2 year duration

• 38M records (labs, nursing evals, etc.)

©2014 MFMER | slide-10

(6)

©2014 MFMER | slide-11

(7)

©2014 MFMER | slide-13

©2014 MFMER | slide-14

Diffusion algorithm can find hidden relationships by exploiting

connections in the semantic graph

Initial values are attached to specific “seed” nodes

Values propagate over graph edges, and accumulate in

different parts of the graph

• Sometimes results are unexpected

With a functioning graph diffusion algorithm, many possible

searches can be performed

For the REP, we can identify a representative example of

cohort features and label the graph

(8)

©2014 MFMER | slide-15

©2014 MFMER | slide-16

(9)

©2014 MFMER | slide-17

(10)

©2014 MFMER | slide-19

(11)

©2014 MFMER | slide-21

Just one algorithm? No

There are many whole graph algorithms which could be

applied to healthcare data:

• PageRank – Google-developed algorithms for

weighting the edges to emphasize important nodes

in a graph

• Peer-pressure clustering – Graph-based cluster

algorithm to find groups based on both node and

edge data

• Betweeness-centrality – Algorithm to determine

key nodes in a graph which are most connected

• Clique detection – Methods to find sub-graphs in a

graph

(12)

©2014 MFMER | slide-23

Why doesn’t everyone use Semantic

Databases?

Migrating relational databases to semantic

databases can be tricky

Graph databases suffer from missing data and

noisy data – just like relational databases

Graph databases are large, and graph

algorithms are complex

©2014 MFMER | slide-24

Migrating Relational Databases

Relational DBs, by definition, are an efficient

tabular storage of information.

Care must be taken in developing a semantic

model to ensure “semantic richness”

• Data must be promoted correctly to

subjects/objects

• Predicates must be semantically meaningful

• Standard nomenclature must be used to be

(13)

©2014 MFMER | slide-25

Missing and Noisy Data

• Missing data is just that … missing.

• Graph algorithms need to be smarter about missing data. For example, • Building latent variables into the data

• Using a priori models to address missing data

• Healthcare data is notoriously noisy • Moreover, there is a lot of it • Algorithms must be robust to noise

and oversampling

• While pre-processing can address this, some useful information can be lost.

• Algorithms need to “intelligently” weight the data to draw meaningful conclusions.

Connecting Two BPR Encounters

©2014 MFMER | slide-26

Graph Data is Large and Complex

For decades, the community didn’t have the

computational resources to deal with

semantic data efficiently.

• Technology developers were unable to

pack enough memory into a computer

to hold the data

• Networks were too slow

• As a result, CPUs were “data starved”

New technologies address this issue

specifically

• Hadoop clusters

• Graph computers

(14)

©2014 MFMER | slide-27

Progressively complex queries using graph

computer vs standard SQL database

©2014 MFMER | slide-28

Final Thoughts

• Graph databases for healthcare were proposed in the 1970s.

• Over time, the conceptual model of graph databases / algorithms matured. • Technology has finally caught up.

• The technical community is now prepared to accept massive amounts of healthcare data and store it semantically.

• Semantic graph databases change the way that we look at data. • Graph analytics will yield new insights into existing and soon-to-be

collected datasets.

• There are still challenges in data migration and data quality to be addressed.

• Harass your favorite computer scientist / informaticist to make progress in these areas.

(15)

©2014 MFMER | slide-29

©2014 MFMER | slide-29

Speaker

First Plenary Session

THE USE OF "BIG DATA" - WHERE ARE WE

AND WHAT DOES THE FUTURE HOLD?

David R. Holmes III, PhD

Mayo Clinic College of Medicine

Rochester, MN, USA

References

Related documents

At the time of study design social work in England had not formally or overtly conflated the term digital with the term professional within the requirements for education and

The main wall of the living room has been designated as a &#34;Model Wall&#34; of Delta Gamma girls -- ELLE smiles at us from a Hawaiian Tropic ad and a Miss June USC

Talking about medication on a regular basis and higher continuity of care may enhance patient – provider concordance in rating medication adherence as a prerequisite

Simulating clinical concentrations and delivery rates of a typical intravenous infusion, a variety of routinely used pharmaceutical drugs were tested for potential binding to

Hasil penelitian menunjukkan bahwa substitusi tepung ikan dengan tepung maggot memberikan respon yang sama (p≥0,05) terhadap retensi protein, retensi lemak dan

Cancer stem-like cells (CSCs) are thought to be the root cause of chemotherapy-resistance and radio- resistance, ultimately leading to treatment failure in patients with

Given the relative lack of Arabic SF production, in comparison with the vast output of the West, and the coincidence of the main growth-period of the Western and

∗ The dubiously structural shocks include the wage-markup shock, the price-markup shock, the exogenous spending shock, and the risk premium shock. Source of Actual Data: See Smets