Information and Knowledge for Decision Making

(1)

Research Concept

Presentations

Section 2

An NSF I/UCRC Planning Grant Workshop

(2)

L.I.F.E. Form Access

•

Please go to:

http://iucrc.renci.org

•

Select

Planning Grant Workshop

•

Then select

L.I.F.E Form Evaluations

•

PASSWORD:

unc2015

•

ID yourself as IAB

(3)

OBJECTIVES

APPROACH/TECHNIQUES

DELIVERABLES

BENEFITS TO INDUSTRY

Symptom Extraction from the EHR for

Epidemiological Studies: The Hybrid NLP

Workbench

Stephanie W. Haas

School of Information and Library Science

• The Atherosclerosis Risk in Communities (ARIC)1 study focuses on identifying symptoms of worsening heart function such as shortness of breath, edema and

orthopnea. Currently, records are read by human experts. • An NLP system that automatically extracts symptom

mentions and presents the results for human review will improve cost-‐effectiveness, timeliness and accuracy of the process. Better data provision will support epidemiologic surveillance.

• Develop and evaluate performance of rule-‐based NLP, machine learning, and hybrid algorithms for identifying symptom mentions in the EHR. The need to tailor algorithms to specific symptoms, parts of the EHR, or hospitals will also be explored.

• Design a workbench that allows a human expert to review and confirm/deny proposed mentions, supporting expert – system interaction in a variety of ways

• Rule-‐based, machine learning, or hybrid system that identifies symptom mentions in all parts of the EHR.

• Interaction design to facilitate human expert confirmation of mentions.

• Workbench-‐style interface for results review.

• Workbench design and interaction leverages strengths of automatic extraction technologies and expert judgment, with regard to usability requirements.

• Algorithms and workbench could be extended to other health conditions and to other domains where human expertise must be merged with automatic extraction processes to produce optimal results.

(4)

Symptom Extraction from the EHR for

Workbench

Stephanie W. Haas

School of Information and Library Science

record

symptom

list

read

identify

symptom

mention

extraction

system

EHR

proposed

symptom

mentions

Workbench

confirm

deny

context

symptom

list

add

current

proposed

(5)

Symptom Extraction from the EHR for

Workbench

Stephanie W. Haas

School of Information and Library Science

Rule-‐based NLP system

vs.

gold standard (n = 112 records)

2

ARIC HF Variable

Recall

Precision

based on gold standard (based

on post-extraction review)

# additional patients

identified by system

New onset or

worsening shortness

of breath

100%

76% (91%)

13

New onset or

worsening edema

98%

52% (66%)

11

Paroxysmal

nocturnal dyspnea

100%

64% (73%

1

Orthopnea

100%

81% (90%)

2

1_{The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. Am J Epidemiol 1989;}

129(4):687-‐702.

2_{Moore C, Shaffer K, Kucharska-‐Newton A, Haas S, Heiss G (2015) Using natural language processing to facilitate medical}

(6)

Symptom Extraction from the EHR for

Workbench

Stephanie W. Haas

School of Information and Library Science

Weighting symptom mentions

•

symptom type

•

frequency

•

form of expression

•

location in EHR

Relationship between mentions

•

confirmation

•

contradiction

•

uncertainty

•

change over time

Interaction

•

workflow (e.g., group all

mentions of a symptom)

•

context of mentions (text, EHR

location)

•

include confidence rating

•

default confirm or deny

Design & Deployment

•

algorithm: rule-‐based, machine

learning, hybrid

•

tuning for variation across

symptom, expert, hospital

•

acceptance by experts,

epidemiologists

•

expansion into other conditions

(7)

OBJECTIVES

APPROACH/TECHNIQUES

DELIVERABLES

PRECISE CARE using MedSIFTER:

Depression & Memory Loss Case Studies

Javed Mostafa

School of Information and Library Science Biomedical Research Imaging Center

• AIM1: Personalization: Leverage highly robust user modeling algorithm to learn and predict precise care information

• AIM2: Prediction: Develop online diagnosis and

screening tools for precise status checks and monitoring • ~35 million American adults struggle with depression at

somepoint in their lives

• Alzheimer patients will rise from 5 to 14 million by 2050 in the USA

• Both conditions are grossly underdiagnosed and require ongoing monitoring and support

• 87% US adults use the Internet and 72% sought health informaton

Personalization & Precision Care

High-‐volume text & image processing and personalization platform

• Unstructured content can be processed ONLINE to determine key themes & clusters automatically

• Content can be MAPPEDto a “user profile” (i.e., user model)

• The model can PREDICT the likelihood of interest / user characteristics

• CanDETECT changing information and interests

•

A highly effcient and effective system for

data integration and analytics

for difficult to

diagnose and treat conditions

•

A flexible

“service” oriented platform

that

can be leveraged for a wide variety of

precision care settings

• Work with seasoned researchers in ML and

HCI

• Access to realistic data and workflow

(8)

PRECISE CARE using MedSIFTER:

Javed Mostafa

School of Information and Library Science Biomedical Research Imaging Center

Patient Portal

(Mobile App/Web

)

Patient

Care Provider/s

User Model for Personalization

Medications

Diagnosis

Prognosis – Progression

Treatment Options

(9)

Javed Mostafa

School of Information and Library Science Biomedical Research Imaging Center

Patient Reported Outcome (PRO) or Other Instruments

( Plus Behavioral Data on Mobile App/Web

)

Patient

User Model

for Screening / Status-Checks

Alarming Condition

Severity Index

Mild

Slightly Degraded

(10)

Javed Mostafa

School of Information and Library Science Biomedical Research Imaging Center

Probability that category 2 is the

top-most relevant category

Probability that category 1

is relevant

Top class Relevance of categories

User profile/model Acquired by using Robust ML techniques

Data Streams/Sources – Behavior or Clinical Data

Carolina DW

UNC EHR data

Physiological Real-time

Data

(11)

OBJECTIVES

APPROACH/TECHNIQUES

DELIVERABLES

Adapting information extraction as a tool

to guide research exploration

Charles Schmitt

Renaissance Computing Institute

• Research, whether for science, business, or intelligence, is an exploratory process that involves seeking, processing, and structuring information from a variety of sources to form conclusions that must then be supported by evidence • This project seeks to improve the research process by

extracting and structuring information that is processed during research activities into a research-‐focused

knowledge base (RKB).

• The RKB provides the basis to: improve subsequent information seeking tasks, provide review of prior

exploration, and to provide provenance about conclusions.

• Information extraction techniques will be employed to extract key content from web-‐based information sources • Recent advances in statistical embedding will be employed

to develop knowledge representations that reduce

information dimensionality while providing generalization • Knowledge representations will form the basis for research

specific knowledge bases that drive subsequent applications

• Development of new methods for calculating distance from new information sources to RKB

• A set of methods for developing RKB

• Software libraries for extracting research information from common web-‐based information sources and to serve as templates for additional extractors

• Software library and API to score new information sources for relevancy to RKBs, allowing users to rank potential new information sources

• Software library and API to allow for development of additional applications that leverage RKBs, such as tools to provide research summaries.

• New methods and tools to assist R&D programs that rely heavily on integration of knowledge from multiple sources • Filter new information, capture provenance, support

conclusions

• Especially relevant for biomedical fields e.g., adjudication of clinical-‐relevant genomic variants; research into side effects of specific therapeutics; understanding the biology impacts of natural products; reviewing literature to determine environmental impacts of materials.

(12)

Adapting information extraction as a tool

Charles Schmitt

Renaissance Computing Institute

What knowledge is

needed next???

Research Specific

Knowledge Base

Research Specific

Knowledge Base

Research Specific

Knowledge Base

Extract &

Organize

Improve

exploration

Summarize

R&D activities

Provide

provenance

(13)

Charles Schmitt

Renaissance Computing Institute

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 kd ist kgrowth

Current work: Don’t solve general AI, focus on usefulness

RKB

K_dist = distance of new information source

from RKB

K_growth = growth in RKB induced by a new

information source

Core techniques are rapidly evolving

•

Latent Semantic Analysis

•

Word embeddings

•

King is to queen as man is to …

•

Phrase embedding

Provide both:

-‐ Structure for RKB

-‐ Distance metric

(14)

Project Objectives:

•

Assess outside of current test environment

•

Compare techniques for calculating k_dist, k_growth

•

Compare unsupervised, semi-‐supervised, and supervised training

•

Further develop solution

•

User selection of relevant information and research project

•

User feedback

•

Explore statistical embeddings augmented with domain ontologies

•

Explore use of RKB:

•

Summarizing exploration

•

Providing provenance

Charles Schmitt

(15)

OBJECTIVES

APPROACH/TECHNIQUES

DELIVERABLES

Using Systems Science Methods to Improve

Colorectal Cancer Screening in NC

Kristen Hassmiller Lich

_{Gillings School of Global Public Health} Dept of Health Policy & Mgmt

• Support federal, state, payer, and local community decision making about how to improve colorectal cancer screening rates overall, address disparities, and improve health among the population of North Carolina by simulating the determinants of current care as well as alternate strategies under consideration.

• Individual-‐based modeling (IBM) using AnyLogic software was used to integrate census data, multi-‐level statistical models developed using population-‐based claims and other data to explain colorectal cancer screening behaviors

(compliance and modality), research on the natural history of colorectal cancer, and stakeholder-‐developed

intervention scenarios.

• Simulation-‐informed policy recommendations were presented to national (Centers for Disease Control and Prevention) and local (NC Dept of Health) decision makers and others through research and policy presentations and peer-‐reviewed manuscripts.

• This replicable approach leverages existing (but often fragmented) data and technology to support comparative effectiveness analysis at the population level, and to support local capacity planning (i.e., colonoscopy). • Technology could be extended to other populations,

(16)

Using Systems Science Methods to Improve

Colorectal Cancer Screening in NC

The model integrated rich data, and informed state and federal decision making

about how to address gaps in colorectal cancer screening at the population level.

(17)

Using Systems Science Methods to Improve

Colorectal Cancer Screening in NC

We simulate current screening

behaviors, in order to

compare future intervention

options (“counterfactuals”)…

(Cost-‐effectiveness efficiency frontier is

shown above; and NC projections by

county are shown at right)

(18)

OBJECTIVES

APPROACH/TECHNIQUES

DELIVERABLES

Data-‐driven decision making in

emergency health-‐care operations

Nilay Tanik Argon

Statistics and Operations Research

• Support federal, state, and local emergency response planning

• within emergency departments and beyond hospitals • during day-‐to-‐day emergencies as well as mass-‐casualty

events

• by means of mathematical and statistical decision making tools

• Design and control

• Statistical analysis and machine learning tools

• Stochastic modeling – queueing theory, Markov decision processes, etc.

• Computer simulation – mainly discrete-‐event simulations (Arena, Simio, Anylogic, etc.)

• Rules of thumbs for the design of emergency response systems: Number and location of trauma centers and transportation resources

• Dynamic policy recommendations and simple calculators for ambulance routing, surge capacity generation, triage, etc. during mass-‐casualty events.

• Dynamic policy recommendations, simulation, and

calculation tools at emergency departments: patient flow, staffing, triage, diversion

• Analytics-‐based decision making tools that can be used in different hospitals and emergency response systems.

• Advanced modeling of health-‐care operations that could be expanded to other parts of health systems

• Core values: Quality of care; fairness; efficiency; cost effectiveness

(19)

•

Ambulance dispatching during a disaster (with A. Mills and S. Ziya)

emergency health-‐care operations

Nilay Tanik Argon

Statistics and Operations Research

Question:

Which casualties

should be transported to

which treatment facilities?

Factors:

1. Limited ambulances

2. Travel times

3. Hospital capabilities

4. Changing ED occupancy

levels

Solution approach:

•

Model as a queuing control problem

•

Develop heuristic policies that are easy to implement

•

Test policies by a realistic simulation model – data from national trauma data base

(20)

Question:

At each casualty location, which patients should be given priority for

transportation? Triage!

Solution approach:

Model as a fluid model and solve

Test policies by a discrete-‐event simulator

Decision support tool:

Available via web (

http://www.restarttriage.com

)

Data-‐driven decision making in

Nilay Tanik Argon

Statistics and Operations Research

Patient Prioritization in Mass Casualty Incidents

(with A. Mills and S. Ziya)

(21)

Nilay Tanik Argon

Statistics and Operations Research

•

Predict operational characteristics

of patients at triage:

•

Admit or not?

•

Complex or not?

•

Develop statistical tools

that could be embedded to already existing electronic

records system for prediction.

•

Use these tools for

more efficient operational design

:

•

If a patient is predicted to have a high probability of admission, request a

hospital bed earlier to shorten boarding time.

•

Based on the complexity of the patient, treat the patient at fast track or

change his/her priority level.

•

Predictive and operational solutions for Emergency Departments

(with A. Mehrotra, D. Travers, and S. Ziya)

Information and Knowledge for Decision Making

Research Concept

OBJECTIVES

School of Information and Library Science

Symptom Extraction from the EHR for

School of Information and Library Science

record

Workbench

School of Information and Library Science

Rule-­‐based NLP system

Symptom Extraction from the EHR for

School of Information and Library Science

Weighting symptom mentions

Design & Deployment

Javed Mostafa

School of Information and Library Science Biomedical Research Imaging Center

Personalization & Precision Care

A highly effcient and effective system for

PRECISE CARE using MedSIFTER:

School of Information and Library Science Biomedical Research Imaging Center

Patient Portal

School of Information and Library Science Biomedical Research Imaging Center

Patient Reported Outcome (PRO) or Other Instruments

School of Information and Library Science Biomedical Research Imaging Center

Probability that category 2 is the

OBJECTIVES

Renaissance Computing Institute

Adapting information extraction as a tool

Renaissance Computing Institute

What knowledge is

Charles Schmitt

Renaissance Computing Institute

Current work: Don’t solve general AI, focus on usefulness

-­‐ Structure for RKB

User feedback

Colorectal Cancer Screening in NC

Using Systems Science Methods to Improve

Colorectal Cancer Screening in NC

emergency health-­‐care operations

Statistics and Operations Research

Ambulance dispatching during a disaster (with A. Mills and S. Ziya)

Statistics and Operations Research

Question:

Question:

Data-­‐driven decision making in

Statistics and Operations Research

Patient Prioritization in Mass Casualty Incidents

Statistics and Operations Research

Predict operational characteristics

hospital bed earlier to shorten boarding time.

Rule-‐based NLP system

-‐ Structure for RKB

emergency health-‐care operations

Data-‐driven decision making in