Research Concept
Presentations
Section 2
Information and Knowledge for Decision Making
An NSF I/UCRC Planning Grant Workshop
L.I.F.E. Form Access
•
Please go to:
http://iucrc.renci.org
•
Select
Planning Grant Workshop
•
Then select
L.I.F.E Form Evaluations
•
PASSWORD:
unc2015
•
ID yourself as IAB
OBJECTIVES
APPROACH/TECHNIQUES
DELIVERABLES
BENEFITS TO INDUSTRY
Symptom Extraction from the EHR for
Epidemiological Studies: The Hybrid NLP
Workbench
Stephanie W. Haas
School of Information and Library Science
• The Atherosclerosis Risk in Communities (ARIC)1 study focuses on identifying symptoms of worsening heart function such as shortness of breath, edema and
orthopnea. Currently, records are read by human experts. • An NLP system that automatically extracts symptom
mentions and presents the results for human review will improve cost-‐effectiveness, timeliness and accuracy of the process. Better data provision will support epidemiologic surveillance.
• Develop and evaluate performance of rule-‐based NLP, machine learning, and hybrid algorithms for identifying symptom mentions in the EHR. The need to tailor algorithms to specific symptoms, parts of the EHR, or hospitals will also be explored.
• Design a workbench that allows a human expert to review and confirm/deny proposed mentions, supporting expert – system interaction in a variety of ways
• Rule-‐based, machine learning, or hybrid system that identifies symptom mentions in all parts of the EHR.
• Interaction design to facilitate human expert confirmation of mentions.
• Workbench-‐style interface for results review.
• Workbench design and interaction leverages strengths of automatic extraction technologies and expert judgment, with regard to usability requirements.
• Algorithms and workbench could be extended to other health conditions and to other domains where human expertise must be merged with automatic extraction processes to produce optimal results.
Symptom Extraction from the EHR for
Epidemiological Studies: The Hybrid NLP
Workbench
Stephanie W. Haas
School of Information and Library Science
record
symptom
list
read
identify
symptom
mention
extraction
system
EHR
proposed
symptom
mentions
Workbench
confirm
deny
view more
context
symptom
list
add
current
proposed
Symptom Extraction from the EHR for
Epidemiological Studies: The Hybrid NLP
Workbench
Stephanie W. Haas
School of Information and Library Science
Rule-‐based NLP system
vs.
gold standard (n = 112 records)
2
ARIC HF Variable
Recall
Precision
based on gold standard (based
on post-extraction review)
# additional patients
identified by system
New onset or
worsening shortness
of breath
100%
76% (91%)
13
New onset or
worsening edema
98%
52% (66%)
11
Paroxysmal
nocturnal dyspnea
100%
64% (73%
1
Orthopnea
100%
81% (90%)
2
1The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. Am J Epidemiol 1989;
129(4):687-‐702.
2Moore C, Shaffer K, Kucharska-‐Newton A, Haas S, Heiss G (2015) Using natural language processing to facilitate medical
Symptom Extraction from the EHR for
Epidemiological Studies: The Hybrid NLP
Workbench
Stephanie W. Haas
School of Information and Library Science
Weighting symptom mentions
•
symptom type
•
frequency
•
form of expression
•
location in EHR
Relationship between mentions
•
confirmation
•
contradiction
•
uncertainty
•
change over time
Interaction
•
workflow (e.g., group all
mentions of a symptom)
•
context of mentions (text, EHR
location)
•
include confidence rating
•
default confirm or deny
Design & Deployment
•
algorithm: rule-‐based, machine
learning, hybrid
•
tuning for variation across
symptom, expert, hospital
•
acceptance by experts,
epidemiologists
•
expansion into other conditions
OBJECTIVES
APPROACH/TECHNIQUES
DELIVERABLES
BENEFITS TO INDUSTRY
PRECISE CARE using MedSIFTER:
Depression & Memory Loss Case Studies
Javed Mostafa
School of Information and Library Science Biomedical Research Imaging Center
• AIM1: Personalization: Leverage highly robust user modeling algorithm to learn and predict precise care information
• AIM2: Prediction: Develop online diagnosis and
screening tools for precise status checks and monitoring • ~35 million American adults struggle with depression at
somepoint in their lives
• Alzheimer patients will rise from 5 to 14 million by 2050 in the USA
• Both conditions are grossly underdiagnosed and require ongoing monitoring and support
• 87% US adults use the Internet and 72% sought health informaton
Personalization & Precision Care
High-‐volume text & image processing and personalization platform
• Unstructured content can be processed ONLINE to determine key themes & clusters automatically
• Content can be MAPPEDto a “user profile” (i.e., user model)
• The model can PREDICT the likelihood of interest / user characteristics
• CanDETECT changing information and interests
•
A highly effcient and effective system for
data integration and analytics
for difficult to
diagnose and treat conditions
•
A flexible
“service” oriented platform
that
can be leveraged for a wide variety of
precision care settings
• Work with seasoned researchers in ML and
HCI
• Access to realistic data and workflow
PRECISE CARE using MedSIFTER:
Depression & Memory Loss Case Studies
Javed Mostafa
School of Information and Library Science Biomedical Research Imaging Center
Patient Portal
(Mobile App/Web
)
Patient
Care Provider/s
User Model for Personalization
Medications
Diagnosis
Prognosis – Progression
Treatment Options
PRECISE CARE using MedSIFTER:
Depression & Memory Loss Case Studies
Javed Mostafa
School of Information and Library Science Biomedical Research Imaging Center
Patient Reported Outcome (PRO) or Other Instruments
( Plus Behavioral Data on Mobile App/Web
)
Patient
User Model
for Screening / Status-Checks
Alarming Condition
Severity Index
Mild
Slightly Degraded
PRECISE CARE using MedSIFTER:
Depression & Memory Loss Case Studies
Javed Mostafa
School of Information and Library Science Biomedical Research Imaging Center
Categories
(info topics / severity levels)
c1 c2 c3 : : cn u1 u2 u3 : : un t1 t2 t3 : : tn
Probability that category 2 is the
top-most relevant category
Probability that category 1
is relevant
Top class Relevance of categories
User profile/model Acquired by using Robust ML techniques
Data Streams/Sources – Behavior or Clinical Data
Carolina DW
UNC EHR data
Physiological Real-time
Data
OBJECTIVES
APPROACH/TECHNIQUES
DELIVERABLES
BENEFITS TO INDUSTRY
Adapting information extraction as a tool
to guide research exploration
Charles Schmitt
Renaissance Computing Institute
• Research, whether for science, business, or intelligence, is an exploratory process that involves seeking, processing, and structuring information from a variety of sources to form conclusions that must then be supported by evidence • This project seeks to improve the research process by
extracting and structuring information that is processed during research activities into a research-‐focused
knowledge base (RKB).
• The RKB provides the basis to: improve subsequent information seeking tasks, provide review of prior
exploration, and to provide provenance about conclusions.
• Information extraction techniques will be employed to extract key content from web-‐based information sources • Recent advances in statistical embedding will be employed
to develop knowledge representations that reduce
information dimensionality while providing generalization • Knowledge representations will form the basis for research
specific knowledge bases that drive subsequent applications
• Development of new methods for calculating distance from new information sources to RKB
• A set of methods for developing RKB
• Software libraries for extracting research information from common web-‐based information sources and to serve as templates for additional extractors
• Software library and API to score new information sources for relevancy to RKBs, allowing users to rank potential new information sources
• Software library and API to allow for development of additional applications that leverage RKBs, such as tools to provide research summaries.
• New methods and tools to assist R&D programs that rely heavily on integration of knowledge from multiple sources • Filter new information, capture provenance, support
conclusions
• Especially relevant for biomedical fields e.g., adjudication of clinical-‐relevant genomic variants; research into side effects of specific therapeutics; understanding the biology impacts of natural products; reviewing literature to determine environmental impacts of materials.
Adapting information extraction as a tool
to guide research exploration
Charles Schmitt
Renaissance Computing Institute
What knowledge is
needed next???
Research Specific
Knowledge Base
Research Specific
Knowledge Base
Research Specific
Knowledge Base
Extract &
Organize
Improve
exploration
Summarize
R&D activities
Provide
provenance
Adapting information extraction as a tool
to guide research exploration
Charles Schmitt
Renaissance Computing Institute
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 kd ist kgrowth
Current work: Don’t solve general AI, focus on usefulness
RKB
K_dist = distance of new information source
from RKB
K_growth = growth in RKB induced by a new
information source
Core techniques are rapidly evolving
•
Latent Semantic Analysis
•
Word embeddings
•
King is to queen as man is to …
•
Phrase embedding
Provide both:
-‐ Structure for RKB
-‐ Distance metric
Project Objectives:
•
Assess outside of current test environment
•
Compare techniques for calculating k_dist, k_growth
•
Compare unsupervised, semi-‐supervised, and supervised training
•
Further develop solution
•
User selection of relevant information and research project
•
User feedback
•
Explore statistical embeddings augmented with domain ontologies
•
Explore use of RKB:
•
Summarizing exploration
•
Providing provenance
Adapting information extraction as a tool
to guide research exploration
Charles Schmitt
OBJECTIVES
APPROACH/TECHNIQUES
DELIVERABLES
BENEFITS TO INDUSTRY
Using Systems Science Methods to Improve
Colorectal Cancer Screening in NC
Kristen Hassmiller Lich
Gillings School of Global Public Health Dept of Health Policy & Mgmt• Support federal, state, payer, and local community decision making about how to improve colorectal cancer screening rates overall, address disparities, and improve health among the population of North Carolina by simulating the determinants of current care as well as alternate strategies under consideration.
• Individual-‐based modeling (IBM) using AnyLogic software was used to integrate census data, multi-‐level statistical models developed using population-‐based claims and other data to explain colorectal cancer screening behaviors
(compliance and modality), research on the natural history of colorectal cancer, and stakeholder-‐developed
intervention scenarios.
• Simulation-‐informed policy recommendations were presented to national (Centers for Disease Control and Prevention) and local (NC Dept of Health) decision makers and others through research and policy presentations and peer-‐reviewed manuscripts.
• This replicable approach leverages existing (but often fragmented) data and technology to support comparative effectiveness analysis at the population level, and to support local capacity planning (i.e., colonoscopy). • Technology could be extended to other populations,
Using Systems Science Methods to Improve
Colorectal Cancer Screening in NC
Kristen Hassmiller Lich
Gillings School of Global Public Health Dept of Health Policy & MgmtThe model integrated rich data, and informed state and federal decision making
about how to address gaps in colorectal cancer screening at the population level.
Using Systems Science Methods to Improve
Colorectal Cancer Screening in NC
Kristen Hassmiller Lich
Gillings School of Global Public Health Dept of Health Policy & MgmtWe simulate current screening
behaviors, in order to
compare future intervention
options (“counterfactuals”)…
(Cost-‐effectiveness efficiency frontier is
shown above; and NC projections by
county are shown at right)
OBJECTIVES
APPROACH/TECHNIQUES
DELIVERABLES
BENEFITS TO INDUSTRY
Data-‐driven decision making in
emergency health-‐care operations
Nilay Tanik Argon
Statistics and Operations Research
• Support federal, state, and local emergency response planning
• within emergency departments and beyond hospitals • during day-‐to-‐day emergencies as well as mass-‐casualty
events
• by means of mathematical and statistical decision making tools
• Design and control
• Statistical analysis and machine learning tools
• Stochastic modeling – queueing theory, Markov decision processes, etc.
• Computer simulation – mainly discrete-‐event simulations (Arena, Simio, Anylogic, etc.)
• Rules of thumbs for the design of emergency response systems: Number and location of trauma centers and transportation resources
• Dynamic policy recommendations and simple calculators for ambulance routing, surge capacity generation, triage, etc. during mass-‐casualty events.
• Dynamic policy recommendations, simulation, and
calculation tools at emergency departments: patient flow, staffing, triage, diversion
• Analytics-‐based decision making tools that can be used in different hospitals and emergency response systems.
• Advanced modeling of health-‐care operations that could be expanded to other parts of health systems
• Core values: Quality of care; fairness; efficiency; cost effectiveness
•
Ambulance dispatching during a disaster (with A. Mills and S. Ziya)
Data-‐driven decision making in
emergency health-‐care operations
Nilay Tanik Argon
Statistics and Operations Research
Question:
Which casualties
should be transported to
which treatment facilities?
Factors:
1. Limited ambulances
2. Travel times
3. Hospital capabilities
4. Changing ED occupancy
levels
Solution approach:
•
Model as a queuing control problem
•
Develop heuristic policies that are easy to implement
•
Test policies by a realistic simulation model – data from national trauma data base
Question:
At each casualty location, which patients should be given priority for
transportation? Triage!
Solution approach:
Model as a fluid model and solve
Test policies by a discrete-‐event simulator
Decision support tool:
Available via web (
http://www.restarttriage.com
)
Data-‐driven decision making in
emergency health-‐care operations
Nilay Tanik Argon
Statistics and Operations Research
Patient Prioritization in Mass Casualty Incidents
(with A. Mills and S. Ziya)
Data-‐driven decision making in
emergency health-‐care operations
Nilay Tanik Argon
Statistics and Operations Research
•
Predict operational characteristics
of patients at triage:
•
Admit or not?
•
Complex or not?
•
Develop statistical tools
that could be embedded to already existing electronic
records system for prediction.
•
Use these tools for
more efficient operational design
:
•
If a patient is predicted to have a high probability of admission, request a
hospital bed earlier to shorten boarding time.
•
Based on the complexity of the patient, treat the patient at fast track or
change his/her priority level.
•
Predictive and operational solutions for Emergency Departments
(with A. Mehrotra, D. Travers, and S. Ziya)