• No results found

Big Data Analytics for Mitigating Insider Risks in Electronic Medical Records

N/A
N/A
Protected

Academic year: 2021

Share "Big Data Analytics for Mitigating Insider Risks in Electronic Medical Records"

Copied!
48
0
0

Loading.... (view fulltext now)

Full text

(1)

Big Data Analytics for Mitigating 

Insider Risks in

Electronic Medical Records

Bradley Malin, Ph.D. Associate Prof. & Vice Chair of Biomedical Informatics, School of Medicine Associate Prof. of Computer Science, School of Engineering Vanderbilt University 21/8/2015

(2)
(3)

January 1, 2015

Logged over 2,000,000 

users’ interactions

Record

Alice’s

Electronic

Medical

Record

Record

Alice’s

Electronic

Medical

Record

Alice’s

Electronic

Medical ecord

Alice’s Electronic Medical Record

(4)

January 

2

, 2015

Logged over 

2,000,000 users’ 

interactions

(5)

January 

3

, 2015

Logged over 

2,000,000 users’ 

interactions

(6)

Auditing Requirements

Federal (US)

1. Access control

2. Track & audit

employee accesses

3. Store

logs for  6 years

(7)
(8)

How (Not) to Use Access Control

• Central Norway Health Region enabled break the glass

• 1/2 of 99,000 patients  broke glass

• 1/2 of 12,000 users  broke glass

• ~300K events in 1 month

Role Users Break Grass

Nurse 5633 36% Doctor 2927 52% Health Secretary 1876 52% Physiotherapist 382 56% Psychologist 194 58% (Røstad & Øystein 2007) 8

(9)

Oct 2007

Palisades Medical Center

Dozens of Employees

(10)

10

$1 million fine

UCLA

HHS Investigation

July 8, 2011

(11)

The Model is

Wrong

(12)

Learning Suspicious EMR Access Behavior

(Boxwala et al, JAMIA, 2011) 12 Manually select 505  potential cases / controls  based on previous breaches  at Partners Healthcare LABEL Human experts label  cases as + / ‐ BUILD Build classifier from   labeled events PREDICT Calculate the prediction  probabilities using  classifier on all events SELECT New unlabeled events  from DB Model

• Support Vector Machines • Logistic Regression

(13)

Learning Suspicious EMR Access Behavior

(Boxwala et al, JAMIA, 2011)

Feature Coefficient Odds Ratio

Works in the same department 3.16 23.5  Same street address 2.60 13.45 Same family name 2.34 10.38 Over 200 accesses in a day 1.30 3.70 VIP Patient 1.18 3.23 … … … Same Zip Code ‐1.46 0.23 Is Provider ‐2.33 0.10 Care unit visit match ‐3.40 0.03

(14)

Role Refinement

• Northwestern Memorial Hospital

• 3 months of access logs to inpatient records

• 8K users, 16K patients, 1.1M accesses

– User ID – Patient ID – User position – Date / Time – Number of Orders Entered  – Patient Location in hospital – Service patient is on 14 (Zhang, Gunter, Liebovitz, Tian, & Malin – AMIA 2011)

(15)

Predictability is Job Dependent

Rank Most Predictable Accuracy Users

1 (tie) ED Assistant 100% 26

1 (tie) ED Physician – CPOE 100% 43

1 (tie) NMH Resident/Fellow ID Clinic-CPOE 100% 10

Rank Least Predictable Accuracy Users

140 Patient Care Staff Nurse 7.6% 1554

139 Rehab Occupational Therapist (OT) 14.3% 28 136 Patient Care Staff Nurse (Pilot) 22.1% 217

MOST PREDICTABLE

(16)

Where are We Going Wrong?

Actual Role Predicted Role Probability

Rehab – Occupational Therapist Rehab – Physical Therapist 85.7% Rehab – Physical Therapist Rehab – Occupational Therapist 60.0% 16

(17)
(18)

Suspicious or Anomalous?

(19)

Defining Access Control

(20)

January 1

EMR users linked if they accessed

1 patient in common

(Malin, Nyemba, Paulett 2011)

(21)

Mining to Model the System

(Malin, Nyemba, Paulett 2011) 2 nd Principal  Componen t Children’s Hospital University Hospital

(22)

Hypothesis!

• Collaborative systems are about social phenomena

• People should form communities

• We should be able to measure deviation from 

community structure

• Note: other social phenomena could be studied 

(temporal workflow*, function invoked* – if any, etc.)

(*Chen et al, IEEE TDSC 2012; Zhang et al. ACM SACMAT 2013; Zhang et al. ACM TMIS 2013) 22

(23)

Community‐Based Anomaly Detection (CADS)

Access  Logs Social  Relation Construction Community Deviation User  Communities Distance Measurement Deviation  Measurement User‐Specific  Deviation Scores Pattern  Extraction Anomaly  Detection

(24)

The average cluster coefficient 

for this network is 

0.48

, which 

is significantly larger than 

0.001 

for random networks

Users exhibit 

collaborative behavior 

in the health 

information system 

Example 6‐Nearest Neighbor Network

(1 day of accesses)

24

(25)

Auditing Strategies of the Past

• Principle Components Analysis (PCA)

– Graph‐based anomaly detection (Shyu et al 2003)

(How similar am I to spectral clusters of users?)

• K‐Nearest Neighbor (KNN)

– Nearest neighbor based anomaly detection (Liao et al 2002) (How similar am I to my “friends”?)

• High‐Volume Model 

(Gallagher et al 1998) (Do I access way more people than my relations?)

(26)

Social Structure Wins the Day!

Tr u e  Po si ti ve  Ra te False Positive Rate 26

(27)

Gripes & Future Musings

• Different providers within the same ward have different  behavior! • Different wards within the same healthcare institution have  different behavior! • Different healthcare organizations use different languages! • Logic (i.e., access control) and AI (i.e., data mining) need to  play nicely together

(28)

Questions?

[email protected]

Health Information Privacy Laboratory

http://www.hiplab.org/

(29)
(30)

High Confidence Rules

Rule Support Confidence Weeks Center for Patient & Professional Advocacy  Hearing & Speech 0.000581 0.860  18 Practice – City A  Clinic ‐ City A 0.000193 0.673 21 Infectious Disease – Clinic  Infectious Disease 0.000206 0.637 21 NICU  Neonatology 0.000613 0.629 17 VMG ‐ Family Practice  Clinic ‐ City A 0.00132 0.628 21 Vanderbilt Hearing School  Hearing & Speech 0.00142 0.619 22 30

(31)

Low Confidence Rules

(but occur in at least 3 weeks)

Rule Support Confidence Weeks

Anesthesiology Vanderbilt Hearing School 0.0000522 0.000581 6 Anesthesiology 4N  Labor & Delivery 0.0000526 0.000577 6 Anesthesiology Physician Liaison Program 0.0000565 0.000574  4 Emergency Medicine  Nutrition Clinic 0.0000454 0.000572 4 Anesthesiology Cardiac Cath Lab 0.0000590 0.000565 3 Emergency Medicine  Diabetes Ctr 0.0000458 0.000558 4 Anesthesiology Center for Clinical/Research Ethics 0.0000459 0.000528 7 Anesthesiology Infectious Disease – Clinic 0.0000454 0.000527 4 Anesthesiology Pediatric Immunology 0.0000458 0.000514 4

(32)

Big Data Audits Must Be 

Understandable

to be 

Actionable

(33)

What Makes Sense?

• Dr. Smith’s access of Peggy Johnson’s medical 

record was strange

• Dr. Smith’s access was 10 standard deviations 

away from normal behavior in his hospital

• Dr. Smith’s access was strange because he is a 

neonatologist and he accessed the record of a 

100 year‐old woman who, for the past year, has 

only been treated by gerontologists

33

(34)

So… Do You Believe 

Inferred Patterns?

(35)

Hypothesis: Locally Knowledgeable of Class

Anethesiologists Psychiatrists Coding & Charge  Entry Medical  Information  Services High (10) Medium (10) Low  (10) Ane. Rules High (10) Medium (10) Low  (10) Psych. Rules High  (10) Medium (10) Low (10) Code Rules High (10) Medium (10) Low (10) MIS Rules

(36)

Survey

• Employees presented with questions asked to report 

likelihood of rules on a 5 point Likert scale

• All employees asked the same set of 120 questions 

(four sets of 30)

“Someone from Anesthesiology accessed the record of patient John Doe.  How likely is it that someone from the following organizational area  accessed the same patient's record?” Not at  all Moderately

Slightly Very Completely

Anesthesiology

Not at  all

Moderately

Slightly Very Completely

Psychiatry

(37)

• Employees can distinguish 

between

high, med, and low

for 

their own rules

• Anesthesiologists evaluated with 

anesthesiology rules

• Tested hypothesis with linear 

Anethesiologists High Medium Low Ane. Rules

Hypothesis: Locally Knowledgeable of Class

(38)

Hypothesis: Locally Knowledgeable of Class

• Confirmed for every 

organizational area at 95% 

confidence level!

Anethesiologists High Medium Low

Area Strength p‐value

ANE

0.75 0.007

CODE

0.44 0.011

MIS

0.32 0.037

PSY

0.82 0.020 Ane. Rules 38

(39)

Learning Suspicious EMR Access Behavior

(Boxwala et al, JAMIA, 2011)

(40)

“Learning” Rules for Suspicious Access Detection

(Boxwala et al, JAMIA, 2011)

40

Feature Coefficient Odds Ratio

(41)

“Learning” Rules for Suspicious Access Detection

(Boxwala et al, JAMIA, 2011)

Feature Coefficient Odds Ratio

Works in the same department 3.16 23.5 

(42)

“Learning” Rules for Suspicious Access Detection

(Boxwala et al, JAMIA, 2011)

42

Feature Coefficient Odds Ratio

Works in the same department 3.16 23.5 

Same street address 2.60 13.45

(43)

“Learning” Rules for Suspicious Access Detection

(Boxwala et al, JAMIA, 2011)

Feature Coefficient Odds Ratio

Works in the same department 3.16 23.5 

Same street address 2.60 13.45

Same family name 2.34 10.38

(44)

“Learning” Rules for Suspicious Access Detection

(Boxwala et al, JAMIA, 2011)

44

Feature Coefficient Odds Ratio

Works in the same department 3.16 23.5  Same street address 2.60 13.45 Same family name 2.34 10.38 Over 200 accesses in a day 1.30 3.70 VIP Patient 1.18 3.23 … … …

(45)

“Learning” Rules for Suspicious Access Detection

(Boxwala et al, JAMIA, 2011)

Feature Coefficient Odds Ratio

Works in the same department 3.16 23.5  Same street address 2.60 13.45 Same family name 2.34 10.38 Over 200 accesses in a day 1.30 3.70 VIP Patient 1.18 3.23 … … … Same Zip Code ‐1.46 0.23

(46)

“Learning” Rules for Suspicious Access Detection

(Boxwala et al, JAMIA, 2011)

46

Feature Coefficient Odds Ratio

Works in the same department 3.16 23.5  Same street address 2.60 13.45 Same family name 2.34 10.38 Over 200 accesses in a day 1.30 3.70 VIP Patient 1.18 3.23 … … … Same Zip Code ‐1.46 0.23 Is Provider ‐2.33 0.10

(47)

Predictability is Job Dependent

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 500 1000 1500 2000 Pr ediction  Accur acy Number of Users in Role Med Student ‐ CPOE NMH Resident / Fellow ‐ CPOE Patient Care Staff Nurse Rehab OT

(48)

Another Healthcare Environment

• Vanderbilt EMR Logs

• 6 months

• Arbitrary Week

 2,500 users

 35,000 patients

 66,000 <user, patient> distinct accesses

48

References

Related documents

INNER CLASS-BASED ELCOM 4.1 Framework for Inner Class Cohesion Metric 4.2 Developing the Class Blueprint for Inner Class 4.3 Components of the Class Blueprint for Inner Class

Variables that proxy for the degree of foreign investment and demand for analyst services by foreign investors include (1) dummy variable that takes value 1 if a country’s U.S

Increased pension fund contribution Increased dividends, or special dividend Debt redemption (medium- and long-term) Acquisitions Increase in inventories Paydown of

The Polish-Russian local border traffic agreement (LBT) is a culmina- tion of a multi-year process associated with the development of relationships between the European Union and

117 See Deb Friedman. “Professionalism.” Feminist Alliance Against Rape Newsletter Fall 1975.. Black feminism was built upon these women’s analyses of their experiences, more

I would like to thank and credit the entire Village of Berkeley management team led by Village Administrator Rudy Espiritu, Chief of Police Tim Larem, Fire Chief Mike

A comprehensive assessment of the impacts of feedstock production on natural resources re- quires consideration of two important dimensions of production: the impact of

PICU: Pediatric Intensive Care Unit; PIM2: Pediatric Index of Mortality; PLR: Positive likelihood ratio; POPC: Pediatric Overall Performance Category; PPSV23: