Big Data Analytics for Mitigating
Insider Risks in
Electronic Medical Records
Bradley Malin, Ph.D. Associate Prof. & Vice Chair of Biomedical Informatics, School of Medicine Associate Prof. of Computer Science, School of Engineering Vanderbilt University 21/8/2015January 1, 2015
Logged over 2,000,000
users’ interactions
Record
Alice’s
Electronic
Medical
Record
Record
Alice’s
Electronic
Medical
Record
Alice’s
Electronic
Medical ecord
Alice’s Electronic Medical RecordJanuary
2
, 2015
Logged over
2,000,000 users’
interactions
January
3
, 2015
Logged over
2,000,000 users’
interactions
Auditing Requirements
Federal (US)
1. Access control
2. Track & audit
employee accesses
3. Store
logs for 6 years
How (Not) to Use Access Control
• Central Norway Health Region enabled break the glass
• 1/2 of 99,000 patients broke glass
• 1/2 of 12,000 users broke glass
• ~300K events in 1 month
Role Users Break Grass
Nurse 5633 36% Doctor 2927 52% Health Secretary 1876 52% Physiotherapist 382 56% Psychologist 194 58% (Røstad & Øystein 2007) 8
Oct 2007
Palisades Medical Center
Dozens of Employees
10
$1 million fine
UCLA
HHS Investigation
July 8, 2011
The Model is
Wrong
Learning Suspicious EMR Access Behavior
(Boxwala et al, JAMIA, 2011) 12 Manually select 505 potential cases / controls based on previous breaches at Partners Healthcare LABEL Human experts label cases as + / ‐ BUILD Build classifier from labeled events PREDICT Calculate the prediction probabilities using classifier on all events SELECT New unlabeled events from DB Model• Support Vector Machines • Logistic Regression
Learning Suspicious EMR Access Behavior
(Boxwala et al, JAMIA, 2011)
Feature Coefficient Odds Ratio
Works in the same department 3.16 23.5 Same street address 2.60 13.45 Same family name 2.34 10.38 Over 200 accesses in a day 1.30 3.70 VIP Patient 1.18 3.23 … … … Same Zip Code ‐1.46 0.23 Is Provider ‐2.33 0.10 Care unit visit match ‐3.40 0.03
Role Refinement
• Northwestern Memorial Hospital
• 3 months of access logs to inpatient records
• 8K users, 16K patients, 1.1M accesses
– User ID – Patient ID – User position – Date / Time – Number of Orders Entered – Patient Location in hospital – Service patient is on 14 (Zhang, Gunter, Liebovitz, Tian, & Malin – AMIA 2011)Predictability is Job Dependent
Rank Most Predictable Accuracy Users
1 (tie) ED Assistant 100% 26
1 (tie) ED Physician – CPOE 100% 43
1 (tie) NMH Resident/Fellow ID Clinic-CPOE 100% 10
Rank Least Predictable Accuracy Users
140 Patient Care Staff Nurse 7.6% 1554
139 Rehab Occupational Therapist (OT) 14.3% 28 136 Patient Care Staff Nurse (Pilot) 22.1% 217
MOST PREDICTABLE
Where are We Going Wrong?
Actual Role Predicted Role Probability
Rehab – Occupational Therapist Rehab – Physical Therapist 85.7% Rehab – Physical Therapist Rehab – Occupational Therapist 60.0% 16
Suspicious or Anomalous?
Defining Access Control
January 1
EMR users linked if they accessed
1 patient in common
(Malin, Nyemba, Paulett 2011)
Mining to Model the System
(Malin, Nyemba, Paulett 2011) 2 nd Principal Componen t Children’s Hospital University HospitalHypothesis!
• Collaborative systems are about social phenomena
• People should form communities
• We should be able to measure deviation from
community structure
• Note: other social phenomena could be studied
(temporal workflow*, function invoked* – if any, etc.)
(*Chen et al, IEEE TDSC 2012; Zhang et al. ACM SACMAT 2013; Zhang et al. ACM TMIS 2013) 22Community‐Based Anomaly Detection (CADS)
Access Logs Social Relation Construction Community Deviation User Communities Distance Measurement Deviation Measurement User‐Specific Deviation Scores Pattern Extraction Anomaly DetectionThe average cluster coefficient
for this network is
0.48
, which
is significantly larger than
0.001
for random networks
Users exhibit
collaborative behavior
in the health
information system
Example 6‐Nearest Neighbor Network
(1 day of accesses)
24Auditing Strategies of the Past
• Principle Components Analysis (PCA)
– Graph‐based anomaly detection (Shyu et al 2003)
(How similar am I to spectral clusters of users?)
• K‐Nearest Neighbor (KNN)
– Nearest neighbor based anomaly detection (Liao et al 2002) (How similar am I to my “friends”?)• High‐Volume Model
(Gallagher et al 1998) (Do I access way more people than my relations?)Social Structure Wins the Day!
Tr u e Po si ti ve Ra te False Positive Rate 26Gripes & Future Musings
• Different providers within the same ward have different behavior! • Different wards within the same healthcare institution have different behavior! • Different healthcare organizations use different languages! • Logic (i.e., access control) and AI (i.e., data mining) need to play nicely togetherHigh Confidence Rules
Rule Support Confidence Weeks Center for Patient & Professional Advocacy Hearing & Speech 0.000581 0.860 18 Practice – City A Clinic ‐ City A 0.000193 0.673 21 Infectious Disease – Clinic Infectious Disease 0.000206 0.637 21 NICU Neonatology 0.000613 0.629 17 VMG ‐ Family Practice Clinic ‐ City A 0.00132 0.628 21 Vanderbilt Hearing School Hearing & Speech 0.00142 0.619 22 30
Low Confidence Rules
(but occur in at least 3 weeks)
Rule Support Confidence Weeks
Anesthesiology Vanderbilt Hearing School 0.0000522 0.000581 6 Anesthesiology 4N Labor & Delivery 0.0000526 0.000577 6 Anesthesiology Physician Liaison Program 0.0000565 0.000574 4 Emergency Medicine Nutrition Clinic 0.0000454 0.000572 4 Anesthesiology Cardiac Cath Lab 0.0000590 0.000565 3 Emergency Medicine Diabetes Ctr 0.0000458 0.000558 4 Anesthesiology Center for Clinical/Research Ethics 0.0000459 0.000528 7 Anesthesiology Infectious Disease – Clinic 0.0000454 0.000527 4 Anesthesiology Pediatric Immunology 0.0000458 0.000514 4
Big Data Audits Must Be
Understandable
to be
Actionable
What Makes Sense?
• Dr. Smith’s access of Peggy Johnson’s medical
record was strange
• Dr. Smith’s access was 10 standard deviations
away from normal behavior in his hospital
• Dr. Smith’s access was strange because he is a
neonatologist and he accessed the record of a
100 year‐old woman who, for the past year, has
only been treated by gerontologists
33So… Do You Believe
Inferred Patterns?
Hypothesis: Locally Knowledgeable of Class
Anethesiologists Psychiatrists Coding & Charge Entry Medical Information Services High (10) Medium (10) Low (10) Ane. Rules High (10) Medium (10) Low (10) Psych. Rules High (10) Medium (10) Low (10) Code Rules High (10) Medium (10) Low (10) MIS Rules
Survey
• Employees presented with questions asked to report
likelihood of rules on a 5 point Likert scale
• All employees asked the same set of 120 questions
(four sets of 30)
“Someone from Anesthesiology accessed the record of patient John Doe. How likely is it that someone from the following organizational area accessed the same patient's record?” Not at all ModeratelySlightly Very Completely
Anesthesiology
Not at all
Moderately
Slightly Very Completely
Psychiatry
• Employees can distinguish
between
high, med, and low
for
their own rules
• Anesthesiologists evaluated with
anesthesiology rules
• Tested hypothesis with linear
Anethesiologists High Medium Low Ane. RulesHypothesis: Locally Knowledgeable of Class
Hypothesis: Locally Knowledgeable of Class
• Confirmed for every
organizational area at 95%
confidence level!
Anethesiologists High Medium LowArea Strength p‐value
ANE
0.75 0.007CODE
0.44 0.011MIS
0.32 0.037PSY
0.82 0.020 Ane. Rules 38Learning Suspicious EMR Access Behavior
(Boxwala et al, JAMIA, 2011)
“Learning” Rules for Suspicious Access Detection
(Boxwala et al, JAMIA, 2011)
40
Feature Coefficient Odds Ratio
“Learning” Rules for Suspicious Access Detection
(Boxwala et al, JAMIA, 2011)
Feature Coefficient Odds Ratio
Works in the same department 3.16 23.5
“Learning” Rules for Suspicious Access Detection
(Boxwala et al, JAMIA, 2011)
42
Feature Coefficient Odds Ratio
Works in the same department 3.16 23.5
Same street address 2.60 13.45
“Learning” Rules for Suspicious Access Detection
(Boxwala et al, JAMIA, 2011)
Feature Coefficient Odds Ratio
Works in the same department 3.16 23.5
Same street address 2.60 13.45
Same family name 2.34 10.38
“Learning” Rules for Suspicious Access Detection
(Boxwala et al, JAMIA, 2011)
44
Feature Coefficient Odds Ratio
Works in the same department 3.16 23.5 Same street address 2.60 13.45 Same family name 2.34 10.38 Over 200 accesses in a day 1.30 3.70 VIP Patient 1.18 3.23 … … …
“Learning” Rules for Suspicious Access Detection
(Boxwala et al, JAMIA, 2011)
Feature Coefficient Odds Ratio
Works in the same department 3.16 23.5 Same street address 2.60 13.45 Same family name 2.34 10.38 Over 200 accesses in a day 1.30 3.70 VIP Patient 1.18 3.23 … … … Same Zip Code ‐1.46 0.23
“Learning” Rules for Suspicious Access Detection
(Boxwala et al, JAMIA, 2011)
46
Feature Coefficient Odds Ratio
Works in the same department 3.16 23.5 Same street address 2.60 13.45 Same family name 2.34 10.38 Over 200 accesses in a day 1.30 3.70 VIP Patient 1.18 3.23 … … … Same Zip Code ‐1.46 0.23 Is Provider ‐2.33 0.10