Global Sponsor:
Ensuring Compliance of Patient Data
with Big Data and BI
Denny Lee, Principal Program Manager at Microsoft
Ayad Shammout, Principal Business Intelligence Consultant at BIDMC
Agenda
A Quick Big Data Primer
• Big Data on the Microsoft Platform by Andrew Brust
• What is Big Data by Mark Whitehorn
Healthcare and Big Data
Compliance and Auditing
• SQL Compliance Project
Compliance and Auditing with Big Data and BI
• Big Data: Unstructured Volumes of DataWhat is Big Data?
Volume
Exceeds physical limits of vertical scalability
Velocity
Decision window small compared to data change rate
Variety
Many different formats makes integration expensive
Variability
Many options or variable
4
10x
increase every five years85%
from new data types Volume Velocity Variety Hadoop CloudBy 2015, organizations
that build a modern
information
management system
will outperform their
peers financially by 20
percent.
– Gartner, Mark Beyer “Information Management in the 21st Century”
7
HDInsight:
Visit HadoopOnAzure.com
Healthcare
Healthcare and Big Data
•
Often a laggard in technology
•
Yet, application of technology will be revolutionary to
understanding the human system
• Genomic sequencing brings the promise of understanding human biological systems
• Proteomic sequencing brings the promise of building the protein sequences to build customized drugs
Healthcare Big Data Example Scenarios
•
Clinical trials: not just examining existing drugs and
efficacy, but also potential deviations
• E.g. Originally Viagra was developed to lower blood pressure and treat Angina; now it also helps with newborn pulmonary hypertension and altitude sickness
•
Predicting healthcare incidences issues
•
Social media campaigns (e.g. advertising drugs)
•
Pharmaceutical campaign advertising analytics
• Modeling the consumer, trying to understand their user
behavior (why are they purchasing this medication, how do they feel about their ailment, related behaviors, etc.)
•
Patient Satisfaction Survey
Compliance
SQL Server Compliance:
http://www.microsoft.com/sqlserver/2008/en/us/compliance.aspx Reaching Compliance Whitepaper: http://www.microsoft.com/en-us/download/details.aspx?id=6808 13 IT Control S OX PCI H IP A A G L BA ID Management Separation of Duties Encryption Key Management Auditing Control Testing Policy Management
Auditing: BIDMC Scenario
Auditing is critical component HIPAA compliance and ensuring patient privacy
1 Billion rows+ of audit data
146 mission critical clinical applications
Comprehensive audits yield 300-500k transactions/day
HIPAA requires audit system with 20 years of data
Auditing Project
Available to community as part of Compliance SDK
Collaboration of Caregroup, MCS, SQLCAT
Quote:
Creating an enterprise tool for consolidated storage, reporting and alerting of all application audit data - that's cool!
John Halamka’s Cool Technology of the Week (Wellsphere Top Health Blogger, Health Impact Award)
Compliance: Auditing
Audit specific users Typically want to do sysadmin
But, many scenarios require auditing of more users because those users have insert, update access
Based on your policies
Audit specific tables
Audit all tables that can only be modified or deemed as sensitive
Audit Objects
Key and encryption access auditing (Audit action types: DATABASE_OBJECT_ACCESS_GROUP and
DATABASE_OBJECT_CHANGE_GROUP)
Audit everything approach
Can grow quite quickly (i.e. lots of data) so may want to limit data
Or have your audit reporting system filter out data you do not need
BIDMC Compliance Project
16 SSIS SSIS SSIS HDInsight Windows HDInsight Azure SQ L S erv er 2008/ 2012 Audit Logs ETL Logs to HDFS SSAS (tabular) Use Excel 2013PowerPivot and Power View.
Centralizing Audit Logs and Reporting
Centralizing Logs
Allows you to have one system process all audit logs from your servers
Easier manageability
Set files to 250MB in size (less files, but not too large to process)
Optimized for Hadoop General Rule of Thumb: 250MB-1GB file sizes
Can also centralize processing
… and centralize reporting
Compliance SDK contains the full project
Organized by Server, Database, DDL, and DML actions
Auditing: Interesting Observations
Backup a user database:
Need CREATE permissions on the master database to look at the backup media
The CREATE permission is a misnomer since you are not creating
Nevertheless required to do a backup hence the RESTORE LABELONLY statements in your audit
Server Principal Name is the user name
A lot of VIEW SERVER STATE calls but is part of important
server audit specification (may want to filter this out)
Audit Logs can generate A LOT of data
• 2 medium servers generated 250GB of files in 6 hours!
Auditing Sensitive Information
19
Querying Audit Information
Use PowerPivot / Power View / Analysis Services to Query the data.
Security Information
Policy Information
Process Audit Information
Use SSIS to process SQL2008 All-Actions Audit Information and other CG application audit log data; potentially can use Management Performance DW framework.
Caregroup Environment File Server SQL Audit Connect/Logic SSIS CG Application Data Intersystems Cache SQL2005 Oracle
SQL2008 All-Actions Audit Data
SQL 2008 / 2012 R2 SSRS 2008 / Power View Policy Analysis Policy Reports Policy Best Practices Security Analysis Security Reports Compliance Reports
Feedback Action Loop Update systems to keep them
Audit Logs
20
Storage Infrastructure
Transfer files to ASV via AzCopy, CloudExplorer, etc.
Storage Infrastructure
21
Hadoop on Azure
Compute Nodes (Medium VMs) Azure Storage Vault (ASV)
Azure Blob Storage
22
SSIS
Hadoop / Auditing: File sizes
Currently testing gz vs. raw,
• E.g. 12MB raw text file vs. 633Kb gz file (~20x compression)
20x smaller size, ~same query time
• Approx same map / reduce task utilization
File Size is 250MB-1GB
• SSIS package takes care of the size
Future testing: avro, protobuf
23
Query Duration (s)
select count(*) from sql_audit_asv_raw 56.066
Hadoop / Auditing: Formats
For ease of processing, replace carriage returns within
embedded SQL statements, e.g.
select col1, col2 from tableA
to
select col1, col2 from tableA
This allows you to create a Hive table using CR as row
delimiter (i.e. does not have things like SQL quoted
identifiers)
SQOOP, HiveODBC, Templeton, CSV, etc
Global Sponsor:
Questions?
Global Sponsor: