• No results found

Ensuring Compliance of Patient Data with Big Data and BI

N/A
N/A
Protected

Academic year: 2021

Share "Ensuring Compliance of Patient Data with Big Data and BI"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Global Sponsor:

Ensuring Compliance of Patient Data

with Big Data and BI

Denny Lee, Principal Program Manager at Microsoft

Ayad Shammout, Principal Business Intelligence Consultant at BIDMC

(2)

Agenda

A Quick Big Data Primer

• Big Data on the Microsoft Platform by Andrew Brust

• What is Big Data by Mark Whitehorn

Healthcare and Big Data

Compliance and Auditing

• SQL Compliance Project

Compliance and Auditing with Big Data and BI

• Big Data: Unstructured Volumes of Data

(3)

What is Big Data?

Volume

Exceeds physical limits of vertical scalability

Velocity

Decision window small compared to data change rate

Variety

Many different formats makes integration expensive

Variability

Many options or variable

(4)

4

10x

increase every five years

85%

from new data types Volume Velocity Variety Hadoop Cloud

By 2015, organizations

that build a modern

information

management system

will outperform their

peers financially by 20

percent.

 – Gartner, Mark Beyer “Information Management in the 21st Century”

(5)
(6)
(7)

7

(8)
(9)

HDInsight:

Visit HadoopOnAzure.com

(10)

Healthcare

(11)

Healthcare and Big Data

Often a laggard in technology

Yet, application of technology will be revolutionary to

understanding the human system

• Genomic sequencing brings the promise of understanding human biological systems

• Proteomic sequencing brings the promise of building the protein sequences to build customized drugs

(12)

Healthcare Big Data Example Scenarios

Clinical trials: not just examining existing drugs and

efficacy, but also potential deviations

• E.g. Originally Viagra was developed to lower blood pressure and treat Angina; now it also helps with newborn pulmonary hypertension and altitude sickness

Predicting healthcare incidences issues

Social media campaigns (e.g. advertising drugs)

Pharmaceutical campaign advertising analytics

• Modeling the consumer, trying to understand their user

behavior (why are they purchasing this medication, how do they feel about their ailment, related behaviors, etc.)

Patient Satisfaction Survey

(13)

Compliance

SQL Server Compliance:

http://www.microsoft.com/sqlserver/2008/en/us/compliance.aspx Reaching Compliance Whitepaper: http://www.microsoft.com/en-us/download/details.aspx?id=6808 13 IT Control S OX PCI H IP A A G L BA ID Management Separation of Duties Encryption Key Management Auditing Control Testing Policy Management

(14)

Auditing: BIDMC Scenario

Auditing is critical component HIPAA compliance and ensuring patient privacy

 1 Billion rows+ of audit data

 146 mission critical clinical applications

 Comprehensive audits yield 300-500k transactions/day

 HIPAA requires audit system with 20 years of data

Auditing Project

 Available to community as part of Compliance SDK

 Collaboration of Caregroup, MCS, SQLCAT

Quote:

Creating an enterprise tool for consolidated storage, reporting and alerting of all application audit data - that's cool!

 John Halamka’s Cool Technology of the Week (Wellsphere Top Health Blogger, Health Impact Award)

(15)

Compliance: Auditing

Audit specific users

 Typically want to do sysadmin

 But, many scenarios require auditing of more users because those users have insert, update access

 Based on your policies

Audit specific tables

 Audit all tables that can only be modified or deemed as sensitive

Audit Objects

 Key and encryption access auditing (Audit action types: DATABASE_OBJECT_ACCESS_GROUP and

DATABASE_OBJECT_CHANGE_GROUP)

Audit everything approach

 Can grow quite quickly (i.e. lots of data) so may want to limit data

 Or have your audit reporting system filter out data you do not need

(16)

BIDMC Compliance Project

16 SSIS SSIS SSIS HDInsight Windows HDInsight Azure SQ L S erv er 2008/ 2012 Audit Logs ETL Logs to HDFS SSAS (tabular) Use Excel 2013

PowerPivot and Power View.

(17)

Centralizing Audit Logs and Reporting

Centralizing Logs

 Allows you to have one system process all audit logs from your servers

 Easier manageability

 Set files to 250MB in size (less files, but not too large to process)

 Optimized for Hadoop General Rule of Thumb: 250MB-1GB file sizes

 Can also centralize processing

 … and centralize reporting

Compliance SDK contains the full project

 Organized by Server, Database, DDL, and DML actions

(18)

Auditing: Interesting Observations

Backup a user database:

 Need CREATE permissions on the master database to look at the backup media

 The CREATE permission is a misnomer since you are not creating

 Nevertheless required to do a backup hence the RESTORE LABELONLY statements in your audit

Server Principal Name is the user name

A lot of VIEW SERVER STATE calls but is part of important

server audit specification (may want to filter this out)

Audit Logs can generate A LOT of data

• 2 medium servers generated 250GB of files in 6 hours!

(19)

Auditing Sensitive Information

19

Querying Audit Information

Use PowerPivot / Power View / Analysis Services to Query the data.

Security Information

Policy Information

Process Audit Information

Use SSIS to process SQL2008 All-Actions Audit Information and other CG application audit log data; potentially can use Management Performance DW framework.

Caregroup Environment File Server SQL Audit Connect/Logic SSIS CG Application Data Intersystems Cache SQL2005 Oracle

SQL2008 All-Actions Audit Data

SQL 2008 / 2012 R2 SSRS 2008 / Power View Policy Analysis Policy Reports Policy Best Practices Security Analysis Security Reports Compliance Reports

Feedback Action Loop Update systems to keep them

(20)

Audit Logs

20

Storage Infrastructure

Transfer files to ASV via AzCopy, CloudExplorer, etc.

(21)

Storage Infrastructure

21

Hadoop on Azure

Compute Nodes (Medium VMs) Azure Storage Vault (ASV)

Azure Blob Storage

(22)

22

SSIS

(23)

Hadoop / Auditing: File sizes

Currently testing gz vs. raw,

• E.g. 12MB raw text file vs. 633Kb gz file (~20x compression)

20x smaller size, ~same query time

• Approx same map / reduce task utilization

File Size is 250MB-1GB

• SSIS package takes care of the size

Future testing: avro, protobuf

23

Query Duration (s)

select count(*) from sql_audit_asv_raw 56.066

(24)

Hadoop / Auditing: Formats

For ease of processing, replace carriage returns within

embedded SQL statements, e.g.

select col1, col2 from tableA

to

select col1, col2 from tableA

This allows you to create a Hive table using CR as row

delimiter (i.e. does not have things like SQL quoted

identifiers)

(25)
(26)

SQOOP, HiveODBC, Templeton, CSV, etc

(27)
(28)
(29)
(30)

Global Sponsor:

Questions?

(31)

Global Sponsor:

Thank You for Attending

References

Related documents

Where the ambulance service review team determines that the certified service provider has met the criteria for certification, the findings will be made known to the

Saturday (hard day, 6-8 hours): dojo class conditioning hard stretching sparring weight training  bag work. running

/ Whether Battery Charging Systems, Welding Technology or Solar Electronics - our goal is clearly defined: to be the technology and quality leader. With around 3,000

As you may recall, last year Evanston voters approved a referendum question for electric aggregation and authorized the city to negotiate electricity supply rates for its residents

Among the various units where the PHI facilities were developed with the help of NHB soft loan schemes, the two units selected for this study were: (a) Abhinav Grape Growers

We note that the returns differential on portfolio securities is perhaps abnormally low over this period as U.S. equities outperformed non-U.S. Over longer time periods the

HyTrust Appliance records all the VMware privileged user log data needed to achieve compliance in the virtual environment. It creates an audit trail with the essential details of

Log Management or SIEM Tools Asset Inventory Security Configuration Settings Configuration Audit Tools Netflow Data NBA Tools Performance Metrics SNMP Tools File