A Multitier Fraud Analytics and Detection Approach
Jay Schindler, PhD MPH
Conflict of Interest Disclosure
Jay Schindler, PhD MPH
• Salary
• Stock Ownership
Learning Objectives
• Describe the major components of a fraud analytics
workstation
• Identify the major components of the CRISP-DM
model as used within a fraud analytics framework
• List 2 different data visualization approaches or
methods for high-dimensional data with 4 or 5
variables
Understanding “big data”
BIG
DATA
Velocity
Variety
Complexity
Volume
Explosion of Data in Healthcare
•
“Frost & Sullivan estimates that picture
archiving and communication system
(PACS) storage requirements in U.S.
hospitals grew at a rate of more than 20
percent per year for the past five years and
reached 27,000 Terabytes in 2011. As a
result of the increased use of data in the
provision of care, data storage and access
solutions are becoming more strategic
decisions and pressing issues for hospital
administrators to address.”
http://www.paymentaccuracy.gov/
2/22/2013 6
Analytics-Supported Decision Making
Management
- Planning - Administration - Regulation - LegislationEconomic Support
- Private Insurance - Social Security - GovernmentalResource
Production
- Workforce - Facilities - Commodities - KnowledgeService Delivery
- Prevention - Primary, specialty - Secondary, Tertiary, Long-TermOrganization of
Programs
- Public agencies - Private market - Voluntary agencies- Enterprises Self-Ins Private Ins.
Registries Intervention s Surveillance Medicaid Medicare Factors Risk Factors ED
Emerg Srv. HealthMental Health Long-Term Inpatient Outpatient Health Home Health CHIP Resources Needs Community Needs
Disparate Data
Health System*
Cost of care outliers
ACO performance Probable fraudulent claims
Market-wide expenditures Regional health outcomes Resource allocation change impacts
County health service utilization
Projected Medicaid costs Practice performance
Integrated Insights
An integrated health analytics platform…
•
Provides decision-makers with a
platform to visualize and analyze
population health characteristics
– Characterize costs of care
– Analyze conditions and risk
– Identify improvement opportunities
– Estimate future costs
•
Allows flexible and dynamic
reporting capability
•
Integrates disparate databases
via data virtualization
•
Serves as a foundational
capability for health and human
services
A Layered Framework for Health Analytics
Web Service Data Virtualization Data Sources Data Governance Data Standards Health Analytics Systems Analyst Clinical Informatics Public Health Surveillance Data Cleansing Encryption/Decryption Data Warehouse Ontology Geospatial Statistical Predictive Modeling Service cost comparisons, outliers Estimations ofA Sample Scenario Architecture
Virtual Data Layer Services
MSIS PF BRFSS HCUP Sources Analytic/Presentation Services Web Services Data Visualization Cloud Integration/Delivery Database / Data Management Discovery Data Mining / BI Tools
Mem Temp Persisted WS
Development Process via CRISP-DM
Business
Understanding
Data
Understanding
Platform &
Data Prep
Exploration
Evaluation
Production
Integrated Health Analytics Lifecycle*
*Adapted from: Cross Industry Standard Process for Data Mining (CRISP-DM), Visual Guide by Nichole Leaper
• Determine business objectives • Identify desired insights • Assess environments • Form project plan
• Review data sources • Verify data quality
and completeness • Form analytics plan • Identify needed reference arch elements • Construct tailored platform • Access data sources • Preprocess data • Format and integrate data • Apply analytics techniques • Generate initial insights • Describe findings • Evaluate results • Assess alignment with business objectives • Plan for ongoing
access
• Determine next steps
• Add new analytic views
• Sustain platform • Monitor and
maintain data source access
Anomaly Detection:
Payment per Medicare beneficiary by hospital type of service code Identify services and individual cases with extreme valuesCluster Analysis:
Clusters of high average costs vs. low average costs in Medicare patients Investigation of patient groups &procedures
Predictive Modeling:
Predicting number of child Medicaid User interface for FAW – used to demonstrate different fraud
scenarios
Outliers Detection tab connects to SAS
product for identifying anomalies
Using CMS PUF of over 9.7 million rows of claims data sample from 2008.
Subset of claims by ICD-9 coding for diabetics.
Identifies the high cost outliers for different type of service codes Several kinds of charts
can be output for user.
Key Point
High cost outliers for specific types of service codes are
This user interface tab shows a flash file of a bubble chart that displays the percent of Medicaid eligibles and percent of
Dynamic Cost Projections from Existing Data
• Use Case: Enabling dynamic “what if” scenarios to project future Medicaid costs
• Context: LA Medicaid Director adjusts various population parameters to project annual cost with
the new population
Estimated Enrollment Dynamic Cost Projection
• Results:
– Ability to estimate future costs based on historical data and growing understanding of future population
– Rapidly gain insights to main factors contributing to
Medicaid cost expenditures – Explore correlations among