Achilles — a platform for exploring
and visualizing clinical data summary
statistics
Mark Velez, MA
Ning "Sunny" Shang, PhD
Department of Biomedical Informatics,
Columbia University
NIH BD2K bioCADDIE webinar, August 13th, 2015
Biomedical Informatics
Outline
• OHDSI
• ACHILLES demo
What is OHDSI
• The Observational Health Data Sciences and
Informatics (OHDSI) program is a
multi-stakeholder, interdisciplinary collaborative
– To bring out the value of observational health data through large-scale analytics and evidence
generation
• Clinical characterization
• Population-level estimation • Patient-level prediction
What is OHDSI
• Single observational data source is unlikely to
be sufficient for research analysis needs
– Analyze multiple data sources concurrently
• Using a common data model and the
foundational infrastructure to enable
observational research
– By 2014, 58 databases in CDM – > 250 million patients covered
What is OHDSI
• Mission
– To transform medical decision making by creating reliable scientific evidence about disease natural history, healthcare delivery, and the effects of
medical interventions through large-scale analysis of observational health database for population-level estimation and patient-population-level predictions
Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)
statistical analysis Analytic tools
Data Source 1 Data Source 2 Data Source 3
Data transform in CDM
• Extracting, transforming, and loading (ETL)
process
– WhiteRabbit: analyzes the structure and content of a database
– RabbitInAHat: connects and maps tables and
columns from the raw dataset to the CDM dataset – ETL-CDMBuilder: transform raw data to CDM
ACHILLES
(Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems)
• An open source analytics framework
• Interactively explore population-level
summary statistics for the data stored in CDM
– Profile your CDM data
– Explore population-level summaries – Review data quality assessment
ACHILLES implementation
• ACHILLES R package
• Oracle / SQL Server / Postgres / Redshift
• Summary statistics export into Json to prepare
data for visualization
• Visualization by AchillesWeb (HTML5 /
JavaScript)
create strata tables Data quality queries (Heel) Export to JSON Visualization (AchillesWeb)ACHILLES Summary Statistics 1
• Summary of data set / clinical database
– Size of the database
ACHILLES Summary Statistics 2
• Person demographic information and
demographic information over death
ACHILLES Summary Statistics 3
• Metadata (e.g. observation periods, data
density)
– Observation periods document time intervals during which health care information captured
– Data density describes the unit quantity of records and concepts pertains in each database
ACHILLES Summary Statistics 3
• Prevalence of condition/condition era/
observation/drug exposure/drug
era/procedure/visit
– Treemap view – Table view
ACHILLES Summary Statistics 4
• Achilles HEEL
ACHILLES Heel Error Types
Error Type Example
Clinical facts
Illogical change Monthly change of count of condition is more than 100% Invalid ids Person has invalid provider_id
Improper value based on norm
Year of birth is less than 1800 Negative payment
Improper value based on inter-relationship
A condition is recorded after the patient is dead Terminology
Not standard vocabulary a concept is not a standard OMOP vocabulary concept Non-mapped concept Data with unmapped concepts
Applications of ACHILLES
• Explore summary statistics about the clinical
data
– Public domain (de-identified information)
• Integrate with clinical systems
• Achilles integrating other OHDSI tools
• Framework for other applications
ACHILLES collaborating with other
OHDSI tools
ACHILLES Database profiling CIRCE Cohort definition HERACLES Cohort characterizationACHILLES Framework for other
applications—bioCADDIE DDI
Suitability
• General definition
– the quality or state of being especially suitable or fitting [Merriam-Webster]
• In our project
– The extent to which a clinical dataset to meet the research needs for observational studies
– Data suitability is how suitable the data are for a specific research purpose
Research methods
Suitability conceptual framework Web-based survey Metrics with Columbia EHR Hybrid Approach Implementation by Customizing ACHILLES EHR characteristics lit review Measures Categories Observa tional study-derived sub-measur e Desider ata study-derived sub-measur eSuitability of Clinical Database for Observational Study Can I access? Policy and Administration • Data policy documentat ion • Administrati ve platform • Technical accessibility Relevance • Healthcare organization description • Data organization documentation • Research data inventory • Available and retrievable temporal information Quality • Data quality control • Database data quality • Research sample data quality Usability • Data representatio n • Usefulness • Cohort availability • Database linkability Descriptive metadata and provenance documentation • Data provenance • Database content synopsis User --Researcher What’s inside? (content) Are data usable? Intrinsic
Suitability Survey
Important websites
• OHDSI
– http://www.ohdsi.org/
– Main GitHub Page: https://github.com/OHDSI/
– Forum: http://forums.ohdsi.org/
• ACHILLES
–
http://www.ohdsi.org/analytic-tools/achilles-for-data-characterization/
– R Package for Generating Statistics for ACHILLES:
https://github.com/OHDSI/Achilles
– Web Application for Viewing ACHILLES Results:
https://github.com/OHDSI/AchillesWeb
– Demo
http://www.ohdsi.org/web/achilles/index.html#/OHDSI%20Sam ple%20Database/dashboard