Data Visualization
Scientific Principles, Design Choices and
Implementation in LabKey
Catherine Richards, PhD, MPH Staff Scientist, HICOR
Cory Nathe
Software Engineer, LabKey [email protected]
Outline
o
Scientific Principles and Design Choices
o
Implementation in LabKey
Scientific Principles and Design Choices
o
Why use data visualizations
o
Choosing the best chart type and visual attributes
o
Incorporating design best practices
Why use data visualizations?
o
Leverage visual system to absorb large amounts of information very
quickly
• Identify patterns or outliers
o
Inspire new questions
Data Viz show patterns tables do not
o Average X = 9
o Average Y = 7.5
o Y=3+0.5X --> same linear model
Scientific Principles and Design Choices
o
Why use data visualizations
o
Choosing the best chart type and visual attributes
Chart Types
Visual Attributes
o
Data encoding: mapping data to visual attributes
o
Process
• Choose data dimensions to graph • Classify data types
Data Dimensions
o
Unique information
Data Dimensions
o
Most common
• Visualizations with 3 or 4 data dimensions
o
Rare
• Visualizations with 6,7 or more
Data Types
o
Nominal
o
Ordinal
o
Quantitative
• Interval • RatioData Types
o
Nominal (labels)
• Fruits: apples, oranges, pears
o
Ordinal
• Restaurant inspection grades: A, B, C
o
Quantitative
• Interval (location of zero arbitrary) • Dates
• Location
• Ratio (zero fixed)
• Physical measurement: weight, height
Operations Permitted with Data Types
o
Nominal (labels)
• Operations: =, ≠
o
Ordinal
• Operations: =, ≠, <,>, ≤, ≥
o
Interval (location of zero arbitrary)
• Operations: =, ≠, <, >, ≤, ≥, -(subtraction) • Can measure distances or spans
o
Ratio (zero fixed)
• Operations :=, ≠, <, >, ≤, ≥, -, /(division), *(multiplication) • Can measure ratios or proportions
Visual Attributes
Science of Data Viz
o
Psychophysics
• Branch of psychology that deals with relationship between physical stimuli and sensory response
Ranking of Elementary Perceptual Tasks
Length-Position Experiment
Length-Position Experiment
Cleveland & McGill. JASA. 1984. 79 (387): 531-554 Most accurate
Ranking of Elementary Perceptual Tasks
Chart Types
Chart Types
Chart Types
Chart Types
Chart Types
Chart Types
Scientific Principles and Design Choices
o
Why use data visualizations
o
Choosing the best chart type and visual attributes
Incorporating Design Best Practices
o
Graphic design
• Color theory • Typography
o
Tufte’s Rules
Tufte’s Rules
1.Reduce chart-junk and increase data-to-ink ratio
2.Maximize contrast
3.Use readable labels
4.
Don’t repeat yourself
5.Instead of legends label data series (points) directly
6.Avoid smoothing and 3D
7.Sort for comprehension
Tufte’s Rules
Tufte’s Rules
Tufte’s Rules
Tufte’s Rules
Tufte’s Rules
Tufte’s Rules
Tufte’s Rules
Tufte’s Rules
Outline
o
Scientific Principles and Design Choices
o
Implementation in LabKey
LabKey Built-in Reports
o
For non-developers
• Plotting tools built in to LabKey Data Regions
• Rendered using LabKey Visualization API (built on D3js library) • Example: box plot, scatter plot, time chart
o
For developers
• JavaScript Views
• R Reports (Rserver/Knitr)
• Advanced View (invoke command line program) • Module Reports (using LABKEY.Report.execute)
o
Shown in Data Views Browser
• Customize grouping, label, thumbnail, etc. • Control visibility (private vs. shared)
LabKey Data API Access
o
Access data from study dataset, external schema, list, etc.
o
LabKey Client APIs
• Examples: JavaScript, Java, Perl, Python, Rlabkey, SAS Macros, HTTP Interface
• Secure, auditable, programmatic access to data and services • Exporting data grid as a Script
LabKey JavaScript Visualization API
o
Shapes / Geoms:
• Point / Bin • Path • ErrorBar • BoxPlot / BarPloto
Interactions:
• Callback function for point click • Callback function for mouse
over/out • Brushing (1D, 2D) o
Plot Helpers
• PieChart • LeveyJenningsPlot • SurvivalCurvePlotLabKey Visualization - Live Demo
JavaScript based charts from LabKey Demo Study
• Data Region > Charts/Views menu • Generic Chart (box/scatter plot) • Time Chart
• JavaScript View • Reports Webpart
Examples (1 of 3)
Panorama - Levey-Jennings report, Pareto plot
Examples (2 of 3)
Dataspace - scatter with gutter plotsExamples (3 of 3)
HIDRA Argos - pie chart, survival curve, bar plot, timeline report
Argos, an application developed in partnership with Fred Hutch. The Timeline report was created by the Oncoscape Core team and is maintained by Lisa McFerrin. Oncoscapeis supported by Fred Hutch and STTR.
Outline
o
Scientific Principles and Design Choices
o
Implementation in LabKey
HICOR IQ - Overview
o
Regional Oncology Informatics Platform
o
GOAL: to provide patients, payers, providers and health systems with
transparent information to support decision-making in cancer care
HICOR IQ - Overview
o
The initial launch includes a limited initial set of reports based on
ASCO 2012 Choosing Wisely Recommendations
o
The initial functionality allows users to select metrics of interest,
configure plots based on regional or clinic views, and generate
reports categorized by sub-groups
HICOR IQ - Live Demo
o
Data Views direct link to different metrics
o
Configure report (apply filters, switch chart type)
o
Bar plot, Scatter plot, Time plot
o
Population size, filters, exclusions
HICOR IQ - Implementation
o
Collaboration between HICOR and LabKey
• Iterative layout and user experience design • D3 code creation for plot rendering
o
Custom Java module
• New database schema and tables
• Use of OLAP cube for accessing measures and dimensions • Plots generated with dimple JavaScript D3 library
o
Additional data security
• Data can not be directly accessed from schema browser
HICOR IQ - Code Example
renderPlot: function () {...
//initialize the svg
svg = dimple.newSvg("#" + this.renderId, fullWidth, fullHeight);
//create the chart component and set margins chart = new dimple.chart(svg, data);
chart.setBounds(margin.l, margin.t, plotWidth, plotHeight);
//configure the x-axis
x = chart.addCategoryAxis("x", "Group"); x.floatingBarWidth = 20;
//configure the y-axis
y = chart.addMeasureAxis("y", "Value"); y.showGridlines = false;
y.ticks=4;
y.overrideMax=1.0; y.tickFormat = "%";
//add a bar series to the plot
s = chart.addSeries(null, dimple.plot.bar);
//sorting the x-axis variable x.addOrderRule("Group");
//render the chart as an svg and remove the dimple title
chart.draw();
x.titleShape.remove();
//use D3 to update some content and add titles
this.renderTitle(svg, fullWidth, 0);
this.styleAxis(svg, x, y, margin);
//define the content of the bar hover tooltip
this.overrideTooltipText(s, data, function(row) {
return [
"Group: " + row.Group, "Utilization: " +row.Value
]; });
HICOR IQ - Code Example
renderPlot: function () { ... //initialize the svg svg = dimple.newSvg("#" + this.renderId, fullWidth, fullHeight);//create the chart component and set margins
chart = new dimple.chart(svg, data);
chart.setBounds(margin.l, margin.t, plotWidth, plotHeight);
//configure the x-axis
x = chart.addCategoryAxis("x", "Group"); x.floatingBarWidth = 20;
//configure the y-axis
y = chart.addMeasureAxis("y", "Value"); y.showGridlines = false;
y.ticks=4;
y.overrideMax=1.0; y.tickFormat = "%";
//add a bar series to the plot
s = chart.addSeries(null, dimple.plot.bar);
//sorting the x-axis variable
x.addOrderRule("Group");
//render the chart as an svg and remove the dimple title
chart.draw();
x.titleShape.remove();
//use D3 to update some content and add titles
this.renderTitle(svg, fullWidth, 0);
this.styleAxis(svg, x, y, margin);
//define the content of the bar hover tooltip
this.overrideTooltipText(s, data, function(row) {
return [
"Group: " + row.Group,
"Utilization: " +row.Value
]; }); var y = d3.scale.linear() .range([height, 0]); y.domain([0, 1.00]); d3.svg.axis() .scale(y) .orient("left") .tickValues([0, .25, .5, .75, 1])
.tickFormat(function(d) { return d * 100 + "%"; }); ...
svg.append("g")
.attr("class", "y axis") .call(axis)
.style("font-weight","bold") .style("font-family", "Arial") .append("text")
.attr("class", "ylabel") .attr("y", -20)
.attr("x", -40) .attr("dy", ".71em") .text(label);
HICOR IQ - Future
o
Allow new metric definition and data loading
o
Split module for security (server) vs. plotting (client)
o
Identification of “My Clinic” for comparison in scatter plot
o
Include static reports
o
Clinic / Payor dashboard report
o
Better organization of sub-metrics
Thank You
Any questions?
Catherine Richards, PhD, MPH Staff Scientist, HICOR
(soon to be Director, Scientific and User Engagement at Aetion)
Cory Nathe
Software Engineer, LabKey [email protected]