COMP 388/441:
Human-Computer Interaction
Data Visualization:
methods, guidelines,
and software
April 10, 2013
Today's Topics
●
Overview of visualization techniques
● 1D charts, 2D plots, 3D+ techniques, maps
●
A few guidelines for scientific visualization
●
Survey of visualization tools
●
Note: What this lecture is NOT
● fully comprehensive, by any means
● strongly advocating any one tool for all uses ● a survey of creative uses of these tools
1D techniques
Pie Charts Bar Chartssimple 2D plotting
Line plots Scatter plotsAncient plotting techniques
●
The stem-and-leaf plot?
Better: histograms
Plots frequency of 1D data in bins equivalent in 2D: contour maps of scatter plot densities
Scatter-plot
matrix
Useful for viewing multiple relationships simultaneously
Q-Q plots
● comparing data in across distributions ● used to quickly determine if a scaling relation exists between two distributions ● linear = scaling relation exists ● most common: normal Q-Q plots implicitly test normalityGlobal Maps
● 3D surface on 2D leads to distortions ● Mercator: scaleincreases near poles
● Gall-Peters: distorts
shape horizontally for equal areas
● Mollweide: warps less
dramatically at poles
● Goode's: is equal area,
but sacrifices distances
● Robinson: a
compromise, neither equal area nor conformal
● However, for smaller maps
this is not an issue
Cloropleth maps
Useful in indicating one dimension of information
as an overlay on a map
Flow map
Graduated symbol maps
Capable of showing multiple dimensions of data graphically - here overall population AND % hispanic for each state
Cartogram of 2012 election
Which map doesn't help you see who won? Warping areas to represent data
cartogram:
counties cartogram: states
Treemap - US Budget
Like cartogram, but when location doesn't matter
Visualizations summary
●1D
● Pie charts, bar charts
●
2D
● line plots, scatter plots, histograms
●
2D+
● Scatter-plot matrix, contour maps
●
Maps
● global projections, cloropleth, graduated symbols
●
Using area as a dimension
● Cartogram, Treemap
Scientific Visualization Guidelines
●
Primary goals
● quick to understand - use simple, standard forms
● highlight the important aspect of the data
● avoid misrepresentation/biased interpretation
● IMPORTANT: For output formats that scale/print well...
● use vector graphics: SVG, EPS, PDF
– software: Inkscape (free), Illustrator (expensive)
● instead of rastor graphics: gif, jpg, png, tiff...
– software: Gimp (free), Photoshop
The following tips are a sample from: Kelleher, C., Wagener, T., Ten guidelines for effective data visualization in scientific publications, Environmental Modelling & Software (2011), doi:10.1016/j.envsoft.2010.12.006
Keep It Simple... (KISS)
Create the simplest graph that conveys the
information you want to convey
Select meaningful axis ranges
Axis ranges across plots
Keep axis ranges similar to compare across plots
Using lines
Use lines only to connect sequential data
Appears to not change in interval
Implies data is not known
a very brief survey of
Tools you can use for
data visualization
Tools for data visualization
●
Local data vs. stored on database
● desktop, server vs. client rendered
●
Browser compatibility
● static images vs javascript and SVG
●
Expertise necessary
● Novice: spreadsheets
● Intermediate: manipulating scripts, graphical
selections
● Advanced: Python/pylab, Weka, R, matlab
EXCEL, OpenOffice Spreadsheet, ...
● Good for... ● novice or one-time users ● creating static images ● Unacceptable for... ● automation ● interaction graphics
Charts
● available in spreadsheets ● online: can integrate with web forms ● resulting charts are interactive ● app engine allows advanced programming interactionFlot
● Uses jQuery - small, lightweight javascript library ● Relies on canvas - works across many browsers ● Can only plot line and bar charts, but can be
interactive through callbacks
Raphaël
● JavaScript library that produces SVG and VML output. ● Graphics are crisp, but may load slowly.
● Many options makes the learning curve a bit steeper
D3
●
D3 (Data-Driven Documents) is a JavaScript
library for interactive SVG rendering
●
Similar concerns to Raphaël. Advanced
graphics are possible, but require more effort
Mapping frameworks
● Leaflet● a lightweight mapping
framework, to work comfortably even on mobile devices
● Polymaps, Openlayers ● feature-rich
● allows CSS-like customization ● aimed at data visualization ● Kartograph
● a powerful javascript or
python library for generating SVG-rendered maps
● CartoDB
● quick data tables --> maps
Processing
●
A popular cross-platform Java-like programming
language for creating visualizations
●
Desktop application for interactive visualizations
● Also there is Processing.js ports for embedding in
Pro-tools
●
for automating analyses using high-level
statistical packages as needed
●
Commercial data analysis packages available
● MATLAB (and the free alternative, Octave) ● SPSS
● SAS
●
Problem: expensive, and locked-in
● but there are free alternatives...
R
● A free software environment for statistical computing ● The tool of choice for statisticians
Weka
●
A cross-platform collection of machine learning
algorithms for data mining tasks.
● tools for pre-processing, classification, regression,
clustering, association rules, and visualization.
●
Can be used directly, or called through Java
Python
●
Python with associated modules
● numpy/scipy, matplotlib, many others ● Available as combined packages
– Sage, Enthought, Python(x,y)
● RPy - to work with R and Python simultaneously
Today's Summary
●
Overview of visualization techniques
● 1D charts, 2D plots, 3D+ techniques, maps
●
Some guidelines for scientific visualization
● keep it simple, use appropriate data ranges...
●
A brief survey of visualization tools
● novice: Microsoft, OpenOffice, or Google Spreadsheets
● interactive: Flot, Raphael, D3, Processing ● pro-tools: R, Weka, Python