Visualising Variables – Validly!
Damien Jolley
Monash Institute of Health Services Research Monash University
AHMRC Posters
May 2006
Download slides from:
http://www.jolley.com.au
Average daily retail petrol price,
Melbourne, 15 April-14 May 2006
We
d Mo Wed Wed
n Mo
n Mo
n Mo
n T
h
Source: http://www.accc.gov.au, 29 May 2005
Download slides from:
http://www.jolley.com.au
Average daily retail petrol price,
Melbourne, Oct-Nov 2002
T
h T
h T
u
Sat Sat Fri T Sat
h
Source: http://www.accc.gov.au, 21 Nov 2002
Download slides from:
http://www.jolley.com.au
Price & pattern has changed…
2004 2005 2006
130
90 100 110 120
C en ts p er li tr e, M el bo u rn e
90 100 110
90 100 110
01/05 08/05 15/05 22/05
01/05 08/05 15/05 22/05 01/05 08/05 15/05 22/05
Melbourne Sydney Adelaide
Brisbane Perth
pr ic e
date
Graphs by city
Daily average petrol prices (c/litre) in selected Australian
cities, May 2005
Source: http://www.accc.gov.au
Download slides from:
http://www.jolley.com.au
Obvious fact #1:
Graphs can communicate data:
quickly
accurately
powerfully
efficiently
Download slides from:
http://www.jolley.com.au
“Only 50% of American 17- year-olds can identify
information in a graph”*
Source: Wainer H.
Understanding graphs and tables.
Educational Researcher 1992;
21:14-23
* US National Assessme
nt of Education
al
Progress,
June 1990
Download slides from:
http://www.jolley.com.au
Whose fault?
Source: Wainer H.
Understanding graphs and tables.
Educational Researcher 1992;
21:14-23
“Like characterising someone’s ability to read by asking questions about a passage full of spelling and
grammatical errors. What are we really testing?”
Drawn using MS Excel ‘XY-chart’
http://www.jolley.com.au
Obvious fact #2:
Bad graphs can hinder communication
http://www.jolley.com.au
Less obvious facts #3, #4,
#5:
What characterises a “good” graph?
What are the characteristics of a
“bad” graph?
What software to use? How to use it?
Download slides from:
http://www.jolley.com.au
Howie’s Helpful Hints
for bad graph displays
Ten useful pointers to help you create uninformative, difficult-to-read scientific graphs
Adapted from:
Wainer H. (1997) Visual Revelations.
Mahwah, NJ: Lawrence
Erlbaum Associates,
Publishers
http://www.jolley.com.au
Steps for better graphs
1. Identify direction of effect
In almost all cases, the cause or predictor variable should be horizontal (X)
Effect or outcome variable is best vertical (Y)
2. Identify the levels of measurement
Nominal, ordinal or quantitative are different!
3. Think of visual perception guides
Columns or dots? Lines or scatterplot?
4. Minimise guides and non-data
Grid lines, tick marks, legends are non-data
Download slides from:
http://www.jolley.com.au
Cause (X) and effect (Y)
Figure 16
Standard deviation of batting averages for all full-time players by year for the first 100 years of professional baseball. Note the regular decline.*
Standard deviation
Time
Source:
Gould, Stephen Jay. Full House: The Spread of Excellence from Plato to Darwin. Random House, 1997.
cited: http://www.math.yorku.ca /SCS/Gallery/, 24 Nov 2002
* My emphasis
Standard deviation
Time
Killias M.
International correlations between gun ownership and rates of homicide and suicide.
Can Med Assoc J 1993;
148: 1721-5
% of households owning guns
Rate of homicide with a gun (per million per year)
10 20 30 40
1 5 10
50 USA
Norway Canada France Finland Belgium
Australia
Spain Switzerland
Netherlands
West Germany
Scotland England & Wales
Drawn using S-plus
http://www.jolley.com.au
Levels of Measurement
The right display for a variable depends on its level of measurement
For univariate graphs,
qualitative barplot
ordinal column chart
quantitative boxplot or histogram
For bivariate graphs,
X ordinal, Y binary connected percents
X & Y both quantitative scatterplot
X categorical, Y quant box plots
Binary
eg gender, death, pregnant
Categorical
Qualitative
eg race, political party, religion
Diverging
eg change (-ve to +ve)
Ordinal
eg rating scale, skin type, colour
Quantitative
Interval
only differences matter, eg BP, IQ
Ratio
absolute zero, ratios matter,
eg weight, height, volume
Source:
Lewis S, Mason C, Srna J. Carbon monoxide exposure in blast furnace workers.
Aust J Public Health. 1992 Sep;16(3):262-8.
Ordinal variable,
but categorie
s mixed Outcome
is COHb%, but drawn
on X
http://www.jolley.com.au
An alternative display . . .
Area of circles proportional Predictor variable to n
O u tc om e va ri ab le
Drawn using MS Excel
‘bubble plot’
Download slides from:
http://www.jolley.com.au
Principles of visual perception
WS Cleveland
much work in psycho- physics of human visual understanding
Tells us:
hierarchy of visual
quantitative perception
patterns and shade can cause vibration
graphs can shrink with almost no loss of
information
Source: Cleveland WS. The Elements of Graphing Data. Monterey: Wadsworth, 1985.
http://www.jolley.com.au
Ubiquitous column charts
Source: Jamrozik K, SpencerCA, et al. Does the Mediterranean paradox extend to abdominal aortic aneurism? Int J Epidemiol 2001; 30(5): 1071
Download slides from:
http://www.jolley.com.au
A dotchart version…
Mediterranean Netherlands
All other Other N Europe
Australia Scotland
Full fat milk
50 60 70 80
Adds salt
50 60 70 80
Meat 3+ weekly
50 60 70 80
Fish 1+ weekly
50 60 70 80
Percent
Drawn using S-plus
“Trellis” graphics
Moiré vibration is easy with
a computer !!!
http://www.jolley.com.au
Moiré vibration
Vibration is maximised with lines of equal separation
This is common in scientific column charts
cited in Tufte E. The Visual Display of Quantitative Information.
Download slides from:
http://www.jolley.com.au
Minimise non-data ink
Non-data ink includes tick marks, grid lines, background, legend
Explanation of error bars, P-values can be included in caption or in text
Greeks in Australia Swedes in Sweden Japanese in Japan Anglo-Celts in Australia
Greeks in Greece
0.10 0.25 0.50 0.75 1.00
Relative mortality rate (all causes)
Note the exception for X-Y orientation:
because predictor is qualitative
(unordered)
http://www.jolley.com.au
Software for scientific graphics
Dedicated programs – thousands!
DeltaGraph (SPSS)
Prism
ViSta
Business graphics
MS Excel
many other spreadsheet programs
Graphics in statistical packages
Stata
simple, powerful
S-Plus, R
powerful, difficult
SPSS interactive graphics
easy, expensive
Systat
good reputation
SAS GRAPH language
expensive, powerful
Advice: Avoid “default” choice in all programs (almost always wrong).
Avoid programs with “Chart Type” menus – wrong approach.
Download slides from:
http://www.jolley.com.au
Graph formats
Object-oriented
lines, shapes, etc can be identified within graph
each object has attributes (eg size, colour, font)
editable using selection and
“grouping”
Common formats:
Postscript (ps,eps)
Windows metafile (wmf,emf)
Bit-mapped
image exists as a collection of pixels
each pixel is light or dark, coloured
can edit only pixels not objects
often “compressed” to save disk space, bandwidth
Common formats
graphics interchange (gif)
Windows bitmap (bmp)
JPEG interchange (jpg)
Advice: Use WMF format where possible. Paste WMF into
PowerPoint, “ungroup”, then edit objects for publication quality.
http://www.jolley.com.au
References, further reading
Tufte ER.
The Visual Display of Quantitative Information Cheshire, CT:
Graphics Press 2001
www.edwardtuft e.com
Cleveland WS.
Visualizing Data
Summit NJ:
Hobart Press, 1993
Wainer H.
Visual
Revelations.
Graphical Tales of Fate and Deception from Napoleon
Bonaparte to Ross Perot Mahwah, NJ:
Lawrence Erlbaum Associates, Publishers. 1997 www.erlbaum.co m
Wilkinson L.
The Grammar of Graphics
New York:
Springer Verlag,
1999
Download slides from:
http://www.jolley.com.au
Summary
Howie’s Helpful Hints for bad graphs:
Don’t show the data
Show the data inaccurately
Obfuscate the data
Steps for better graphs:
Identify direction of cause & effect
Exploit levels of measurement
Accommodate visual perception principles
Minimise non-data ink
Don’t use Excel unless you have to