Cours de Visualisation d'Information
InfoVis Lecture
Multivariate Data Sets
Frédéric Vernier
Maître de conférence / Lecturer Univ. Paris Sud
Data Sets
Ø
Data comes in many different forms
Ø
Typically, not in the way you want it
Ø
How is stored (in the raw)?
Ø
Heterogeneous
data often seen as
multiple dimensions of elements extracted
by patterns or needs.
Schema
Ø
Cars
Ø
brand
Ø
model
Ø
year
Ø
cost
Ø
size
Ø
weights
Ø
miles per gallon
Data Tables
Ø
Often, we take raw data and transform it
into a form that is more workable
Ø
Main idea:
Ø
Individual items are called
cases
Variable Types
Ø
N
-Nominal (equal or not equal to other values)
Ø
Example: gender, hair color
(blond, brown, black, red)
Ø
O
-Ordinal (obeys < relation, ordered set)
Ø
Example: soccer leagues, rainbow colors
Ø
Q
-Quantitative (can do math on them)
Variable Types
Ø
Three main types of variables
Ø
N
-Nominal
Ø
By Class: data belong or not to classes (.org, .com, .fr)
Ø
Partially ordered: order on classes (engineer students)
Ø
O
-Ordinal
Ø
Q
-Quantitative
Ø
Quantitative + 0 (clear 0)
Example
Baseball
statistics
Metadata
Ø
Descriptive information about the data
Ø
Might be something as simple as the type of a
variable, or could be more complex (INT)
Ø
For times when the table itself just isn’t enoughi
Ø
AtBats
≥
Hit
≥
HomeRuns
Ø
if “YearInMasterLeague”=1 then AtBats=CareerAtBat
1 M2R InfoVis Lecture. 2011. Univ. Paris Sud
How Many Variables?
Ø
Data sets of dimensions 1,2,3 are common
Ø
Number of variables per class
Ø
1 - Univariate data (e.g timeline)
Ø
2 - Bivariate data (e.g maps)
Ø
3 - Trivariate data (volume)
Ø
>3 - Hypervariate data (???)
Ø
Example:
www.nationmaster.com
Univariate
Ø
Representations
Ø
Dot plot
Ø
Bar chart (item vs. attribute)
Ø
Tukey box plot
Ø
Histogram
75 3
Bivariate
Ø
Scatterplot
Common
BUT
Trivariate
Ø
3D scatterplot, 2D plot+size
Hypervariate Data
Ø
What about data sets with
MANY
variables?
Ø
Often the interesting ones
Ø
n-D
What does 10-D
space look like?
Multiple Projections
Give each variable its own display
A B C D E
1 4 1 8 3 5
2 6 3 4 2 1
3 5 7 2 4 3
4 2 6 3 1 5
A B C D E
1
2
3
4
Help me Infovis !
Ø
smart layout
Scatterplot Matrix
All pair of variables in
their own 2-D scatterplot
Brushing (subset)
&
Linking (sync.)
label, dot plot, scale
Histogram
>
dot plot
for distribution
Scale
row &
column
Chernoff Faces
Simple
Example
[Spinelli and Zhou, 2004]
On steroids
Star Plots / Glyphs
Var 1
Var 2
Var 3
Var 4
Var 5
Value
Space out the n
variables at equal
angles around a
circle
Each
“
spoke
”
encodes
examples
circular
// coords
On prednizone ...
just
2 dims
[bertillon]
population
x
percent foreigners
Star Coordinates
E. Kandogan, “Star Coordinates: A Multi-dimensional Visualization Technique with Uniform Treatment of Dimensions”, InfoVis 2000
1 M2R InfoVis Lecture. 2011. Univ. Paris Sud
Demo - Interaction
Ø
Activate/ deactivate axis
Ø
Color selection or axis
Ø
Glyph coordinates
Ø
Scale axis
Ø
Rotate axis
Ø
Dot size
Ø
Brushing on axis
Ø
Trail
Ø
Inspector
Ø
Panning
Parallel Coordinates
V1 V2 V3 V4 V5
By A. Inselberg
Encode variables along
a horizontal row
Vertical line specifies
values
Parallel Coords Example
Basic
Grayscale
Color
From: Dean F. Jerding and John T. Stasko
VisDB
Ø
Database of data items, each of n
dimensions
Ø
Issue a query that specifies a target value
of the dimensions
Ø
Often get back no exact matches
Ø
Want to find near matches
Ø
Relevance factor
Taken from:
Technique
Ø
Calculate relevance of all data points
Ø
Sort items based on relevance
Ø
Use spiral technique to order the values
Ø
Color items based on relevance
Display Methodology
Total relevanceDim 1
Dim 2
Spiral
in each
window
Items ordered by total relevance
Same item
appears in
same place
in each
window
Highest relevance
value in center,
decreasing values
grow outward
Alternative
Ø
Grouping arrangement => single window
Ø
Create all relevance dimensional
depictions for an item and group them
Ø
Spiral out the
different data
items
Example
Multi-window
Grouping
8 dimensions
1000 items
Overview
More techniques ?
Ø
Combinations
Ø
More integrated software
Highlighted Dynamic Table
Viewer
Nada Golmie &
Bill Kules
Eureka / TableLens
Rao &
Card 94
EZChooser:
Comparisons
Ø
ParCood: <1000 items, <20 attrs
Ø
Relate between adjacent attr pairs
Ø
StarCoord: <1,000,000 items, <20 attrs
Ø
Interaction intensive
Ø
TableLens: similar to par-coords
Ø
more items with aggregation
Ø
Relate 1:m attrs (sorting), short learn time
Ø
Visdb: 100,000 items with 10 attrs
MultiVariate Visu Tools
Paper presentations
Ø
Hajar Falih
Ø
Multi-Dimensional Detective
Ø
Thibaut Jacob
Ø
Rolling the Dice: Multidimensional Visual
Exploration using Scatterplot Matrix Navigation
06/12/201190 min Lecture: Multi-dimensional Data Visualization Δ