Chapter 3 - Multidimensional Information Visualization II

(1)

Chapter 3 - Multidimensional

Information Visualization II

Concepts for visualizing univariate to

hypervariate data

Vorlesung „Informationsvisualisierung”

Prof. Dr. Florian Alt, WS 2013/14

Konzept und Folien (4th revised edition):

(2)

Outline

• Reference model and data terminology

• Visualizing data with < 4 variables

• Visualizing multivariable data

– Geometric transformation

– Glyphs

– Pixel-based

– Downscaling of dimensions

• Case studies: support for exploring multidimensional data

– Rank-by-feature

– Dust & magnet

(3)

(4)

Rank-By-Feature

• Seo & Shneiderman 2004

• Part of the Hierarchical Clustering Explorer (HCE)

(http://www.cs.umd.edu/hcil/multi-cluster/)

• Tabs: histogram and scatterplot ordering

• Implements systematic approach for data exploration

– (1) study 1D, study 2D, then find features

– (2) ranking guides insight, statistics confirm

• T

ool provides low-dimensional projections as a histogram (1D) or

scatterplot (2D)

• Users can select a feature detection criterion (e.g. test for normal

distribution (1D), correlation coe

ﬃ

cient (2D)) to rank projections

• The ranking facility is particularly helpful when the number of

possible projections is too large to investigate: concentrate on

the interesting ones

(5)

Rank-By-Feature

• Users start with 1D projections (histogram ordering)

• Four coordinated views

– A: selection of ranking criterion

– B: overview of scores for all dimensions (color coding: the brighter the

color, the higher the score)

– C: numerical / statistical detail for each dimension (e.g. score, mean,

standard deviation)

– D: display of histogram + boxplot (minimum, first quartile, median,

third quartile, maximum)

(6)

Rank-By-Feature

• Some basic statistical terms

– Mean: Sum of all values divided by the number of values

– Median: Middle value of a distribution of values when ranked in order of

magnitude

– Mode: Single most common value

– Variance: average squared deviation between the mean and the values

– Standard deviation: square root of the variance (translates the variance

into the original units of measurement)

• Statistical tests supported by HCE for 1D ranking

– Normality of the distribution: distribution of items forms a symmetric,

bell-shaped curve

– Uniformity of the distribution: all of the values of a random variable occur

with equal probability (results in a flat histogram)

– Number of potential outliers

– Number of unique values

(7)

Mode

• The most frequent score

• Describes how most items behave

• Pros:

– Easy to calculate and understand

– Can be used with nominal data

• Cons:

– There can be more than one mode

– Mode can change dramatically by adding only one dataset

– Independent of all other data in the set

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 Fr eq u en cy Score

Negatively Skewed Distribution

(8)

Median

• Middle score of the distribution

Example data:

1 7 3 9 6 9 2

• Sorted by magnitude:

9 9 7 6 3 2 1 median = 6

If #scores even average two middle scores

Example data:

1 7 3 9 4 6 9 2

• Sorted by magnitude:

9 9 7 6 4 3 2 1 median = 5

• Pros:

– Relatively una

ﬀ

ected by outliers (very low or high scores) and

skewed distributions

– Can be used with ordinal, interval and ratio data

• Cons:

– Does not consider all scores of the data set

– Not very stable

(9)

Mean

• Sum of all scores divided by #scores:

• Most often used if ‘average’ is mentioned

• Pros:

– Considers every score

– most accurate summary of the data

– Resistant to sampling variation: removing one sample changes

the mean far less than mode or median

• Cons:

– Heavily a

ﬀ

ected by extreme scores and skewed distributions

– Can only be used with interval and ratio data

∑

=

n = i i

X

n

m

1

(10)

Standard Deviation and Variance

• How do you measure the accuracy of the mean?

• Example data set 1:

5 5 5 5 5

mean = 5

• Example data set 2:

6 8 4 1 6

mean = 5

• Which of the data sets is better reflected by the mean?

• If x

1 , x

2 , … x

n

are the data in a sample with mean m

– Deviation = di

ﬀ

erence between mean and scores =

∑

(x

i

- m)

– Variance s

=

– Standard deviation (SD) s =

√

Var(X)

• Both variance and standard deviations

measure the

– accuracy of the data set

– variability of the data

∑

(x

i

– m) )

n

2

(11)

Quantile, Quartile, and Percentile

• Quantile

– ‘Cut points' that divide a sample of data into groups containing

(as far as possible) equal numbers of observations.

• Quartile (Quantile of 4)

– Values that divide a sample of data into 4 groups containing

(as far as possible) equal numbers of observations

• Percentile (Quantile of 100)

– Values that divide a sample of data into 100 groups containing

(as far as possible) equal numbers of observations

median

(12)

Rank-By-Feature

• Move on to 2D projections (scatterplot ordering)

• Identify pairwise relationships between dimensions

• B: prism provides overview of scores for dimension pairs; score is color coded

• D: scatterplot browser; multiple browsers are possible;

• Ranking criteria

– Correlation coe

ﬃ

cient: direction and strength of linear relationship

– Least square error for simple linear / curvilinear regression: how well does the

regression model fit

(13)

Dust & Magnet

• Yi et al. 2005

• Data cases are represented as particles of iron

dust

• Magnets represent the dimensions of the data set

• Users manipulate the magnets to move the dust

• Dust moves at diﬀerent speed depending on its

(14)

(15)

Clutter Reduction Techniques

• Clutter: crowded and disordered visual entities

that obscure the structure in visual displays

• Avoiding clutter is one of the main challenges in

Information Visualization

• Techniques to reduce clutter

– Dimension reordering

– Sampling

– Constant-density visualization

– Point displacement

– Aggregation / Clustering (one mark represents more than

one case, e.g. group day sales into months)

(16)

Dimension Reordering

• Peng et al. 2004

• Automatically identify the views with the least

amount of visual clutter

• Clutter definition and algorithms for diﬀerent

(17)

(18)

Sampling

• Reduce the density of visual representation by

displaying a random subset of the data

• Random sampling preserves the distribution of data

• Overall trends (e.g. correlation) can still be detected at a

reduced density

• Uniform sampling (Ellis & Dix 2004)

– Applying the same (manually or automatically defined)

sampling factor to the entire data space

– Problem: areas with low density may become empty

• Non-uniform sampling (Bertini & Santucci 2004)

– Preserving relative density

– Model to compute where, how, and how much to sample to

preserve image characteristics

(19)

(20)

Sampling Lens

• Ellis et al. 2005

• Magic Lens approach to apply random sampling to

user-defined regions while maintaining the original data

density in the context

(21)

Constant Information Density

• Woodru

ﬀ

et al. 1998

• Data objects in areas with high information density

are represented by small-sized low-detail glyphs

• Objects in sparse-density areas are represented by

larger, more detailed glyphs

• The number of information objects is kept constant

as the user scales and moves the view

(22)

(23)

Point Displacement

• Items that would overlap other items are moved to adjacent

free positions

• Position of the items and their distance should be preserved as

much as possible

• Three algorithms for displacing pixels (e.g. adding abstract

data to geographical maps) (Keim & Hermann 1998)

– Nearest-Neighbor

– Curve-based

– Gridfit

• Algorithms share the same

procedure

– All data points which have a unique

position are placed on the display

– New positions are determined for the

(24)

Point Displacement

• Nearest-Neighbor algorithm

– For all points which have not been set in the first

step, place them on the nearest unoccupied

position

– Fast to compute, but limited eﬀectiveness for very

dense displays (pixels may be placed very far from

their original positions)

• Curve-based algorithm

– For all points which have not been set in the first

step

• Compute the nearest unoccupied position on a given

screen-filling curve

• Shift all points between the occupied pixel and the

unoccupied position along the screen-filling curve

(25)

Chapter 4 - Interaction with

Visualizations

Dynamic linking, brushing and filtering in

Information Visualization displays

Vorlesung „Informationsvisualisierung”

Prof. Dr. Florian Alt, WS 2013/14

Konzept und Folien (4th revised edition):

(26)

Outline

• Definitions

• Direct Manipulation (DM)

– Dynamic Queries

– Movable Filters

• Common Interaction Techniques

– Select

– Explore

– Reconfigure

– Encode

– Abstract / Elaborate

– Filter

– Connect

(27)

(28)

InfoVis & Interaction

• Information Visualization research originally: focus on finding

novel visual representations

• Increasing interest in interaction design, HCI models and

evaluation as well as aesthetics in the InfoVis community

• HCI Interaction models help us to better understand the

complex concepts of human-machine communication

• Norman’s execution-evaluation cycle (Norman 1988)

– 1. Establishing the goal

– 2. Forming the intention

– 3. Specifying the action sequence

– 4. Executing the interaction

– 5. Perceiving the system state

– 6. Interpreting the system state

(29)

InfoVis & Interaction

• InfoVis systems are composed of two major components:

– Representation

• Roots lie in the field of Computer Graphics

• Mapping from data to geometrical objects and

• How these objects are rendered

– Interaction

• Roots lie in the field of Human-Computer Interaction (HCI)

• Dialog between user and the system

• User explores the data from di

ﬀ

erent perspectives to find

insights

(30)

Representation and Interaction

• Although discussed as two separate components,

representation and interaction clearly are not

mutually exclusive.

For example: Interaction with a system may

activate a change in representation.

• Representation component has received vast

majority of attention in InfoVis research.

• Scan of recent conference proceedings and journal

issues in InfoVis uncovers many articles about new

representations of datasets but interaction is often

relegated to a secondary role.

(31)

Representation and Interaction

• Interaction rarely is the main focus of research in InfoVis.

• Making it the “little brother” of InfoVis

• Interaction is overshadowed by the more noteworthy

representation aspects.

• Interaction is an essential part of InfoVis

• Without it, an InfoVis technique or system becomes a

static image or autonomously animated images.

• Static images clearly have analytic and expressive value

• But usefulness becomes more limited as the data set

they represent grows larger with more variables

(32)

Representation and Interaction

• By the way: Also with a static image such as a poster, a user

(or reader) will often perform interactions

– Rotating the poster

– Looking closer/further

– Jotting down notes on the poster

• Spence suggests the notion of “passive interaction” through

which the user’s mental model on the data set is changed

and enhanced.

• Finally, through interaction, some limits of a representation

can be overcome, and the cognition of a user can be further

amplified.

• Importance of interaction and the need for its further study

seem undisputed.

(33)

Definition

• Interaction: Before discussing the term “interaction”, we will

find a solid definition for it.

• But: Finding a solid definition for it is challenging!

• In the broader context of Human-Computer Interaction (HCI)

it is simply described as

“…the communication between user and the system.”

[Dix et al., 2004]

• Becker, Cleveland, and Wilks compactly define interaction

as…

“…direct manipulation and instantianeous change.”

(34)

Definition

• “…asymmetry in data rates.”

[Ware, 2000]

• The amount of data flowing from InfoVis systems to users is

far greater than from users to systems.

• Thus, interaction techniques in InfoVis seem more designed

for changing and adjusting visual representation than for

entering data into systems, which clearly is an important

aspect of interaction in HCI.

(35)

Definition

• We view interaction techniques in InfoVis as the

features that provide users with the ability to directly

or indirectly manipulate and interpret representation.

• According to this view, a static image or an

autonomously animated representation does not have

associated interaction techniques.

• However, a menu interface for changing from a scatter

plot to a parallel coordinates plot is an interaction

technique since it allows a user to manipulate a

representation even if it may be less interactive or

direct.

(36)

(37)

Direct Manipulation (DM)

• Shneiderman 1982

• DM features

– Visibility of all objects of interest

– Incremental actions with rapid feedback on all actions

– Reversibility of all actions, so that users are encouraged to explore

without penalties

– Syntactic correctness of all actions, so that every user action is a

legal operation

• DM does not only make interaction easier for novice

(38)

Example: Brushing

• Becker & Cleveland 1987

• A collection of dynamic methods for viewing

multidimensional data

• Brush is an interactive interface tool to select / mark

subsets of data in a single view, e.g. by sweeping a virtual

brush across items of interest

• Given linked views (e.g. scatterplot matrix) the brushing can

support the identification of correlations across multiple

dimensions (brushing & linking)

• Usually used to visually filter data (via highlighting)

• Additional manipulation / operations may be performed on

the subsets (masking, magnification, labeling etc.)

• Di

ﬀ

erent types of brushes (Hauser et al. 2002)

– Simple brush via sweeping

– Composite brush: composed multiple single-axis brushes by the use of

logical operators

– Angular brush

Composite scatterplot

brushes - Hauser et

al. 2002

(39)

Example: Brushing

• Brushing one dimension in parallel coordinates to

highlight car data objects with 4 cylinders

(40)

(41)

Angular Brush

• Angular brush: brushing by

specifying a slope range –

highlight correlation and outliers

between two dimensions

(42)

Smooth Brush

• Non-binary brushing

• Degree-of-interest

defined by distance

to brushed range

• Decreasing degree is

mapped to

decreasing drawing

intensity

Hauser et al.

(43)

Another Brushing Example

• Example for composite (AND) brush in Parallel

Coordinate Plot – find the cities with high wages, small

prices and many paid holiday days

• Demo InfoScope: http://www.macrofocus.com/public/

products/infoscope.html (free trial and applet)

(44)

Dynamic Queries

• Shneiderman 1994

• Explore and search databases

• SQL example: SELECT customer_id, customer_name,

COUNT(order_id) as total FROM customers INNER JOIN

orders ON customers.customer_id = orders.customer_id

GROUP BY customer_id, customer_name HAVING

COUNT(order_id) > 5 ORDER BY COUNT(order_id) DESC

• Problems

– Takes time to learn

– Takes time to formulate and reformulate

– User must know what she is looking for – only exact matches

– Lots of ways to fail

– SQL error messages helpful?

(45)

Dynamic Queries

• Based on Direct Manipulation (DM)

• DM principles with regard to Dynamic Queries

– Visual presentation of the query’s components

– Visual presentation of results

– Rapid, incremental, and reversible control of the query

– Selection by pointing, not typing

– Immediate, continuous feedback

• Implementation approach

– Graphical query formulation: Users formulate queries by

adjusting sliders, pressing buttons, bounding box selection…

– Search results displayed are continuously updated (< 100 ms)

(46)

Examples

• Visual representations of data to query?

• Some examples: geographic data, starfields,

tables etc.

(47)

HomeFinder

• One of the first DQ interfaces

(48)

FilmFinder

(49)

Dynamic Query Controls

• Check boxes and buttons (Nominal with low cardinality)

• Sliders and range slider (ordinal and quantitative data)

• Alphaslider (ordinal data) (Ahlberg & Shneiderman 1994)

– Small-sized widget to search sorted lists

– Online-text output

– Two-tiled slider thumb for dragging operations with di

ﬀ

erent

granularities

– Letter index visualizing the distribution of initial letters – jump to a

position in the slider

– Locating an item out of 10,000 items ~ 28s for novice users

– Pros and cons to text entry?

• Extend data sliders with data visualization

(Eick 1994)

(50)

Dynamic Queries Online

• Online examples:

– Apartment Search (http://immo.search.ch)

– Diamond search (http://www.bluenile.com)

(51)

DQ in current search interfaces

• DQ have become widespread with fast search

algorithms and increased computing capacity

– search happens while typing in search terms in google search

– new routes are calculated while point is dragged in google

(52)

Making Money with Dynamic Queries

• Starfield displays and

Dynamic Queries

provided the basis for

SpotFire

• Christopher Ahlberg

– 1991: Visiting student from

Sweden at the HCIL

University of Maryland

– 1996: Founder of SpotFire

– 2007: SpotFire was sold

for 195 Mio. $

(53)

Summary Dynamic Queries

• Users can rapidly, safely playfully explore a data space – no

false input possible

– Users can rapidly generate new queries based on incidental learning

– Visual representation of data supports data exploration

– Analysis by continuously developing and testing hypotheses (detect

clusters, outliers, trends in multivariate data)

– Provides straightforward undo and reversing of actions

• Potential problems with DQ as implemented in the FilmFinder?

– Limit of query complexity – filters are always conjunctive

– Performance is limited for very large data sets and client / server

applications

– Controls require valuable display space

– Information is pruned