• No results found

Chapter 3 - Multidimensional Information Visualization II

N/A
N/A
Protected

Academic year: 2021

Share "Chapter 3 - Multidimensional Information Visualization II"

Copied!
53
0
0

Loading.... (view fulltext now)

Full text

(1)

Chapter 3 - Multidimensional

Information Visualization II

Concepts for visualizing univariate to

hypervariate data

Vorlesung „Informationsvisualisierung”

Prof. Dr. Florian Alt, WS 2013/14

Konzept und Folien (4th revised edition):

(2)

Outline

Reference model and data terminology

Visualizing data with < 4 variables

Visualizing multivariable data

– Geometric transformation

– Glyphs

– Pixel-based

– Downscaling of dimensions

Case studies: support for exploring multidimensional data

– Rank-by-feature

– Dust & magnet

(3)
(4)

Rank-By-Feature

Seo & Shneiderman 2004

Part of the Hierarchical Clustering Explorer (HCE)

(http://www.cs.umd.edu/hcil/multi-cluster/)

Tabs: histogram and scatterplot ordering

Implements systematic approach for data exploration

– (1) study 1D, study 2D, then find features

– (2) ranking guides insight, statistics confirm

T

ool provides low-dimensional projections as a histogram (1D) or

scatterplot (2D)

Users can select a feature detection criterion (e.g. test for normal

distribution (1D), correlation coe

cient (2D)) to rank projections

The ranking facility is particularly helpful when the number of

possible projections is too large to investigate: concentrate on

the interesting ones

(5)

Rank-By-Feature

Users start with 1D projections (histogram ordering)

Four coordinated views

– A: selection of ranking criterion

– B: overview of scores for all dimensions (color coding: the brighter the

color, the higher the score)

– C: numerical / statistical detail for each dimension (e.g. score, mean,

standard deviation)

– D: display of histogram + boxplot (minimum, first quartile, median,

third quartile, maximum)

(6)

Rank-By-Feature

Some basic statistical terms

– Mean: Sum of all values divided by the number of values

– Median: Middle value of a distribution of values when ranked in order of

magnitude

– Mode: Single most common value

– Variance: average squared deviation between the mean and the values

– Standard deviation: square root of the variance (translates the variance

into the original units of measurement)

Statistical tests supported by HCE for 1D ranking

– Normality of the distribution: distribution of items forms a symmetric,

bell-shaped curve

– Uniformity of the distribution: all of the values of a random variable occur

with equal probability (results in a flat histogram)

– Number of potential outliers

– Number of unique values

(7)

Mode

The most frequent score

Describes how most items behave

Pros:

– Easy to calculate and understand

– Can be used with nominal data

Cons:

– There can be more than one mode

– Mode can change dramatically by adding only one dataset

– Independent of all other data in the set

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 Fr eq u en cy Score

Negatively Skewed Distribution

(8)

Median

Middle score of the distribution

Example data:

1 7 3 9 6 9 2

Sorted by magnitude:

9 9 7 6 3 2 1 median = 6

If #scores even average two middle scores

Example data:

1 7 3 9 4 6 9 2

Sorted by magnitude:

9 9 7 6 4 3 2 1 median = 5

Pros:

– Relatively una

ected by outliers (very low or high scores) and

skewed distributions

– Can be used with ordinal, interval and ratio data

Cons:

– Does not consider all scores of the data set

– Not very stable

(9)

Mean

Sum of all scores divided by #scores:

Most often used if ‘average’ is mentioned

Pros:

– Considers every score

– most accurate summary of the data

– Resistant to sampling variation: removing one sample changes

the mean far less than mode or median

Cons:

– Heavily a

ected by extreme scores and skewed distributions

– Can only be used with interval and ratio data

=

n = i i

X

n

m

1

1

(10)

Standard Deviation and Variance

How do you measure the accuracy of the mean?

Example data set 1:

5 5 5 5 5

mean = 5

Example data set 2:

6 8 4 1 6

mean = 5

Which of the data sets is better reflected by the mean?

If x

1

, x

2

, … x

n

are the data in a sample with mean m

– Deviation = di

erence between mean and scores =

(x

i

- m)

– Variance s

=

– Standard deviation (SD) s =

Var(X)

Both variance and standard deviations

measure the

– accuracy of the data set

– variability of the data

(x

i

– m) )

n

2

2

(11)

Quantile, Quartile, and Percentile

Quantile

– ‘Cut points' that divide a sample of data into groups containing

(as far as possible) equal numbers of observations.

Quartile (Quantile of 4)

– Values that divide a sample of data into 4 groups containing

(as far as possible) equal numbers of observations

Percentile (Quantile of 100)

– Values that divide a sample of data into 100 groups containing

(as far as possible) equal numbers of observations

median

(12)

Rank-By-Feature

Move on to 2D projections (scatterplot ordering)

Identify pairwise relationships between dimensions

B: prism provides overview of scores for dimension pairs; score is color coded

D: scatterplot browser; multiple browsers are possible;

Ranking criteria

– Correlation coe

cient: direction and strength of linear relationship

– Least square error for simple linear / curvilinear regression: how well does the

regression model fit

(13)

Dust & Magnet

Yi et al. 2005

Data cases are represented as particles of iron

dust

Magnets represent the dimensions of the data set

Users manipulate the magnets to move the dust

Dust moves at different speed depending on its

(14)
(15)

Clutter Reduction Techniques

Clutter: crowded and disordered visual entities

that obscure the structure in visual displays

Avoiding clutter is one of the main challenges in

Information Visualization

Techniques to reduce clutter

– Dimension reordering

– Sampling

– Constant-density visualization

– Point displacement

– Aggregation / Clustering (one mark represents more than

one case, e.g. group day sales into months)

(16)

Dimension Reordering

Peng et al. 2004

Automatically identify the views with the least

amount of visual clutter

Clutter definition and algorithms for different

(17)
(18)

Sampling

Reduce the density of visual representation by

displaying a random subset of the data

Random sampling preserves the distribution of data

Overall trends (e.g. correlation) can still be detected at a

reduced density

Uniform sampling (Ellis & Dix 2004)

– Applying the same (manually or automatically defined)

sampling factor to the entire data space

– Problem: areas with low density may become empty

Non-uniform sampling (Bertini & Santucci 2004)

– Preserving relative density

– Model to compute where, how, and how much to sample to

preserve image characteristics

(19)
(20)

Sampling Lens

Ellis et al. 2005

Magic Lens approach to apply random sampling to

user-defined regions while maintaining the original data

density in the context

(21)

Constant Information Density

Woodru

et al. 1998

Data objects in areas with high information density

are represented by small-sized low-detail glyphs

Objects in sparse-density areas are represented by

larger, more detailed glyphs

The number of information objects is kept constant

as the user scales and moves the view

(22)
(23)

Point Displacement

Items that would overlap other items are moved to adjacent

free positions

Position of the items and their distance should be preserved as

much as possible

Three algorithms for displacing pixels (e.g. adding abstract

data to geographical maps) (Keim & Hermann 1998)

– Nearest-Neighbor

– Curve-based

– Gridfit

Algorithms share the same

procedure

– All data points which have a unique

position are placed on the display

– New positions are determined for the

(24)

Point Displacement

Nearest-Neighbor algorithm

– For all points which have not been set in the first

step, place them on the nearest unoccupied

position

– Fast to compute, but limited effectiveness for very

dense displays (pixels may be placed very far from

their original positions)

Curve-based algorithm

– For all points which have not been set in the first

step

Compute the nearest unoccupied position on a given

screen-filling curve

Shift all points between the occupied pixel and the

unoccupied position along the screen-filling curve

(25)

Chapter 4 - Interaction with

Visualizations

Dynamic linking, brushing and filtering in

Information Visualization displays

Vorlesung „Informationsvisualisierung”

Prof. Dr. Florian Alt, WS 2013/14

Konzept und Folien (4th revised edition):

(26)

Outline

Definitions

Direct Manipulation (DM)

– Dynamic Queries

– Movable Filters

Common Interaction Techniques

– Select

– Explore

– Reconfigure

– Encode

– Abstract / Elaborate

– Filter

– Connect

(27)
(28)

InfoVis & Interaction

Information Visualization research originally: focus on finding

novel visual representations

Increasing interest in interaction design, HCI models and

evaluation as well as aesthetics in the InfoVis community

HCI Interaction models help us to better understand the

complex concepts of human-machine communication

Norman’s execution-evaluation cycle (Norman 1988)

– 1. Establishing the goal

– 2. Forming the intention

– 3. Specifying the action sequence

– 4. Executing the interaction

– 5. Perceiving the system state

– 6. Interpreting the system state

(29)

InfoVis & Interaction

InfoVis systems are composed of two major components:

– Representation

Roots lie in the field of Computer Graphics

Mapping from data to geometrical objects and

How these objects are rendered

– Interaction

Roots lie in the field of Human-Computer Interaction (HCI)

Dialog between user and the system

User explores the data from di

erent perspectives to find

insights

(30)

Representation and Interaction

Although discussed as two separate components,

representation and interaction clearly are not

mutually exclusive.

For example: Interaction with a system may

activate a change in representation.

Representation component has received vast

majority of attention in InfoVis research.

Scan of recent conference proceedings and journal

issues in InfoVis uncovers many articles about new

representations of datasets but interaction is often

relegated to a secondary role.

(31)

Representation and Interaction

Interaction rarely is the main focus of research in InfoVis.

Making it the “little brother” of InfoVis

Interaction is overshadowed by the more noteworthy

representation aspects.

Interaction is an essential part of InfoVis

Without it, an InfoVis technique or system becomes a

static image or autonomously animated images.

Static images clearly have analytic and expressive value

But usefulness becomes more limited as the data set

they represent grows larger with more variables

(32)

Representation and Interaction

By the way: Also with a static image such as a poster, a user

(or reader) will often perform interactions

– Rotating the poster

– Looking closer/further

– Jotting down notes on the poster

Spence suggests the notion of “passive interaction” through

which the user’s mental model on the data set is changed

and enhanced.

Finally, through interaction, some limits of a representation

can be overcome, and the cognition of a user can be further

amplified.

Importance of interaction and the need for its further study

seem undisputed.

(33)

Definition

Interaction: Before discussing the term “interaction”, we will

find a solid definition for it.

But: Finding a solid definition for it is challenging!

In the broader context of Human-Computer Interaction (HCI)

it is simply described as

“…the communication between user and the system.”

[Dix et al., 2004]

Becker, Cleveland, and Wilks compactly define interaction

as…

“…direct manipulation and instantianeous change.”

(34)

Definition

“…asymmetry in data rates.”

[Ware, 2000]

The amount of data flowing from InfoVis systems to users is

far greater than from users to systems.

Thus, interaction techniques in InfoVis seem more designed

for changing and adjusting visual representation than for

entering data into systems, which clearly is an important

aspect of interaction in HCI.

(35)

Definition

We view interaction techniques in InfoVis as the

features that provide users with the ability to directly

or indirectly manipulate and interpret representation.

According to this view, a static image or an

autonomously animated representation does not have

associated interaction techniques.

However, a menu interface for changing from a scatter

plot to a parallel coordinates plot is an interaction

technique since it allows a user to manipulate a

representation even if it may be less interactive or

direct.

(36)
(37)

Direct Manipulation (DM)

Shneiderman 1982

DM features

– Visibility of all objects of interest

– Incremental actions with rapid feedback on all actions

– Reversibility of all actions, so that users are encouraged to explore

without penalties

– Syntactic correctness of all actions, so that every user action is a

legal operation

DM does not only make interaction easier for novice

(38)

Example: Brushing

Becker & Cleveland 1987

A collection of dynamic methods for viewing

multidimensional data

Brush is an interactive interface tool to select / mark

subsets of data in a single view, e.g. by sweeping a virtual

brush across items of interest

Given linked views (e.g. scatterplot matrix) the brushing can

support the identification of correlations across multiple

dimensions (brushing & linking)

Usually used to visually filter data (via highlighting)

Additional manipulation / operations may be performed on

the subsets (masking, magnification, labeling etc.)

Di

erent types of brushes (Hauser et al. 2002)

– Simple brush via sweeping

– Composite brush: composed multiple single-axis brushes by the use of

logical operators

– Angular brush

Composite scatterplot

brushes - Hauser et

al. 2002

(39)

Example: Brushing

Brushing one dimension in parallel coordinates to

highlight car data objects with 4 cylinders

(40)
(41)

Angular Brush

Angular brush: brushing by

specifying a slope range –

highlight correlation and outliers

between two dimensions

(42)

Smooth Brush

Non-binary brushing

Degree-of-interest

defined by distance

to brushed range

Decreasing degree is

mapped to

decreasing drawing

intensity

Hauser et al.

(43)

Another Brushing Example

Example for composite (AND) brush in Parallel

Coordinate Plot – find the cities with high wages, small

prices and many paid holiday days

Demo InfoScope: http://www.macrofocus.com/public/

products/infoscope.html (free trial and applet)

(44)

Dynamic Queries

Shneiderman 1994

Explore and search databases

SQL example: SELECT customer_id, customer_name,

COUNT(order_id) as total FROM customers INNER JOIN

orders ON customers.customer_id = orders.customer_id

GROUP BY customer_id, customer_name HAVING

COUNT(order_id) > 5 ORDER BY COUNT(order_id) DESC

Problems

– Takes time to learn

– Takes time to formulate and reformulate

– User must know what she is looking for – only exact matches

– Lots of ways to fail

– SQL error messages helpful?

(45)

Dynamic Queries

Based on Direct Manipulation (DM)

DM principles with regard to Dynamic Queries

– Visual presentation of the query’s components

– Visual presentation of results

– Rapid, incremental, and reversible control of the query

– Selection by pointing, not typing

– Immediate, continuous feedback

Implementation approach

– Graphical query formulation: Users formulate queries by

adjusting sliders, pressing buttons, bounding box selection…

– Search results displayed are continuously updated (< 100 ms)

(46)

Examples

Visual representations of data to query?

Some examples: geographic data, starfields,

tables etc.

(47)

HomeFinder

One of the first DQ interfaces

(48)

FilmFinder

(49)

Dynamic Query Controls

Check boxes and buttons (Nominal with low cardinality)

Sliders and range slider (ordinal and quantitative data)

Alphaslider (ordinal data) (Ahlberg & Shneiderman 1994)

– Small-sized widget to search sorted lists

– Online-text output

– Two-tiled slider thumb for dragging operations with di

erent

granularities

– Letter index visualizing the distribution of initial letters – jump to a

position in the slider

– Locating an item out of 10,000 items ~ 28s for novice users

– Pros and cons to text entry?

Extend data sliders with data visualization

(Eick 1994)

(50)

Dynamic Queries Online

Online examples:

– Apartment Search (http://immo.search.ch)

– Diamond search (http://www.bluenile.com)

(51)

DQ in current search interfaces

DQ have become widespread with fast search

algorithms and increased computing capacity

– search happens while typing in search terms in google search

– new routes are calculated while point is dragged in google

(52)

Making Money with Dynamic Queries

Starfield displays and

Dynamic Queries

provided the basis for

SpotFire

Christopher Ahlberg

– 1991: Visiting student from

Sweden at the HCIL

University of Maryland

– 1996: Founder of SpotFire

– 2007: SpotFire was sold

for 195 Mio. $

(53)

Summary Dynamic Queries

Users can rapidly, safely playfully explore a data space – no

false input possible

– Users can rapidly generate new queries based on incidental learning

– Visual representation of data supports data exploration

– Analysis by continuously developing and testing hypotheses (detect

clusters, outliers, trends in multivariate data)

– Provides straightforward undo and reversing of actions

Potential problems with DQ as implemented in the FilmFinder?

– Limit of query complexity – filters are always conjunctive

– Performance is limited for very large data sets and client / server

applications

– Controls require valuable display space

– Information is pruned

References

Related documents

In the context of initiatives to develop the scholarship of teaching and learning in a large research-intensive university in Australia, this paper discusses the relationship

• The regulator is encouraged to ask the Applicant to provide additional data using a comprehensive set of smaller probes in order to evaluate the genetic stability of the

Spletna aplikacija nam je zagotovila pridobivanje podatkov študentov, saj pomeni testni sistem s podatkovno bazo, kjer so shranjeni podatki študentov, vpisanih na fakulteti.

Dari hasil olah data deskripsi variabel yang dilakukan dari 30 responden maka diperoleh hasil nilai persentase rata-rata status sosial ekonomi sebesar 79 persen,

2020: -30% in energy consumed per vehicle produced vs 2010 at Mass-Market and Premium Brand assembly and stamping plants worldwide. Note: details for each company are available

In this study we combine imaging spectrometry (IS) and airborne laser scanning (ALS) data of a mixed needle and broadleaf forest to differentiate tree species more accurately

From the Medical-Surgical Intensive Care Unit (ICU), Avicenne Teaching Hospital, Bobigny (C.C., J.-P.F., Y.C.); the Medical ICU, Albert Michallon Teaching Hospital (C.S., J.-F.T.)