• No results found

TIES443. Lecture 9: Visualization. Lecture 9. Course webpage: November 17, 2006

N/A
N/A
Protected

Academic year: 2021

Share "TIES443. Lecture 9: Visualization. Lecture 9. Course webpage: November 17, 2006"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

1

Lecture 9

Lecture 9

V

V

isualization

isualization

Department of Mathematical Information Technology

University of Jyväskylä

Mykola Pechenizkiy

Course webpage:

http://www.cs.jyu.fi/~mpechen/TIES443

TIES443

November 17, 2006

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

2

Topics for today

Topics for today

• Purpose of visualization in DM/KDD

• Visualization of data

– 1D, 2D, 3D, 3+D

• Visualization of data mining results

– Model visualization, model performance visualization

• Visualization of data mining processes

• Application …

– Interactive data mining and visualization

– Time-series visualization

(2)

Lecture 9: Visualization

TIES443: Introduction to DM

3

Sources

Sources

• Books:

– “

Information Visualization in DM and KD

” Fayyad

et al.

– “

Information Visualization: Beyond the Horizon

:

2nd ed, C. Chen

– “

The Visual Display of Quantitative Information

”, 2nd ed, E. Tufte

• People:

Edward Tufte

,

Ben Shneiderman, Daniel A. Keim, Marti Hearst,

Tamara Munzner, and other

• Excellent handouts and recommended reading on

information visualization by

Tamara Munzner

http://www.cs.ubc.ca/~tmm/courses/cpsc533c-04-spr/

• Some papers:

– Visual Data Mining Framework for Convenient Identification of

Useful Knowledge

http://www.cs.uic.edu/~kzhao/Papers/06_ICDM_Zhao_Visual.pdf

• Lots of other sources …

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Visualization & related fields

Visualization & related fields

• User Interface Studies

• Computer Vision

• Cognitive Science

• Computer-Aided Design

• Geometric Modelling

• Computer Graphics

• Signal Processing

• Approximation Theory

(3)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

5

Purpose of Visualization in DM/KDD

Purpose of Visualization in DM/KDD

• Data Visualization

– Qualitative overview of large and complex data sets

– Data summarization and (interactive) exploration

– Regions of interest identification

– Support humans in looking for structures, features, patterns,

trends, and anomalies in data

– Perceptual capabilities of the human visual system

• Model (and its performance) Visualization

• Process Visualization

• Disadvantages?

– requires human eyes

– can be misleading

"The purpose of computing is insight, not numbers"

Richard Hamming

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

6

Visualization Techniques Categorization

Visualization Techniques Categorization

• Based on task at hand

– To explore data

– To confirm a hypothesis

• Based on structure of underlying data set

• Based on the dimension of the display

– Stationary

– Animated

– Interactive

• Geometric vs. Symbolic

– Results of physical models, simulations, computations

– Nonnumeric data

• Networks, relations as graphs; icons, arrays

• 1D-2D-3D, stereoscopic, virtual reality vs. 3+D

• Static vs. Dynamic

(4)

Lecture 9: Visualization

TIES443: Introduction to DM

7

Some Basic Visualization Terminology

Some Basic Visualization Terminology

• Visualization

• Interaction

• Graphical attributes

• Mapping

• Rendering

• Field

• Scalar, Vector, Tensor

• Isosurface

• Glyph

• …

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Perception in Visualization

Perception in Visualization

• What graphical primitives do humans detect quickly?

• What graphical attributes can be accurately measured?

• How many distinct values for a particular attribute can be

used without confusion?

• How do we combine primitives to recognize complex

phenomena?

(5)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

9

The most of following slides for

The most of following slides for

data

data

visualization were adopted from the

visualization were adopted from the

presentation

presentation “

“Visualization and Data

Visualization and Data

Mining

Mining”

by G.

by G. Piatetsky

Piatetsky-

-Shapiro

Shapiro

(available from

(available from www.kdnuggets.com

www.kdnuggets.com

)

)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

10

Data Visualization Outline

Data Visualization Outline

• Graphical excellence and lie factor

• Representing data in 1,2, and 3-D

• Representing data in 4+ dimensions

– Parallel coordinates

– Scatterplots

(6)

Lecture 9: Visualization

TIES443: Introduction to DM

11

Napoleon Invasion of Russia, 1812

Napoleon Invasion of Russia, 1812

Napoleon

(7)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

13

Snow’s Cholera

Map, 1855

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

14

Asia at night

Asia at night

(8)

Lecture 9: Visualization

TIES443: Introduction to DM

15

Bad Visualization: misleading Y -axis

2124

2003

2121

2002

2120

2001

2105

2000

2110

1999

Sales

Year

Sales

2095

2100

2105

2110

2115

2120

2125

2130

1999

2000

2001

2002

2003

Sales

Y-Axis scale gives

WRONG

impression of big change

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Better Visualization

Better Visualization

2124

2003

2121

2002

2120

2001

2105

2000

2110

1999

Sales

Year

Sales

0

500

1000

1500

2000

2500

3000

1999

2000

2001

2002

2003

Sales

Axis from 0 to 2000 scale gives

correct impression of small change

(9)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

17

Lie Factor=14.8

(E.R. Tufte,

(E.R. Tufte,

The Visual Display of Quantitative Information

The Visual Display of Quantitative Information

, 2nd edition)

, 2nd edition)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

18

Lie Factor

Lie Factor

=

=

data

in

effect

of

size

graphic

in

shown

effect

of

size

Factor

Lie

8

.

14

528

.

0

833

.

7

18

)

0

.

18

5

.

27

(

6

.

0

)

6

.

0

3

.

5

(

=

=

=

Tufte requirement: 0.95<Lie Factor<1.05

(E.R. Tufte,

(10)

Lecture 9: Visualization

TIES443: Introduction to DM

19

• Give the viewer

– the greatest number of ideas

– in the shortest time

– with the least ink in the smallest space.

• Tell the truth about the data!

(E.R. Tufte,

(E.R. Tufte,

The Visual Display of Quantitative Information

The Visual Display of Quantitative Information

, 2nd

, 2nd

edition)

edition)

Tufte

Tufte’

’s

s

Principles of Graphical Excellence

Principles of Graphical Excellence

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

1

1-

-D (Univariate) Data

D (Univariate) Data

• Representations

(11)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

21

2

2-

-D (

D (

Bivariate) Data

Bivariate

) Data

• Scatter plot, …

price

mileage

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

22

3

3-

-D Data (projection)

D Data (projection)

(12)

Lecture 9: Visualization

TIES443: Introduction to DM

23

3

3

-

-

D image (requires 3

D image (requires 3

-

-

D blue and red glasses)

D blue and red glasses)

Taken by Mars Rover Spirit, Jan 2004

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Visualizing in 4+ Dimensions

Visualizing in 4+ Dimensions

• Scatterplots

• Parallel coordinates

• Chernoff faces

• Stick figures

• Star glyphs

• …

(13)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

25

Multiple Views

Multiple Views

Give each variable its own display

A B C D E

1 4 1 8 3 5

2 6 3 4 2 1

3 5 7 2 4 3

4 2 6 3 1 5

A B C D E

1

2

3

4

Problem: does not show correlations

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

26

Scatterplot Matrix

Scatterplot Matrix

Represent each possible

pair of variables in their

own 2-D scatterplot

(car data)

Q: Useful for what?

A: linear correlations

(e.g. horsepower &

weight)

Q: Misses what?

A: multivariate effects

For 13 dimensions (columns) :

Number of histograms = 13

(14)

Lecture 9: Visualization

TIES443: Introduction to DM

27

Parallel Coordinates

Parallel Coordinates

• Encode variables along a horizontal row

• Vertical line specifies values

Dataset in a Cartesian coordinates

Same dataset in parallel coordinates

Invented by

Alfred Inselberg

while at IBM, 1985

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Example: Visualizing Iris Data

Example: Visualizing Iris Data

sepal

length

sepal

width

petal

length

petal

width

5.1

3.5

1.4

0.2

4.9

3

1.4

0.2

...

...

...

...

5.9

3

5.1

1.8

Iris setosa

Iris versicolor

Iris virginica

(15)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

29

Parallel Coordinates: 2 D

Parallel Coordinates: 2 D

Sepal

Length

5.1

Sepal

Width

3.5

sepal

length

sepal

width

petal

length

petal

width

5.1

3.5

1.4

0.2

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

30

Parallel Coordinates: 4 D

Parallel Coordinates: 4 D

Sepal

Length

5.1

Sepal

Width

Petal

length

Petal

Width

3.5

sepal

length

sepal

width

petal

length

petal

width

5.1

3.5

1.4

0.2

1.4

0.2

(16)

Lecture 9: Visualization

TIES443: Introduction to DM

31

5.1

3.5

1.4

0.2

Parallel Visualization of Iris data

Parallel Visualization of Iris data

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Parallel Coordinates

Parallel Coordinates: Correllation

Hyperdimensional Data Analysis Using Parallel Coordinates. Edward J. Wegman.

Journal of the American Statistical Association, 85(411), Sep 1990, p 664-675.

(17)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

33

Hierarchical Parallel Coordinates

Coordinates

Hierarchical Parallel Coordinates for Visualizing Large Multivariate Data Sets. Fua,

Ward, and Rundensteiner, IEEE Visualization 99

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

34

Chernoff Faces

Chernoff Faces

Encode different variables’ values in characteristics

of human face

http://www.cs.uchicago.edu/~wiseman/chernoff/

http://hesketh.com/schampeo/projects/Faces/chernoff.html

(18)

Lecture 9: Visualization

TIES443: Introduction to DM

35

Chernoff faces, example

Chernoff faces, example

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Stick Figures

Stick Figures

• Two variables are mapped to X, Y axes

• Other variables are mapped to limb lengths and angles

• Texture patterns can show data characteristics

(19)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

37

Stick figures, example

Stick figures, example

census data

showing

age, income, sex,

education, etc.

Closed figures

correspond to

women and we can

see more of them

on the left.

Note also a young

woman with high

income

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

38

Star Glyphs from

Star Glyphs from Xmdv

Xmdv

• Two variables are

mapped to X, Y axes

• Other variables are

mapped to limb

lengths and angles

• Texture patterns can

show data

characteristics

(20)

Lecture 9: Visualization

TIES443: Introduction to DM

39

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Model Visualization

Model Visualization

(21)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

41

DM Model Visualization: Why

DM Model Visualization: Why

• Understanding the model

– allow a user to discuss and explain the logic behind the model

• also with colleagues, customers, and other users

– more than just comprehension; it also involves business context

• financial indicators like profit and cost

– visualization of the DM output in a meaningful way

– allowing the user to interact with the visualization

• simple “what if” questions can be answered

– 3 whales: representation, interaction, and integration

• Trusting the model

– good quantitative measures of "trust"

– the overall goal of "visualizing trust" – understanding the

limitations of the model

• Comparing Different Models using Visualization

– as Input-Output Mappings, as Algorithms, as Processes

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

42

Model Visualization: Performance Evaluation

Model Visualization: Performance Evaluation

Cost-curves (Holt)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 PC(+) − Probability Cost
(22)

Lecture 9: Visualization

TIES443: Introduction to DM

43

Visualization of DM/KDD Process

Visualization of DM/KDD Process

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Applications

Applications

• Bioinformatics

• Visual genomics

• Social networks

• Process mining

(23)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

45

Visualization of Clusters of

Visualization of Clusters of

Microarray

Microarray

Data

Data

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

46

Visual Genomics

Visual Genomics

Pictures by Christoph W. Sensen

(24)

Lecture 9: Visualization

TIES443: Introduction to DM

47

The Result of Process Mining

The Result of Process Mining

www.processmining.org

From presentation by Wil van der Aalst, TU/e

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

(25)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

49

Focus+Context

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

50

Spiral Axis

= serial attributes are

encoded as line thickness

Radii

= periodic attributes

Carlis & Konstan. UIST-98

Independently rediscovered by

Weber, Alexa & Müller InfoVis-01

But dates back to 1888!

M

ond

ay

Tue

sda

y

W

edn

esd

ay

Thu

rsda

Frid

y

ay

Satu

rday

Sun

day

One year

of power

demand

data

Time Series Spirals

Time Series Spirals

(26)

Lecture 9: Visualization

TIES443: Introduction to DM

51

Power Consumption

van Wijk and van Selow, Cluster and Calender based Visualization of Time Series

Data, InfoVis99, http://www.win.tue.nl/˜vanwijk/clv.pdf

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Chimpanzees Monthly Food Intake

1980-1988

January

April

July

October

Time Series Spirals

Time Series Spirals

The spokes are months, and

spiral guide lines are years

112 types of food

Simple and intuitive, but works only with

periodic data, and if the period is known

(c) Eamonn Keogh, [email protected]

(27)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

53

• Current width = strength of

theme

• River width = global strength

• Color mapping (similar

themes/

same

color family)

• Time axis

• External events can be linked

A company’s patent activity

1988 to 1998

Havre, Hetzler, Whitney & Nowell

InfoVis 2000

ThemeRiver

ThemeRiver

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

54

oil

• Simple and intuitive

• Many extensions

possible

• Scalability is still an issue

Castro confiscates American oil refineries

Fidel Castro’s speeches 1960-1961

dot.com stocks 1999-2002

ThemeRiver

ThemeRiver

(28)

Lecture 9: Visualization

TIES443: Introduction to DM

55

Comments

Comments

• Simple and intuitive

• Highly dynamic

exploration

• Query power may be

limited and simplistic

• Limited scalability

Hochheiser, and Shneiderman

TimeSearcher

TimeSearcher

(c) Eamonn Keogh, [email protected]

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

10001000101001000101010100001

01010001010111011110101101001

01110100101010011101010101001

01001010101110101010010101010

11010101001011001011101111010

00111000010100001001110101000

11100001010101100101110101

01011001011110011010010000100

01010011011010111000010101011

10111110001101101101111110100

11001001000110100011110011011

01000101111000101101001101100

11010000001001100010011100000

11101001100101100001010010

VizTree

VizTree

Which set of bit strings is generated by a human and

which one is generated by a computer?

the frequencies of all triplets are encoded in the thickness of branches …

humans usually try to fake randomness by alternating patterns

(29)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

57

Zoom in

The “trick” on the

previous slide only

works for discrete data,

but time series are

real

valued.

we will discuss how

we can SAX up a

time series to make

it discrete during the

next tutorial on time

series mining

Overview

Details 1

Details 2

Ben Shneiderman

VizTree

VizTree

“Overview, zoom & filter, details on demand”

– main principles of visualization

(c) Eamonn Keogh, [email protected]

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Lecture 9: Visualization

TIES443: Introduction to DM

58

• Purpose of visualization in DM/KDD

• Visualization of data - Many methods

– 1D, 2D, 3D

– Visualization is possible in more than 3-D

• Visualization of data mining results

– Model visualization, model performance visualization

• Visualization of data mining processes

• Application …

– Interactive and integrative data mining and visualization

– Time-series, symbolic, networks visualization

Visualization Summary

Visualization Summary

What else did you get from this lecture?

What else did you get from this lecture?

One picture may worth 1000 words!

One picture may worth 1000 words!

(30)

Lecture 9: Visualization

TIES443: Introduction to DM

59

Additional Slides

Additional Slides

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY

Visualization Software

Visualization Software

Free and Open-source

• Ggobi

• Xmdv

• Some techniques are available in WEKA and YALE as you

have seen already

• Many more - see

www.kdnuggets.com/software/visualization.html

References

Related documents

There are different Session Key Exchange methods / algorithms Like Diffie-Hellman, Secure Hill Cipher Modifications and Key Exchange Protocol, Integration of

At the individual level, the Health Ambassador model will offer nutrition and physical activity education for adults living in low-income communities, preferably in public housing but

In mid 2009, Osram inaugurated a 35 000 m² facility in Penang, Malaysia, to manufacture 4-inch- wafer-based indium gallium nitride semiconductor chips. These chips form the basis

As a female, African-American doctoral candidate at Eastern Michigan University (EMU), I am in the process of completing my dissertation research on the “Socialization of African

Boyd &amp; Blair Corsair High Wire Post Modern Wodka NON ALCOHOLIC Blaum Brothers Jack Rudy GIN Corsair Eden Mills Edinburgh FEW Heath Clark High Wire Letherbee Martin Miller’s

Istituto Veneto di Scienze Lettere ed Arti, Palazzo Cavalli Franchetti, San Marco, 2847 (Campo Santo Stefano); Chiesa di Santa Maria della Visitazione, Centro Culturale Don

Immersive environment for improving the understanding of architectural 3D models: Comparing user spatial perception between immersive and traditional virtual reality systems.

Cellular wireless networks have become an indispensable part of the communications infrastructure CDMA is an important air-interface technology for cellular