• No results found

Visualization Techniques in Data Mining

N/A
N/A
Protected

Academic year: 2021

Share "Visualization Techniques in Data Mining"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Prof. Pier Luca Lanzi

Laurea in Ingegneria Informatica

Politecnico di Milano

Polo di Milano Leonardo

Tecniche di Apprendimento Automatico per

Applicazioni di Data Mining

Visualization Techniques

in Data Mining

(2)

Outline

• Goals of visualization

• Advantages

• Methodologies

• Techniques

• User interaction

• Problems

(3)

Goals of Data Visualization

Today there is the need to manage a huge

amount of data, and computer systems help

us in this task

Visual Data Mining help to deal with this

flood of information, integrating the human

in the data analysis process

Visual Data Mining allows the user to gain

insight into the data, drawing conclusions

and directly interacting with the data

(4)

Advantages of visualization

techniques

The main advantages of the application of Visual

data mining techniques are:

Visual data exploration can easily deal with very large,

highly non homogeneous and noisy amount of data

Visual data exploration requires no understanding of

complex mathematical or statistical algorithms

Visualization techniques provide a qualitative overview

useful for further quantitative analysis

(5)

Approach methodologies

Confirmative Analysis:

starting point: hypotheses about the data

result: visualization of the data allowing confirmation or rejection of the hypotheses

Presentation:

starting point: facts to be presented are fixed a priori

result: high-quality visualization of the data presenting the facts

Explorative Analysis:

starting point: data without hypotheses

(6)

Visualization techniques

• Geometric techniques: scatterplots matrices, Hyperslice,

parallel coordinates

• Pixel-oriented techniques: simple line-by-line, spiral and

circle segments

• Hierarchical techniques: Treemap, cone trees

• Graph-based techniques: 2D and 3D graph

• Distortion techniques: hyperbolic tree, fisheye view,

perspective wall

(7)

Geometric techniques

Basic idea:

Visualization of geometric transformations and

projections of the data

Methods:

Scatterplot matrices

Hyperslice

(8)

Scatterplot matrices

A scatterplot matrix

is composed of scatter plots of all possible pairs of variables in a dataset

Assuming a N-dimension dataset, there are (N2-N)/2 pairs of two dimension plots

(9)

Hyperslice

HyperSlice is an extension of the scatterplot matrix

They represent a multi-dimensional function as a matrix of orthogonal two-dimensional slices

(10)

Parallel Coordinates

The axes are defined as parallel

vertical lines separated

A point in Cartesian coordinates correspond to a polyline in

parallel coordinates

Able to visualize data that may be occluded in Cartesian coordinates

(11)

Pixel-oriented techniques

Basic idea:

The basic idea of pixel-oriented techniques is to map each

data value to a colored pixel

Each attribute value is represented by a pixel with a color

tone proportional to a relevance factor in a separate

window

Methods:

Simple Arrangement Line-by-Line

(12)

Pixel-oriented techniques

(13)

Pixel-oriented techniques

• Spiral

(14)

Hierarchical techniques

Basic idea:

Visualization of the data using a hierarchical

partitioning into two- or three-dimensional

subspaces

Methods:

Treemap

(15)

Treemap

Visualization of hierarchical collections of quantitative data as files on a hard drive, financial analysis, bioinformatics, etc..

Divide a limited screen space display area into a sequence of rectangles whose areas correspond to an attribute of data set

(16)

Cone trees

3-dimensional extension of the more familiar

2-D hierarchical tree structures, to a more

intuitive navigation and display of information

(17)

Graph-based visualization

• Graphs (edges + nodes) with labels and

attributes

• Used where emphasis is on data relationship

(databases, telecom)

• Coordinates not always meaningful

• Useful for discovering patterns

(18)

Graph-based visualization

• Color and thickness code values

• Asymmetric relations:

(19)

Graph-based visualization

(20)

Graph-based visualization

• 3D graphs:

– more room for objects

– different points of view

(21)

Focus vs. context

• Too much data in too small screens

• Solutions:

– dual views (detailed + global)

(22)

Distortion

• Hyperbolic tree

• Fisheye view

(23)

User interaction

• Brushing: selecting points or regions

• Linking: more views work together

(24)

User interaction

• Dynamic projections and rotations

– Interactively and continuously moving through

subspaces

• Dynamic queries

– Visual interface (button and sliders)

– Incremental behavior (undo)

(25)

Problems

• Missing attributes

– Ignore

– Fill blanks with:

• a predefined constant

• a value extracted according to the inferred

distribution

(26)

Problems

• Large data sets

– Typical screens have one million pixels

– Subsampling

– Voxel/pixel bins

– Jittering

• Large number of attributes

(27)

Conclusions

• Human and computer skills can be integrated

with visual data mining

• Visualization may be useful for:

– understanding what is happening

– searching novel patterns

(28)

References (I)

D. A. Keim. “Visual Techniques for Exploring Databases”. Int. Conference on Knowledge Discovery in Databases, 1997.

D. A. Keim. “Information visualization and visual data mining”. IEEE Trans. on Visualization and Computer Graphics, jan 2002, vol. 8, no. 1, pp. 1-8

J. Van Wijk, R. Van Liere. “HyperSlice - Visualization of scalar

functions of many variables”. IEEE Visualization, 1993, pp.119-125.

P. C. Wong, A. H. Crabb, R. D. Bergeron. “Dual multiresolution HyperSlice for multivariate data visualization”. InfoVis 1996

D. A. Keim. “Pixel-oriented Database Visualizations”. SIGMOD RECORD, Special Issue on Information Visualization, 1996.

M. Ankerst, D. A. Keim, H.-P. Kriegel. “Circle Segments: A Technique for Visually Exploring Large Multidimensional Data Sets”. Visualization

(29)

References (II)

R. A. Becker, S. G. Eick, A. R. Wilks. “Visualizing Network Data”. IEEE Trans. on Visualization and Computer Graphics, mar 1995, vol. 1, no. 1, pp. 16-28

R. J. Hendley, N. S. Drew, A. M. Wood, R. Beale. “Narcissus: visualising information”. InfoVis 1995, p. 90

T. A. Keahey, E. L. Robertson (1996). “Techniques for non-linear magnification transformations”. InfoVis 1996

J. Lamping, R. Rao, P. Pirolli. “A focus+context technique based on hyperbolic geometry for visualizing large hierarchies”. CHI '95, pp. 401-408

J. D. Mackinlay, G. G. Robertson, S. K. Card. “The perspective wall: detail and context smoothly integrated”. CHI '91, pp. 173-176

References

Related documents

Visualization techniques, visual data mining, neural networks, decision tree, forecasting, business

43.2 Methodology of Visual Data Mining The data analyst typically specifies first some parameters to restrict the search space; data mining is then performed automatically by

Visualization techniques, visual data mining, neural networks, decision tree, forecasting, business

1) The first phase of educational data mining is to find the relationships between the data of educational environment using data mining techniques i.e.

Warehousing, Data Acquisition, Data Mining, Business Analytics,

Therefore, this paper proposes a new direction based on evaluation techniques for solving data mining tasks, by using six techniques: Statistics, Visualization, Clustering,

This paper provides an insight on various various techniques, tools andapproaches of data mining for assay of heart maladies.The use of different data mining tools and techniques

 Visual Data Mining: the process of discovering implicit but useful knowledge from large data sets using visualization techniques. Computer