• No results found

Self-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data

N/A
N/A
Protected

Academic year: 2021

Share "Self-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

Self-Organizing Maps g g p (SOM) ( )

Ke Chen

Ke Chen

(2)

Outline Outline

I t d ti

• Introduction

• Biological Motivation Biological Motivation

• Kohonen SOM

• Learning Algorithm Vi li ti M th d

• Visualization Method

• Examples Examples

• Relevant Issues

• Conclusions

(3)

Introduction Introduction

• Self-organizing maps (SOM) Self organizing maps (SOM)

– SOM is a biologically inspired unsupervised neural network that approximates an unlimited number of input data by a finite set of nodes arranged in a grid of low-dimension, where neighbor nodes correspond to more similar input data.

– The model is produced by a learning algorithm that automatically orders the

i t t di i l id di t th i t l i il it

inputs on a one or two-dimensional grid according to their mutual similarity.

– Useful for clustering analysis and data visualization

(4)

Biological Motivation Biological Motivation

• Mapping two dimensional continuous inputs from sensory organ (eyes, ears, skin, pp g p y g ( y , , , etc) to two dimensional discrete outputs in the nerve system.

– Retinotopic map: from eye (retina) to the visual cortex.

Tonotopic map: from the ear to the auditory cortex – Tonotopic map: from the ear to the auditory cortex

• These maps preserve topographic orders of input.

• Biological evidence shows that the connections in these maps are not entirely “pre-g p y p programmed” or “pre-wired” at birth. Learning must occur after the birth to

create the necessary connections for appropriate topographic mapping.

(5)

Kohonen SOM

Kohonen SOM

(6)

Kohonen SOM

Competition

(7)

Kohonen SOM

Cooperation

(8)

Kohonen SOM

(see the algorithm on the next slide for details)

Adaptation

(9)

Learning Algorithm g g

neurons i and k

(10)

Visualization Method Visualization Method

• In 2D/3D dimensional space, neurons are visualized as changing In 2D/3D dimensional space, neurons are visualized as changing

positions in the weight space as learning takes place. Each neuron is described by the corresponding weight vector.

• Two neurons are connected by an edge if they are direct neighbors in the neural network lattice. For 2-D/3-D data, the lattice via weights can be displayed in the original data space.

• For high-dimension data, a unified distance matrix (U-matrix) is t t d t f ilit t th i li ti

constructed to facilitate the visualization

– distance between the neighboring neurons gives an approximation of the distance between different parts of the underlying data

distance between different parts of the underlying data

– depicted in an image, similar colors depict the closely spaced nodes and distinct colors indicate the more distant nodes

– groups of similar colors can be considered as a clusters, and the contrast parts as the boundary regions

(11)

Visualization Method Visualization Method

• Example: U-Matrix Example: U Matrix

(12)

Examples Examples

Example 1: 1-D self-organizing map

(13)

Examples Examples

Example 2: 2-D self-organizing map

(14)

Examples Examples

Example 3: self-organizing map of synthetic data sets

f f l h f d ff

After convergence of SOM learning, we achieve SOMs for different

data distributions

(15)

Examples Examples

Example 4: Taxonomy of animals p y

Animal names and their attributes

Dove Hen Duck Goose Owl Hawk Eagle Fox Dog Wolf Cat Tiger Lion Horse Zebra Cow Small 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0

A grouping with SOM according to similarity has emerged

is

has

Small 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0 Medium 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0

Big 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 2 legs 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 4 legs 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 Hair 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 Hooves 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

Mane 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 Feathers 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0

peaceful

likes to

Feathers 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 Hunt 0 0 0 0 1 1 1 1 0 1 1 1 1 0 0 0

Run 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 Fly 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 Swim 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0

(16)

Examples Examples

Example 5: Macroeconomical data analysis

l ( ) f l ( ) ll ( )

Factors: annual increase (%), infant mortality (‰), illiteracy ratio (%),

school attendance (%), GIP, annual GIP increase (%) (1990)

(17)

Examples Examples

Example 5: Macroeconomical data analysis (cont.)

l d h d h d ff

Applying PCA and SOM to this data set, we achieve different groupings

(18)

Relevant Issues Relevant Issues

• Training: order phase vs. convergence phase

– Order phase

• There is a topological ordering of weight vectors.

k f l h

• It may take 1000 or more iterations of SOM algorithm.

• The choice of the parameter values is important.

• With a proper initial setting of the parameters the neighborhood of the

• With a proper initial setting of the parameters, the neighborhood of the winning neuron includes almost all neurons in the network, then it shrinks slowly with time.

– Convergence phase

• Fine tune the weight vectors.

M b l 500 i h b f i h k

• Must be at least 500 times the number of neurons in the network ֜ thousands or tens of thousands of iterations.

• Choice of parameter values:Choice of parameter values:

– η(t) maintained on the order of 0.01.

– Neighborhood function such that the neighbor of a BMU contains only the

(19)

Relevant Issues Relevant Issues

• SOM extension

– PSOM: continuous projection: interpolation between centroid l ti

locations

– disSOM: SOM on dissimilarity between objects; more general than distance Nonnegative Matrix Factorization

– Hierarchical SOM: from single to multiple layers for multi-scale – Hierarchical SOM: from single to multiple layers for multi-scale

data analysis

– Generative topographic map (GTM): a probabilistic counterpart of

(20)

Conclusions Conclusions

SOM i bi l i ll i i d l k f hi h

• SOM is a biologically inspired neural network for high dimensional data clustering and visualization.

• Its most important property is topology preservation.

• Learning gets involved in two phases: order vs. convergence

• It is no guarantee that SOM is always convergent and hence It is no guarantee that SOM is always convergent and hence the parameter tuning is needed.

• There are several variants or extensions, which tends to overcome the limitations of the SOM

overcome the limitations of the SOM.

• There are a number of successful applications of SOM.

References

Related documents

JACK STUD: A vertical structural member that does not span the full height of the wall and supports vertical loads and/or transfers lateral loads.. Jack studs are used to

When studying for university transfer degrees, choose classes from the following fields: Accounting American Sign Language Anthropology Art Astronomy Biology

An aim of this research was to seek the views of students, new graduates, field educators, managers and academics about how learning about practice in the field connects

Piotr Przybyłowski – Chairman of the Conference on behalf of the Gdynia Maritime University, Dean of the Faculty of Entrepreneurship and Quality Science.

The results showed that job satisfaction is a significant mediator in the relationship of autonomy with mental health (H4), task variety with mental health (H5) and task

Sample design: Multi-stage cluster sample stratified by Poland’s 16 provinces and proportional to population size and urban/rural population Mode: Face-to-face adults

2.3    Defining   the   ICT   Profession