Self-Organizing Maps g g p (SOM) ( )
Ke Chen
Ke Chen
Outline Outline
I t d ti
• Introduction
• Biological Motivation Biological Motivation
• Kohonen SOM
• Learning Algorithm Vi li ti M th d
• Visualization Method
• Examples Examples
• Relevant Issues
• Conclusions
Introduction Introduction
• Self-organizing maps (SOM) Self organizing maps (SOM)
– SOM is a biologically inspired unsupervised neural network that approximates an unlimited number of input data by a finite set of nodes arranged in a grid of low-dimension, where neighbor nodes correspond to more similar input data.
– The model is produced by a learning algorithm that automatically orders the
i t t di i l id di t th i t l i il it
inputs on a one or two-dimensional grid according to their mutual similarity.
– Useful for clustering analysis and data visualization
Biological Motivation Biological Motivation
• Mapping two dimensional continuous inputs from sensory organ (eyes, ears, skin, pp g p y g ( y , , , etc) to two dimensional discrete outputs in the nerve system.
– Retinotopic map: from eye (retina) to the visual cortex.
Tonotopic map: from the ear to the auditory cortex – Tonotopic map: from the ear to the auditory cortex
• These maps preserve topographic orders of input.
• Biological evidence shows that the connections in these maps are not entirely “pre-g p y p programmed” or “pre-wired” at birth. Learning must occur after the birth to
create the necessary connections for appropriate topographic mapping.
Kohonen SOM
Kohonen SOM
Kohonen SOM
Competition
Kohonen SOM
Cooperation
Kohonen SOM
(see the algorithm on the next slide for details)
Adaptation
Learning Algorithm g g
neurons i and k
Visualization Method Visualization Method
• In 2D/3D dimensional space, neurons are visualized as changing In 2D/3D dimensional space, neurons are visualized as changing
positions in the weight space as learning takes place. Each neuron is described by the corresponding weight vector.
• Two neurons are connected by an edge if they are direct neighbors in the neural network lattice. For 2-D/3-D data, the lattice via weights can be displayed in the original data space.
• For high-dimension data, a unified distance matrix (U-matrix) is t t d t f ilit t th i li ti
constructed to facilitate the visualization
– distance between the neighboring neurons gives an approximation of the distance between different parts of the underlying data
distance between different parts of the underlying data
– depicted in an image, similar colors depict the closely spaced nodes and distinct colors indicate the more distant nodes
– groups of similar colors can be considered as a clusters, and the contrast parts as the boundary regions
Visualization Method Visualization Method
• Example: U-Matrix Example: U Matrix
Examples Examples
• Example 1: 1-D self-organizing map
Examples Examples
• Example 2: 2-D self-organizing map
Examples Examples
• Example 3: self-organizing map of synthetic data sets
f f l h f d ff
After convergence of SOM learning, we achieve SOMs for different
data distributions
Examples Examples
• Example 4: Taxonomy of animals p y
Animal names and their attributes
Dove Hen Duck Goose Owl Hawk Eagle Fox Dog Wolf Cat Tiger Lion Horse Zebra Cow Small 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0
A grouping with SOM according to similarity has emerged
is
has
Small 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0 Medium 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
Big 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 2 legs 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 4 legs 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 Hair 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 Hooves 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
Mane 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 Feathers 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
peaceful
likes to
Feathers 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 Hunt 0 0 0 0 1 1 1 1 0 1 1 1 1 0 0 0
Run 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 Fly 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 Swim 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Examples Examples
• Example 5: Macroeconomical data analysis
l ( ) f l ( ) ll ( )
Factors: annual increase (%), infant mortality (‰), illiteracy ratio (%),
school attendance (%), GIP, annual GIP increase (%) (1990)
Examples Examples
• Example 5: Macroeconomical data analysis (cont.)
l d h d h d ff
Applying PCA and SOM to this data set, we achieve different groupings
Relevant Issues Relevant Issues
• Training: order phase vs. convergence phase
– Order phase
• There is a topological ordering of weight vectors.
k f l h
• It may take 1000 or more iterations of SOM algorithm.
• The choice of the parameter values is important.
• With a proper initial setting of the parameters the neighborhood of the
• With a proper initial setting of the parameters, the neighborhood of the winning neuron includes almost all neurons in the network, then it shrinks slowly with time.
– Convergence phase
• Fine tune the weight vectors.
M b l 500 i h b f i h k
• Must be at least 500 times the number of neurons in the network ֜ thousands or tens of thousands of iterations.
• Choice of parameter values:Choice of parameter values:
– η(t) maintained on the order of 0.01.
– Neighborhood function such that the neighbor of a BMU contains only the