Multimedia Data Mining:
Multimedia Data Mining:
An Overview to Image Processing
An Overview to Image Processing
and
and
Machine Learning
Machine Learning
Zaheer Ahmad
Zaheer Ahmad
PhD Scholar
PhD Scholar
[email protected]
[email protected]
Department of Computer Science
Department of Computer Science University of Peshaw
University of Peshawar
ar
2/16/2011 1
Agenda
Agenda
•
•
Multimedia Data Mining
Multimedia Data Mining
••
Image Data Mining and
Image Data Mining and Image Processing
Image Processing
••
Machine Learning
Machine Learning
••
Learning Techniques and tools
Learning Techniques and tools
••
Neural Networks and its
Neural Network
s and its types
types
••
Training (Learning) of Neural Network
Training (Learning) of Neural Network
2/16/2011 2
Multimedia Data mining
Multimedia Data mining
•
•
Multimedia Data Mining i
Multimedia Data Mining is an interdisciplinary
s an interdisciplinary
and multidisciplinary field, used to
and multidisciplinary field, used to
intelligen
intelligently
tly retriev
retrieve and
e and search multimedia
search multimedia
contents.
contents.
•
•
A variety of techniques, from machine
A variety of techniques, from machine
learning,
learning, sta
statistics, databases, knowledge
tistics, databases, knowledge
acquisition, data visualization, image an
acquisition, data visualization, image analysis,
alysis,
high performance computing, and
high performance computing, and
knowledge-based systems are used in MMM
based systems are used in MMM
2/16/2011 3
2/16/2011 4
MACHINE LEARNING
MACHINE LEARNING
2/16/2011 5
Data for MMM
Data for MMM
2/16/2011 6 2/16/2011 6Data a database ?
Data a database ?
• •No --- mostly
No --- mostly
••
Web Image, Audio, Video
Web Image, Audio, Video
••
Live Streaming
Live Streaming
••
Geo Sensors data
Geo Sensors data
••
But yes….
But yes….
••
video database
video database
••
•
The word multimedia refers to a combination
The word multimedia refers to a combination
of multiple media types together
of multiple media types together
••
Multimedia Data Type
Multimedia Data Type
–
–
Any Type of information medium that can be
Any Type of information medium that can be
represented, processed, stored and transmitted
represented, processed, stored and transmitted
over network in digital form
over network in digital form
–
–
Multi-lingual text, numeric, images, videos, audio,
Multi-lingual text, numeric, images, videos, audio,
graphical, temporal, relational and
graphical, temporal, relational and categoric
categorical
al
data
data
2/16/2011 7
Definition
Definition
•
•
MMM is a subfield of data mining that deals
MMM is a subfield of data mining that deals
with an extraction of implicit knowledge,
with an extraction of implicit knowledge,
multimedia data relashionships, or other
multimedia data relashionships, or other
patt
patterns not
erns not explicitly stor
explicitly stored in
ed in multimedia
multimedia
databases
databases
–
–
Used for multimedia information system and
Used for multimedia information system and
retriev
retrieval of
al of content based image/audio/video and
content based image/audio/video and
provide search and efficient storage organization
provide search and efficient storage organization
2/16/2011 8
Media Types
Media Types
•• 0-dimensional data: This type 0-dimensional data: This type of the of the data is the regulardata is the regular,,
alphanumeric data. A typical example is the
alphanumeric data. A typical example is the text data.text data.
•
• 1-dimensional data: This type of the 1-dimensional data: This type of the data has one dimensiondata has one dimension
of a space imposed
of a space imposed into them. A typical example of this typeinto them. A typical example of this type of the data is the
of the data is the audio dataaudio data
•
• 2-dimensional data: This type of the 2-dimensional data: This type of the data has two dimensionsdata has two dimensions
of a space imposed
of a space imposed into them. Imagery data and graphics datainto them. Imagery data and graphics data are the
are the two common two common examples examples of this type of this type of dataof data
•
• 3-dimensional data: This type of the 3-dimensional data: This type of the data has threedata has three
dimensions of a space imposed
dimensions of a space imposed into them. Video data andinto them. Video data and
animation data are the two common examples of this type of animation data are the two common examples of this type of data
data
2/16/2011 9
Multimeimedia Data
Multimeimedia Data
•
•
Spatial Data
Spatial Data
–
– Generalize detailed geographic points into clusterdGeneralize detailed geographic points into clusterd
regions, such as business,
regions, such as business, residential, industrial, orresidential, industrial, or agricultural areas, according to land usage
agricultural areas, according to land usage
•
•
Image Data
Image Data
–
– Size, color, shape, texture, orientation, and relativeSize, color, shape, texture, orientation, and relative
postions and structure of the contained objects or regions postions and structure of the contained objects or regions in the image
in the image
•
•
Music data
Music data
–
– Summarize its melody: based on Summarize its melody: based on the approximate patthe approximate patterntern
that repeateldly occure in the segment that repeateldly occure in the segment
–
– Summarized Summarized its its type: based on type: based on its tone, its tone, tempo, tempo, or theor the
major musical insturment played major musical insturment played
2/16/2011 10
How Multimedia Data Mining System
How Multimedia Data Mining System
Works
Works
2/16/2011 11
Similarity Search in Multimedia data
Similarity Search in Multimedia data
•
•
Description based retrieval systems
Description based retrieval systems
–
–
Build indices and perform object retrieval based on
Build indices and perform object retrieval based on
image descriptions, such as keywords, captions, size
image descriptions, such as keywords, captions, size
and time of creation
and time of creation
–
–
Labor-intensive if performed manually
Labor-intensive if performed manually
––
Results are typically of poor quality if automated
Results are typically of poor quality if automated
•
•
Content Based Retrieval Systems
Content Based Retrieval Systems
••
Support retrieval based on the image content,
Support retrieval based on the image content,
such as color, histogram, texture, shape, objects
such as color, histogram, texture, shape, objects
and wavelet transforms
and wavelet transforms
2/16/2011 12
Multidimensional Analysis of
Multidimensional Analysis of
Multimedia Data
Multimedia Data
•
• Multimedia data CubeMultimedia data Cube
–
– Design and construct similar to that traditional data cubes fromDesign and construct similar to that traditional data cubes from
relational data relational data
–
– Contain additional dimensions and measures for multimediaContain additional dimensions and measures for multimedia
information such as color, texture, and shape information such as color, texture, and shape
•
• The database doesn’t The database doesn’t store images but their descriptorsstore images but their descriptors
–
– Feature Descriptor: a set of vectors for each visualFeature Descriptor: a set of vectors for each visual
characteristics characteristics
•
• Color Color Vector: Vector: contains the contains the color histogramcolor histogram •
• MFC(Most Frequent Color) VMFC(Most Frequent Color) Vector: Five ector: Five color centroidscolor centroids •
• MFO(Most Frequent Orientation) Vector: Five edge orientationMFO(Most Frequent Orientation) Vector: Five edge orientation
centroid centroid
–
– Layout Descriptor: Contains a color layout vector and an edgeLayout Descriptor: Contains a color layout vector and an edge
layout vector layout vector
2/16/2011 13
Typical Architecture of MMM
Typical Architecture of MMM
2/16/2011 14
Image Data Mining
Image Data Mining
Image and Machine Learning
Image and Machine Learning
2/16/2011 15
What is an image?
What is an image?
•
• An image is a two dimensionalAn image is a two dimensional
function, f(x,y), where x and y function, f(x,y), where x and y areare spatial coordinat
spatial coordinates, and es, and thethe amplitude of f at any pair of amplitude of f at any pair of coordina
coordinates (x,y) is tes (x,y) is called the intensitycalled the intensity or grey level of the image at that
or grey level of the image at that point.
17
17
Image Processing Stages
Image Processing Stages
Image Acquisition Image Acquisition Image Processing Image Processing Image Segmentation Image Segmentation Image Analysis Image Analysis Pattern Recognition Pattern Recognition
Analog to digital conversion Analog to digital conversion
Remove noise, Remove noise, improve contrast improve contrast ……
Find regions
Find regions (objects)(objects) in the image
in the image T
Take measurements ake measurements of of objects/relationships objects/relationships
Match the description with Match the description with similar description of known similar description of known objects (models)
19 19
Image Analysis
Image Analysis
Input Image Input Image Regions, objectsRegions, objects MeasurementsMeasurements
Image Image Analysis Analysis
Measurements:
Measurements:
-Size
-Size
-Position
-Position
-Orientation
-Orientation
-Spatial relationship
-Spatial relationship
-Gray scale or color intensity
-Gray scale or color intensity
Image segmentation
Image segmentation
The operation of
The operation of distinguishing important objects from thedistinguishing important objects from the background (or from unimportant object
background (or from unimportant objects) based on differents) based on different featur
feature of the e of the imageimage
Dark objects, bright background Dark objects, bright background
A
21
21
Image Segmentation
Image Segmentation
Input Image
Input Image RegionsRegions
Objects Objects Segmentation
Segmentation
-Clasify
-Clasify pixels in
pixels into
to groups
groups having
having similar
similar characteristics
characteristics
-T
-Two
wo techniq
techniques:
ues:
Region
Region segmentat
segmentation
ion
——Color/smoothness
Color/smoothness
Edge detection
Edge detection
Region Detection
Region Detection
2/16/2011 22
Histogram
Histogram
The data contained in a
The data contained in a digitaldigital image can be displayed as a image can be displayed as a histogr
histogram which is a plot am which is a plot of theof the pixel values ranging from black pixel values ranging from black to white versus the number of to white versus the number of pixels that have that particular pixels that have that particular value.
Edge through Gradient Information
Edge through Gradient Information
Edge Location Edge Location Edge Direction Edge Direction
ii )) ,, (( x xii yyii Neighborhood pixels Neighborhood pixelsSharpness Change / Contrast change Sharpness Change / Contrast change
25
25
Pa
Patt
ttern Recognition
ern Recognition (PR)
(PR)
- Measurements - Measurements - Stuctural - Stuctural descriptions descriptions Class identifier Class identifier Pattern Pattern Recognition Recognition
feature vector
feature vector
set of information data
set of information data
Content Based Image Retrieval
Content Based Image Retrieval
26
27
27
Fingerprint recognition system
Fingerprint recognition system
Fingerprint Fingerprint sensor sensor Fingerprint Fingerprint sensor sensor Feature Extractor Feature Extractor Feature Extractor Feature Extractor Feature Matcher Feature Matcher ID ID
Enrollment
Enrollment
Identification
Identification
Template Template database databaseMachine Learning
Machine Learning
A computer program is said to learn from
A computer program is said to learn from
experience ‘
experience ‘
E
E
’’
with respect to some class of
with respect to some class of
tasks
tasks
‘‘
T
T
’’
and performance measure
and performance measure
‘‘
P
P
’’,,
If its
If its
performance at tasks in
performance at tasks in T
T
, as measured by
, as measured by P
P
,,
improv
improves with
es with experience
experience E
E
..
Mitchell (1997): Mitchell (1997):
2/16/2011 28
Machine Learning
Machine Learning
Things
Things
learn when they change their behavior in
learn when they change their behavior in
a way that makes them perform better in the
a way that makes them perform better in the
future.
future.
From Witten and Frank (2000) From Witten and Frank (2000)
2/16/2011 29
Machine Learning
Machine Learning
•
•
ML is a scientific discipline that is concerned
ML is a scientific discipline that is concerned
with the design and development of algorithms
with the design and development of algorithms
that allow computers to
that allow computers to evolve behaviors based
evolve behaviors based
on empirical data, such as from sensor data or
on empirical data, such as from sensor data or
databases.
databases.
•
•
A major focus of machine l
A major focus of machine learning research is to
earning research is to
automatically learn to recognize complex
automatically learn to recognize complex
patterns and make intelligent decisions based
patterns and make intelligent decisions based
on data.
on data.
2/16/2011 30
•
•
the difficulty lies in the fact that the set of all
the difficulty lies in the fact that the set of all
possible behaviors giv
possible behaviors given all possible inp
en all possible inputs is
uts is
too large to be covered by the set of observed
too large to be covered by the set of observed
examples (training data).
examples (training data).
••
Hence the learner must generalize from the
Hence the
learner must generalize from the
given examples, so as to be able
given examples, so as to be able to produce a
to produce a
useful output in new cases
useful output in new cases
2/16/2011 31
Types of Learning
Types of Learning
2/16/2011 32
2/16/2011 32
•
•
Supervised Learning
Supervised Learning
Learning a mapping between an input x and
Learning a mapping between an input x and
a desired output y
a desired output y
•
•
Unsupervised Learning
Unsupervised Learning
Understanding the relationships between
Understanding the relationships between
data components
data components
•
•
Reinf
Reinforcement
orcement Learning
Learning
Learning to act in the
Learning to act in the envir
environment based on
onment based on
the delayed rewards
Classes of Learning
Classes of Learning
Machine learning is not only about
Machine learning is not only about classificat
classification.
ion.
Classification
Classification
learning
learning
: learn to put instances into
: learn to put instances into
pre-pre-
defined
defined
classes---competitive network:
classes---competitive network:
selects one unit
selects one unit in the output
in the output lay
layer (target class)---
er (target
class)---((Supervised Learning
Supervised Learning
))
Association learning
Association learning
: learn relationships between the
: learn relationships between the
Attributes---Attributes--- new response becomes associated
new response becomes associated
with a particular stimulus
with a particular stimulus
---
---
pattern associator
pattern associator
::
recalls input patterns based on similarity
recalls input patterns based on similarity
Clustering
Clustering
: discover classes of instances that belong
: discover classes of instances that belong
Together--- (
Together--- (Unsupervised
Unsupervised
))
self-organizing map
self-organizing map
(SOMs)
(SOMs)
2/16/2011 33
Learning Tools and Techniques
Learning Tools and Techniques
in
in
Short
Short
2/16/2011 34 2/16/2011 34Learning Rules
Learning Rules
•
•
if outlook = sunn
if outlook =
sunny and
y and humidity = high then play
humidity = high then play
= no
= no
•
•
if outlook = rainy and windy = true then play = no
if outlook = rainy and windy = true then play = no
••
if outlook = overcast then play = yes
if outlook = overcast then play = yes
••
if humidity = normal then play = yes
if humidity = normal
then play = yes
••
if none of the above then play = yes
if none of the above then play = yes
BEST But LABOURUS , HARD TO
BEST But LABOURUS , HARD TO
CODE
CODE
AND COVER
AND COVER
in Large Domains
in Large Domains
2/16/2011 35
Learning Decision Trees
Learning Decision Trees
•
•
Example: XOR (familiar from connectionist
Example: XOR (familiar from connectionist
networks).
networks).
Nodes represent decisions on attributes, leaves
Nodes represent decisions on attributes, leaves represent classificationsrepresent classifications..
Some how like Learning Rules
Some how like Learning Rules
2/16/2011 36
Principal component analysis
Principal component analysis
•
•
PCA is
PCA is applied as a
applied as a data red
data reduction or structure
uction or structure
detection method
detection method
•
•
combining two correlated variables into one
combining two correlated variables into one
factor
factor
•
•
PCA defined as an orthogonal linear
PCA defined
as an orthogonal linear
transformation that transforms the data to a new
transformation that transforms the data to a new
coordinate system such that the greatest variance
coordinate system such that the greatest variance
by any projection of the data comes to lie on the
by any projection of the data comes to lie on the
first coordinate (called the first principal
first coordinate (called the first principal
component), the second greatest variance on the
component), the second greatest variance on the
second coordinate
second coordinate
2/16/2011 37
Support Vector Machine
Support Vector Machine
•
•
Support
Support V
Vector Machine is
ector Machine is a classifier
a classifier derived
derived
from st
from statistic
atistical learning
al learning theory by Vladim
theory by Vladimir
ir
Vapnik and his co-workers
Vapnik and his co-workers
••
Used for large data set
Used for large data set
••
Good for text classification
Good for text classification
••
Work as multilayer perceptron
Work as multilayer perceptron
2/16/2011 38
Hidden Markov Model
Hidden Markov Model
2/16/2011 39
Genetic Algorithms
Genetic Algorithms
2/16/2011 40
Neural Networks
Neural Networks
41
Inputs
Inputs OutputsOutputs
Connection between cells
Connection between cells
NN A Brain-Inspired Model
NN A Brain-Inspired Model
in in out out 42 42Physical Structure of biological
Physical Structure of biological
neuron
neuron
2/16/2011 43
2/16/2011 43
•
•
Nerve cells are main processing element in our
Nerve cells are main processing element in our
central nervous system.
central nervous system.
•
•
Humans generally have about 100 billion nerve
Humans generally have about 100 billion nerve
cells in
cells in the entire nervous
the entire nervous sys
system.
tem.
•
•
Axon
Axon
and
and
dandroid
dandroid
are signal carrier away and
are signal carrier away and
toward cell body respectively
toward cell body respectively
•
•
Synapse
Synapse
is the point at which the axon of one cell
is the point at which the axon of one cell
inter
interconnects with a dendrite of another cel
connects with a dendrite of another celll
•
NN A Brain-Inspired Model
NN A Brain-Inspired Model
•
•
A neural network acquires knowledge through
A neural network acquires knowledge through
learning.
learning.
••
A neural network's knowledge is stored within
A neural network's knowledge is stored within
inter-neur
inter-neuron connection
on connection streng
strengths known
ths known as
as
synaptic weights.
synaptic weights.
••
The largest modern neural networks
The largest modern neural networks
achieve the complexity comparable to a
achieve the complexity comparable to a
nervous system of a fly.
nervous system of a fly.
44
Historical Background
Historical Background
•
•
1943 McCulloch and Pitts proposed the first
1943 McCulloch and Pitts proposed the first
computational models of neuron.
computational models of neuron.
•
•
1949 Hebb proposed the first learning rule.
1949 Hebb proposed the first learning rule.
••
1958 Rosenblatt’s work in
1958 Rosenblatt’s work in perceptrons.
perceptrons.
••
1969 Minsky and
1969 Minsky and Papert’s
Papert’s exposed limitation of the
exposed limitation of the
theory.
theory.
•
•
1970s Decade of dormancy for neural networks.
1970s Decade of dormancy for neural networks.
••
1980-90s Neural network return (self-organization,
1980-90s Neural network return (self-organization,
back-prop
back-propagation
agation algorithms, etc)
algorithms, etc)
45
NN Appli
NN Applica
cations
tions
•• Process Modeling and Process Modeling and ControlControl- Creating a neural network model for a physical- Creating a neural network model for a physical
plant then using that model to determine the best control settings for the plant.
plant then using that model to determine the best control settings for the plant.
•
• Machine Diagnosis-Machine Diagnosis- Detect when a machine has failed so that the system canDetect when a machine has failed so that the system can
automatically shut down the machine when this
automatically shut down the machine when this occurs.occurs.
•
• TTarget Recoarget Recognitiongnition- Military application which uses video and/or infrared image data to- Military application which uses video and/or infrared image data to
determine if an enemy target is present.
determine if an enemy target is present.
•
• Medical Diagnosis-Medical Diagnosis- Assisting doctors with their diagnosis by analyzing the reportedAssisting doctors with their diagnosis by analyzing the reported
symptoms and/or image data such as MRIs or X-rays.
symptoms and/or image data such as MRIs or X-rays.
•
• Target Marketing-Target Marketing- Finding the set of demographics which have the highest responseFinding the set of demographics which have the highest response
rate for a particular marketing campaign.
rate for a particular marketing campaign.
•
• Voice Recogntion-Voice Recogntion- Transcribing spoken words into ASCII text.Transcribing spoken words into ASCII text.
•
• Financial Financial ForecasForecastingting((StockStockpredication) - Using the historical data of a security topredication) - Using the historical data of a security to
predict the future movement of that security.
predict the future movement of that security.
•
• Quality ControlQuality Control - Attaching a camera or sensor to the end of a production process to- Attaching a camera or sensor to the end of a production process to
automatically inspect for defects.
automatically inspect for defects.
•
• Intelligent SearchIntelligent Search - An internet search engine that provides the most relevant content- An internet search engine that provides the most relevant content
and banner ads based on the users' past behavior.
and banner ads based on the users' past behavior.
•
• Fraud DetectionFraud Detection - Detect - Detect fraudulenfraudulent credit t credit card transactions and automatically declinecard transactions and automatically decline
the charge.
How NN Work ( Mathematically)
How NN Work ( Mathematically)
•
•
Linear and
Linear and Non Linear
Non Linear Pa
Patt
ttern / Classification
ern / Classification
••
Regress
Regression /
ion / Function
Function Estimation
Estimation
••
Curve Fitting
Curve Fitting
Why to USE NN
Why to USE NN
•
•
Parallel Processing
Parallel Processing
••
Fault tolerance
Fault tolerance
••
Self-organization
Self-organization
••
Generaliz
Generalization
ation ability
ability
••
Continuous adaptivity
Continuous adaptivity
47
48
48
Artificial
Artificial Neurons
Neurons
•
• Neural networks are made up of Neural networks are made up of nodes which havenodes which have –
– Input edges, each with someInput edges, each with some weight weight –
– Output edges (withOutput edges (with weightsweights)) –
– An activation level (a function of the inputs)An activation level (a function of the inputs) •
• Weights of edges can be positive or negative and may changeWeights of edges can be positive or negative and may change
over time (learning) over time (learning)
•
• The output function The output function is the weighted sum of the activation levelsis the weighted sum of the activation levels
of inputs of inputs
•
• The activation The activation level is level is a linear or a linear or non-linear transfnon-linear transfer functioner function “a”“a”
of the input : of the input :
•
Artificial Neural Networks
Artificial Neural Networks
Block Diagram
Block Diagram
2/16/2011 49
Artificial Neural Networks
Artificial Neural Networks
Process
Process
2/16/2011 50
The Perceptron
The Perceptron
51 51
x x11 x x22 x xnn..
..
..
w w11 w w22 w wnn w wn+1n+1 Bias Bias x xn+1n+1=-1=-1a=
a=
bias+wbias+wii xxii y y 1 if 1 if aa 00 y= y= 0 if 0 if aa <<00{{
q
q
=w
=w
n+1 n+1 ••Bias , the extra weighBias , the extra weight connected to a t connected to a constanconstant is called the t is called the bias of bias of
the element the element
•
• It enablesIt enables to set the threshold equal to zeroto set the threshold equal to zero which help inwhich help in
calculation calculation
•
•To get an extra dimension for representationTo get an extra dimension for representation This meansThis means
that every
that every pointpoint in (n + 1)-dimensional weight space can bein (n + 1)-dimensional weight space can be associated with a
associated with a hyperplanehyperplane in (n + 1)-dimensional extended inputin (n + 1)-dimensional extended input space.
Logical Operations
Logical Operations
2/16/2011 52 2/16/2011 52 Threshold= 2 Threshold= 2 Threshold= 2 Threshold= 22/16/2011 53
2/16/2011 53
Threshold= 2 Threshold= 2
The first layer performs the two AND NOT's and the The first layer performs the two AND NOT's and the second layer performs the OR. Both Z neurons and second layer performs the OR. Both Z neurons and the Y neuron have a threshold of 2
the Y neuron have a threshold of 2 X
X11 XOR XXOR X22= (X= (X11 AND NOT XAND NOT X22) OR (X) OR (X22AND NOTAND NOT X
Linear Separability Problem
Linear Separability Problem
•
• If two classes of patterns can be separated by a decision boundary,If two classes of patterns can be separated by a decision boundary,
represented by the linear equation represented by the linear equation
then they are said to be linearly separable. The simple network can then they are said to be linearly separable. The simple network can correctly classify any patterns.
correctly classify any patterns.
•
• Decision boundary of linearly separablDecision boundary of linearly separable classes e classes can be determinedcan be determined
either by some learning procedures or by solving linear equation either by some learning procedures or by solving linear equation systems based on representative patterns of each classes
systems based on representative patterns of each classes
•
• If such a decision boundary does not exist, then the two classes areIf such a decision boundary does not exist, then the two classes are
said to be linearly inseparable. said to be linearly inseparable.
•
• Linearly inseparable problems cannot be solved by the simpleLinearly inseparable problems cannot be solved by the simple
network , more sophisticated architecture is needed. network , more sophisticated architecture is needed.
0 0 1 1
n n ii x xiiwwii b b 54 54•
•
Examples of linearly separable classes
Examples of linearly separable classes
--
LogicalLogical ANDAND functionfunctionpatterns (bipolar) decision boundary patterns (bipolar) decision boundary
x1 x1 x2 x2 y y w1 w1 = = 11 -1 -1 -1 -1 -1 -1 w2 w2 = = 11 -1 -1 1 1 -1 -1 b b = = -1-1 1 -1 -1 1 -1 -1 qq = 0= 0 1 1 1 1 11 -1 + x1 + x2 = 0-1 + x1 + x2 = 0 - Logical
- Logical OROR functionfunction
patterns (bipolar) decision boundary patterns (bipolar) decision boundary
x1 x1 x2 x2 y y w1 w1 = = 11 -1 -1 -1 -1 -1 -1 w2 w2 = = 11 -1 -1 1 1 1 1 b b = = 11 1 1 -1 -1 11 qq = 0= 0 1 1 1 1 11 1 + x1 + x2 = 01 + x1 + x2 = 0 xx o o o o o o x: class I (y = 1) x: class I (y = 1) o: class II (y = -1) o: class II (y = -1) xx xx o o xx x: class I (y = 1) x: class I (y = 1) o: class II (y = -1) o: class II (y = -1) 55 55 Equa
•
•
Examples of linearly inseparable classes
Examples of linearly inseparable classes
--
LogicalLogical XXOROR (exclusive OR) function(exclusive OR) function patterns (bipolar) decision boundary patterns (bipolar) decision boundaryx1 x2 y x1 x2 y -1 -1 -1 -1 -1 -1 -1 1 1 -1 1 1 1 1 -1 -1 11 1 1 -1 1 1 -1 o o xx o o xx x: class I (y = 1) x: class I (y = 1) o: class II (y = -1) o: class II (y = -1) 56 56
Multilayer NN
Multilayer NN
•
•
Neural Net for Nonlinear Classification
Neural Net for Nonlinear Classification
••
Combination of Perceptron
Combination of Perceptron
••
Back propagation learning
Back propagation learning
57
What do each of
What do each of the layer
the layers do?
s do?
1st layer draws 1st layer draws linear boundaries linear boundaries 2nd layer combines 2nd layer combines the boundaries the boundaries
3rd layer can generate 3rd layer can generate
arbitrarily complex boundaries arbitrarily complex boundaries
Multilayer FFNN
Multilayer FFNN
A NN with one
A NN with one or more than one hidden
or more than one hidden layer
layerss
58
Back propagation Algorithm
Back propagation Algorithm
••
Multiple outputs.
Multiple outputs.
••
Forward pass:
Forward pass:
••
Error calculation:
Error calculation:
••
Backward
Backward propaga
propagation:
tion:
••
No guarantee to in getting best possible
No guarantee to in getting best possible
weights after correcting.
weights after correcting.
••
Classifies inputs into multiple classes.
Classifies inputs into multiple classes.
•NN Training Data
NN Training Data
•
• TTraining Sraining Setet: this data set is used to adjust the weights on the: this data set is used to adjust the weights on the
neural network. neural network.
•
• Validation SetValidation Set: this data set is used to minimize overfitting.: this data set is used to minimize overfitting. –
– not adjusting the weights of the not adjusting the weights of the network with this data set,network with this data set, –
– just verifying that any increase in accuracy over the training data setjust verifying that any increase in accuracy over the training data set
actually yields an increase in accuracy over a data set that has not actually yields an increase in accuracy over a data set that has not been shown to the network before, or at least the network hasn't been shown to the network before, or at least the network hasn't trained on it (i.e. validation data set).
trained on it (i.e. validation data set).
–
– If the accuracy over the training data set increases, but If the accuracy over the training data set increases, but the accuracythe accuracy
over then validation data set stays the same or decreases, over then validation data set stays the same or decreases,
–
– then you're overfittithen you're overfitting your neural network and ng your neural network and you should stopyou should stop
training. training.
•
• Testing SetTesting Set: this data set is used only for testing the final solution in: this data set is used only for testing the final solution in
order to confirm the actual predictive power of the network. order to confirm the actual predictive power of the network.
2/16/2011 60
Neuron and Activation Functions
Neuron and Activation Functions
2/16/2011 61
Activa
Activation
tion Functions
Functions
2/16/2011 62
2/16/2011 62
These functions can be defined
These functions can be defined as follows.
as follows.
Step
Step
tt(x)
(x)
=
= 1
1 if
if x
x >=
>= t,
t, else
else 0
0
Sign(x)
Sign(x)
=
= +1
+1 if
if x
x >=
>= 0,
0, else
else -1
-1
Sigmoid(x)
Selection of Nodes
Selection of Nodes for
for
Neural Network
Neural Network
•
•
Input
Input Nodes----Image/dat
Nodes----Image/data size
a size
••
Output node---output binary
Output node---output binary
••
Middle Layer----o ooo oo
Middle Layer----o ooo oo
….
….
–
–
Keep
Keep middle layer
middle layer smaller to
smaller to Generalize
Generalize and not
and not
memorize
memorize
2/16/2011 63
64
64
Perceptro
Perceptron n Learning Algorithm:Learning Algorithm: Initialise weights and threshold. Initialise weights and threshold. Set
Set w w i i (t)(t), (0 <=, (0 <= i i <=<= nn), to be the weight), to be the weight i i atat time
time t t , and, and øø to be the threshold value in theto be the threshold value in the output node. Set
output node. Set w w 00to be -to be -øø, the bias, and, the bias, and x x 00 to be always 1.
to be always 1. Set
Set w w i i ( ( 00 ) ) to small random values, thusto small random values, thus initialising the weights and threshold. initialising the weights and threshold.
Present input and desired output Present input and desired output Present input
Present input x x 00,, x x 11,, x x 22, ...,, ..., x x nnand desiredand desired output
output d(t)d(t)
Calculate the actual output Calculate the actual output y(t)
y(t) == f f hh[[w w 00(t)x (t)x 00(t)(t) ++ w w 11(t)x (t)x 11(t)(t) + .... ++ .... + w w nn(t)x (t)x nn(t)(t)]] Adapts weights
Adapts weights w
w i i (t+(t+11 ) ) == w w i i (t)(t) ++ ñ[d(t)ñ[d(t) -- y(t)]x y(t)]x i i (t)(t) , where 0 <=, where 0 <= ññ <= 1 is a positive gain function that controls <= 1 is a positive gain function that controls the adaption rate.
the adaption rate.
Steps iii. and iv. are repeated until the iteration Steps iii. and iv. are repeated until the iteration error is less than a user-specified error
error is less than a user-specified error threshold or a predetermined number of threshold or a predetermined number of iterations have been completed.
iterations have been completed.
Perceptro
Perceptron n Learning Algorithm:Learning Algorithm: start: The weight vector w0 is start: The weight vector w0 is generated
generated randomlyrandomly,, set t := 0
set t := 0
test: A vector x 2 P [ N is selected test: A vector x 2 P [ N is selected randomly,
randomly,
if x 2 P and wt · x > 0 go to test, if x 2 P and wt · x > 0 go to test, if x 2 P
if x 2 P and wt · x and wt · x 0 go to add,0 go to add, if x 2 N and wt · x < 0 go to test, if x 2 N and wt · x < 0 go to test, if x 2 N
if x 2 N and wt · x and wt · x 0 go to 0 go to subtract.subtract. add: set wt+1 = wt + x and t := t + add: set wt+1 = wt + x and t := t + 1, goto test
1, goto test
subtract: set wt+1 = wt − x and t := subtract: set wt+1 = wt − x and t := t + 1, goto test
65 65
Neural Networks
Neural Networks
–
–
Training
Training
Backpropagation training cycle Backpropagation training cycle
Urdu OCR Input Data Example
Urdu OCR Input Data Example
feeded to FNN
feeded to FNN
66
2/16/2011 67
ouY
ouY
68 68Thank
Thank
References
References
•
•
Data Mining and Knowledge Discovery Series, Chapman &
Data Mining and Knowledge Discovery Series, Chapman &
Hall/CRC
Hall/CRC
•
•
Neural Networks
Neural Ne
tworks a Sy
a Systematic
stematic Approach
Approach
••
Matlab - development of neural network theory for artificial
Matlab - development of neural network theory for artificial
life-thesis, matlab and java code
life-thesis, matlab and java code
•
•