CSC479
Data Mining
Lecture # 17
Classification by
Backpropagation
•
Brain
•
A marvelous piece of
architecture and design.
•
In association with a
nervous system, it
controls the life patterns,
communications,
interactions, growth and
development of
There are about 10
10to 10
14nerve cells
(called neurons) in an adult human brain.
Neurons are highly connected with each
other. Each nerve cell is connected to
hundreds of thousands of other nerve cells.
Passage of information between neurons is
slow (in comparison to transistors in an IC). It
takes place in the form of electrochemical
signals between two neurons in milliseconds.
Energy consumption per neuron is low
Look more like some
blobs of ink… aren’t they!
Taking a more closer look
reveals that there is a
large collection of different
molecules, working together
coherently, in an organized
manner.
Cell Body Axon
Dendrites Nucleus
Axons from another
neurons Synapse
Synapse
An artificial neural network is an information
processing system that has certain performance
characteristics in common with biological neural
networks.
An ANN can be characterized by:
1.
Architecture:
The pattern of connections between
different neurons.
2.
Training or Learning Algorithms:
The method of
determining weights on the connections.
There are two basic categories:
1.
Feed-forward Neural Networks
These are the nets in which the signals flow
from the input units to the output units, in a
forward direction.
They are further classified as:
1.
Single Layer Nets
2.
Multi-layer Nets
2.
Recurrent Neural Networks
These are the nets in which the signals can
Y
1Y
mX
1X
nw11
wnm
1
v1m v1n
w1n
w1m
1 1
Supervised Training
Training is accomplished by presenting a sequence of
training vectors or patterns, each with an associated
target output vector.
The weights are then adjusted according to a learning
algorithm.
During training, the network develops an associative
memory. It can then recall a stored pattern when it is
given an input vector that is sufficiently similar to a
vector it has learned.
Unsupervised Training
A sequence of input vectors is provided, but no traget
vectors are specified in this case.
The net modifies its weights and biases, so that the
14
Classification by Backpropagation
Backpropagation: A
neural network
learning algorithm
Started by psychologists and neurobiologists to develop
and test computational analogues of neurons
A neural network: A set of connected input/output units
where each connection has a
weight
associated with it
During the learning phase, the
network learns by
adjusting the weights
so as to be able to predict the
correct class label of the input tuples
Also referred to as
connectionist learning
due to the
15
Neural Network as a Classifier
Weakness
Long training time
Require a number of parameters typically best determined
empirically, e.g., the network topology or “structure."
Poor interpretability: Difficult to interpret the symbolic meaning
behind the learned weights and of “hidden units" in the network
Strength
High tolerance to noisy data
Ability to classify untrained patterns
Well-suited for continuous-valued inputs and outputs
Successful on a wide array of real-world data
Algorithms are inherently parallel
Techniques have recently been developed for the extraction of
16
A Neuron (= a perceptron)
The n-dimensional input vector x is mapped into variable y by
means of the scalar product and a nonlinear function mapping
j-f
weighted
sum
Input
vector
x
output
y
Activation
function
weight
vector
w
å
w
0w
1w
nx
0x
1x
n n i 0 For Exampley sign( w xi i j)
A Multi-Layer Feed-Forward Neural Network
Given a unit, j in a hidden or output layer, the net input, Ij , to unit j is
Given the net input Ij to unit j, then Oj , the output of unit j, is computed as
17
å
i ij i j j w O
I
j
I j
e
O
Backpropagate the error: The error is propagated backward by
updating the weights and biases to reflect the error of the
network’s prediction. For a unit j in the output layer, the error Errj
is computed by
where Oj is the actual output of unit j, and Tj is the known target
value of the given training tuple.
The error of a hidden layer unit j is
where wjk is the weight of the connection from unit j to a unit k in
the next higher layer, and Errk is the error of unit k.
Weights are updated by the following equations, where is
the change in weight wij
Biases are updated by the following equations, where is
the change in bias weight
18
)
)(
1
(
j j jj
j
O
O
T
O
Err
i j ij
ij
w
l
Err
O
w
(
)
jk k k j
j
j
O
O
Err
w
Err
(
1
)
å
( )l Err Oj i
j j
j
(
l
)
Err
( )
l Err
jj
19
How A Multi-Layer Neural Network Works?
The inputs to the network correspond to the attributes measured
for each training tuple
Inputs are fed simultaneously into the units making up the input
layer
They are then weighted and fed simultaneously to a hidden layer
The number of hidden layers is arbitrary, although usually only one
The weighted outputs of the last hidden layer are input to units
making up the output layer, which emits the network's prediction
The network is feed-forward in that none of the weights cycles
back to an input unit or to an output unit of a previous layer
From a statistical point of view, networks perform nonlinear
20
Defining a Network Topology
First decide the
network topology:
# of units in the
input layer
, # of
hidden layers
(if > 1), # of units in each
hidden layer, and # of units in the
output layer
Normalizing the input values for each attribute measured
in the training tuples to [0.0—1.0]
One
input
unit per domain value, each initialized to 0
Output
, if for classification and more than two classes,
one output unit per class is used
Once a network has been trained and its accuracy is
21
Backpropagation
Iteratively process a set of training tuples & compare the network's
prediction with the actual known target value
For each training tuple, the weights are modified to minimize the
mean squared error between the network's prediction and the actual target value
Modifications are made in the “backwards” direction: from the output
layer, through each hidden layer down to the first hidden layer, hence “backpropagation”
Steps
Initialize weights (to small random #s) and biases in the network
Propagate the inputs forward (by applying activation function)
Backpropagate the error (by updating weights and biases)
Example 9.1
» The class label of the
training tuple is 1 and the learning rate l is 0.9
Let the initial weights and bias values are
22
å
i ij i j j w O
I
j
I j
e
O
23
Example 9.1 (Cont…)
)
)(
1
(
j j jj
j
O
O
T
O
Err
jk k k j
j
j
O
O
Err
w
Err
(
1
)
å
i j ij
ij
w
l
Err
O
w
(
)
j j
j
(
l
)
Err
Terminating condition:
Training stops when
All
w
ijin the previous epoch are so small as
to be below some specified threshold, or
The percentage of tuples misclassified in the
previous epoch is below some threshold, or
A prespecified number of epochs has expired.
In practice, several hundreds of thousands of
epochs may be required before the weights
will converge.
Efficiency of backpropagation: Each epoch (one interation through the
training set) takes O(|D| * w), with |D| tuples and w weights, but # of
epochs can be exponential to n, the number of inputs, in the worst case
Rule extraction from networks: network pruning
Simplify the network structure by removing weighted links that
have the least effect on the trained network
Then perform link, unit, or activation value clustering
The set of input and activation values are studied to derive rules
describing the relationship between the input and hidden unit layers
Sensitivity analysis: assess the impact that a given input variable has
on a network output. The knowledge gained from this analysis can be represented in rules
25