• No results found

Lecture 17-Classification by Backpropagation-M

N/A
N/A
Protected

Academic year: 2020

Share "Lecture 17-Classification by Backpropagation-M"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

CSC479

Data Mining

Lecture # 17

Classification by

Backpropagation

(2)

Brain

A marvelous piece of

architecture and design.

In association with a

nervous system, it

controls the life patterns,

communications,

interactions, growth and

development of

(3)

There are about 10

10

to 10

14

nerve cells

(called neurons) in an adult human brain.

Neurons are highly connected with each

other. Each nerve cell is connected to

hundreds of thousands of other nerve cells.

Passage of information between neurons is

slow (in comparison to transistors in an IC). It

takes place in the form of electrochemical

signals between two neurons in milliseconds.

Energy consumption per neuron is low

(4)

Look more like some

blobs of ink… aren’t they!

Taking a more closer look

reveals that there is a

large collection of different

molecules, working together

coherently, in an organized

manner.

(5)

Cell Body Axon

Dendrites Nucleus

Axons from another

neurons Synapse

Synapse

(6)
(7)
(8)

An artificial neural network is an information

processing system that has certain performance

characteristics in common with biological neural

networks.

An ANN can be characterized by:

1.

Architecture:

The pattern of connections between

different neurons.

2.

Training or Learning Algorithms:

The method of

determining weights on the connections.

(9)

There are two basic categories:

1.

Feed-forward Neural Networks

These are the nets in which the signals flow

from the input units to the output units, in a

forward direction.

They are further classified as:

1.

Single Layer Nets

2.

Multi-layer Nets

2.

Recurrent Neural Networks

These are the nets in which the signals can

(10)
(11)
(12)

Y

1

Y

m

X

1

X

n

w11

wnm

1

v1m v1n

w1n

w1m

1 1

(13)

Supervised Training

Training is accomplished by presenting a sequence of

training vectors or patterns, each with an associated

target output vector.

The weights are then adjusted according to a learning

algorithm.

During training, the network develops an associative

memory. It can then recall a stored pattern when it is

given an input vector that is sufficiently similar to a

vector it has learned.

Unsupervised Training

A sequence of input vectors is provided, but no traget

vectors are specified in this case.

The net modifies its weights and biases, so that the

(14)

14

Classification by Backpropagation

Backpropagation: A

neural network

learning algorithm

Started by psychologists and neurobiologists to develop

and test computational analogues of neurons

A neural network: A set of connected input/output units

where each connection has a

weight

associated with it

During the learning phase, the

network learns by

adjusting the weights

so as to be able to predict the

correct class label of the input tuples

Also referred to as

connectionist learning

due to the

(15)

15

Neural Network as a Classifier

Weakness

 Long training time

 Require a number of parameters typically best determined

empirically, e.g., the network topology or “structure."

 Poor interpretability: Difficult to interpret the symbolic meaning

behind the learned weights and of “hidden units" in the network

Strength

 High tolerance to noisy data

 Ability to classify untrained patterns

 Well-suited for continuous-valued inputs and outputs

 Successful on a wide array of real-world data

 Algorithms are inherently parallel

 Techniques have recently been developed for the extraction of

(16)

16

A Neuron (= a perceptron)

 The n-dimensional input vector x is mapped into variable y by

means of the scalar product and a nonlinear function mapping

j

-f

weighted

sum

Input

vector

x

output

y

Activation

function

weight

vector

w

å

w

0

w

1

w

n

x

0

x

1

x

n n i 0 For Example

y sign( w xi ij)

(17)

A Multi-Layer Feed-Forward Neural Network

 Given a unit, j in a hidden or output layer, the net input, Ij , to unit j is

 Given the net input Ij to unit j, then Oj , the output of unit j, is computed as

17

å

i ij i j j w O

I

j

I j

e

O

(18)

 Backpropagate the error: The error is propagated backward by

updating the weights and biases to reflect the error of the

network’s prediction. For a unit j in the output layer, the error Errj

is computed by

where Oj is the actual output of unit j, and Tj is the known target

value of the given training tuple.

 The error of a hidden layer unit j is

where wjk is the weight of the connection from unit j to a unit k in

the next higher layer, and Errk is the error of unit k.

 Weights are updated by the following equations, where is

the change in weight wij

 Biases are updated by the following equations, where is

the change in bias weight

18

)

)(

1

(

j j j

j

j

O

O

T

O

Err

i j ij

ij

w

l

Err

O

w

(

)

jk k k j

j

j

O

O

Err

w

Err

(

1

)

å

( )l Err Oj i

j j

j

(

l

)

Err

( )

l Err

j

j

(19)

19

How A Multi-Layer Neural Network Works?

 The inputs to the network correspond to the attributes measured

for each training tuple

 Inputs are fed simultaneously into the units making up the input

layer

 They are then weighted and fed simultaneously to a hidden layer

 The number of hidden layers is arbitrary, although usually only one

 The weighted outputs of the last hidden layer are input to units

making up the output layer, which emits the network's prediction

 The network is feed-forward in that none of the weights cycles

back to an input unit or to an output unit of a previous layer

 From a statistical point of view, networks perform nonlinear

(20)

20

Defining a Network Topology

First decide the

network topology:

# of units in the

input layer

, # of

hidden layers

(if > 1), # of units in each

hidden layer, and # of units in the

output layer

Normalizing the input values for each attribute measured

in the training tuples to [0.0—1.0]

One

input

unit per domain value, each initialized to 0

Output

, if for classification and more than two classes,

one output unit per class is used

Once a network has been trained and its accuracy is

(21)

21

Backpropagation

 Iteratively process a set of training tuples & compare the network's

prediction with the actual known target value

 For each training tuple, the weights are modified to minimize the

mean squared error between the network's prediction and the actual target value

 Modifications are made in the “backwards” direction: from the output

layer, through each hidden layer down to the first hidden layer, hence “backpropagation

 Steps

 Initialize weights (to small random #s) and biases in the network

 Propagate the inputs forward (by applying activation function)

 Backpropagate the error (by updating weights and biases)

(22)

Example 9.1

» The class label of the

training tuple is 1 and the learning rate l is 0.9

 Let the initial weights and bias values are

22

å

i ij i j j w O

I

j

I j

e

O

(23)

23

Example 9.1 (Cont…)

)

)(

1

(

j j j

j

j

O

O

T

O

Err

jk k k j

j

j

O

O

Err

w

Err

(

1

)

å

i j ij

ij

w

l

Err

O

w

(

)

j j

j

(

l

)

Err

(24)

Terminating condition:

Training stops when

All

w

ij

in the previous epoch are so small as

to be below some specified threshold, or

The percentage of tuples misclassified in the

previous epoch is below some threshold, or

A prespecified number of epochs has expired.

In practice, several hundreds of thousands of

epochs may be required before the weights

will converge.

(25)

 Efficiency of backpropagation: Each epoch (one interation through the

training set) takes O(|D| * w), with |D| tuples and w weights, but # of

epochs can be exponential to n, the number of inputs, in the worst case

 Rule extraction from networks: network pruning

 Simplify the network structure by removing weighted links that

have the least effect on the trained network

 Then perform link, unit, or activation value clustering

 The set of input and activation values are studied to derive rules

describing the relationship between the input and hidden unit layers

 Sensitivity analysis: assess the impact that a given input variable has

on a network output. The knowledge gained from this analysis can be represented in rules

25

References

Related documents

Some limit properties for information based model selection criteria are given in the context of unit root evaluation and various assumptions about initial conditions.. The

Specifically, we analyze whether corporate risk management decisions in the form of reinsurance purchases by the global property-liability insurance industry vary across countries

This unit will introduce learners to the basic analogue principles used in electronics, such as gain, loss and noise and the principles of a range of classes of amplifier.. The

Their network telescope had a much greater address space than previous studies, utilising five /12 IPv6 networks segments, which equates to a very large portion of the total

We show that a fast read-only transaction has to “write” to some server for otherwise, a server can miss the information that a stale value has been returned for some object by

For designing the required neural network JNNS - a freely available neural network simulator was applied. For adjusting weights we used the backpropagation with

Figure 4-40: Helical pipe pressure drop signal trace during bubbly flow 159 Figure 4-41: Straight pipe conductivity signal trace during slug

Ex- amples include using an Error Propagation Learning Back Propagation (EPLBP) neural network to retrieve soil moisture from simulated brightness temperature [5], the use of