• No results found

3.3 Neural Networks

3.3.3 Neural network structures

3.3.3.1 Basic units

The basic processing element of NNs is often called a neuron (analogy with neu- rophysiology), unit, node, or sometimes cell. In this book, we will simply call this basic element a unit. Each unit has multiple inputs and a single output, as shown in Figure 3.2. Many of these basic processing elements may be considered to have two basic components, namely the summer and the activation function.

X X X X 1 1 2 3 q Σ S f(S) O

Figure 3.2: The basic elements of neural networks.

Summer

The summer (or also known as adder) sums up the input signals. The respective input signals, before being summed up by the summer, are weighted by the respective link or synapses of the neuron. In other ways, the summer is doing a linear combination of the weighted input signals.

40

Activation function

The activation function transforms the summer output to neuron output through a non-linear function. This transformation squashes (or limits) the amplitude of the output neuron to some finite values. There are a number of different activation func- tions. These activation functions could be classified as:

1. Differentiable/non-differentiable. 2. Pulse-like/step-like.

3. Positive/zero-mean.

Each of the above classifications are briefly described and examples given:

Classification 1 distinguishes smooth from sharp functions. Smooth functions are needed for some adaptation algorithms, such as backpropagation, whereas discontinuous functions are needed to give a true binary output. Examples of smooth functions are: sigmoid function, hyperbolic tangent function and radial basis function (RBF). An example of a sharp function is the hard-limiter or threshold transfer function.

Classification 2 distinguishes functions which only have a significant output value for inputs near to zero from functions which only change significantly around zero. For examples, the RBF can be pulse-like and the hard-limiter step-like.

Classification 3 refers to step-like functions. Positive functions change from 0 at 1to 1 at1; zero-mean changes from -1 at 1to 1 at1. The sigmoid function is a positive function and the hyperbolic tangent function is zero- mean. A hard-limiter function can either be positive or zero-mean function depending on how the limit is defined.

There are cases where the network topology consists of rule units; each of the units corresponds to a specific fuzzy rule. Therefore, all the units form a general fuzzy rule [218].

3.3.3.2 Network topology

There are basically two network topologies, namely feedforward neural network (FNN) and recurrent neural network (RNN). Figure 3.3 shows the basic structures of FNN and RNN. Both of these networks can have either fully connected or partially connected structures; and either single layer or multilayer. A single layer neural network (SNN) has no hidden layer, hence it has only an input layer and an output layer (sometimes also known as a two layers network). A multilayer neural network (MNN) has at least one hidden layer.

41

Feedforward Neural Network

Recurrent Neural Network

Inputs Outputs

neuron or unit

Outputs Inputs

Figure 3.3: Difference between feedforward and recurrent neural networks.

Feedforward neural networks

A FNN is completely feedforward with no past state of the network feeding back to any of its units. FNN is classified as fully connected, if every unit in the layer of the network is connected to every other unit in the adjacent forward layer. However, if some of the communication links (or sometimes called synaptic connections) are missing from the network, we say that the network is partially connected.

FNN is static mapping, so theoretically it is not feasible to control or identify dynamic systems. To extend this essentially steady state mapping to the dynamic domain would mean to adopt an approach similar to the linear theory of ARMA (autoregression moving average) modelling. Here, a time series of past real plant inputs and outputs values are used as inputs to the FNN with the help of tapped delay lines (TDL). The nonlinear dynamic plant in discrete time is

y p (k)=f(y p (k 1);:::;y p (k n y );u(k 2);:::;u(k n u )) (3.1)

wheref(:)is the unknown nonlinear function,y

pis the plant output,

uis the control signal,n

yand n

uare the number of past outputs and inputs of the plant depending on the plant order. To represent this plant, the NN model is fed with the past output and input values of the plant as follows (see Figure 3.4):

y m (k)=NN(y p (k 1);:::;y p (k n y );u(k 2);:::;u(k n u )) (3.2)

42

This assumes that the plant order is known and its states are measurable.

According to Levin [140], to represent the dynamics of the system sufficiently, at leastlpast measurements of the real plant output and input feeding back to the FNN input are required, whereln

y +n

u +1.

The assumptions that all the plant outputs and inputs are measurable and avail- able for feedback is sometimes unrealistic; for instance some plant outputs may not be accessible for sensors (such as rotor currents in a squirrel cage induction motor) or the required sensors may be too expensive or unreliable. Hence for such a system, the FNN may not be suitable and a RNN may be preferred. This will be discussed in the next subsection.

Recurrent neural networks

A RNN distinguishes itself from a FNN in that it has at least one feedback loop. It is claimed that the presence of feedback loops has a profound impact on the learning capability of the network and on its performance [188, 185, 240, 133]. In the sense of control theory, the feedback loop makes a RNN a nonlinear dynamic system. The feedback loops commonly involve unit delays if dealing with discrete-time systems, or integrators in the continuous-time case. RNNs may be preferred to FNNs when:

The measured plant outputs are highly corrupted by noise.

The dynamics of the nonlinear process are complex and unknown.

Direct state feedback is impossible and only partial measurement is available from the plant.

There are apparently many structures of RNN using different combinations of feeding back the states to the units in each layer. In RNN, the more general is the structure, therefore more feedback interconnections (or fully-connected recurrent neural network (FRNN)), the ‘richer’ is the dynamic representation. Hence, FRNNs are sometimes said to have global dynamic representation. The more general types are the Jordan, Elman and Williams-Zipser networks. In a Jordan network, the past output values of the network are fed back into the hidden units or the input layer. In the Elman network, the past values of the hidden units are fed back into themselves [75, 134].

One of the problems in using FRNNs is the stability issue [127, 222]. Here, the topic is addressed by looking at the recurrent weights. FRNNs are normally trained by using the BP algorithm. To improve the stability of FRNNs, it is proposed here that the total dynamic feeding back has to be less than 1. Hence, by considering the FRNN of an Elman network trained by a BP algorithm,

0< p X w rij (k)f 0 hj (S hj (k 1))<1 (3.3)

43

wherew rij

(k)is the recurrent weight at time stepk,pis the number of hidden units andf

hjis the activation function of the

jth hidden unit, where usually the sigmoid or hyperbolic tangent function is used. Sincejf

0 hj;max

j =1,w

rijhas to be bounded as follows to ensure FRNN stability:

0<w rij < 1 p (3.4)

Single layer and multilayer neural networks

NNs in feedforward or recurrent structures can be connected in a single layer neural networks (SNNs) or multilayer neural networks (MNNs). The radial basis function network (RBFN), B-spline network, functional-link network (FLN), CMAC, lattice associative memory network, Adaline and perceptron are feedforward SNNs. Mul- tilayer perceptron (MLP) is a feedforward MNN. Boltzmann machine and bidirec- tional associative memories (BAM) could be classified as recurrent MNNs. Adaptive resonance theory (ART), Kohonen and Hopfield networks could be classified as re- current SNNs.

Both the MNN and SNN are universal approximators. A MNN with one hidden layer and sufficient hidden units can approximate any arbitrary nonlinear function [86, 62, 103, 190, 104, 87, 114]. Although a MNN with two hidden layers may give a better approximation for some specific problems [213], DeVilliers and Barnard [68] have demonstrated that a MNN with 2 hidden layers are more prone to fall into local minima.

A SNN is sometimes categorized as locally generalizing [6]. The network is considered local since only a small subset of adaptable parameters can potentially affect the network output in a local region of the input space. The MNN is some- times categorized as globally generalizing. This is because one or more adaptable parameters in the network can potentially affect the network output at every point in the input space.