Online LM Algorithm for Linear Neural Network

3.2 Estimation of Linear Neural Network using LM Algorithm

3.2.2 Online LM Algorithm for Linear Neural Network

Conventional LM algorithm is adapted here to work in sliding window batch mode for online parameter estimation. A suitable window size is first selected such that it covers half to one cycle of the lowest frequency oscillatory mode. The size of the window must be greater than the number of inputs fed to the neural network model. The LM is used to update (3.7)-(3.12) and trains the linear neural network multiple times for a single window. The flow chart describing this algorithm is illustrated in Appendix C.1. The neural network output with a single hidden layer can be expressed as:

y(k + 1) = V × Φ(W × [¯u, ¯x]^T)

= V × (W × X) (3.4)

The error vector ¯e over a window containing ws samples is given by:

y(·) is the actual output, ˆ

y(·) is the estimated output, and

ws is the number of samples in a window.

To calculate the error derivatives over an entire window, the weights to be updated (unknown parameters) are arranged in a form of a vector ¯p as follows:

p = [ _W_i1^T _W_i2^T _{·· W}_i(m+n)^T _V₁ _{· · ·} _V_N ]^T, i = 1, · · · , N (3.6) and the size is Np = N (m + n + 1). The corresponding error derivative for weight update equation can be written as:

J = ∂¯e

Here the gradient of 1^st row in terms of input weights for an error at (k + 1) instant is

given by:

From (3.7), arranging the derivative of first row into a matrix form:

JW =

Similarly output weights for an error at (k + 1):

∂ ˆy(k + 1)

Finally we deduce the gradient of first row of (3.7):

J =vec{J^T} J^T

vec{J_W^T} convert the matrix into a vector form.

In the same way, the gradient of all the remaining rows of (3.7) can be calculated. Once the derivatives are evaluated for each sample with respect to adjustable parameters, a stacked form of a matrix for (3.7) is yielded. The weight parameters are updated according to (2.29).

p_new= ¯p_old+ [J^TJ + χ_kI]⁻¹J^T¯e (3.12)

where χkis the learning rate and ‘¯e’ is the error vector over a window. In this application, one sees that the number of outputs of the neural network is one and the size of inputs are (n + m).

Parameters update are done online for each moving window, portrayed in Appendix C.

The weights of the neural network are stacked in a vector and initialized at the start of the first window. A window size (ws) is fixed and for each sample within the window, the output of the neural network is calculated. At each epoch within a window, the squared error over the window is compared with the squared error of the previous epoch. If the squared error in the present iteration is less than the previous one, the value of χk is de-creased and the weight update is accepted otherwise, the value of χkis increased without updating the weights. The iteration is continued for the window until the convergence criteria is satisfied.

The convergence is slower during first few windows while it becomes faster once the weight parameters are stabilized. Closer to the optimum solution, the LM algorithm performs similar to Gauss-Newton, providing faster convergence. Compared to RLS based linear approach, LM is better in terms of flexibility in the choice of initial guess and convergence.

In addition, LM can be used for nonlinear optimization unlike RLS.

Please note that a square wave is used as an excitation signal for all test cases in this thesis because it is easy to generate and have a strictly limited amplitude range. The frequency of square wave needs to be selected such that the system dynamics are adequately excited.

A rule of thumb is that the frequency of square wave should be approximately 0.16 of the system bandwidth, ensuring that most of square wave power is inside the system bandwidth [37]. In this work, estimation is done for a range of frequency and 0.2Hz is an appropriate frequency which provides the correct estimation of model parameters.

Test Case 3.2.1. To verify the convergence and accuracy of the proposed algorithm, the online LM is tested with the linear neural network structure. In this Chapter, the proposed methodology is illustrated on a 4−machine, 2−area power system. The exci-tation signal and power flow through the line 10 − 9 is used as a input output data to identify the weight parameters of linear NN through online LM algorithm. Disturbances are created through a 3−phase fault at bus 8 followed by a line outage of 8 − 9.

In this example the linear NN model is of 4^th order. The number of neurons used in the hidden layer are 10 and batch size is 15. A square wave of 0.2Hz is used as an excitation signal. Parameters used in this Test case are mentioned in Table 3.1. The results of estimation and error between actual and estimated output are shown in the Fig.3.2. We can observe that the weights of linear NN model converges within 2.5s, as exposed in the Fig. 3.3.

0 5 10 15 20 25 30

1300 1350 1400 1450 1500

time, s

Pline 10−9, MW

actaul output estimated output

0 5 10 15 20 25 30

−0.03

−0.02

−0.01 0 0.01 0.02 0.03

time, s

error

Figure 3.2: Identification of power system using neural network model with LM algorithm, 4−machine, 2−area system.

This example shows that the LM algorithm determines the parameters of NN within the specified limit − minimum time taken for prediction and convergence. On the basis of this built model, a linear controller can be developed to attain the desired response of

0 5 10 15 20 25 30

−10

−5 0 5 10

time, s

input weights

0 5 10 15 20 25 30

−2

−1 0 1 2

time, s

output weights

Figure 3.3: Identified parameters of linear neural network model with LM algorithm.

Table 3.1: Parameters used in the evaluation of Test cases:4.2.1,3.3.1,3.4.1and3.5.1

Parameter Description Value Eqn.

m no. of previous control inputs 4 (3.2) n no. of previous measurement 4 (3.2)

χ learning rate 0.1 (3.12)

α pole shifting factor 0.9 - 1.0 (2.46)

ng numerator order 3 (2.43)

nf denominator order 3 (2.43)

N no. of neurons in hidden layer 10 (3.1) ws no. of sample in a window 15 (3.5)

Ts sampling time 0.1 s

-Note that the test case studies carried out in this Chapter corresponds to a 4-machine, 2-area system (to note the specification of this generic power system see Section 1.4.1) and values of parameters used in getting all subsequent results, are given in the Table3.1.

In document Nonlinear self-tuning control for power oscillation damping (Page 59-64)