Algorithm properties - Sparse Bayesian identification of polynomial NARX models

6.4 Sparse Bayesian identification of polynomial NARX models

6.4.2 Algorithm properties

In this section descriptions and examples are given in order to explain some of the properties of the algorithm. The examples used in this section refer to the identification of the system generated by the polynomial NARX model

yk =0.3yk−1+0.1uk−1+0.4yk−1yk−2+ek (6.78) where ekis a white noise sequence drawn from the normal distributionN (ek|0, σ2), where σ2₌_{0.01. In this case the model has been chosen for its structural simplic-} ity.

Model selection by the variational lower bound

In the previous section it was stated that the optimal model choice is taken to be the model that maximises the variational lower bound after it has reached con- vergence. This is justified by considering the Bayesian model selection procedure discussed in Section 3.8.3.

In Equation (6.79) the conditional dependencies on the model Ms were ne- glected. Explicitly including the conditional dependencies, Equation (6.79) can be written

p(Θ|x,Ms) = P(y|x,Θ,_p₍Ms)p(Θ|Ms)

y_|Ms) , (6.79)

which is in the same form as Equation (3.78). From the discussion of model selection in Section 3.8.3, considering the posterior distribution over the models

Msconditional on the data and applying Bayes’ theorem the posterior distribution over the model is given by

p(_Ms|y) = p(y|M_p₍s)p(Ms)

y) . (6.80)

The first term in the numerator on the right hand side of Equation (6.80) is the same as the marginal likelihood in Equation (6.79). Setting equal prior distri- butions p(_Ms)for each model and noting that the denominator is constant for a given data set the posterior is proportional to the marginal likelihood in Equation (6.79)

Algorithm 6.2The SVB-NARX Algorithm Initialise

T_L(_Q), TARD,a0, b0, c0, d0,

Initialise model structure to all basis functionsM0 ={Φm}_mM=1 s=0

start Procedure whileM >1

s =s+1

t =1

Calculate variational posteriors while_L(Q)t− L(Q)t−1 ≤T_L(Q)

t = t+1

update parameter estimates for modelMs−1 using Algorithm 6.1, calculateL(Q)t via Equation (6.67)

end while

SetL(Q)s=_L(Q)t Calculate{ARDs_}M

j=1via Equation (6.75) Calculate Ts

ARD via Equation (6.76) Perform pruning step

Initialise pruning terms set,M− ₌_∅,

form=1 :|Mi−1| ifARDs_j _≤T_ARDs

collect terms to prune,M− =_M−_∪Φj_, end if

end for

Update model structure

Set current model structure to Ms =Ms−1\ M− Set M =_|Ms|

end while

Select final model structure Set optimal modelM∗₌ _Ms∗

where s∗ ₌_argmax

s L(Q) s end Procedure

The VLB,L(Q)scalculated for each model is an approximation of the marginal

likelihood, p(y_|Ms). Equation (6.81) therefore provides the justification for using the VLB as a criterion for selecting final model structure.

Tuning parameters

The single tuning parameter of the algorithm is the resolution, r, whose value is set in advance by the modeller. It is named resolution because it defines the region of ARD values that are pruned from the model via Equation (6.76). Increasing the value of r leads to a higher resolution, resulting in less terms being pruned at each iteration, s, because a smaller portion of the range of ARD values is selected for pruning. Consequently, computation time will increase. Conversely, reducing the value of r increases the number of terms pruned at each iteration because a larger portion of the range of ARD values is selected for pruning.

It is to be noted that if r is chosen too small then correct model terms may be incorrectly pruned from the model. The effect of the tuning parameter is demonstrated through example via the structure detection of the system generated by Equation (6.82). Structure detection is performed on the test system using the SVB-NARX algorithm initialised with a0 = c0 = 1−2 and b0 = d0 = 1−4 the resolution is set as r = 25, 50, 75, 100. The correct model structure is identified for r = 50, 75, 100 but the algorithm fails for r = 25 as extra terms were included in

the final model, see Figure 6.4. Note that the peak variational lower bound value is lowest at r=25 when the incorrect model structure is identified and is constant for the other values or r.

For a given model and data set the variational lower bound is independent of the resolution that produced it. This allows for multiple algorithm runs with varying values of r that produce models which are directly comparable.

6.5 Results

In this section the SVB-ARD algorithm is demonstrated in order to assess the performance for the purpose of joint structure detection and parameter estimation. The algorithm is then applied to the identification of a DEA system in order to validate the performance of the algorithm on real data. For both the synthetic and the real case the algorithm is benchmarked against the FRO and SEMP algorithms. The benchmark example is also compared against LASSO.

Figure 6.4: The resolution r of the algorithm controls how many terms are re- moved at each iteration and therefore the total number of iterations performed. Variational lower bound against iteration number for r = 25, 50, 75, 100. The dot-

ted line indicates the maximum value of the variational lower bound for each value of r.

In document Bayesian Non-linear System Identification and Frequency Response Analysis with Application to Soft Smart Actuators (Page 155-158)