6.4 Sparse Bayesian identification of polynomial NARX models
6.4.2 Algorithm properties
In this section descriptions and examples are given in order to explain some of the properties of the algorithm. The examples used in this section refer to the identification of the system generated by the polynomial NARX model
yk =0.3yk−1+0.1uk−1+0.4yk−1yk−2+ek (6.78) where ekis a white noise sequence drawn from the normal distributionN (ek|0, σ2), where σ2=0.01. In this case the model has been chosen for its structural simplic- ity.
Model selection by the variational lower bound
In the previous section it was stated that the optimal model choice is taken to be the model that maximises the variational lower bound after it has reached con- vergence. This is justified by considering the Bayesian model selection procedure discussed in Section 3.8.3.
In Equation (6.79) the conditional dependencies on the model Ms were ne- glected. Explicitly including the conditional dependencies, Equation (6.79) can be written
p(Θ|x,Ms) = P(y|x,Θ,p(Ms)p(Θ|Ms)
y|Ms) , (6.79)
which is in the same form as Equation (3.78). From the discussion of model selection in Section 3.8.3, considering the posterior distribution over the models
Msconditional on the data and applying Bayes’ theorem the posterior distribution over the model is given by
p(Ms|y) = p(y|Mp(s)p(Ms)
y) . (6.80)
The first term in the numerator on the right hand side of Equation (6.80) is the same as the marginal likelihood in Equation (6.79). Setting equal prior distri- butions p(Ms)for each model and noting that the denominator is constant for a given data set the posterior is proportional to the marginal likelihood in Equation (6.79)
Algorithm 6.2The SVB-NARX Algorithm Initialise
TL(Q), TARD,a0, b0, c0, d0,
Initialise model structure to all basis functionsM0 ={Φm}mM=1 s=0
start Procedure whileM >1
s =s+1
t =1
Calculate variational posteriors whileL(Q)t− L(Q)t−1 ≤TL(Q)
t = t+1
update parameter estimates for modelMs−1 using Algorithm 6.1, calculateL(Q)t via Equation (6.67)
end while
SetL(Q)s=L(Q)t Calculate{ARDs}M
j=1via Equation (6.75) Calculate Ts
ARD via Equation (6.76) Perform pruning step
Initialise pruning terms set,M− =∅,
form=1 :|Mi−1| ifARDsj ≤TARDs
collect terms to prune,M− =M−∪Φj, end if
end for
Update model structure
Set current model structure to Ms =Ms−1\ M− Set M =|Ms|
end while
Select final model structure Set optimal modelM∗= Ms∗
where s∗ =argmax
s L(Q) s end Procedure
The VLB,L(Q)scalculated for each model is an approximation of the marginal
likelihood, p(y|Ms). Equation (6.81) therefore provides the justification for using the VLB as a criterion for selecting final model structure.
Tuning parameters
The single tuning parameter of the algorithm is the resolution, r, whose value is set in advance by the modeller. It is named resolution because it defines the region of ARD values that are pruned from the model via Equation (6.76). Increasing the value of r leads to a higher resolution, resulting in less terms being pruned at each iteration, s, because a smaller portion of the range of ARD values is selected for pruning. Consequently, computation time will increase. Conversely, reducing the value of r increases the number of terms pruned at each iteration because a larger portion of the range of ARD values is selected for pruning.
It is to be noted that if r is chosen too small then correct model terms may be incorrectly pruned from the model. The effect of the tuning parameter is demon- strated through example via the structure detection of the system generated by Equation (6.82). Structure detection is performed on the test system using the SVB-NARX algorithm initialised with a0 = c0 = 1−2 and b0 = d0 = 1−4 the res- olution is set as r = 25, 50, 75, 100. The correct model structure is identified for r = 50, 75, 100 but the algorithm fails for r = 25 as extra terms were included in
the final model, see Figure 6.4. Note that the peak variational lower bound value is lowest at r=25 when the incorrect model structure is identified and is constant for the other values or r.
For a given model and data set the variational lower bound is independent of the resolution that produced it. This allows for multiple algorithm runs with varying values of r that produce models which are directly comparable.
6.5
Results
In this section the SVB-ARD algorithm is demonstrated in order to assess the performance for the purpose of joint structure detection and parameter estimation. The algorithm is then applied to the identification of a DEA system in order to validate the performance of the algorithm on real data. For both the synthetic and the real case the algorithm is benchmarked against the FRO and SEMP algorithms. The benchmark example is also compared against LASSO.
Figure 6.4: The resolution r of the algorithm controls how many terms are re- moved at each iteration and therefore the total number of iterations performed. Variational lower bound against iteration number for r = 25, 50, 75, 100. The dot-
ted line indicates the maximum value of the variational lower bound for each value of r.