Selection of regressors - Polynomial models and Design of Experiments

2.2 Polynomial models and Design of Experiments

2.2.1 Selection of regressors

To find the best trade-off between bias and variance error, a selection strategy is applied. The selected regressors must be able to approximate the unknown function, while avoiding redundant regressors. Selecting the n significant regressors of eq. (2.32), the polynomial approximation can be written as

f .u/ D w1x1C w2x2C : : : C wnxn (2.38)

with

xi 2 M D˚1; u1; u2; : : : ; up; u2₁; u1u2; : : : ; uo_p : (2.39)

M_{denotes the set of potential regressors derived from the Taylor series, from which the n signific-} ant regressors need to be selected. It has the size jMj D k. The number of elements k depends on the approximation order o and the dimensionality of the input p. Since the regressors are motivated by a Taylor series, the summed up order of the interaction terms must be less or equal to the order o. For example, the summed up order of the interaction term u2

1u2is three, why such an interaction

term is only included if o 3.

The applied size of the order o depends on the degree of non-linearity of the process. If no a-priori information about the non-linearity of the process is given, odd degrees for o are often superior to even degrees [50]. This is due to asymptotic boundary effects, since a process often performs in opposite directions for large positive and negative values of an input. For example, a low air mass flow rates implies high soot emissions, while a high air mass flow rates implies low soot emissions. For the combustion models presented in Chap. 3, an order of o D 3 is applied.

Criterion of fit

The utilised criterion of fit to select the n significant regressors is Mallows’ Cp-statistic. It is an

estimate of the scaled sum of squared errors J=2_{, see [97], and under the assumption of i.i.d.}

measurement errors with zero mean a general formulation is given as in [25] by

Cp D

iD1.y.i / y.i //O 2

O2 N C 2.n C 1/: (2.40)

y.i /is the measured output at u.i/, Oy.i/ the modelled output at u.i/, N the size of the dataset and O2 _{the estimated noise variance. The criterion consists of a term for the residual sum of squares,}

which is presented by the last term. The residual sum of squares decreases with additional model parameters and is therefore accounting for the bias error. The penalty term on the other hand in- creases with the number of parameters and is accounting for the variance error. The normalisation of the residual sum of squares by the noise variance O2_{is applied to relate these errors to each other.}

The estimation of the noise variance is given by the sum of squared errors, with the modelled output O

yk.i /using all k regressors, divided by the degrees of freedom,

O2 D PN

iD1.y.i / yOk.i // 2

N k : (2.41)

If the order o is chosen sufficiently large, the bias error of Oyk.i /can be neglected. Then the sum

of squared errors composes only a variance error, why, under the assumption of i.i.d. Gaussian measurement errors, eq. (2.41) gives an unbiased estimation of the variance error [35, 50]. Since a degree of freedom is required for this variance estimation, an one is added in eq. (2.40) to the number of model parameters n [25].

Besides the formulation as in eq. (2.40), there is also an adjusted formulation of Mallows’ Cp-

statistic given in [40]

N C_p D

iD1.y.i / y.i //O 2

O2 N C 2n C

2.k n C 1/

N k 3 : (2.42)

In this formulation the variance estimation is not regarded with an additional degree of freedom as in eq. (2.40), but the criterion is adjusted for the case if the number of measurements N is not much larger than the number of potential regressors k. Since for the regarded cases in this thesis, the number of measurements N is considerably larger than the number of potential regressors k, the definition as in eq. (2.40) can be applied. It is however recommended to use the definition as in eq. (2.42) in other cases.

There are alternative criteria of fit to Mallows’ C_p-statistic, such as the information criteria AIC,

AICcor BIC [4, 25, 161]. They do also introduce a regularisation by adding a penalty term, depend-

ing on the number of parameters n, to the sum of squared errors. They are however more difficult to interpret as they describe a relative difference between probability density functions. Other useful criteria are cross-validation, PRESS-statistic and ridge-regression [50, 60, 112]. Cross-validation splits the training dataset into several datasets, leaves then one dataset out for model training and validates the identified model on the left out dataset. This is repeated with each dataset being applied once for validation such that the model error is given by the averaged validation error. The PRESS-statistic is a cross-validation where the left out datasets consist of one point. Although the formulation given in [112] eases the calculation of the PRESS-statistic, these criteria are computationally intensive. The ridge-regression on the other hand suffers from a heuristic choice of a tuning parameter, which needs to be determined a-priori for model identification. Due to the individual drawbacks, Mallows’ C_p-statistic is applied here. The programmed modelling algorithm can however easily be modified, using one of the alternative criteria.

Selection algorithm

The criterion of fit as in eq. (2.40) enables to compare models with different numbers of regressors. To determine the model with the best regressor set, all possible models need to be identified and compared. This is already for a moderate size of M not feasible, why a heuristic selection al- gorithm is applied. It is a combination of a forward selection, backward selection and replacement

of regressorsand is depicted in Fig. 2.9. The algorithm starts with the empty set and the forward

selection picks the regressors with the highest contribution to the model error. When several parameters are picked, it is possible that an early picked regressor has no longer a significant contribution to the model. Such an regressor is then eliminated by the backward selection. The replacement of regressors enables to exchange a regressor from the set of picked regressors A with a regressor from the set of potential regressors M, if this has a higher contribution to the model. The individual steps are called several times during the selection algorithm. The frequency and order of the steps are adapted to the number of potential regressors, such that the computationally expensive steps, such as the replacement of regressors, are not called in every selection loop. The individual selection steps are also discussed in more detail in [102]. .

An alternative common used selection algorithm is orthogonal least squares [28, 29, 102]. It is fast to calculate, but since the results are identical to a simple forward selection, it is less flexible than the selection algorithm introduced in Fig. 2.9. Another approach applying orthogonal regressors is the principal component analysis [68, 71]. There, orthogonal linear combinations of regressors are derived from a singular value decomposition and a forward selection is performed with regard to these linear combinations of regressors, see [198]. The obtained model qualities are comparable to the selection algorithm of Fig. 2.9, but the principal component analysis performs faster due to the orthogonal construction of regressors. A drawback is however that after re-transformation of the linear combinations, all coefficients in eq. (2.32) are in general unequal to zero such that no structural information is given. Hence, if the model quality is of interest rather than the model structure, the principal component analysis can be applied, otherwise the selection algorithm of Fig. 2.9 is recommended.

In document Emission Modelling and Model-Based Optimisation of the Engine Control (Page 36-38)