5.4 Model-based smoke limitation
6.1.2 Computing time
To compare the computational effort for a model simulation, the required floating point operations (FLOPs) are counted for each model structure. It is distinguished between multiplications ’’, additions ’+’ and evaluations of an exponential function ’exp’.
In the following, the required FLOPs for an evaluation of one local model Pmnox;j.u/ are regarded.
Besides these FLOPs, there are additional FLOPs necessary to determine the global model output P
mnox.z; u/, see eq. (6.1) and eq. (3.18). To compare the four model structures, the FLOPs required
for one simulation of the global Pmnox.z; u/are then summarised and compared.
Look-up tables
The required steps for an evaluation of a two-dimensional look-up table are reviewed in the fol- lowing. At first, the four grid points in the lattice structure surrounding the input u need to be determined. Depending on the applied grid point distributions, several comparisons might be ne- cessary. Since there are efficient algorithms to find the surrounding grid points, see [145], these operations are not considered in the following.
Once the four surrounding grid points are determined, the bilinear interpolation is computed. To determine the weights for the bilinear interpolation ˆi.u/, see eq. (2.1), the area interpolation is
applied, see Fig. 2.1 b). Therefore, eq. (2.3) is evaluated for all four surrounding grid points. An evaluation requires two additions and two multiplications for each grid point, which makes eight additions and eight multiplications for one look-up table. The multiplication in the denominator in eq. (2.3b) is saved, since the area of a grid element is a constant. Given the weights for the grid points, the look-up table output is determined by multiplying the weights with the corresponding grid point heights and summing them up, see eq. (2.1). Therefore, another four multiplications and three additions are necessary. The required FLOPs for an evaluation of a two-dimensional look-up table are therefore
C W 8 C 3
W 8 C 4 (6.11)
expW
Since the look-up table structure presented in Fig. 6.1 consists of three two-dimensional look-tables, three of these evaluations and two more multiplications are required to calculate the output for one local model Pmnox;j.u/.
LOLIMOT
LOLIMOT adapts to the systems non-linearities by axis orthogonal partitions in the input space. In each partition an affine model is identified and the partitions are automatically determined by the tree construction algorithm, see Fig. 2.11. Depending on the non-linearity of the system, several
partitions might be necessary for a satisfying model quality. The number of model partitions directly influences the computing time.
The output of a local LOLIMOT model with p inputs and ML partitions is given by eq. (6.6). An
evaluation of eq. (6.6) requires for each of the ML partitions, one multiplication by the weighting function, p multiplications of the inputs by their parameters and p additions inside the brackets. Furthermore, there are ML 1additions due to the sum over the ML partitions. Thus, the required
FLOPs for an evaluation of eq. (6.6) are
C W ML .p C 1/ 1
W ML .p C 1/ (6.12)
expW
Furthermore, there are additional FLOPs necessary for the evaluation of the ML weighting func-
tions, ˆLOLIMOT;i.u/. For the Gaussian weighting function, see eq. (2.48), these additional FLOPs
are
C W 2p MLC 1
W .2p C 1/ MLC 1 (6.13)
expW ML
The regarded NOx model has p D 4 local model inputs and in average ML D 8:38partitions for
a local model. For an implementation in the ECU, especially the evaluations of the Gaussian are critical. These evaluations can be avoided if alternative weighting functions, such as polynomial approximations of the Gaussian [119], are applied.
LOPOMOT
Due to the increased local model complexity of LOPOMOT, there are no further partitions neces- sary to model Pmnox;j.u/. Thus, the local model simplifies to a local polynomial, see eq. (6.7), and
the computational effort only depends on the number of local regressors n. There are less than n multiplications necessary to compute the regressors from the inputs. Linear regressors require no multiplication, regressors such as u1u2require one multiplication and regressors such as u21u2are
computed by multiplying u1u2 by u1. Given the regressors, there are another n 1 multiplica-
tions necessary to multiply the regressors by their coefficients (the offset needs no multiplication). Finally the terms are summed up to the local model output such that the required FLOPS are
C W n
W n C n 1 (6.14)
expW
The number of FLOPs are given as an upper bound. n is the number of potential regressors and there are in general less than n regressors selected for a local model, which further reduces the
computing time. The number of potential regressors n depends on the selected polynomial order and the dimension of the input u. For the regarded emission model, an order of three is applied and due to the weak excitation of the temperature, higher order regressors with the intake temperature are manually cancelled. Thus, the set of potential regressors has n D 26 elements.
Kernel model
Since the applied Kernel model consists of the entire training dataset, one kernel function needs to be evaluated for each training point. The applied kernel function is a Gaussian, as is also applied for LOLIMOT. Hence, the required FLOPs to evaluate the kernel functions in eq. (6.9) are identical to eq. (6.13), with ML being the number of data points. For a local model there are in average ML D 218 data points in the training set. Besides the calculation of the kernel function, there are
ML multiplications with the measured output value and ML 1 additions to calculate eq. (6.9).
Therefore, an evaluation of mnox;j.u/with the Kernel model requires the following FLOPs,
C W .2p C 1/ ML
W .2p C 2/ MLC 1 (6.15)
expW ML
with ML D 218and the number of local model inputs p D 4.
Comparison
For the global-local model structure, as in eq. (6.1), at most four of the 21 local models are valid and need therefore be evaluated for a global model simulation. The number of FLOPs required for one evaluation of Pmnox.z; u/ is therefore the number of FLOPs required to evaluate Pmnox;j.u/
multiplied by four plus the evaluation of one look-up table, eq. (6.11), over the engine operation point (zT D n
eng; uinj),
FLOPs . Pmnox.z; u// D 4 FLOPs Pmnox;j.u/ CFLOPs .ˆ.z// : (6.16)
The latter is necessary to determine the weights for the local models. The resulting FLOPs are summarised in Table 6.2 for the four model structures. It shall be noted that an ECU with floating- point arithmetic is assumed for these results. The results for an ECU with fixed-point arithmetic are however assumed to be similar.
The Kernel model is only limitedly suited for an implementation in common ECUs, as it requires the highest computing time. Its computing time directly depends on the number of training values, but it might be reduced by choosing alternative Kernel functions or alternative Kernel methods like support vector machines [179]. The look-up tables on the other hand require the lowest number of FLOPs. LOLIMOT needs about three times of the look-up tables computing time, but requires additional evaluations of the exponential function. The adaptive polynomial approach LOPOMOT
Table 6.2:Required FLOPs for one simulation of the global-local NOxmodel, Pmnox.z; u/. It is
distinguished between multiplications ’’, additions ’+’ and evaluations of an exponential func- tion ’exp’. The four regarded model structures are each applied for the local models Pmnox;j.u/,
see eq. (6.1).
Coperator operator exp evaluation sum of operations
look-up tables 143 164 - 307
LOLIMOT 455 485 34 974
LOPOMOT 115 216 - 331
Kernel model 6991 7864 872 15727
requires only slightly more FLOPs than the look-up table structure and is therefore also suited for an implementation in common ECUs.
The number of FLOPs is important for an implementation in the ECU. For other applications, also the time required for model training is of interest. This is the highest for the Kernel model, since a non-linear optimisation algorithm is applied to determine the bandwidth parameters. The fastest training is given for the look-up table approach, if the search of a suited look-up table structure and the determination of a regularisation parameter is not considered. This can increase the training time significantly such that the LOLIMOT structure might be faster to identify. Its tree construction algorithm is a fast heuristic search for suited model partitions and the determination of the model parameters is given by a least squares algorithm. LOPOMOT requires in general less partitions than LOLIMOT, but needs additional computing time for the selection of regressors. This additional time depends on the number of potential regressors and the applied selection algorithm. The model training is generally more extensive than for LOLIMOT.