2. Research framework and scope
2.4. Definition of the reference design spaces
are performed on the same computer and can therefore be used for comparison purposes. For a large number of points as well as for a high dimensional problem calculation times might increase rapidly and might even become a problematic when running the routines on ordinary computers as done for the thesis. The computer specifications can be found in Appendix A.1.
The number of training points as well as the shape complexity all influence how well the method can describe the boundary around them. If only few points are available or the distance between inside and outside labelled point is rather large, it is more complicated to determine an accurate boundary. Also for complex shapes, e.g., with sharp corners or non-convex forms, it is more difficult to find the precise boundary contour.
The correctness of the test point assessment is the most important criterion. For the practical implications consider the following reasoning of the effect of the wrongly assessed points. For a test point that is actually known to be inside the boundary but was assessed ‘out’ (false positive assessment), that means that the point would not be tested and therefore results in missing information. But for a test point that is actually known to be outside the boundary but was assessed ‘in’ (false negative assessment), attempts would be made to test the point, which can result in damaging the engine. The description above is summarised in the confusion matrix in Table 2.1.
ACTUAL vs ACTUAL value ACTUAL value
Assessed value INSIDE the boundary OUTSIDE the boundary Assessed value True positive False negative inside the boundary (correct) (Possible engine damage)
Assessed value False positive True negative outside the boundary (Information missing) (correct)
Table 2.1.: Confusion matrix for new measurement point evaluation
2.4. Definition of the reference design spaces
Each method is compared based on a reference data set. For the hypothetical cases the data are generated by means of an algorithm. The method has been chosen since the exact boundary can be defined by the user and is hence known in advance. The other test cases are performed with real engine data, in order to assure applicability for future use. For the research the cases considered are two-, three-, four- and seven-dimensional test cases for both hypothetical and engine problems.
It has been chosen to start with two dimensions since the results can easily be plotted and inter- preted. Increasing the complexity one step at the time up to four dimensions should provide an inside in the changes due to the increase of complexity. The validation for seven dimensions is mainly to show the applicability and restrictions for the current most complex case used in special calibration tasks.
x1
x2
(a) Convex shape
x1
x2
(b) Non-convex shape
Figure 2.10.: Two-dimensional reference shapes
The reference design spaces are used for training the methods for the initial boundary description but also the final evaluation of the methods. The points that are generated or taken from a reference design space are the training data for each method. The new data, coming from a newly design test plan, is the test data for the method. A separate evaluation data set is not required, since the output for the training data is known and the training data set can therefore also be used for evaluation.
2.4.1. Hypothetical data sets
The initial comparison of the applicability of all methods has been performed on two-dimensional test cases. For the two-dimensional problem an in-house ‘shape and data generator’ algorithm was used. The tool generates a relation between the number of inputs and an output by means of a polynomial model. The input variables influencing the model output (e.g., the problem of the inlet and outlet valve timing) result in the problem dimension (initially two dimensions), the polynomial order, and the selection of the maximum interaction level for the polynomial (only the first level of interaction is used) are the tuning parameters of the tool. The number of required model coefficients of the problem is determined based on the variables described above. The coefficients themselves are assigned a value at random. Through the randomly generated coefficients, e.g., the value of each coefficient changes for each new run, the correlation and therefore shapes can be varied. The limit defined for the model output creates contours and, projected on the two-dimensional input graphs, the contours have been used as the boundaries for the design space.
Two hypothetical test reference shapes have been selected to be able to analyse the effects of different boundary contours. They are shown in Figure 2.10. The polynomial parameters of the two two-dimensional hypothetical data sets can be found in Section A.3.
To analyse the criteria presented in Section 2.3.2 the number of training points and the training point distribution have been varied for each boundary shape and each dimensions. It can affect the quality of the boundary description as well as the calculation time for training the boundary and testing a new set of points. The shapes themselves will also show effects on the performance
2.4. Definition of the reference design spaces
of the different methods. All tests, for each combination of dimension, boundary shape, training point distribution and number of training points, have been repeated 25 times and the average values have been used. This number of repetitions is chosen to have a balance between statistical reliable results and calculation efforts.
2.4.2. Engine data sets
The used engine data have been selected from two different calibration tasks. One calibration task was to vary the valve timing for both inlet and outlet valves, by means of a variable camshaft control, over a range of engine speed and load. The test is therefore a four-dimensional test and since the engine speed and load are varied it is also called a global test. The test results have been reduced to a three- and two-dimensional case by using fixed values for one or two of the variables, respectively. The measured outputs used are the fuel consumption (BEFF) and the engine imbalance (SPMI). A limiting value for the outputs induces a restriction on the variation limits of the input variables. By means of a modelling software, ASCMO by ETAS, a model between the in- and output has been established. The model is based on Gaussian process models. It could be exported to a Matlab file and thus be used to determine corresponding output values to randomly generated data sets.
The seven-dimensional case with engine data comes from a calibration task where engine speed, engine load, exhaust gas recirculation (EGR) rate, 50 percent of mass fraction burned, amount of fuel for the first injection, amount of fuel for the second injection and the rail pressure have been varied. For the calibration task detailed measurements have been executed so that an engine min-max map for the five latter variables, spanned over engine speed and load, could be established. The engine maps are depicted in Figure 2.11.
The engine maps have been used as the models that define the true boundary. As might be seen from the min-max engine maps is that the overall design space that is within all engine maps is rather small. To assure that enough points will be inside for training the boundary in seven dimensions, the engine speed and load range considered has been reduced to about one third.
1000 2000 3000 0 20 40 60 0 10 20 30 40 Engine speed [rpm] Engine load [%] EGR rate [%] (a) EGR 1000 2000 3000 0 20 40 60 8 10 12 14 16 Engine speed [rpm] Engine load [%]
Fuel mass burned [%]
(b) Mass fraction burned
1000 2000 3000 0 20 40 60 1 1.5 2