6.2 System-wide power models
6.2.1 Predefined models
The calibration of predefined models is the most common approach for power modeling, pro- viding some flexibility to the models to adapt themselves to new hardware architectures. This section compares the models described in Section 5.3.3 for the three learning data-set cases described in Section 5.3.2. The results presented in Figures 6.2 to 6.4 show each model’s aver- age performance and standard deviation for the learning data-set and for every workload used for validation (see Section 5.2.1). As a general aspect, one can notice that the learning error always decreases as the model increases in complexity. However, the same does not stand for the validation workloads.
Figure 6.2 shows the results for the ideal case, i.e. when only the generic workload is used
for training. One can see that the average model, which considers the power consumption
constant, presents very poor results reaching up to 25 W of MAE. These results evidences that this assumption cannot be used in most of the cases. It can also be noticed that the errors for the workloads which are not too close to the generic workload, like stress, Gromacs (gmx), HPCC and NPB have a poor performance when using the capacitive model. However, more complex models provide better results. The capacitive with leakage power and the aggregated models have similar performance. Most of the workloads present validation errors near 2 W, however for the stress, HPCC_A and NPB_A workloads, some high errors are noticed. In addition, a large difference can be noticed for the distributed version of the NPB size C (npb_C8_dist).
Figure 6.3 shows the results for case 2, where one workload of each kind is used in the learning set used to calibrate the model. In this case, the overall error of all models decreases. Once again the average model has the worst accuracy. Even though, the capacitive model presents great improvements from the previous case, its performance is still below the more complex models. The results of the capacitive with leakage and aggregated models surpass the others for 9 of the workloads (generic, apache_perf, openssl, pybench, hpcc_B, hpcc_C, npb_B4, npb_B8_dist and npb_C4), have similar performance in 8 (apache, c-ray, gmx_dppc, hpcc_A_dist, hpcc_B_dist, hpcc_C_dist, npb_A4 and npb_A8_dist) and a worst one for the 5 remaining ones. One can notice that for the npb_C8_dist workload the capacitive with leakage and the aggregated models have a huge difference, the same has already been seen for the previous learning case.
The results when learning from one execution of each workload (case 3) is shown in Figure 6.4. This case allows the evaluation of each model’s best performance, since it considers all possible configurations of the workloads. One can notice that for all models, except the average, the results are quite close to case 2. This proximity evinces that case 2 is a good approach, i.e. the concept of using one workload configuration of each kind to create the model is suitable. It is important to notice that even for this case, the error of the stress workload is kept high. This happens due to the dynamic behavior of the workload which varies from high to low power usage in a short period of time. For this workload, the models which use the temperature as an input will not provide good results due to the heat dissipation profile, which presents inertia and is not instantaneous.
Evaluation of Power Models
learn generic apache apache_perf c−ray openssl pybench stress gmx_polyCH2 gmx_villin gmx_dppc gmx_lzm
MAE (W) 0 2 4 6 8 10 14 23.3 15.4 21 21.1 23 22.3
Avg Cap Cap+Leak Aggr
learn hpcc_A hpcc_A_dist hpcc_B hpcc_B_dist hpcc_C hpcc_C_dist npb_A4 npb_A8_dist npb_B4 npb_B8_dist npb_C4 npb_C8_dist
MAE (W) 0 2 4 6 8 10 14 25.2 22.8 25.5 22.7 25.6 22.7 22.6 22.2 24.6 24.7 24.3 25.4
Avg Cap Cap+Leak Aggr
Figure 6.2: Predefined models’ performance after calibration using the learning workload for the ideal case (case 1).
learn generic apache apache_perf c−ray openssl pybench stress gmx_polyCH2 gmx_villin gmx_dppc gmx_lzm
MAE (W) 0 2 4 6 8 10 14 17.2 15.5 15 15.1 17 16.3
Avg Cap Cap+Leak Aggr
learn hpcc_A hpcc_A_dist hpcc_B hpcc_B_dist hpcc_C hpcc_C_dist npb_A4 npb_A8_dist npb_B4 npb_B8_dist npb_C4 npb_C8_dist
MAE (W) 0 2 4 6 8 10 14 19.1 16.8 19.4 16.7 19.5 16.7 17.5 16.9 18.7 18.8 18.3 19.5
Avg Cap Cap+Leak Aggr
Figure 6.3: Predefined models’ performance after calibration using the learning one workload of each kind (case 2).
learn generic apache apache_perf c−ray openssl pybench stress gmx_polyCH2 gmx_villin gmx_dppc gmx_lzm
MAE (W) 0 2 4 6 8 10 14 18.2 23.1 20.4 16 15.7 16
Avg Cap Cap+Leak Aggr
learn hpcc_A hpcc_A_dist hpcc_B hpcc_B_dist hpcc_C hpcc_C_dist npb_A4 npb_A8_dist npb_B4 npb_B8_dist npb_C4 npb_C8_dist
MAE (W) 0 2 4 6 8 10 14
Avg Cap Cap+Leak Aggr
Figure 6.4: Predefined models’ performance after calibration using the learning all workloads (case 3).
Evaluation of Power Models
model: total number of cache misses; and network sent and received bytes. An analysis of the npb_C8_dist workload was realized to understand why the results of learning cases 1 and 2 present such difference between these models. Figure 6.5 show the means and standard devia- tions of each extra variable. One can notice that the network communications of the validation data-sets are higher than any learning data-set. Since they are not in the same range as the learning data, any variation above the training range may incur unexpected behavior. The same does not happen for the learning case 3, since the workload is included as part of the learning data-set. ● ● ● ● ● ● ● −0.5 0.0 0.5 1.0 1.5 2.0 Last La y er Cache Misses ( 10 6) L1 L2 L3 V1 V2 V3 V4 ● ● ● ● ● ● ● −10 0 10 20 30 Sent Bytes ( 10 6) L1 L2 L3 V1 V2 V3 V4 ● ● ● ● ● ● ● −20 −10 0 10 20 30 Receiv ed Bytes ( 10 6) L1 L2 L3 V1 V2 V3 V4
Figure 6.5: Comparison between learning workloads for each case (L1, L2 and L3) and the validation runs for the npb_C8_dist workload (V1, V2, V3 and V4).