5.4 Combined confidence predictor based filtering and prioritization
5.6.3 Filtering experiments
In this subsection, we will evaluate the performance of our confidence predictor based filtering technique. However, before evaluating it, there are some aspects that should be analyzed. The first variable that we will evaluate in the following points is the potential of the accuracy increment for each confidence predictor. Note that if we filter requests with low level confidence, which are expected to result in non useful prefetches, the resulting level of accuracy of the prefetching technique should improve. After that, we made a study to estimate the optimum period of warmup for each region. Finally, we compare the filtering technique from the state of the art with our technique.
Estimated accuracy increment
To calculate the estimated increment of accuracy, for each predictor, we have calculated the accuracy without filtering any requests, and the accuracy when filtering the low confidence requests according to each predictor. Note that, this experiment has been done without any hardware restriction. For this reason, the difference between them gives us the theoretical accuracy that the filtering technique with a certain predictor should obtain. The predictors that we have compared are: Last Phase (LP), Filtering state of the art technique (FT), Code Region (CR), Stream Position (SP), and the combined of Code Region and Stream Position (COMB).
Figure 5.3 shows the estimated accuracy for the predictors after filtering the low priority requests from the RPT prefetcher. All the heuristics improve their accuracy compared with the baseline. Moreover, the accuracy predicted for the baseline heuristics is always below the accuracy predicted for the combined (COMB) technique which is our proposal. A better accuracy should result in a more efficient technique (less cache pollution, traffic in the network, and power to issue and resolve the unuseful requests). For this reason, we expect that with this prediction, our technique will improve the current state of the art technique.
Stability analysis
In order to get the best warmup period value for the filtering analysis, we have done the following experiment: We have counted the percentage of wrong predictions done after locking a region for several warmup values. We consider a prediction to be wrong if the predicted value is not the same as it would be predicted without locking the the region.
In Figure 5.4, we see the results of this experiment. We see that, the longer the warmup period is, the less number of wrong predictions are done. Nevertheless, for values longer than 50 warmup iterations, the number of wrong predictions increases. This is because
5.6 Performance evaluation 113
Fig. 5.3 Accuracy increment estimation after filtering the low confidence requests.
during the warmup period, the low confidence requests are always predicted as medium confidence requests. At some point in time, the region profiling is enough stable to take right decisions. Thus, forcing the low confidence predictions to be treated as medium confidence increases the number of wrong predictions. As we can see in the figure, this value is around 50 iterations. For this reason, we have selected this number as the optimal value for the parameter of our dynamic warmup technique.
Fig. 5.4 Percentage of regions that change their confidence after the warmup.
Filtering performance evaluation
In this subsection, we compare the performance of our combined confidence predictor based filtering technique with the filtering technique from the state of the art. Figure 5.5a shows the filtering results when simulating both mechanisms. To get this figures, we have done an initial execution without filtering and another one with the filtering activated. The bars from the
figures represent the average percentage of prefetch requests issued with the filtering activated respect the average number of prefetch requests issued without filter (100% indicates the same number for both runs). The line represents the IPC speedup comparing the filtered simulation with the non filtered one (more than 100% indicates a higher IPC with filtering). In Figure 5.5a we can see the results for the baseline mechanism. We can see that the baseline heuristic is able to reduce a lot the prefetching requests. It manages to eliminate about 87% of the non-useful requests in average. However, due to the filter aggressiveness of this heuristic, it also filters a lot of useful requests. In average, this heuristics filters more than 56% of the useful requests, what have an important impact on the speedup. As we can see, the IPC is reduced in around 18% in average. Another important observation that can be done regarding this figure is about facesim benchmark. As we can see, the number of useful requests is about 167%, this means that the execution with filtering issues more useful prefetch requests than the non filtered one. The reason for this is found on the behavior of the prefetcher. When the prefetcher works more accurately it is able to issue more requests. The reason for this is in the behavior of the RPT. If the requests issued by the RPT are not useful, it is auto-regulated and it reduces the number of requests issued for a certain static address. If these requests are useful the reduction is not applied and they are issued. Moreover the filter will let them pass, and consequently we can have results like the facesim ones, which are directly transformed in important speedup improvements. On the other hand, there are benchmarks such as fluidanimae or streamcluster which percentage of filtered requests is nearly 99%, which is directly reflected in negative speedups.
(a) Baseline (state of the art). (b) Combined confidence predictor based tech- nique.
5.6 Performance evaluation 115
On the other hand, the results for the filtering technique using our combined heuristic methodology are shown in Figure 5.6. As we can see, our technique is much less aggressive than the baseline one, However, we reduce the non-useful requests by 30% and in average the prefetcher is able to increment the useful ones in 10%. This effect is directly translated into an average 3% of speedup. We can see that our technique is able to reduce the non-useful requests in all the benchmarks except dedup and allow the prefetcher to increase the useful requests in most of them. This big difference between the baseline and our proposal is because of the increased effectiveness of our combined confidence predictor and the dynamic warmup methodology that makes our technique more flexible and capable of adapt itself more efficiently to the program behaviour.