In order to use our heuristic for the filtering technique, when a request is generated the filtering logic assigns a confidence to this request. This confidence is obtained from the region profiler table according to the region of code that triggered the request and its position in the stream of requests generated by the prefetcher. In line with our proposal, confidence can be high, medium or low. Whenever confidence is low, this would mean that most probably, according to our confidence predictor, the request will not be useful. Therefore, when our filtering technique is applied, the request will be discarded and not sent to the prefetching queue. As we use the region of code and stream position to decide if a request will be useful, we do not even need to compute the address of the prefetch request, but can discard it before it has actually been generated. Other prefetchers in the same stream of requests might have a higher confidence according to our predictor and thus they will not be discarded.
5.2 Confidence predictor based filtering 99
5.2.1
Warmup period
The filtering technique has one big drawback that is not actually addressed in the previous filtering proposals . When the confidence predictor detects a useless request and this request is filtered, it will not be issued any more. This means that its confidence prediction will no longer be updated. Hence the mechanism is not able to recover from an erroneous filtering decision. Should the behavior of the code executed by the program change for a few instructions, or the memory structures become different or any other change occur in the execution flow, some prefetch requests that were useful may became non-useful for a long enough period of time for the confidence of that generation point to go down and for the mechanism to decide that it has to be filtered. If this happens, when the program goes back to its usual behaviour, then the previous useful requests would now be filtered. In this situation we believe that it is better to launch a short useless stream and keep launching the other accurate requests with the same confidence. In fact there is only one way to recover from this situation, and this is when the filtered entry is evicted from the table. However, if it is a very active entry (corresponding to a hot region of code), it will last a long time in the filter table and will filter requests that are potentially useful.
A specific case of this problem is the instability of the confidence counters when the mechanism starts to profile a generation point. Imagine that when a region is first executed, a few initial prefetching requests associated with that region and position on the stream are unuseful but the following ones are useful. If the mechanism decides too soon that the generation point has a low level of confidence and starts to filter it, it will never generate enough requests for it to realize that the confidence of that generation point should actually be higher.
To solve this situation, we have been working with the hypothesis that a generation point, i.e. a tuple of <region of code, position in the stream of prefetch requests>, after a long enough number of iterations, will have a similar accuracy for the rest of the execution. Thus, when a new region is detected, the confidence predictor updates the information for all its stream positions during a certain period of time (warmup). When this period is over, we assume that the behaviour for that region of code has stabilized or at least that the gathered profiling information is enough representative of its long term behaviour. Then, the confidence predictor assigns one of the 3 levels of confidence to each tuple and locks them. Therefore, the prediction for that region will not be updated anymore until it is evicted from the Region Profile Table. Moreover, during this warmup period, and this is key, the filter will not discard any request.
Note that by locking the values of the Region Profile Table, we not only alleviate the problem presented at the beginning of this paragraph, but we also save power and entries
from the Prefetch Profiling Table. Note that one we have assigned confidence predictions to the stream position of a region of code, it is not necessary to profile the requests generated for that region anymore and thus, we don not need to use entries from the Prefetch Profiling Table for that.
5.2.2
Dynamic warmup strategy
For this mechanism to be effective, the time destined to the warmup period is a key parameter. For this reason, we have implemented a dynamic warmup strategy to improve the capacity to detect when the value of a generation point in a region becomes stable. As the accuracy is a value that can have small variations in small periods of time, we have counted the number of iterations that a position in the stream of generated prefetches from a region has the same confidence level. We have chosen the confidence because its variability is lower than the accuracy. Small changes in accuracy may not imply changes in confidence. Nevertheless, big changes in accuracy, will change the confidence values. For example, in a configuration where we have a warmup of 5 iterations and where the threshold to change the confidence is 20% accuracy. If we have 7 accesses to a tuple of region and position in the stream of generated prefetches composed by: 30%, 25%, 19%, 16%, 19%, 18%, and 17%. The warmup would lock that entry in this last access. Table 5.1 shows the warmup process until the lock of the entry. As we can see, in the last access, where the accuracy is 17%, the warmup is over because it reaches its limit and the region is locked with Low confidence. Note that between the 1st and the 2nd access there is a change in the confidence level, thus the warmup is reset. We have done several experiments to figure out which would be the best number of dynamic iterations with the same confidence to finish the warmup. They are shown in Section 5.6. Note that, from now on, we will use the concept of warmup iterations as this dynamic value of iterations of the warmup.
Access Accuracy Confidence Warmup iterations
0 30% High 1 1 25% High 2 2 19% Low 1 3 16% Low 2 4 19% Low 3 5 18% Low 4 6 17% Low 5