5.4 Combined confidence predictor based filtering and prioritization
5.5.1 Execution flow
In this section, the implementation is explained as an upgrade of the implementation of the confidence predictor. As in the previous chapter, we have used a block diagram to explain the behavior of the hardware for this technique. Figure 5.2 shows the block diagram tagged with the numbers that are specified in the following points:
Fig. 5.2 Block diagram for the hardware implementation of the prioritization and filtering technique using the combined confidence predictor.
1. Memory request issue: The core requests some data to the cache memory through a demand operation, the code region of the request is identified and sent to the memory hierarchy with the request.
5.5 Hardware implementation 105
2. Cache access: The prefetcher analyses this demand operation and uses its heuristics to infer which will be the next request that the core will need.
3. Prefetcher trigger: When the prefetcher decides to trigger a stream of requests, the Dynamic Management logic tags these requests from the stream with a certain priority based on the confidence prediction. Remember that we have defined three levels of confidence for prefetch requests: high, medium and low. Thus, in this point, the low confidence requests will be filtered instead of being queued to the prefetch queue. If the warmup has not finished for a generation point or there is no information about that region on the Region Profile Table, the requests are assigned the medium confidence ( low priority for the prioritization mechanism) and tagged to be profiled.
4. Prefetch queue: Each prefetch request is queued to the priority queue. Remember that these structure is composed by two small subqueues with different priorities. Each request is queued to one of these subqueues according to its confidence.
5. Process the prefetch request: When the cache controller does not have any demand operation to process, it peeks up the request at the head of the high priority queue and process it.
6. Miss status holding register: If the requested memory block is not in the cache, the prefetch request is added to the Miss Status Holding Register (MSHR) and submitted to the corresponding input buffer from the priority network interface. If the request has been tagged to be profiled, a bit is set in the corresponding entry of this structure to profile the success of the prefetch request when the memory block arrives from the memory hierarchy.
7. Network interface: The arbiter from the priority network interface is the responsible of applying the prioritization policy and choosing the flit from the input buffer with the highest priority (with pending requests) that will be processed. Once it is injected in the network, the rest of the prioritization is done by the routeres in the way that was specified in the previous Section.
8. Prepare profiling: When the miss is solved the data is stored in the cache module waiting to be requested by the processor or being evicted by other request. Moreover, if the prefetch request had been tagged for profiling, the memory line is tagged with an extra bit to indicate that this is a prefetched request, and the Prefetch Profiling Table stores the profiling information related to this prefetch request (address, region identifier, and generation point in the stream of requests).
9. Prefetching request evaluation: Each prefetching request is evaluated when the corresponding entry of the Prefetch Profiler Table is evicted. This eviction can be due to several reasons and according to these reasons evaluation will differ. (1) The prefetched memory line is evicted from the cache without being used: the prefetch requests is categorized as non-useful. In this case, the profiled accuracy of the corresponding generation point will decrease. (2) The prefetched memory line is used by a demand request: Once this data is used the line is marked as a non-prefetch request and the corresponding prefetch request is evaluated as useful. The profiled accuracy of the generation point that has triggered this request will increase. (3) Finally, an entry of the Prefetch Profiler Table may be evicted because it is a limited size hardware table and the entry has to be replaced by another new request. In this case, the evicted prefetch request will be evaluated as non-useful, and the profiled accuracy on the Region Profile Table for this request will decrease.
10. Region Profile Table Update: Each entry in this table contains the confidence related to a position in the stream of prefetch requests and the region of code. Moreover, this table is updated when there is an eviction in the Prefetch Profiler Table. Depending on the evaluation of the corresponding prefetch request, the appropriate counters will be updated to maintain the track of the profiled accuracy. However, when the warmup period finishes, the confidence in the Region Profile Table will be locked and it will not be updated anymore until it is evicted. Note that, when a new entry is allocated in the table, it always must wait for a warmup period until being locked, it does not matter if the entry was previously in the cache and it was evicted, There is always this warmup period for all the entries. Moreover, the replacement policy for this table is LRU and the use of the entries is updated when the prefetcher access the table to get the priority for a certain region.
11. Confidence prediction: When the dynamic prefetching management technique re- quires to perform a confidence prediction for a given prefetch request (given the memory region of the trigger and the position on the stream of the prefetch request to be generated), that technique will access the Region Profile Table and will get the confidence for this entry. Nevertheless, if the warmup period is not over, the confidence will be high or medium. This is done in this way, because if low priority would be provided during the warmup period, the requests may be filtered, and its profiling information would not be updated properly. For this reason, if a request would be tagged with low confidence under the warmup period its confidence is increased to medium.
5.5 Hardware implementation 107