Cache Partitioning - Restraining the cache behaviour

2.4 Restraining the cache behaviour

2.4.1 Cache Partitioning

Cache partitioning techniques aim at increasing cache predictability by partitioning the cache in a way that a cache segment can be reserved for a specific task or group thereof. Cache partitioning techniques can be implemented in hardware or in software.

Hardware Cache Partitioning Kirk in [70] suggests a hardware-implemented partition-

ing scheme named SMART (Strategic Memory Allocation for Real-Time). In SMART, the cache is divided into several segments private to individual tasks; and a shared partition. Each task is assigned one or more private partitions. Assigning private segments to tasks reduces extrinsic cache interferences; since we also have a shared segment, however, there would be a (possibly small) CRPD at context switches. The implementation of SMART requires a hardware flag to tell private versus shared cache partitions and hence a custom cache controller. The author also supplies an algorithm to choose the size and the assignment of partitions [71].

2.4 Restraining the cache behaviour 27

Muller et al. in [109] introduce a similar cache partitioning technique, hardware implemented as well, but more D-cache oriented. Unlike Kirk’s approach, the cache is split into several partitions, which operate like small direct-mapping caches. No shared segment is required. Load and Store instructions include a partition operand which selects the appropriate partition; the compiler must be aware of the cache architecture and should pro- vide an appropriate allocation of the cache partitions. This approach aims at also reducing the intrinsic cache interferences by mapping data structures into cache partitions (and into lines within partitions) according to a compiler analysis of data accesses.

Software Cache Partitioning Since hardware partitioning is not usually supported in

commercial processors, other approaches aim at implementing cache partitioning techniques by software. Software partitioning builds on optimising memory mapping of code through specific compiler and linker support: instructions are placed in the address space so as to reduce or eliminate inter-task interference.

Wolfe in [164] suggests dividing the cache space into partitions, each of which is only used by certain tasks. A direct-mapping cache is partitioned by altering the address trans- lation process at each cache access, also requiring some hardware tweaks. Bounding the memory references issued by tasks to a selected range of addresses effectively permits to define logical (as opposed to physical) partitions in the cache.

Mueller in [107] takes Wolfe’s approach and focuses on the compiler and linker support required for cache partitioning. Mueller aims at defining how to assign tasks to addresses and thus assembling the code in a manner that permits to eliminate extrinsic interferences. Cache lines are grouped into partitions, each assigned to a specific (real-time) task; a single partition is reserved for a shared access. In order to map task memory accesses to a specific cache partition some code transformations are applied. Instructions and data are transformed to fit a certain range of memory addresses, producing a scattered memory mapping for each task. The scattered memory mapping is performed through the compila- tion and linking processes to ensure that every task will only access its own cache partition, except in case of synchronisation, when the shared partition is accessed.

The compiler must restrict the code of a task to only those memory addresses that map into the cache partition. Mueller also suggests breaking the tight one-to-one mapping between tasks and partitions: in order to avoid extremely small partitions it suffices to define one partition for each priority level, letting the tasks at one and the same priority level share the same partition (if FIFO scheduling is adopted for tasks at the same priority level). Assigning multiple tasks to a single partition permits to define bigger partitions and

consequently reduce intrinsic interferences. On the other hand, sharing partitions within priorities will introduce some extrinsic interference: since when a suspended task will execute again, it may find its cache state modified by some task sharing the same partition.

Discussion Cache partitioning is likely to reduce extrinsic cache interferences by as-

signing separate cache segments to given tasks. Residual intrinsic interference may still be incurred owing to access to shared partitions (if any). The most important drawback of both hardware and software partitioning approaches is the reduction of cache space per task: since each task is granted a smaller amount of addressable cache, the number of capacity (and conflict) misses will inevitably increase.

Furthermore, this kind of techniques leaves one central problem unattended: they do not advise on how we are supposed to define both size and assignment of partitions. This issue is critical as it will have a major impact over cache performance. In fact, the performance of cache partitioning is strictly related to number, size and actual assignment of cache partitions. In particular, assigning partitions to tasks can prove quite a complex job: naively assigning partitions in accordance with task priority (i.e., by urgency) or to task rates is unlikely to be the best choice.

Some studies noticed those open problems and tried to find an automated (i.e., algorith- mic) way to solve them. Sasinowsky and Strosnider in [140] define a dynamic program- ming algorithm for allocating cache segments to a set of periodic tasks. Their algorithm, which runs in a polynomial time, is claimed to be optimal where optimality consists in finding the allocation which produces the minimum processor utilisation for the given task set. As a single task utilisation depends on how large its partition is, the algorithm tries to find a global partitioning and assignment to get a minimum utilisation for the entire task set. The determination of single task utilisation in each possible partition size is obtained through simulations or experiments.

Tan and Mooney in [153] propose a different allocation scheme strictly related to task priorities; for this reason, that approach is often referred to as prioritised cache. A set- associative cache is partitioned at the granularity of sets and each partition can assume a priority in the same range as task priorities. Each task is then allowed to only access cache partitions with equal or lower priority. The priorities of the cache sets are originally set to the lowest priority level; when a task uses a cache set, the set priority is raised to the task priority and is then downgraded to the lowest priority level after the task has completed its execution. The main drawback of this approach is that it guarantees high cache performance to higher-priority task at the cost of degrading lower-priority tasks,

2.4 Restraining the cache behaviour 29

which will not necessarily yield an optimal overall performance.

In document Cache-aware development of high integrity real-time systems (Page 44-47)