Related work - Efficient Adaptive Hard Real-time Multi-processor Systems

Interference analysis is far from a new concept, with the first research efforts focusing on resource usage on multi-processor appearing in the early ’70s [16, 10] or even earlier. Such efforts focus only in determining the worst-case interference that can happen on a particular architecture, without considering the application and/or the deployment. To the best of the author’s knowledge there is exists no other work that couples the

interference analysis to the real-time deployment process. In fact, the closest works to the ones proposed are interference-sensitive WCET estimation method [77, 78, 76], where the term isWCET was first coined, and the works of the on-going project ARGO [84, 41] where a similar idea of iteratively improving the WCET estimation according to the deployment decisions is advocated and evaluated [90].

The works of Nowotsch, et al., for isWCET (first appeared in [77, 78] and extended in [76]) focus COTS platforms where the target architecture is a distributed homogeneous architecture, yet the approach should be extendable to centralised and mixed architectures. The shared resources that are considered are more, as I/O devices, PCI bus, etc. are considered. These works, as opposed to our approaches, enforce resource regulation on a per-task basis in order to ensure that tasks do not generate more interference than permitted. That is, the accesses of task are monitored at runtime and if a task exceeds its allotted resource usage is suspended. A first consideration is that runtime monitoring of resource access can introduce non-negligible overhead to actual execution, which is not addressed in these works. Another, potentially problematic situation, yet a corner case, since mixed-criticality systems are considered is the following. A high-criticality task with tight resource demands on the low-criticality mode to be suspended/slowed-down, while low-criticality tasks run normally, until the criticality mode changes. Still, the approach is of significance and illustrates experimental that a per-task resource-bounded WCET, i.e. isWCET, can increase performance significantly.

Of particular interest are interference-free approaches [94, 21, 81]. These approaches require that execution occurs in short phases of local accesses and shared-resources accesses. This can be achieved by reordering/instrumenting the execution binary and properly aligning, in time, the phases of each core such that shared-resource phases do not overlap. This can minimise WCET estimations, as there is no interference, and lead to efficient deployments. Nevertheless, we consider that runtime adaptation to shorter executions of tasks is difficult, as the strict requirement of phase-alignment cannot be violated, and rescheduling at earlier times requires a significant reduction of a single phase. Therefore, such execution models, while efficient in deploying applications for hard real-time, they lack in adaptability.

Following we review some state-of-the-art approaches that are related to WCET and scheduling.

3.4.1 WCET Analysis and Scheduling

We focus on the works that, similarly to us, consider that the WCET of a task is composed of its WCET in isolation, including data fetch and deposit time, with no interferences from other tasks on shared resources plus time delays due to contentions on these resources. There exist several works considering that the WCET of a task is composed of its WCET

including data plus time delays due to contentions on these resources. Other papers focus mainly on the WCET estimation of tasks with data dependencies deployed on centralised-[73, 23] and distributed [44] architectures. Works on HW/SW co-design, e.g. [42], can avoid interferences by deciding the amount of resources in the platform, contrary to the commercial hardware that we consider.

In [73] the authors propose an ILP formulation of the task scheduling and mapping problem for multi-core architectures with caches. They consider different communication times for data exchange between the tasks mapped to the same core (e.g. when communication happens through caches) and two different cores (e.g. with the access to shared memory). In [23] author are presenting the upper bound estimation of the WCET also for a memory-centric architecture, similar to ours, by proposing a memory-aware execution to compute the delays due to memory contention. The access to the shared memory is assumed to be realised through Data Memory Access (DMA) units, while the access delays are derived experimentally for different sizes of memory block. These approaches are suitable for architectures with a small number of clusters having simple inter-cluster interconnection network, e.g. TI Keystone I IT M [108]. However, the WCET analysis for a multi-processor architectures with multiple clusters interconnected with a NoC such as Kalray MPPA-256 [60] requires more detailed modelling and analysis. There have been several works dedicated to WCET analysis for the tasks with data dependencies deployed onto a multi-processor architecture. In [44] authors are presenting the approach to compute the WCET of tasks running on Kalray MPPA-256 [60] platform by assuming that the maximum number of interfering tasks is equal to the number of processing elements of this cluster, when accessing the shared memory within a cluster, which we experimentally observed that it does not hold in the general case.

In [48], the authors present a comprehensive theory for mixed-criticality scheduling on cluster-based multi-processor architectures with shared resources developed within the CERTAINTY project. To derive a feasible schedule the authors estimate the tasks’ worst-case response time (WCRT), which we call WCET1_{. The tasks are scheduled with} FTTS mixed-criticality scheduling policy that repeats over a hyper-cycle divided into frames and sub-frames the beginning of which is synchronised among each cores of a cluster. Each sub-frame contains only the tasks of the same criticality level, which ensures that resource contention may happen only among the tasks with the same criticality level. The WCRT for the same criticality level, is composed of a WCET and the total delay due to contention on shared resources. Our model considers no criticality levels but has a detailed representation of the communication mechanism over the NoC. Also, in addition to the arbitration delays when accessing shared memory blocks accounted in [48], we consider the arbitration delays occurring at shared buses. Moreover, we are tightening the WCET by excluding non-interfering tasks, i.e. tasks non-overlapping in

1_{The reason for this conflict of terminology is that often the term WCRT refers to a set of tasks}

time that share a resource, which is highly complex for the models used in [48].

In document Efficient Adaptive Hard Real-time Multi-processor Systems (Page 64-67)