Chapter 8 Structural Robustness of Distributed Real-Time Systems Towards Uncer-
8.1 Structural Robustness
The robustness of a system is mainly affected by the degree of interactive complexity between tasks in the system. In order to measure the structural robustness of a system, we seek to quantify the degree of interactive complexity between tasks. As we are interested in ensuring that end-to-end timing constraints of tasks are not violated, we specifically study the complexity of temporal interactions within the system. In other words, we are interested in estimating the extent to which task execution times on individual stages affect the worst-case end-to-end delays of tasks.
We consider systems with tasks described as flow paths (a path may consist of just one resource). Tasks require execution at a sequence of resources along its path (each called a stage execution) and the end-to- end execution must complete within certain pre-specified time constraints. Resources may be complex and represent entire subsystems. Each stage execution of a task consists of the task’s execution on one such resource. We assume that the worst-case extent to which the execution of task i on resource j affects the end-to-end delay of any task in the system is known as Xi,j. The worst-case delay Xi,j is a function of the
worst-case execution time Ci,jof task i on resource j, and will depend on how the resource is scheduled. For
Ci,j. If the resource serves tasks based on a TDMA schedule, then Xi,j may be larger than Ci,j, as it may
take several TDMA cycles before the execution of task i completes on the resource j. If parallelism exists in the execution of tasks within the resource, Xi,j may be lesser than Ci,j. Further, it is possible for these
worst-case delay estimates to be violated. Let the number of resources in the system be N , and the number of tasks be M . Let Qk denote the set of tuples (i, j), such that an infinitesimal increase in the worst-case
execution time of task i on resource j would result in the worst-case end-to-end delay of task k to increase (task i can be the same as task k).
In order to determine a single structural robustness metric for the entire system, we first estimate the extent of temporal interactions within the system. This is estimated by computing the effect that a particular stage execution of a task i has on the worst-case end-to-end delay of a task k, Xi,j, weighted by
the importance of task k, and accumulated across all tasks i and k. Although two tasks i and k both execute at a resource, it is possible that they do not affect each other’s worst-case end-to-end delays. As we are interested in the extent of temporal interactions within the system, we normalize the above computed value with respect to the product of the total of all Xi,j’s and the total of all I(k)’s of tasks. As a larger value
for the extent of temporal interactions within the system reflects a lower level of robustness, we compute the structural robustness metric by considering one minus the above normalized value. We formally define structural robustness as follows:
Definition: Given an importance vector I that denotes the relative importance of a task with respect to other tasks in the system, the structural robustness of a particular system’s task flow graph is defined as:
ω = 1 − P
k≤M
P
(i,j)∈QkXi,jI(k)
P
k≤MI(k)
P
(i,j)Xi,j
(8.1)
Let us take a closer look at the structural robustness metric defined in Equation 8.1. Note that, the definition of structural robustness is particularly concerned with task executions on individual stages that contribute towards the worst-case end-to-end delays of other tasks. Individual stage executions of tasks that affect the worst-case end-to-end delays of a larger number of tasks or those that affect more important tasks are weighted more. For instance, a stage execution of a task A that affects the worst-case end-to-end delay of one other task, contributes less towards reducing the structural robustness of the system than a stage execution of a task B that affects several other tasks. Further, a stage execution of a task A that affects the worst-case end-to-end delay of another task by X, contributes less towards reducing the structural robustness of the system than a stage execution of a task B that affects another task by, say 10X (the temporal interaction is more due to task B than due to task A in both instances). This explains why the metric accumulates the effect that stage executions have on the end-to-end delays of other tasks.
The importance vector is specified by the application and reflects whether missing a deadline for one task is more tolerable than missing a deadline for another task. For instance, for tasks that are homogeneous, a simple importance vector could be to assign an equal importance to all the tasks (a value of 1 for each entry of the vector). Alternatively, the importance of each task could be assigned to be inversely proportional to its deadline. In essence, the value associated to each task reflects its importance towards the application’s correctness and performance.
The problem we address in this chapter is to assign tasks to resources so as to maximize the above defined structural robustness metric. Such an optimized system would be less sensitive to unanticipated delays in particular stage executions of tasks and would minimize the number of deadline misses, as it reduces the extent of temporal interactions within the system. In Section 8.2, we define the particular system model we consider in this chapter. We envision that future work will enhance the scope of systems that are optimized for structural robustness to unanticipated delays in stage execution times.
8.2
System Model
We consider a distributed system comprising of N different kinds of resources, R1, R2, . . . , RN. Each resource
Ri has ri ≥ 1 identical instances of the resource available within the system. A resource can be anything
that serves tasks in a fixed priority preemptive scheduling order (e.g., processor, communication link). Let Ntot denote the total number of all instances of resources present in the system, and let the instances be
arbitrarily named S1, S2, . . . , SNtot. The system serves M end-to-end soft real-time tasks, T1, T2, . . . , TM,
ordered by decreasing priority. Each task Ti requires execution on a pre-specified sequence of resources and
must complete execution on all resources before a pre-specified end-to-end deadline. The relative priority of each task remains the same across all the resources on which it executes. When multiple instances of a resource are available, any one of the instances can be assigned to serve a task requesting that resource. Each resource instance at which a task executes is referred to as a stage. For ease of exposition, we assume that the union of all task paths forms a Directed Acyclic Graph (DAG). Later, in Section 8.4.2, we show how our technique can be easily extended to handle cycles in the task paths (tasks can revisit resources multiple times). Tasks may be periodic or aperiodic.
Let Ci,j denote the estimated worst-case execution time of an invocation of Ti on a resource instance j,
and is the same regardless of which instance of the resource is assigned to serve it. Each resource group has only one resource, and hence the extent of the delay that the execution of a task Tion resource j can cause
another task, Xi,j, is the same as Ci,j. Although we have an estimate of the worst-case execution time, we
Given such a system, the objective is to assign tasks to resource instances as per their resource require- ments, so as to minimize the number of deadline misses within the system in the presence of unanticipated delays in execution times. The algorithm presented in this work achieves this objective by reducing the sensitivity of the end-to-end timing behavior of the system’s task flow graph to specific execution times and not allowing any spikes in the execution times to propagate to the worst-case end-to-end delay.
A particular assignment of tasks to instances of resources requested by it, is termed as a configuration. The sequence of stages followed by a task Ti in a configuration C, is denoted by P athCi . Let Ci,max denote
the maximum computation time of Ti across all the stages on which it executes, and let N odej,max denote
the maximum computation time over all tasks that execute on a resource instance j.
Note that, a task Ti can delay Tk only along execution stages it shares in common with Tk. We define a
task segment Tx
i (the segments are indexed) as Ti’s execution on a sequence of consecutive resource instances
along its path that is also traversed by Tk either in the same order or exactly in reverse order. Let Ci,maxx
be the maximum computation time of Tiacross all stages in segment Tix, and let resource instance j be the
stage corresponding to the maximum computation time, also referred to as the max-stage of the segment. We ignore the precedence constraints between different segments of each higher priority task Ti, and consider
each segment as an independent task. As explained in Chapter 4, note that this does not decrease the end- to-end delay of Tk as we only remove certain precedence constraints, thereby increasing the set of possible
arrival patterns of tasks to stages. Thus, our delay bound estimate errs on the safe side.