4.3 Implementation
4.3.3 Run-time overhead
We used the feather-trace tool to measure the overheads, as in earlier LITMUSRT-
based studies (e.g., [23, 24]). Since the tool uses the timestamp counter to track the start and finish time of an event in cycles, we first validated that the timestamp counter on our board has a constant speed (necessary for precise conversion from cycles to nanoseconds). Since the timestamp counter on each core of the board is not synchronized, we also modified the tool to use the system-wide monotonically-
increasing timer (in nanosecond) to trace the Inter-Processor Interrupt (IPI) delay. We randomly generated periodic tasksets of size ranging between 50 to 450 tasks, with a step of 50. We generated 10 tasksets per taskset size (i.e., 90 tasksets in total) under each scheduler. Under each scheduler, we traced each taskset for 30 seconds, and measured all size types of overhead: release overhead, release latency, scheduling overhead, context switch overhead, IPI delay, and tick overhead (as defined in [5]). We removed the outliers using the method in [23] and computed the worst-case and average-case overheads.
Taskset size: 50 Taskset size: 450
gEDF gFPca nFPca gEDF gFPca nFPca
Release 5.72 5.86 4.74 7.73 23.92 5.45
Sched 8.64 7.75 7.57 11.88 20.07 15.25
CXS 4.23 138.72 142.46 7.31 159.84 162.93
IPI 4.06 3.64 4.12 3.92 3.84 4.03
Table 4.1: Average overhead (µs) under different schedulers with cache-
read workload.
Table 4.1 shows the average overheads for taskset size of 50 and 450 under the
gFPca and nFPca schedulers, as well as the existing gEDF scheduler in LITMUSRT
for comparison. The results show that the release, scheduling, and IPI delay over-
heads of the gFPca and nFPca schedulers are similar to that of gEDF. However,
gFPca and nFPca have a larger context switch overhead than gEDF does, which is
expected because they may need to flush cache partitions during a context switch, as described in the implementation description. The gFPca scheduler incurs higher
worst-case overheads than the gEDF scheduler, which is not surprising because the
scheduling algorithm gFPca has a higher complexity than gEDF. All measured
overhead values can be found in [71].
In the coming sections, we present the schedulability analysis of gFPca, first
the analysis of the cache-related preemption and migration delay (CRPMD) overhead is most challenging, we focus on the analysis of the CRPMD overhead in the main context and present the extension to the remaining types of overhead in Section 4.5.7. Note that our evaluation considered all these overheads.
4.4
Overhead-free analysis
The overhead-free schedulability analysis ofgFPcacan be established using a similar
idea as that ofnFPca[32]. As usual, the processor demand of a taskτi in any interval [a, b] is the amount of processing time required byτi in[a, b]that has to complete at
or before b. When task τi is scheduled undergFPca, τi has the maximum amount of
computation in a period of another task τk when the first job of τi starts executing
at the release time of τk and the following jobs of τi execute as early as possible, as
illustrated in Fig. 4.2. Hence, the worst-case demand of τi in a period ofτk is given
by [20]:
Wik =NJki ·ei+ min{dk+di−ei−NJki ·pi, ei}, (4.1)
where NJki = bdk+di−ei
pi c is the maximum number of jobs of τi that have the entire
executions falling within a period ofτk.
Figure 4.2: Worst-case demand of τi in a period of τk scenario.
The length of τk’s busy interval, denoted by Bk, is the total length of all subin-
tervals in a period of τk during which it cannot execute. The busy interval of τk
can be grouped into two categories: (1) CPU-busy interval, during which all cores are busy executing other higher-priority tasks; and (2) cache-busy interval, during
which at least one core is available (i.e., idle or executing a lower-priority task) and at least A−Ak+ 1 cache partitions are assigned toτk’s higher-priority tasks.
Consequently, the workload of τi in a period of τk consists of two types: (1)
CPU-interference workload, αk
i, which is the workload of τi when it executes in
the CPU-busy interval of τk; and (2) cache-interference workload, βik, which is the
workload of τi when it executes in the cache-busy interval of τk. Sinceτk cannot ex-
ecute when its higher-priority tasks collaboratively keep the CPU busy, and because the system has M cores, the length of the CPU-busy interval of τk is bounded by
1
M
P
i<kαki. Because each higher-priority task executes βik time units with Ai cache
partitions occupied, and because higher-priority tasks only need to occupyA−Ak+1
cache partitions to preventτk from execution, the combined cache resources (i.e., the
number of partitions occupied in an interval multiplied by the interval length) that need to be used by all other tasks to blockτk from execution during τk’s cache-busy
interval is bounded above by P
i<kmin{Ai, A−Ak+ 1}β
k
i. Therefore, the length
of the cache-busy interval of τk is bounded above by
P i<k min{Ai,A−Ak+1} A−Ak+1 β k i. Since
the length of the busy interval of τk is no more than the sum of the length of the
CPU-busy interval and the length of the cache-busy interval, it is bounded above by: X i<k 1 Mα k i + min{Ai, A−Ak+ 1} A−Ak+ 1 βik .
Further, in each period of τk, the CPU/cache-interference workload of a higher-
priority task τi must satisfy the following constraints: (1) the combination of the
CPU-interference workload and cache-interference workload of τi cannot exceed the
workload of τi, i.e.,αik+βik ≤Wik; and (2) the CPU/cache-interference workload of
all τi should be no more than the length of the CPU/cache-busy interval of τk, i.e.,
αki ≤P i<k 1 Mα k i and βik≤ P i<k min{Ai,A−Ak+1} A−Ak+1 β k i.
Based on the above discussion, we obtain the following:
where Bbk is the optimal solution of the following Linear Programming (LP) problem: maximize X i<k 1 Mα k i + min{Ai, A−Ak+ 1} A−Ak+ 1 βki subject to αk i +βik ≤Wik, ∀i < k αk i ≤ P i<kM1α k i βk i ≤ P i<k min{Ai,A−Ak+1} A−Ak+1 β k i
Proof. The lemma holds by construction as discussed above.
The next theorem follows as a result of Lemma 4.1.
Theorem 4.2. A taskset τ is schedulable under the gFPca algorithm if each task τk
in τ satisfies Bbk≤dk−ek.
Proof. Supposeτ is unschedulable. Then, there exists a taskτkthat is unschedulable,
which implies that the length of its busy interval (Bk) is larger than the length of its
slack interval, i.e., the maximum waiting or blocking time that τi can accommodate
before missing its deadline, which is given by di − ei. In addition, we can easily
show that the CPU-interference workload αk
i and the cache-interference workload
βk
i of each high-priority task τi within a period of τk satisfy the constraints in the
above LP formulation; therefore, the maximum length of the busy interval (i.e., Bbk),
calculated by the LP, is no less than Bk. In other words, Bbk ≥ Bk > dk−ek. By contraposition, we imply the theorem.
Theorem 4.3. Given a taskset τe = { τe1, ...,τen }, where τei = (pi, eei, di, Ai) for
all 1≤i≤n. Let τ ={τ1, ..., τn} be any task set withτi = (pi, ei, di, Ai) and ei ≤eei
for all 1≤i≤n. Then, τ is schedulable under the gFPca algorithm if eτ satisfies the
gFPca schedulability conditions given by Theorem 4.2.
Proof. We will show that if τ is unschedulable undergFPca, then eτ will be deemed
Indeed, if τ is unschedulable under gFPca, then there exists a task τk ∈ τ that
misses its deadline. LetBk be the maximum length of the busy interval ofτk. Then,
b
Bk ≥Bk due to Lemma 4.1. Since τk misses its deadline, Bk> dk−ek. Combining
this with eei ≥ ei and Bbk ≥ Bk, we obtain Bbk > dk −eek. Thus, the taskset τe is deemed unschedulable by Theorem 4.2.