Run-time overhead - Cache-Aware Real-Time Virtualization

4.3 Implementation

4.3.3 Run-time overhead

We used the feather-trace tool to measure the overheads, as in earlier LITMUSRT-

based studies (e.g., [23, 24]). Since the tool uses the timestamp counter to track the start and finish time of an event in cycles, we first validated that the timestamp counter on our board has a constant speed (necessary for precise conversion from cycles to nanoseconds). Since the timestamp counter on each core of the board is not synchronized, we also modified the tool to use the system-wide monotonically-

increasing timer (in nanosecond) to trace the Inter-Processor Interrupt (IPI) delay. We randomly generated periodic tasksets of size ranging between 50 to 450 tasks, with a step of 50. We generated 10 tasksets per taskset size (i.e., 90 tasksets in total) under each scheduler. Under each scheduler, we traced each taskset for 30 seconds, and measured all size types of overhead: release overhead, release latency, scheduling overhead, context switch overhead, IPI delay, and tick overhead (as defined in [5]). We removed the outliers using the method in [23] and computed the worst-case and average-case overheads.

Taskset size: 50 Taskset size: 450

gEDF gFPca nFPca gEDF gFPca nFPca

Release 5.72 5.86 4.74 7.73 23.92 5.45

Sched 8.64 7.75 7.57 11.88 20.07 15.25

CXS 4.23 138.72 142.46 7.31 159.84 162.93

IPI 4.06 3.64 4.12 3.92 3.84 4.03

Table 4.1: Average overhead (µs) under different schedulers with cache-

read workload.

Table 4.1 shows the average overheads for taskset size of 50 and 450 under the

gFPca and nFPca schedulers, as well as the existing gEDF scheduler in LITMUSRT

for comparison. The results show that the release, scheduling, and IPI delay over-

heads of the gFPca and nFPca schedulers are similar to that of gEDF. However,

gFPca and nFPca have a larger context switch overhead than gEDF does, which is

expected because they may need to flush cache partitions during a context switch, as described in the implementation description. The gFPca scheduler incurs higher

worst-case overheads than the gEDF scheduler, which is not surprising because the

scheduling algorithm gFPca has a higher complexity than gEDF. All measured

overhead values can be found in [71].

In the coming sections, we present the schedulability analysis of gFPca, first

the analysis of the cache-related preemption and migration delay (CRPMD) overhead is most challenging, we focus on the analysis of the CRPMD overhead in the main context and present the extension to the remaining types of overhead in Section 4.5.7. Note that our evaluation considered all these overheads.

4.4 Overhead-free analysis

The overhead-free schedulability analysis ofgFPcacan be established using a similar

idea as that ofnFPca[32]. As usual, the processor demand of a taskτi in any interval [a, b] is the amount of processing time required byτi in[a, b]that has to complete at

or before b. When task τi is scheduled undergFPca, τi has the maximum amount of

computation in a period of another task τk when the first job of τi starts executing

at the release time of τk and the following jobs of τi execute as early as possible, as

illustrated in Fig. 4.2. Hence, the worst-case demand of τi in a period ofτk is given

by [20]:

W_ik =NJk_i _·ei+ min{dk+di−ei−NJki ·pi, ei}, (4.1)

where NJk_i = _bdk+di−ei

pi c is the maximum number of jobs of τi that have the entire

executions falling within a period ofτk.

Figure 4.2: Worst-case demand of τi in a period of τk scenario.

The length of τk’s busy interval, denoted by Bk, is the total length of all subin-

tervals in a period of τk during which it cannot execute. The busy interval of τk

can be grouped into two categories: (1) CPU-busy interval, during which all cores are busy executing other higher-priority tasks; and (2) cache-busy interval, during

which at least one core is available (i.e., idle or executing a lower-priority task) and at least A₋Ak+ 1 cache partitions are assigned toτk’s higher-priority tasks.

Consequently, the workload of τi in a period of τk consists of two types: (1)

CPU-interference workload, αk

i, which is the workload of τi when it executes in

the CPU-busy interval of τk; and (2) cache-interference workload, βik, which is the

workload of τi when it executes in the cache-busy interval of τk. Sinceτk cannot ex-

ecute when its higher-priority tasks collaboratively keep the CPU busy, and because the system has M cores, the length of the CPU-busy interval of τk is bounded by

i<kαki. Because each higher-priority task executes βik time units with Ai cache

partitions occupied, and because higher-priority tasks only need to occupyA₋Ak+1

cache partitions to preventτk from execution, the combined cache resources (i.e., the

number of partitions occupied in an interval multiplied by the interval length) that need to be used by all other tasks to blockτk from execution during τk’s cache-busy

interval is bounded above by P

i<kmin{Ai, A−Ak+ 1}β

i. Therefore, the length

of the cache-busy interval of τk is bounded above by

P i<k min{Ai,A−Ak+1} A−Ak+1 β k i. Since

the length of the busy interval of τk is no more than the sum of the length of the

CPU-busy interval and the length of the cache-busy interval, it is bounded above by: X i<k 1 Mα k i + min_{Ai, A−Ak+ 1} A₋Ak+ 1 β_ik .

Further, in each period of τk, the CPU/cache-interference workload of a higher-

priority task τi must satisfy the following constraints: (1) the combination of the

CPU-interference workload and cache-interference workload of τi cannot exceed the

workload of τi, i.e.,αik+βik ≤Wik; and (2) the CPU/cache-interference workload of

all τi should be no more than the length of the CPU/cache-busy interval of τk, i.e.,

αk_i _≤P i<k 1 Mα k i and βik≤ P i<k min{Ai,A−Ak+1} A−Ak+1 β k i.

Based on the above discussion, we obtain the following:

where Bb_k is the optimal solution of the following Linear Programming (LP) problem: maximize X i<k 1 Mα k i + min_{Ai, A−Ak+ 1} A₋Ak+ 1 βk_i subject to αk i +βik ≤Wik, ∀i < k αk i ≤ P i<kM1α k i βk i ≤ P i<k min{Ai,A−Ak+1} A−Ak+1 β k i

Proof. The lemma holds by construction as discussed above.

The next theorem follows as a result of Lemma 4.1.

Theorem 4.2. A taskset τ is schedulable under the gFPca algorithm if each task τk

in τ satisfies Bb_k≤d_k−e_k.

Proof. Supposeτ is unschedulable. Then, there exists a taskτkthat is unschedulable,

which implies that the length of its busy interval (Bk) is larger than the length of its

slack interval, i.e., the maximum waiting or blocking time that τi can accommodate

before missing its deadline, which is given by di − ei. In addition, we can easily

show that the CPU-interference workload αk

i and the cache-interference workload

βk

i of each high-priority task τi within a period of τk satisfy the constraints in the

above LP formulation; therefore, the maximum length of the busy interval (i.e., Bbk),

calculated by the LP, is no less than Bk. In other words, Bb_k ≥ B_k > d_k−e_k. By contraposition, we imply the theorem.

Theorem 4.3. Given a taskset τ_e = _{ τ_e1, ...,τen }, where τei = (pi, eei, di, Ai) for

all 1_≤i_≤n. Let τ =_{τ1, ..., τn} be any task set withτi = (pi, ei, di, Ai) and ei ≤eei

for all 1_≤i_≤n. Then, τ is schedulable under the gFPca algorithm if _eτ satisfies the

gFPca schedulability conditions given by Theorem 4.2.

Proof. We will show that if τ is unschedulable undergFPca, then _eτ will be deemed

Indeed, if τ is unschedulable under gFPca, then there exists a task τk ∈ τ that

misses its deadline. LetBk be the maximum length of the busy interval ofτk. Then,

Bk ≥Bk due to Lemma 4.1. Since τk misses its deadline, Bk> dk−ek. Combining

this with eei ≥ ei and Bbk ≥ Bk, we obtain Bbk > dk −eek. Thus, the taskset τe is deemed unschedulable by Theorem 4.2.

In document Cache-Aware Real-Time Virtualization (Page 110-115)