Spinlock Protocols - Multiprocessor Real-Time Locking Protocols

2.4 Real-Time Locking Protocols

2.4.4 Multiprocessor Real-Time Locking Protocols

2.4.4.1 Spinlock Protocols

In a spin-based protocol, a jobJi that incurs acquisition delay busy-waits by executing a delay loop

until its request is satisfied. SinceJiremains scheduled while spinning, it doesnotincur pi-blocking

at the time. Nonetheless, the spinning delays bothJiand also lower-priority jobs since it increases Ji’s execution requirement; and must thus be accounted for. To avoid ambiguity, we refer to the delay thatJi incurs due to itsownbusy-waiting asspin blocking(s-blocking). We letsidenote an upper bound on the maximum cumulative duration of s-blocking incurred by anyJi. In other words,

Jiincurs acquisition delay—and thus spins—for at mostsitime units.

If jobs spin non-preemptively (which they do in all of the considered spin-based protocols), then a spinning job may cause other, newly-released, higher-priority jobs to incur a priority inversion (this is similar to pi-blocking in theNCP). The boundsi doesnot account for such delays that

affectotherjobs. For example, if a higher-priority jobJ1cannot preemptJ2because it is spinning

non-preemptively, thenJ1 incurs pi-blocking (but not s-blocking), which is accounted for byb1,

whereasJ2incurs s-blocking (but not pi-blocking), which is accounted for bys2. This is illustrated in Figure 2.35.

Since busy-waiting consumes processor cycles, the execution requirement must be adjusted to reflect the increased processor demand. We denote Ti’s effective execution requirement as

i = ei+si. When applying any schedulability test in the presence of spinlocks, the effective

execution requiremente0

τ2 τ1 20 15 10 5 0 time T1 T2 T3 pi-blocking s-blocking

release deadline completion job scheduled lock attempt critical section busy-waiting locked unlocked Processor 1 2

Figure 2.35: Illustration of the difference between pi-blocking and s-blocking. The example shows three tasks underP-FPscheduling onm = 2processors. The tasksτ1 ={T1, T2}are assigned

to processor 1; taskT3 is assigned to processor 2. J2,1 spins non-preemptively while it waits for

J3,1 to release the shared resource. J2,1 does not incur pi-blocking while it waits because it is scheduled; nonetheless, its response time increases because it is wasting processor cycles.J1,1incurs

pi-blocking upon release until time 7, whenJ2,2becomes preemptable again.

uniprocessor response time bound (Theorem 2.12) is applied as follows:

Ri =e0i+bi+ i−1 X h=1 Ri ph ·e0_k. (2.7)

Since this is a straightforward substitution, we omit restating all relevant schedulability tests and henceforth assume thate0_iis used instead ofei.

Non-preemptive spinlocks. Gai et al. (2003) proposed the multiprocessor SRP (MSRP), an extension of Baker’sSRPfor partitioned scheduling (eitherP-EDForP-FP). Somewhat confusingly, the nameMSRPderives from the fact that theSRPis used on each processor to arbitrate access to

localresources; theSRPdoes not apply to global resources. Priority ceilings are in fact irrelevant for global resources in theMSRP. Instead, theMSRPusesFIFO spinlocksto protect global resources. In a FIFO spinlock, waiting jobs form aspin queue, and jobs (atomically) append themselves to the end of the spin queue when an acquisition attempt fails.19 Under theMSRP, to request a global

resource`q, a jobJibecomes non-preemptable and then enqueues itself onto the FIFO spin queue.

OnceJihas become the head of the queue, it holds`qand executes its critical section (and remains

non-preemptable).Jire-enables preemptions when it releases`q.

Example 2.17. An exampleMSRPschedule is shown in Figure 2.36. In the depicted scenario,J3,1

acquires`1at time 1 on processor 1. This causesJ4,1to spin non-preemptively on processor 2 during 19

⌧

1 15 10 5 0 time T1 T2 T3

T

4 1 1 1 2 2 1 release completion deadline job scheduled lock attempt critical section busy-waiting locked unlocked on processor 1 2

Figure 2.36: Example MSRP schedule. There are four tasks T1, . . . , T4 assigned to m = 2

processors sharing two resources`1, `2 underP-FPscheduling. The two resulting partitions are indicated asτ1andτ2. The digit within each critical section indicates which resource was requested.

[2,5), which in turn causesJ2,1 to incur pi-blocking during[3,6). Similarly,J1,1incurs pi-blocking during[2,5)on processor 1 becauseJ3,1is executing its critical section and thus is non-preemptable.

J3,1andJ2,1acquire`2later during their execution, but their requests happen to not overlap, and

thus neither job incurs s-blocking in this particular case.20 ♦

Because jobs spin and execute requests non-preemptively, there can be at mostmconcurrent requests for global resources. Together with the FIFO ordering, this implies that at most(m−1)jobs precedeJiin the spin queue for`q. This greatly simplifies bounding maximum s-blocking. Recall thatτkdenotes the set of tasks assigned to processork, and thatPidenotes the processor to whichTi

is assigned. A single request to resources`qcausesJithen to spin for at most

spin(Ti, `q) = m X k=1 k6=Pi max{Lj,q |Tj ∈τk}. (2.8)

In other words, only the longest request issued by any job on each remote processor must be considered when boundingJi’s maximum spin blocking due to a single request. Based on this per- request bound, Gaiet al. (2003) derived the following bound on the maximum s-blocking incurred by anyJiunder theMSRP:

si= nr X

q=1

Ni,q·spin(Ti, `q).

We derive a similar but more accurate bound for FIFO spinlocks that is less pessimistic ifNi,q>1

in Chapter 5.

As mentioned above, a second concern is a priority inversion that might be caused by a non- preemptively spinning job at the time ofJi’s release. This is similar to a priority inversion caused by theNCPor theSRP’s scheduling rule. In fact, since both the priority inversion due to theSRP (local resource sharing) and the priority inversion due to spinning (global resource sharing) have to occur at the time ofJi’s release, any singleJicannot incur both types of pi-blocking. LetbSRPi

denote the maximum pi-blocking due to local resource requests, and letbNP

i denote the maximum

pi-blocking due to global resource requests. Then eachJi incurs at mostbi = max(bNPi , bSRPi )

pi-blocking. The local pi-blockingbSRP

i can be determined by uniprocessor analysis (since local

and global resources cannot be nested). UnderP-FPscheduling, the maximum duration of priority inversion due to a lower-priority job accessing a global resource is bounded by

bNP

i = max{spin(Tk, lq) +Lk,q |Tk∈τPi∧Nk,q >0∧k > i}.

As in the uniprocessor case, “k > i” should be substituted with “dk> di” underP-EDFscheduling. With bothbiandsibounded, schedulability under theMSRPcan be established by applying either Theorem 2.12 or Theorem 2.13 under consideration of effective execution requirements to each processor.

Deviet al. (2006) analyzed global resource sharing with non-preemptive FIFO spinlocks under G-EDF. Their protocol works essentially the same as discussed above in the description of the MSRP: jobs first become non-preemptable, then enqueue themselves in a FIFO spin queue and busy-wait until their request is satisfied, and finally become preemptable again when they release the source. Since jobs are not bound to processors, the maximum duration of spinning cannot be bounded on a per-processor basis as in Equation (2.8) above. However, the non-preemptive spinning still limits the maximum number of concurrent requests tom, and thus to(m−1)preceding jobs. Deviet al. (2006) derived the following simple bound:

spin(Ti, `q) = (m₋1)_· max 1≤k≤n

k6=i

We derive a less-pessimistic bound that takes the identity of blocking jobs and the frequency at which they issue requests into account in Chapter 5.

A key contribution of Deviet al.’s work was to show thatG-EDFensures bounded tardiness even in the presence of non-preemptive sections (Deviet al., 2006; Devi, 2006; Devi and Anderson, 2008). However, their analysis assumes that each job suffers a priority inversion at most once, and only at the time of its release, as it is the case on a uniprocessor (and thus also under partitioning). As we show in Chapter 3, a straightforward, “eager” implementation of non-preemptive sections in a global scheduler doesnotensure that this is indeed the case. Instead, a “linking mechanism” is required to enact preemptions “lazily”; our solution is detailed in Section 3.3.3.

This concludes our review of spin-based locking protocols under event-driven schedulers. In Chapter 5, we generalize the use of non-preemptive FIFO spinlocks to RW constraints and to clustered JLFPscheduling, and derive a flexible blocking analysis framework that takes both response-time bounds and task periods into account to derive less-pessimistic bounds on s-blocking.

The protocols discussed so far do not apply toPD2_{. Locking in pfair-scheduled systems is}

more challenging than underJLFPscheduling since jobs are preempted more frequently. Holman and Anderson studied this problem in detail and proposed several spin-based and suspension-based locking protocols forPD2(Holman, 2004; Holman and Anderson, 2006). However, in our overhead- aware evaluation of schedulers presented in Chapter 4, we foundPD2to not perform well on our test platform even if tasks are independent. We therefore focus on locking under event-drivenJLFP schedulers in this dissertation and review suspension-based protocols for such schedulers next.

In document Scheduling and locking in multiprocessor real-time operating systems (Page 145-149)