Chapter 3: Real-Time Nested Locking Protocol ( RNLP )
6.1 Summary of Results
In the following, we review the key results of the work presented in this dissertation.
Fine-grained locking in multiprocessor real-time systems. In Chapter 3, we presented theRNLP, the first multiprocessor real-time locking protocol to support fine-grained locking. The problem of nested locking in multiprocessor real-time systems stood open for over twenty years. Prior to theRNLP, nested locking of resources shared among multiple processors in real-time systems was only supported through coarse-grained locking, in which resources are grouped and treated as a single lockable entity. Fine-grained locking can be realized through nested locking, or a new fine-grained locking technique called dynamic group locking. In the latter case, a request atomically requests a set of resources, which may be a subset of a larger group of resources that would be treated as a single lockable entity under coarse-grained locking.
TheRNLPhas a modular architecture, composed of ak-exclusion token lock and an RSM. In Chapter 3, several RSMs that support mutex resources were presented, and in Chapter 4, RSMs that support reader/writer sharing andk-exclusion were presented. We showed that there exist pairings of token locks and RSMs that are optimal for all platform configurations in which optimal coarse-grained locking protocols are known. Interestingly, at the time theRNLPwas originally presented (Ward and Anderson, 2012), there was no known
optimal mutex locking protocol for clustered systems under s-aware analysis. Since then, RSB has been proposed (Brandenburg, 2014), we incorporated RSB into a new RSM presented herein, that is optimal in that case as well. This demonstrates the utility of the modular architecture of theRNLP.
In this dissertation, we presented the first fine-grained blocking analysis of theRNLP. (In previous publications (Ward and Anderson, 2012, 2014b), fine-grained blocking analysis was omitted due to space constraints.) This blocking analysis is tighter than previous pi-blocking bounds for the RNLP, at the expense of increased computational complexity. However, these tighter bounds results in improved platform utilization.
Independence-preserving k-exclusion locking protocol. In Chapter 5, we presented the R2DGLP, an asymptotically optimal, independence-preservingk-exclusion locking protocol for globally scheduled systems. TheR2DGLPis especially useful for managing access to multiple GPUs, and indeed has been applied in such applications (Elliott et al., 2013; Elliott, 2015). In designing theR2DGLP, we also developed a new progress mechanism, RRPD, which is similar to priority donation (Brandenburg and Anderson, 2011), but is applied at the time of request issuance instead of job release. By shifting the time of donation to request issuance, it is possible to construct an independence preserving locking protocol for globally scheduled systems. RRPD is similar to a priority-donation technique presented by Elliott and Anderson (2013) for theO-KGLP, but enables improved blocking bounds.
While the design of theR2DGLPwas motivated by the need to manage access to GPUs, it is also useful as a token lock in theRNLP. When combined with the I-RSM on globally scheduled systems, the resulting
RNLPvariant is also independence preserving, and has a request-blocking bound no worse than the global
OMLP(Brandenburg and Anderson, 2010a).
Synchronization algorithms for shared hardware resources. In Chapter 5, we also defined two new resource-sharing constraints motivated by the need to more predictably manage shared hardware resources such as caches and buses. The goal of such shared-hardware management is to reduce or eliminate the effects of timing interference caused by concurrently executing tasks, thereby improving timing predictability. In turn, the improved predictability may offset the cost of shared-hardware management, leading to improved schedulability.
Towards this goal, we presented preemptive mutual exclusion, which is motivated by the need to manage access to a shared communication bus such as the memory bus, andhalf-protected exclusion, which
is motivated by the need to manage access to cache resources. We considered simple, straightforward algorithms that realize these sharing constraints, and provided analysis and experimental results showing the schedulability improvements they may enable.
Idleness analysis. Locking protocols cause blocking as tasks are forced to wait until resources are available, or to ensure resource-holder progress. This blocking must be somehow incorporated into schedulability analysis in real-time systems so as to ensure that jobs do not miss deadlines on account of blocking. Traditionally, blocking is analyzed viablocking analysis, the results of which are incorporated in schedulability analysis. In Chapter 5, we presented a new technique, called idleness analysis, for incorporating the effects of blocking into schedulability analysis, that does not require any blocking analysis. In idleness analysis, idleness induced by blocking is analyzed, instead of the duration of blocking. This “flips the analysis” from asking the question “how long can this request be blocked?” to instead asking “how much idleness can this request cause?”
Idleness analysis and blocking analysis are theoretically incomparable with respect to schedulability, i.e., neither dominates the other. In Section 5.4, we presented the results of schedulability studies that were conducted to investigate in which cases one is favorable to the other. Idleness analysis is often favorable in systems with smaller core counts, as fewer processors can be idled as a result of synchronization.