2.5 Real-Time Operating Systems
2.5.2 Real-Time Extensions of General-Purpose Operating Systems
2.5.2.2 Native Real-Time Linux
In a native real-time Linux design, the Linux kernel is the only kernel present and in full control of the hardware platform. Instead of working around Linux’s limitations as in a para-virtualized design, a native real-time Linux variant must directly modify the kernel to enhance its real-time capabilities. While para-virtualization may be the only feasible (Linux-based) design for applications with very stringent timing constraints (e.g., engine control software), a native design is generally preferable for the vast majority of applications if timing constraints can be met,i.e., if Linux’s limitations such as high interrupt latencies can be addressed.
Given the limitations of early versions of Linux, the first native designs had to introduce substantial infrastructure changes. In contrast, modern Linux (i.e., Linux versions 2.6+) is much better suited for use in real-time systems, so that current native designs mostly focus on scheduling and locking algorithmic changes. In the following, we review the major academic native real-time Linux variants in (roughly) chronological order. These can be classified into two groups: those initially developed in the late 1990s and early 2000s (i.e., before Linux 2.6), and those introduced in recent years.
Embedded HRT systems. Srinivasanet al. (1998) introducedKansas University Real-Time Linux
(KURT Linux), which pioneered many of the real-time infrastructure changes now found in standard Linux. In particular, it introduced high-resolution (software) timers based on hardware timers operating in one-shot mode (the so-called “UTIME” patch), a design that was later generalized and reimplemented in a POSIX-compliant way and merged into standard Linux under the namehrtimers
(Gleixner and Niehaus, 2006). Additionally, KURT Linux also added a number of scheduling features aimed at HRT systems such as table-driven static scheduling, the ability to upload static schedules into the kernel at runtime, and “focused mode,” where the kernel would schedule only real-time processes (and no background processes). KURT Linux’s high-resolution timers and scheduling support were used to conduct networking simulations requiring microsecond accuracy (House and Niehaus, 2000). The last public release of KURT Linux occurred in 2002 in the form of a patch against 2.4.18, which has since been removed from the web.
Linux started being used in robotics applications in the mid to late 1990s. An early real-time Linux version targeted at this domain isAdvanced Real-Time Linux(ART Linux), which added the PIPand a system call interface in support of periodic tasks (Ishiwata and Matsui, 1998). ART Linux was developed by Japan’s National Institute of Advanced Industrial Science and Technology (AIST) and used in theOpen Humanoid Robotics Platform(Kanehiroet al., 2004) and, among others, in the humanoid robot HRP-4C (Kanekoet al., 2009). ART Linux is only sparsely documented and not much publicized (e.g., it does not appear to have a current project homepage), and consequently has seen little use elsewhere.
Another early real-time Linux isReal-Time and Embedded Linux(RED Linux), which initially was also known as Real-Time Enhanced (RTE) Linux (Wang and Lin, 1998). RED Linux was developed at the University of California, Irvine to provide a flexible base for the study of uniprocessor real-time scheduling within a real OS (Wang and Lin, 1998, 1999). (LITMUSRTserves the same purpose in a multiprocessor context.) With regard to infrastructure changes, RED Linux incorporated Kansas University’s UTIME patch and further changed the kernel to reduce the length of non- preemptive sections by addingpreemption points. That is, the kernel still executes system calls and ISRs non-preemptively most of the time in RED Linux, but checks more frequently whether a preemption is required. RED Linux also introduced significant algorithmic changes. In particular, it added a flexible hierarchical scheduling framework, which included support forEDFand several notions of fair scheduling (Wanget al., 2002; Lin and Wang, 2003). It is not clear whether the code for RED Linux was made publicly available; it appears that the RED Linux project is now dormant and not (anymore) available online.
Resource kernels. When extending a UNIX-like OS, the question arises of how to map sporadic tasks, which are analysis-time entities, to processes, which are runtime entities. In a “classic” RTOS such as QNX Neutrino or even standard Linux, the kernel (and thus the scheduler) is unaware of the task parameters that were used during analysis to establish schedulability of the system. As a result, such a kernel cannot reliably detect when processes deviate from the assumed behavior. This is undesirable from a reliability point of view, and in particular in open systems (i.e., if tasks are added and removed at runtime). The RT-Mach project incorporated the idea ofresource reservations
(Mercer and Rajkumar, 1995), which was later developed into theresource kerneldesign (Rajkumar
et al., 1998; Oikawa and Rajkumar, 1999; Rajkumaret al., 2001). The resource kernel approach was implemented by Oikawa and Rajkumar (1998) in Linux under the nameLinux/RK.
In a resource kernel, the sporadic task model is interpreted at runtime as an execution time budget that is replenished periodically or sporadically. That is, sporadic “tasks” are instantiated by the kernel as an accounting abstraction, and each “job” is a time slice with a deadline. By performing an admission test (i.e., schedulability test) before granting a reservation, the kernel can ensure at runtime that resources do not become overcommitted. Once a reservation has been granted, a process is attached to it. Instead of scheduling processes directly, the scheduler in a resource kernel first selects a pending “job” (i.e., a budget that has not yet been exhausted) to “execute” and then dispatches the attached process.
Resource reservations have a number of advantages. Since a process is only scheduled when its reservation has remaining budget, it cannot consume more resources than assumed during schedulability analysis. This provides temporal isolation among processes, which makes the system amenable toa priorianalysis. Further, development is simplified because misbehaving processes (i.e., those that routinely overrun their budget) are straightforward to identify. Another advantage of the resource reservation model is that it can be easily extended to support group scheduling (by attaching more than one process to a processor reservation) and hierarchical scheduling (by having multiple layers of reservations). The idea of resource reservations can further be applied to non-compute resources such as disk and network bandwidth or memory to isolate tasks from interference via those resources as well.
Resource reservations as described so far are calledhard reservationsin Linux/RK because processes are not eligible to execute after exhausting their current budget. Additionally, Linux/RK
supports two relaxed reservation types: processes withfirm reservationsmay be scheduled even if they have currently no budget, but only if the processor would idle otherwise, whereas processes with
soft reservationsare scheduled on a round-robin basis after exhausting their budget. The resource reservation model is a natural way to realize the sporadic task model in a UNIX-like OS, and we follow this approach in the implementation of LITMUSRTas well.
Besides adding resource reservations to Linux’s FP scheduler, Linux/RK also added high- resolution timers and shortened the duration of non-preemptive sections in the Linux kernel. The last public release of Linux/RK with support for processor, disk, and network reservations occurred in 2002 on the Linux/RK project homepage in form of a patch against Linux 2.4.18 (CMU Real-Time and Multimedia Systems Lab, 2006). Although a limited “alpha version” based on Linux 2.6.18 with support for processor reservations (but not disk or network reservations) appeared in early 2007 (Lakshmanan, 2007), Linux/RK does not appear to be actively maintained any longer.
QoS scheduling. Two somewhat different native real-time Linux versions areQLinux(Sundaram
et al., 2000) andLinux-SRT(Childs and Ingram, 2001). Instead of targeting HRT workloads common in embedded systems, they were aimed at supporting “soft” workloads withquality-of-service(QoS) requirements. Like Linux/RK, both variants were based on resource reservations at their core. QLinux emphasizes resource reservations for network and disk bandwidth (in addition to processor time) and implements thestart-time fair queuing policy(Goyalet al., 1996). Linux-SRT supports several types of processor time and disk bandwidth reservations with semantics similar to Linux/RK’s hard, firm, and soft reservations. Notably, Linux-SRT provides an additional system call to let servers such as X11 (Linux’s graphical user interface) explicitly “bill” clients for resources used on their behalf.
Modern Linux. Common to all of the so-far mentioned native real-time Linux variants is that they are no longer actively maintained,26and that they do not explicitly consider multiprocessor
issues (besides the multiprocessor support offered by the underlying Linux version). The latter is understandable because multiprocessor platforms for real-time and embedded systems were a rare occurrence at the time. One reason that has likely contributed to the cessation of project
26With the possible exception of ART Linux. Given the unavailability of (English) documentation, recent publications, or an up-to-date project homepage, we were unable to determine its current status.
maintenance is that Linux 2.6 gained several improvements (over the course of several versions) that greatly improved its viability as an RTOS, namely high-resolution timers, priority inheritance, mostly preemptable kernel execution, much-shortened non-preemptive sections, and an improved lower-overheadFPscheduler. Since these improvements have become available in standard Linux, the need for alternate infrastructure-modifying patches has been lessened. In fact, mainline Linux is now (virtually) POSIX-compliant and supportsFPscheduling (SCHED FIFOandSCHED RR) with 100 distinct priorities, processor affinity masks, and thePIP. While not directly supported in the kernel, thePRIO PROTECTpolicy is available as a user-space implementation in thepthreads
library. TheSRPis thus available, albeit at the cost of two additional system calls per resource request to raise and lower task priorities.
However, while the Linux kernel and its associated runtime libraries are now compliant with the real-time POSIX specification (similar to a purpose-built Category II kernel), the Linux kernel itself still contains non-preemptive code paths that are long (in the context of real-time systems) and architectural design choices that were made with throughput in mind. For example, interrupts are, by default, not serviced using split interrupt handling; rather, ISRs are typically executed immediately when an interrupt is raised and are not subject to scheduling. Executing ISRs right away benefits network and disk bandwidth, but can also delay real-time tasks. Thus, while API-compatible, current mainline Linux is (understandably) not yet comparable to purpose-built RTOSs such as VxWorks or QNX in terms of predictability and interrupt latency.
PREEMPT RT patch. Moving Linux further in the direction of being a true RTOS is the goal of the PREEMPT RT patch,27which is the only “semi-official” real-time Linux variant being maintained
by several core Linux kernel developers (Molnar, 2004; McKenney, 2005; Gleixner, 2006; Zijlstra, 2008). The PREEMPT RT patch changes the core Linux infrastructure significantly. To reduce the length of non-preemptive sections, it converts most spinlocks in the kernel to semaphores, and further enables thePIPby default for all semaphores in the kernel (in standard Linux, thePIPis not necessarily active, which can result in priority inversion). Another major change introduced by the PREEMPT RT patch is to force split interrupt handling for all ISRs (except timers). However, in contrast to a microkernel such as Fiasco, ISR bottom halves are still executed in kernel mode without
27
any isolation (and the ISR top halves are not necessarily as minimal as in Fiasco). Given that error rates in device drivers are notoriously high (Chouet al., 2001), this lack of fault containment poses a considerable reliability hazard.
The PREEMPT RT patch, originally proposed by Molnar (2004), has been continuously main- tained since 2004 and is still under active development, with the latest released version applying to Linux 2.6.33 at the time of writing. Besides serving as a staging ground for real-time features that are (intended to be) incorporated into mainline Linux at a later point, it is also widely used in practice. For example, both Novell and Red Hat corporations have developed commercial Linux distributions that incorporate the PREEMPT RT patch.
The PREEMPT RT patch does not add new scheduling algorithms or locking protocols; rather, its main appeal is a considerable reduction in interrupt latency (Arthur et al., 2007). However, strictly speaking, even with the PREEMPT RT patch, Linux is not a “true” RTOS, and will for the foreseeable future contain a number of compromises between throughput and predictability concerns. Nonetheless, despite these compromises, or maybe in part even because of them, Linux today is “good enough” for many (soft) real-time applications. As noted by Paul McKenney in (Harris, 2005),
I believe that Linux is ready to handle applications requiring sub-millisecond process- scheduling and interrupt latencies with 99.99+ percent probabilities of success. No, that does not cover every imaginable real-time application, but it does cover a very large and important subset.
Many practitioners, it appears, are in agreement, given that industry surveys indicate a steadily growing Linux market share in the embedded sector (Linux Devices, 2007).
Recent developments. One area beyond the scope of the PREEMPT RT project are (experimental) scheduling algorithm improvements such as resource reservations like those offered in Linux/RK, QLinux, or Linux-SRT. Around the time that the LITMUSRTeffort was started in 2006, two other Linux-based real-time projects were initiated, namely theAdaptive Quality of Service Architecture
(AQuoSA) developed at the Scuola Superiore Sant’Anna (Palopoliet al., 2009) andRedline Linux
developed at the University of Massachusetts Amherst (Yanget al., 2008). Both are native real-time Linux variants that are focused mainly on “soft” workloads, albeit in very different ways.
The Redline Linux project’s main goal is to ensure a highly responsive graphical user interface even in situations where the system is exposed to severe, potentially malicious overload (such as
“fork bombs,” where the number of processes increases exponentially until memory is exhausted). To achieve this, Redline Linux features a modified variant of Linux’s now-standardcompletely fair scheduler(CFS),28a timesharing policy inspired by proportional-share fair scheduling (Tijdeman,
1980; Stoicaet al., 1996). However,CFSis not well suited for providinga prioriguarantees,i.e., actual scheduling is not predictable. Redline Linux thus has little in common with LITMUSRT. The Redline Linux patch has not been updated since Linux version 2.6.22.5, which was released in 2007.
In contrast, AQuoSA is under active development (Cucinotta, 2011). Similar to Linux/RK, QLinux, and Linux-SRT, AQuoSA’s chief research goal is to provide soft QoS guarantees to time- sensitive applications (e.g., such as video conferencing) based on reservations. In addition, AQuoSA incorporates adaptive scheduling techniques to adjust reservation parameters dynamically (Cucinotta
et al., 2004). AQuoSA further implements abandwidth inheritance protocol, which applies the idea of priority inheritance to reservation-based scheduling (Lamastraet al., 2001). While AQuoSA was a uniprocessor effort during the first couple of years of the project, it recently gained multiprocessor support (Checconiet al., 2009).
The uniprocessor reservation scheduler in AQuoSA is based onEDF. In later work, Faggioli
et al. (2009a,b) developedSCHED DEADLINE, a standalone, multiprocessor-capableEDF im- plementation with the stated purpose of inclusion in mainline Linux, which is a major engineering undertaking. (In contrast, LITMUSRT is not designed to be merged into Linux.) Together, with Linux’s affinity mask support, this patch addsG-EDF,C-EDF, andP-EDFto Linux and thus would be a significant extension of Linux’s algorithmic real-time capabilities. While key kernel developers have voiced their support (Corbet, 2010), theSCHED DEADLINEpatch has not been accepted into mainline Linux to date. From a predictability point of view, there are some concerns with Linux’s implementation of affinity masks in the context of global and clustered scheduling; these also affect SCHED DEADLINEand are discussed in more detail in Section 3.2.4.
Finally, in very recent work, Katoet al. (2010) presented theAdvanced Interactive Real-Time Scheduler(AIRS) for Linux. AIRS is based onSCHED DEADLINEbut adds asemi-partitioned
EDFvariant calledEDF-WM. In a semi-partitioned algorithm (Andersonet al., 2005), most tasks are statically assigned to a processor (as under partitioning) and only a few tasks migrate (as under
28
In contrast to most other published proportional-share fair schedulers, upper and lower bounds on lag underCFSare unknown; the name “completely fair” is thus not necessarily descriptive of its properties.
global scheduling). Like AQuoSA, QLinux, Linux-SRT, and Linux/RK, AIRS is based on resource reservations and aims to enable high QoS levels for “soft” interactive workloads such as video playback on desktop systems (Katoet al., 2010). Besides processor scheduling, recent versions of AIRS also support memory reservations (Katoet al., 2011). On a related note, Bastoni (2011) recently presented an implementation and evaluation ofEDF-WMand several other semi-partitioned schedulers in LITMUSRT(Bastoniet al., 2011; Bastoni, 2011).
Summary. This concludes our review of prior work and illustrates why we developed LITMUSRT. GlobalJLFPorJLDPschedulers such asG-EDForPD2 are not supported by any of the discussed RTOSs and Linux variants (with the exception ofSCHED DEADLINE, which has only recently become available). Further, global scheduling is only supported by means of affinity masks (if at all), which are ill-suited to implementing policies with frequent migrations such asPD2(as discussed in
Section 3.3.4). None of the major RTOSs implement multiprocessor locking protocols for partitioned scheduling for which pi-blocking analysis has been derived (in the published literature). Additionally, in 2006, when the development of LITMUSRTcommenced, most prior academic real-time Linux extensions were no longer being maintained. To study multiprocessor scheduling and locking in a real OS, development of a new, extensible testbed was thus required.
In the next chapter, we discuss our native real-time Linux extension LITMUSRT, how it im- plements the sporadic task model and schedulers described in this chapter, and how to incorporate runtime overheads that arise in LITMUSRTinto the idealized sporadic task model and associated schedulability analysis.
CHAPTER 3
THEORY, PRACTICE, AND OVERHEADS
∗
In this chapter, we present our work aimed at reconciling practice and theory in multiprocessor real-time systems. In Section 3.1, we revisit the assumptions underlying the real-time theory reviewed in Chapter 2, discuss why some of them are problematic in practice, and argue in favor of a realistic compromise between pure analysis and practical limitations. In Section 3.2, we review the Linux foundation underlying LITMUSRT. Thereafter, in Section 3.3, we present LITMUSRT, which was designed to closely correspond to the sporadic task model, and discuss solutions to implementation problems that we faced when building LITMUSRT. In Sections 3.4 and 3.5 we detail how to account for runtime overheads during schedulability analysis and finally present overhead-aware versions of the schedulability tests reviewed in Chapter 2.