TABLE4.8: Cost analysis
Identical Sequential Random Average 0.1352 0.1353 0.1353 Min 0.1351 0.1352 0.1352 Max 0.1360 0.1361 0.1361 Median 0.1352 0.1353 0.1353 StdDev 0.000098 0.000088 0.000100 4.12 Conclusion
At the outset of this research, we identified a clear disconnect between the prolonged duration of SMI-based measurements from proposed SMM-RIMMs and the SMI latency guidelines. We believed that it may be possible to reduce the amount of time spent in SMM by applying decomposition to SMM-RIMM measurements to process them in fine-grained portions over a larger number of measurements. These smaller measurements could be scheduled at an
Chapter 4. Creation of Methodology for SMI Performance Measurement 80
increased frequency. However, it was unclear what system impact that this shorter but more frequent measurement approach would have.
Moreover, there were no established methodologies to compare this al- ternate scheduling approach to the current state of the art. To resolve this challenge, we developed methodologies for generating SMIs of varying dura- tions and frequencies to analyze their resulting impact. Our methodology is built upon three requirements: Rquantify, Rcontrol, and Rvalidate. Once able to quantify the time spent in SMM, we demonstrated controlling the amount of time spent in SMM (Rcontrol) by four SMI scheduling techniques. We then validated the time spent in SMM (Rvalidate) by comparing the quantified time in SMM against the expected degradation for two CPU-intensive workloads.
5
SMI Preemption Performance Study
In this chapter, we characterize the impacts of SMIs on the system by using our SMI measurement methodology. In our performance study, we varied the durations and frequencies of SMIs to show the resulting impacts from different SMI scheduling approaches. In Section 5.1, we examine the system impacts of this time spent in SMM and we cover the resulting impacts on applications in Section 5.2. We summarize the SMI latency study in Section 5.3.
5.1 System-level Effects
We begin in Section 5.1.1 by examining timing assumptions in the kernel and device drivers. We show the symptoms of spending an excessive amount of time in SMM in Section 5.1.2. We then examine the impacts of SMIs on timer interrupts and the impact on CPU power C-states in Section 5.1.3. We cover the impacts of SMIs on process accounting in Section 5.1.4. We summarize the system-level effects in Section 5.1.5.
5.1.1 Timing Expectations in Code
The Linux kernel source code contains assumptions about SMI durations in several places. For example, the function that calibrates the CPU’s TSC during boot native_calibrate_tsc, uses the tsc_read_refs function which has special handling of SMI disturbances. tsc_read_refs checks two close reads of the CPU’s timestamp counter to ensure that they are less than the declared SMI_THRESHOLD=50000 (CPU clocks) to avoid a scenario where an SMI occurs between the two reads. If the system cannot obtain two close reads of the TSC of a duration less than the SMI_THRESHOLD, it will try up to five times before returning [55]. Prolonged or inopportune SMIs could result in a situation where the TSC could not be used as the clocksource for timing due
Chapter 5. SMI Preemption Performance Study 82
to an inability to properly calibrate it. Other clocksource calibration sections of the Linux kernel feature similar concerns over the impact of an SMI hitting during calibration including functions pit_calibrate_tsc and hpet_next_event.
USB audio relies upon careful synchronization to keep the audio playback in sync. To better establish an upper bound for SMI durations, we developed a measurement to study the impact of prolonged SMIs on a USB audio speaker device. For this experiment, we used a pair of Logitech S-150 USB speakers and a system running Centos 6.0 with a Linux 3.7.1 kernel. We booted into the GUI and began playing a streaming audio file from YouTube. While playing the audio file, we generated progressively longer SMIs using our modified BIOS mechanism while checking the system log via the ‘dmesg’ command after each SMI completed.
We observed a number of warnings with SMIs up to the 1000 ms (1 second) range. At a SMI duration of 1000 ms, the audio completely stopped and the driver reported an error in the system log. Table 5.1 shows these results. The warnings we saw resulted from the snd_pcm_delay function which defines the playback delay as "the overall latency from the write call to the final DAC [94]." The code provides a warning when the delay estimate is off by more than 2 ms. In this measurement, we find another example of software with built-in timing assumptions that could be significantly altered by longer SMM-RIMM measurements. We also note that the SMI durations described in the HyperSentry and HyperCheck papers are in the range to cause warnings from the ALSA sound sub-system.
5.1.2 Symptoms of Excessive Time Spent in SMM
During development of our SMM-RIMM development system, we encoun- tered situations where we spent too long in a single SMI session. For example,
TABLE5.1: USB Audio Sensitivity to Prolonged SMI Delays
SMM time (ms) Warning
1.43 ALSA sound/usb/pcm.c:1213 delay: estimated 144, actual 0
5-999 ALSA sound/usb/pcm.c:1213 delay: estimated [336 to 384], actual 0 1000 ALSA sound/usb/endpoint.c:391
cannot submit urb (err = -27)
with extensive serial output and a large hash operation, one of our preemp- tions inadvertently lasted 247 seconds. The Linux operating system appeared frozen during this duration, however, at the completion of the SMI, the sys- tem continued operating, albeit with several warnings and errors. These included: warnings from the Read Copy Update mechanism, warnings about an unstable clocksource, a hardware interrupt timeout, and a disk I/O error. Clearly this is far too long to spend in SMM, however, it is indicative of the types of errors that can occur if excessive time is spent in a single SMI session. It also demonstrates that even if a user is not using the system, exhaustive checking within a single SMI can easily overwhelm timeout expectations in the kernel and device drivers.
INFO: rcu_sched self-detected stall on CPU { INFO: rcu_sched self-detected stall on CPU { 1} 0} (t=61790 jiffies g=545203 c=545202 q=0) (t=61790 jiffies g=545203 c=545202 q=0) . . .
Clocksource tsc unstable (delta = 243722605708 ns) mmc2: Timeout waiting for hardware interrupt. . . .
Chapter 5. SMI Preemption Performance Study 84
end\_request: I/O error, dev mmcblk0, sector 90111920 EXT4-fs warning (device mmcblk0p2): ext4_end_bio:317:
I/O error -5 writing to inode 3019625 (offset 0 size 0 starting block 11263991)
Buffer I/O error on device mmcblk0p2, logical block 11132662 Switched to clocksource acpi_pm
5.1.3 Timer Interrupt Effects
We originally began our investigation of the system effects of SMIs by focusing on the Linux timer interrupt. Our rationale was that this interrupt effectively drove a wide variety of system tasks ranging from process accounting to setting timers. We were very interested to examine the effects of SMIs on this critical kernel functionality as SMIs would take precedence over the timer interrupts.
5.1.3.1 Timer Interrupt Background
Traditionally many important scheduling and statistical operations in the Linux kernel happened on a regular timer tick interval, e.g. [100, 250, 300, 1000] times a second. For power savings reasons and reduced virtualization overheads, the "tickless kernel" option has been added, allowing the kernel to remain idle longer by avoiding unnecessary wake-ups. If the next scheduled timer event would occur after the next periodic timer tick, the kernel would reprogram "the per-CPU clock event device to this future event" allowing the CPU to remain idle longer [34]. In both traditional and tickless operation, our inspection of the Linux 3.1.4 kernel showed that once the kernel wakes, it runs several key functions in do_timer which update the kernel’s internal clock count (jiffy) and wall clock time, and calculate the load on the system. (Refer to Figure 5.1.) Then it calls update_process_times which charges time
to executing processes, runs high resolution timers and raises SoftIRQs for local timers, checks if the system is in a quiet state for RCU callbacks, does printk statements, runs IRQ work, calls scheduler_tick and then runs timers that are due [72]. The scheduler_tick function performs several important tasks including updating scheduler timestamp data, updating timestamps for processes on the run queue, updating CPU load statistics based on the run queue, invoking the scheduler, updating performance events for the Linux Performance Event subsystem, determining if a CPU is idle at the clock tick, and load balancing tasks between CPU run queues. Intel technical documen- tation notes "All interrupts normally handled by the operating system are disabled upon entry into SMM [45]." This presents the possibility for an SMI to perturb timer interrupts and consequently impact the important scheduling operations in scheduler_tick as a side effect.