Cache-Aware Compositional Analysis of
Real-Time Multicore Virtualization Platforms
Meng Xu, Linh T.X. Phan, Insup Lee, Oleg Sokolsky, Sisu Xi, Chenyang Lu and Christopher D. Gill
Complex Systems on Multicore Platforms
• Embedded systems
– Become more and more complex – Consist of multiple sub-systems
• Multicore platforms
– Number of cores keeps increasing
http://www.codeproject.com/Articles/16165/Robotics-Embedded-Systems-Part-I International technology roadmap for semiconductors
• The benefits of virtualization
– Consolidate legacy systems
– Integrate large, complex systems
Virtualization
Virtual Machine Monitor
VM 0 VM 1 VM 2
Guest OS
Guest OS Guest OS
VCPU VCPU VCPU VCPU VCPU VCPU VCPU VCPU
CPU cache CPU cache CPU cache CPU cache
Compositional Analysis for RT Guarantees
Virtual Machine Monitor
VM 0 VM 1 VM 2
Guest OS
Guest OS Guest OS
VCPU VCPU VCPU VCPU VCPU VCPU VCPU VCPU
CPU cache CPU cache CPU cache CPU cache
Interface 0 Interface 1 Interface 2
• Step 1: Abstract each component (VM) into an interface • Step 2: Transform each interface into a set of VCPUs
• Step 3: Abstract the VCPUs of all VMs to the system’s interface
Interface of the system
Limitations of Existing Multicore
Compositional Analysis
• Existing multicore compositional analysis does not
consider platform overhead
• In practice, platform overhead is not negligible
– Example: cache overhead
• Result: unsafe analysis!
– Reason: analysis does not consider the effect of cache overhead in virtualization and under-estimates resource – Examples: cache overhead due to task preemption,
Contributions
• Introduce overhead-free compositional analysis
– DMPR: improved MPR resource model
• Quantify events that cause cache overhead
– Task-preemption events, VCPU-preemption events, VCPU-completion events
• Propose cache-aware compositional analysis
– Hybrid analysis: combination of task-centric analysis and model-centric analysis
Deterministic Multi-Processor Resource
Model (DMPR)
Worst-case resource supply of a DMPR
t
1 VP 2 VP 3 VP m , ,Θ Π = µ DMPRm
full VCPUs (i.e., with bandwidth 1) ΘΠ, one partial VCPU, with period and budget Π Interface Bandwidth =
Π
Θ
+
m
Θ Partial VCPU: Full VCPU: Full VCPU: 2 , 1 , 5 =µ
Assumptions
hEDF scheduling of VCPUs
1
VP VP3
1
cpu cpu2 cpu3 cpu4
2
VP VP4 VP5
pin gEDF
• Each core has a private cache; no shared cache
• Period of each component’s interface is given by designers • Maximum cache overhead per task preemption or
migration in the system is upper bounded by
• Virtual machine monitor uses hybrid EDF (hEDF)
crpmd
Outline
• Introduction
•
Events that cause cache overhead
• Cache-aware compositional analysis
• Evaluation
0 1 2 3 4 5 6 7 8
Event 1: Task Preemption Event
Definition: A task-preemption event happens when a task
preempts another task within the same VM.
1
cpu
2 1,
τ
τ
2cpu
3τ
Cache overhead Example } , , {τ1 τ2 τ3 τ = ) 5 , 12 ( 1 = τ gEDF ) 5 , 8 ( 2 = τ ) 3 , 4 ( 3 = τ 2τ
1τ
τ
3 1τ
1 2 3τ
τ
τ
> > priority t t Task-preemption event overheadEvent 2: VCPU-Preemption Event
gEDF 1 τ τ2 τ3 1 C 1 VP 1 , 3 , 5 1= µ gEDF 7 τ τ8 3 C µ3 = 6,4,0 hEDF 2 VP VP5 gEDF 4 τ τ5 τ6 2 C 3 VP 1 , 3 , 8 2 = µ 4 VP gEDF pinFull VCPU Partial VCPU
Example:
Definition: A VCPU-preemption event occurs when a
VCPU is preempted by another VCPU of another VM.
(a) VMs’ configuration (b) VCPUs’ configuration 2 3 4 1 VP VP3 1 2 2 VP VP4 VP5 3 4 1 CPU CPU
0 1 2 3 4 5 6 7 8
3
VP
4
VP
Event 2: VCPU-Preemption Event
(c) Scheduling of partial VCPUs
0 1 2 3 4 5 6 7 8 ) 3 , 5 ( 2 VP VP5(6,4) VP4(8,3) 5 4,VP VP VP2
(d) Cache overhead of tasks in component 2
) 2 , 6 ( 5 τ τ4(8,4) τ6(10,1.5)
VP unavailable cache overhead
overhead caused by VCPU-preemption event gEDF 4 τ τ5 τ6 2 C ) 2 , 6 ( 5 τ ) 4 , 8 ( 4 τ ) 5 . 1 , 10 ( 6 τ ) 3 , 5 ( 2 VP ) 4 , 6 ( 5 VP ) 3 , 8 ( 4 VP gEDF CPU1 CPU2 4
τ
τ
6τ
5 3 VP VP4Event 3: VCPU-Completion Event
Definition: A VCPU-completion event of a VCPU happens
when the VCPU exhausts its budget in a period and stops its execution. Example: 0 1 2 3 4 5 6 7 8 1 VP 2 VP ) 2 , 6 ( 2 τ τ1(8,4) τ3(10,1.5)
VP unavailable cache overhead
gEDF 1 τ τ2 τ3 C ) 2 , 6 ( 2 τ ) 4 , 8 ( 1 τ ) 5 . 1 , 10 ( 3 τ 1
τ
τ
3τ
2 1 VP VP2 cache overhead caused by VCPU-completion event full (4,2)Outline
• Introduction
• Events that cause cache overhead
•
Cache-aware compositional analysis
• Task-preemption event
– Inflate higher priority task with one cache overhead
–
• VCPU-preemption/completion event
– Inflate task with the number of cache overhead caused by VCPU-preemption/completion events during a task’s period
Task-Centric Analysis
) ( 2 , 3 , k i k i VP VP crpmd k k e N N e′ = + ∆ τ + τ(a) Task-preemption event overhead
k
τ
jτ
crpmd∆
crpmd k k e e′ = + ∆ kτ
(b) VCPU-preemption eventoverhead during a period of task k number of VCPU-preemption/ completion events
task-preemption event cache overhead for task k
(2) (1)
Task-Centric Analysis
• Inflated WCET of each task
• System is schedulable under cache overhead if
the inflated workload is schedulable
) ( 3 , 2 , k i k i VP VP crpmd crpmd i k e N N e′ = + ∆ + ∆ τ + τ
Pessimistic When Number of Tasks Is Large
Cache overhead in VCPU-completion event
• Only two tasks have cache overhead in a VCPU-preemption/completion event
– But don’t know which two tasks have cache overhead
– To be safe: have to inflate all tasks’ WCET with one cache overhead per VCPU-preemption/completion event
Only two tasks have cache overhead due to the event
0 1 2 3 4 5 6 7 8 ) 2 , 6 ( 2 τ τ1(8,4) τ3(10,2.5) 1 VP 2 VP
VP unavailable cache overhead
3 1,τ
Model-Centric Approach
• Subtract the overhead due to VCPU-preemption/completion
events from the original resource supply of the interface to obtain its effective resource supply .
VCPU-preemption/ completion event overhead
Task-preemption event overhead
Effective SBF of DMPR Interface
Effective SBF of the interface
Effective SBF of the partial VCPU
Effective SBF of m full VCPUs
Reason: A DMPR interface provides resource with one partial VCPU and m full VCPUs
Worst Case Scenario of Effective Resource
Supply of Partial VCPU:
t
i
VP
1
t t2 t3 t4 t5 t6 t7 t8
Worst-case effective resource supply of the partial VCPU
The worst case happens when:
(1) The partial VCPU has all VCPU-preemption/completion events
(3) The time interval t begins when the VCPU finishes supplying its effective resource in the first period.
(2)The partial VCPU incurs the overhead as late as possible in the first period and as early as possible in the rest of periods
(1) (2)
(3)
Maximum number of VCPU-preemption/completion events during a partial VCPU’s period is computed in the paper
z
0 } , 0 max{ * + − − Π − Θ ≠ Θ t x y z if y
=
)
(
t
SBF
VPstop i 0 if Θ = 0Effective Resource Supply of Partial VCPU
m , ,Θ Π = µ
where VPi belongs to interface
*
Θ
−
∆
−
Π
=
crpmdx
y = tΠ− x andz
=
Π
−
Θ
*Worst-case effective resource supply of the partial VCPU VPi * Θ i VP 1 t t 3 t t4 t5 t6 t8 2 t
x
7 t , } , 0 max{ * stop crpmd VPi N ∆ − Θ = ΘEffective SBF of The Interface
Effective SBF of the interface
Effective SBF of the partial VCPU Effective SBF of m full VCPUs
Model-Centric Analysis
gEDF C µ = 10,8.5,1 1 τ τ5 ) 5 , 20 ( , , 5 1 τ = τ • Step 1: Consider task-preemption event overhead
• Step 2: Consider VCPU-preemption/completion event overhead
• Step 3: Check if
Pessimistic When Number of Full VCPUS Is
Large
• Only one full VCPU is affected per VCPU-preemption/ completion event in practice
• But all full VCPUs marked unavailable at a VCPU-preemption/ completion event when we compute the effective SBF of m
full VCPUs 4 VP 3 VP CRPMD 1 t t2 t3 t4 t5 t6 t7 2 VP 1 VP
Task-Centric vs. Model-Centric
• Neither of these two analysis dominates the other
hEDF gEDF 1 τ gEDF 2 τ τ5 1 C C2 C ) 50 , 100 ( = τ period = 20 period = 50 period = 5 ) 50 , 100 ( , , 5 2 τ = τ Bandwidth of task-centric analysis: 4.94 Bandwidth of model-centric analysis: 6.90 Task-centric is better hEDF gEDF 1 τ gEDF 2 τ τ5 1 C C2 C ) 25 , 100 ( 1 = τ period = 20 period = 50 period = 5 ) 25 , 100 ( , , 5 2 τ = τ Bandwidth of task-centric analysis: 3.82 Bandwidth of model-centric analysis: 2.86 Model-centric is better 2 = ∆crpmd 2 = ∆crpmd
Hybrid Cache-Aware Analysis
hEDF gEDF 1 τ τ5 gEDF 6 τ τ10 1 C C2 C ) 25 , 100 ( , , 5 2 τ = τ ) 25 , 100 ( 1 = τ period = 20 period = 50 period = 5Hybrid Cache-Aware Analysis
hEDF 1 C C2 C ) 25 , 100 ( , , τ = τ = τ hEDF gEDF 1 τ τ5 gEDF 6 τ τ10 1 C C2 C ) 25 , 100 ( , , τ = τ ) 25 , 100 ( = τTask-centric analysis Model-centric analysis
0 , 8 . 9 , 20 1 = µ µ2 = 50,36.1,2 3 , 1 . 4 , 5 = µ 0 , 8 . 8 , 20 1 = µ µ2 = 50,39.7,1 2 , 3 . 4 , 5 = µ bandwidth :3.82 bandwidth :2.86
Outline
• Introduction
• Events that cause cache overhead
• Cache-aware compositional analysis
Experimental Setup
Dell Optiplex-980 quad-core workstation (3 cores for guest VMs, 1 core for VM0)
gEDF 1 1 τ k 1 τ 1 D gEDF 1 2 τ k 2 τ 2 D gEDF 1 3 τ k 3 τ 3 D gEDF 1 4 τ k 4 τ 4 D hEDF 256 1 = Π Π2 =128 Π3 =64 Π4 =32 9 . 1 = ∆crpmd RT-Xen LITMUS WSS=256KB Task set: utilization 1.8;
Task utilization distribution: uniformly in [0.001,0.1]
measured
Hardware
Cache Overhead Is Not Negligible
MPR DMPR
Theory RT-Xen Theory RT-Xen
Schedulable Yes No Yes No
Cache-aware Hybrid Cache-aware Task-centric
Theory RT-Xen Theory RT-Xen
Schedulable No No No No
Unsafe—taskset claimed schedulabled by overhead-free analysis is not schedulable in practice
Safe—same taskset is claimed NOT schedulable by cache-aware analysis
Simulation Setup
Task's period uniformly in [350ms, 850ms]
Task's utilization
uniform uniformly in [0.001,0.1]
light bimodal 8/9 in [0.1,0.4] and 1/9 in [0.5,0.9]
medium bimodal 6/9 in [0.1,0.4] and 3/9 in [0.5,0.9]
heavy bimodal 4/9 in [0.1,0.4] and 5/9 in [0.5,0.9]
gEDF 1 1 τ k 1 τ 1 D gEDF 1 2 τ k 2 τ 2 D gEDF 1 3 τ k 3 τ 3 D gEDF 1 4 τ k 4 τ 4 D hEDF 256 1 = Π Π2 =128 Π3 =64 Π4 =32 9 . 0 = ∆crpmd ms
Hybrid Analysis Saves Bandwidth
Hybrid analysis saves bandwidth over task-centric analysis per taskset utilization
Hybrid approach saves bandwidth for 64% of the tasksets
003 . 0 wcet Average = crpmd
∆
Hybrid Analysis Saves Bandwidth
a) bimodal-light distribution b) bimodal-medium distribution c) bimodal-heavy distribution
0005 .
0 wcet Average =
Hybrid analysis still saves bandwidth over task-centric analysis when the distribution of tasks’ utilization changes
crpmd
∆
0.0004 wcet Average = crpmd∆
0003 . 0 wcet Average = crpmd∆
Related Work
• Overhead-free compositional analysis
– S. Baruah and N. Fisher. Component-based design in multiprocessor real-time systems. In ICESS, 2009.
– A. Easwaran, I. Shin, and I. Lee. Optimal virtual cluster-based multiprocessor scheduling. Real-Time Systems, 43(1):25–59, 2009.
– H. Leontyev and J. H. Anderson. A hierarchical multiprocessor bandwidth reservation scheme with timing guarantees. In ECRTS, 2008.
– G. Lipari and E. Bini. A framework for hierarchical scheduling on multiprocessors: From application requirements to run-time allocation. In RTSS, 2010.
– E. Bini, M. Bertogna, and S. Baruah. Virtual multiprocessor platforms: Specification and use. In RTSS, 2009.
• Overhead-aware analysis on non-virtualization environment
– B. B. Brandenburg. Scheduling and Locking in Multiprocessor Real-Time
Operating Systems. PhD thesis, The University of North Carolina at Chapel Hill, 2011.
• Methods of getting the cache overhead value
– A. Bastoni, B. B. Brandenburg, and J. H. Anderson. Cache-Related Preemption and Migration Delays: Empirical Approximation and Impact on Schedulability. In OSPERT, 2010.
– S. Altmeyer, R. I. Davis, and C. Maiza. Improved cache related preemption delay aware response time analysis for fixed priority preemptive systems. Real-Time Systems, 2012.
Conclusion
• Contribution
– Propose DMPR resource model
– Introduce overhead-free compositional analysis under DMPR
– Quantify events that cause cache overhead – Propose cache-aware compositional analysis
• Future work
– Extend our method to multi-level cache hierarchy with shared cache
– Explore cache management methods to reduce the cache overhead