Cache-Aware Compositional Analysis of Real-Time Multicore Virtualization Platforms

(1)

Cache-Aware Compositional Analysis of

Real-Time Multicore Virtualization Platforms

Meng Xu, Linh T.X. Phan, Insup Lee, Oleg Sokolsky, Sisu Xi, Chenyang Lu and Christopher D. Gill

(2)

Complex Systems on Multicore Platforms

• Embedded systems

– Become more and more complex – Consist of multiple sub-systems

• Multicore platforms

– Number of cores keeps increasing

http://www.codeproject.com/Articles/16165/Robotics-Embedded-Systems-Part-I International technology roadmap for semiconductors

(3)

• The benefits of virtualization

– Consolidate legacy systems

– Integrate large, complex systems

Virtualization

Virtual Machine Monitor

VM 0 VM 1 VM 2

Guest OS



Guest OS Guest OS

VCPU VCPU VCPU VCPU VCPU VCPU VCPU VCPU

CPU cache CPU cache CPU cache CPU cache

(4)

Compositional Analysis for RT Guarantees

Virtual Machine Monitor

VM 0 VM 1 VM 2

Guest OS



Guest OS Guest OS

VCPU VCPU VCPU VCPU VCPU VCPU VCPU VCPU

CPU cache CPU cache CPU cache CPU cache

Interface 0 Interface 1 Interface 2

• Step 1: Abstract each component (VM) into an interface • Step 2: Transform each interface into a set of VCPUs

• Step 3: Abstract the VCPUs of all VMs to the system’s interface

Interface of the system

(5)

Limitations of Existing Multicore

Compositional Analysis

• Existing multicore compositional analysis does not

consider platform overhead

• In practice, platform overhead is not negligible

– Example: cache overhead

• Result: unsafe analysis!

– Reason: analysis does not consider the effect of cache overhead in virtualization and under-estimates resource – Examples: cache overhead due to task preemption,

(6)

Contributions

• Introduce overhead-free compositional analysis

– DMPR: improved MPR resource model

• Quantify events that cause cache overhead

– Task-preemption events, VCPU-preemption events, VCPU-completion events

• Propose cache-aware compositional analysis

– Hybrid analysis: combination of task-centric analysis and model-centric analysis

(7)

Deterministic Multi-Processor Resource

Model (DMPR)

Worst-case resource supply of a DMPR

t

1 VP 2 VP 3 VP m , ,Θ Π = µ DMPR

m

full VCPUs (i.e., with bandwidth 1) Θ

Π, one partial VCPU, with period and budget Π Interface Bandwidth =

Π

Θ

+

m

Θ Partial VCPU: Full VCPU: Full VCPU: 2 , 1 , 5 =

µ

(8)

Assumptions

hEDF scheduling of VCPUs

1

VP VP₃

1

cpu cpu₂ cpu3 cpu₄

2

VP VP₄ VP₅

pin _gEDF

• Each core has a private cache; no shared cache

• Period of each component’s interface is given by designers • Maximum cache overhead per task preemption or

migration in the system is upper bounded by

• Virtual machine monitor uses hybrid EDF (hEDF)

crpmd

(9)

Outline

• Introduction

•

Events that cause cache overhead

• Cache-aware compositional analysis

• Evaluation

(10)

0 1 2 3 4 5 6 7 8

Event 1: Task Preemption Event

Definition: A task-preemption event happens when a task

preempts another task within the same VM.

1

cpu

2 1

,

τ

2

cpu

3

τ

Cache overhead Example } , , {τ₁ τ₂ τ₃ τ = ) 5 , 12 ( 1 = τ gEDF ) 5 , 8 ( 2 = τ ) 3 , 4 ( 3 = τ 2

τ

1

τ

₃ 1

τ

1 2 3

τ

> > priority t t Task-preemption event overhead

(11)

Event 2: VCPU-Preemption Event

gEDF 1 τ τ₂ τ3 1 C 1 VP 1 , 3 , 5 1= µ gEDF 7 τ τ₈ 3 C µ₃ = 6,4,0 hEDF 2 VP VP5 gEDF 4 τ τ₅ τ₆ 2 C 3 VP 1 , 3 , 8 2 = µ 4 VP gEDF pin

Full VCPU Partial VCPU

Example:

Definition: A VCPU-preemption event occurs when a

VCPU is preempted by another VCPU of another VM.

(a) VMs’ configuration (b) VCPUs’ configuration 2 3 ₄ 1 VP VP3 1 2 2 VP VP4 VP₅ 3 4 1 _CPU CPU

(12)

0 1 2 3 4 5 6 7 8

3

VP

4

VP

Event 2: VCPU-Preemption Event

(c) Scheduling of partial VCPUs

0 1 2 3 4 5 6 7 8 ) 3 , 5 ( 2 VP VP₅(6,4) VP₄(8,3) 5 4,VP VP VP2

(d) Cache overhead of tasks in component 2

) 2 , 6 ( 5 τ τ₄(8,4) τ₆(10,1.5)

VP unavailable cache overhead

overhead caused by VCPU-preemption event gEDF 4 τ τ₅ τ₆ 2 C ) 2 , 6 ( 5 τ ) 4 , 8 ( 4 τ ) 5 . 1 , 10 ( 6 τ ) 3 , 5 ( 2 VP ) 4 , 6 ( 5 VP ) 3 , 8 ( 4 VP gEDF CPU1 CPU2 4

τ

₆

τ

₅ 3 VP VP4

(13)

Event 3: VCPU-Completion Event

Definition: A VCPU-completion event of a VCPU happens

when the VCPU exhausts its budget in a period and stops its execution. Example: 0 1 2 3 4 5 6 7 8 1 VP 2 VP ) 2 , 6 ( 2 τ τ₁(8,4) τ₃(10,1.5)

gEDF 1 τ τ₂ τ₃ C ) 2 , 6 ( 2 τ ) 4 , 8 ( 1 τ ) 5 . 1 , 10 ( 3 τ 1

τ

₃

τ

₂ 1 VP VP₂ cache overhead caused by VCPU-completion event full (4,2)

(14)

Outline

• Introduction

• Events that cause cache overhead

•

Cache-aware compositional analysis

(15)

• Task-preemption event

– Inflate higher priority task with one cache overhead

–

• VCPU-preemption/completion event

– Inflate task with the number of cache overhead caused by VCPU-preemption/completion events during a task’s period

Task-Centric Analysis

) ( 2 _, 3 _, k i k i VP VP crpmd k k e N N e′ = + ∆ _τ + _τ

(a) Task-preemption event overhead

k

τ

j

τ

crpmd

∆

crpmd k k e e′ = + ∆ k

τ

(b) VCPU-preemption event

overhead during a period of task k number of VCPU-preemption/ completion events

task-preemption event cache overhead for task k

(2) (1)

(16)

Task-Centric Analysis

• Inflated WCET of each task

• System is schedulable under cache overhead if

the inflated workload is schedulable

) ( 3 _, 2 _, k i k i VP VP crpmd crpmd i k e N N e′ = + ∆ + ∆ _τ + _τ

(17)

Pessimistic When Number of Tasks Is Large

Cache overhead in VCPU-completion event

• Only two tasks have cache overhead in a VCPU-preemption/completion event

– But don’t know which two tasks have cache overhead

– To be safe: have to inflate all tasks’ WCET with one cache overhead per VCPU-preemption/completion event

Only two tasks have cache overhead due to the event

0 1 2 3 4 5 6 7 8 ) 2 , 6 ( 2 τ τ₁(8,4) τ₃(10,2.5) 1 VP 2 VP

3 1,τ

(18)

Model-Centric Approach

• Subtract the overhead due to VCPU-preemption/completion

events from the original resource supply of the interface to obtain its effective resource supply .

VCPU-preemption/ completion event overhead

Task-preemption event overhead

(19)

Effective SBF of DMPR Interface

Effective SBF of the interface

Effective SBF of the partial VCPU

Effective SBF of m full VCPUs

Reason: A DMPR interface provides resource with one partial VCPU and m full VCPUs

(20)

Worst Case Scenario of Effective Resource

Supply of Partial VCPU:

t

i

VP

1

t t₂ t₃ t₄ t₅ t₆ t₇ t₈

Worst-case effective resource supply of the partial VCPU

The worst case happens when:

(1) The partial VCPU has all VCPU-preemption/completion events

(3) The time interval t begins when the VCPU finishes supplying its effective resource in the first period.

(2)The partial VCPU incurs the overhead as late as possible in the first period and as early as possible in the rest of periods

(1) (2)

(3)

Maximum number of VCPU-preemption/completion events during a partial VCPU’s period is computed in the paper

(21)

z

0 } , 0 max{ * + − − Π − Θ ≠ Θ t x y z if y







=

)

(

t

SBF

_VPstop i 0 if Θ = 0

Effective Resource Supply of Partial VCPU

m , ,Θ Π = µ

where VP_i belongs to interface

*

Θ

−

∆

−

Π

=

crpmd

x

y = _t_Π− x_ _and

z

=

Π

−

Θ

*

Worst-case effective resource supply of the partial VCPU VP_i * Θ i VP 1 t t 3 t t₄ t₅ t₆ t₈ 2 t

x

7 t , } , 0 max{ * stop crpmd VP_i N ∆ − Θ = Θ

(22)

Effective SBF of The Interface

Effective SBF of the interface

Effective SBF of the partial VCPU Effective SBF of m full VCPUs

(23)

Model-Centric Analysis

gEDF C µ = 10,8.5,1 1 τ  τ5 ) 5 , 20 ( , , ₅ 1 τ = τ 

• Step 1: Consider task-preemption event overhead

• Step 2: Consider VCPU-preemption/completion event overhead

• Step 3: Check if

(24)

Pessimistic When Number of Full VCPUS Is

Large

• Only one full VCPU is affected per VCPU-preemption/ completion event in practice

• But all full VCPUs marked unavailable at a VCPU-preemption/ completion event when we compute the effective SBF of m

full VCPUs 4 VP 3 VP CRPMD 1 t t₂ t₃ t₄ t₅ t₆ t₇ 2 VP 1 VP

(25)

Task-Centric vs. Model-Centric

• Neither of these two analysis dominates the other

hEDF gEDF 1 τ gEDF 2 τ _ τ₅ 1 C C₂ C ) 50 , 100 ( = τ period = 20 period = 50 period = 5 ) 50 , 100 ( , , ₅ 2 τ = τ  Bandwidth of task-centric analysis: 4.94 Bandwidth of model-centric analysis: 6.90 Task-centric is better hEDF gEDF 1 τ gEDF 2 τ _ τ₅ 1 C C₂ C ) 25 , 100 ( 1 = τ period = 20 period = 50 period = 5 ) 25 , 100 ( , , ₅ 2 τ = τ  Bandwidth of task-centric analysis: 3.82 Bandwidth of model-centric analysis: 2.86 Model-centric is better 2 = ∆crpmd 2 = ∆crpmd

(26)

Hybrid Cache-Aware Analysis

hEDF gEDF 1 τ  τ5 gEDF 6 τ _ τ₁₀ 1 C C₂ C ) 25 , 100 ( , , ₅ 2 τ = τ  ) 25 , 100 ( 1 = τ period = 20 period = 50 period = 5

(27)

Hybrid Cache-Aware Analysis

hEDF 1 C C₂ C ) 25 , 100 ( , , τ = τ  = τ hEDF gEDF 1 τ  τ₅ gEDF 6 τ _ τ₁₀ 1 C C₂ C ) 25 , 100 ( , , τ = τ  ) 25 , 100 ( = τ

Task-centric analysis Model-centric analysis

0 , 8 . 9 , 20 1 = µ µ₂ = 50,36.1,2 3 , 1 . 4 , 5 = µ 0 , 8 . 8 , 20 1 = µ µ₂ = 50,39.7,1 2 , 3 . 4 , 5 = µ bandwidth :3.82 bandwidth :2.86

(28)

Outline

• Introduction

• Events that cause cache overhead

• Cache-aware compositional analysis

(29)

Experimental Setup

Dell Optiplex-980 quad-core workstation (3 cores for guest VMs, 1 core for VM0)

gEDF 1 1 τ k 1 τ 1 D gEDF 1 2 τ k 2 τ 2 D gEDF 1 3 τ k 3 τ 3 D gEDF 1 4 τ k 4 τ 4 D hEDF 256 1 = Π Π₂ =128 Π₃ =64 Π₄ =32 9 . 1 = ∆crpmd     RT-Xen LITMUS WSS=256KB Task set: utilization 1.8;

Task utilization distribution: uniformly in [0.001,0.1]

measured

Hardware

(30)

Cache Overhead Is Not Negligible

MPR DMPR

Theory RT-Xen Theory RT-Xen

Schedulable Yes No Yes No

Cache-aware Hybrid Cache-aware Task-centric

Theory RT-Xen Theory RT-Xen

Schedulable No No No No

Unsafe—taskset claimed schedulabled by overhead-free analysis is not schedulable in practice

Safe—same taskset is claimed NOT schedulable by cache-aware analysis

(31)

Simulation Setup

Task's period uniformly in [350ms, 850ms]

Task's utilization

uniform uniformly in [0.001,0.1]

light bimodal 8/9 in [0.1,0.4] and 1/9 in [0.5,0.9]

medium bimodal 6/9 in [0.1,0.4] and 3/9 in [0.5,0.9]

heavy bimodal 4/9 in [0.1,0.4] and 5/9 in [0.5,0.9]

gEDF 1 1 τ k 1 τ 1 D gEDF 1 2 τ k 2 τ 2 D gEDF 1 3 τ k 3 τ 3 D gEDF 1 4 τ k 4 τ 4 D hEDF 256 1 = Π Π₂ =128 Π₃ =64 Π₄ =32 9 . 0 = ∆crpmd     ms

(32)

Hybrid Analysis Saves Bandwidth

Hybrid analysis saves bandwidth over task-centric analysis per taskset utilization

Hybrid approach saves bandwidth for 64% of the tasksets

003 . 0 wcet Average = crpmd

∆

(33)

Hybrid Analysis Saves Bandwidth

a) bimodal-light distribution b) bimodal-medium distribution c) bimodal-heavy distribution

0005 .

0 wcet Average =

Hybrid analysis still saves bandwidth over task-centric analysis when the distribution of tasks’ utilization changes

crpmd

∆

₀_.₀₀₀₄ wcet Average = crpmd

∆

0003 . 0 wcet Average = crpmd

∆

(34)

Related Work

• Overhead-free compositional analysis

– S. Baruah and N. Fisher. Component-based design in multiprocessor real-time systems. In ICESS, 2009.

– A. Easwaran, I. Shin, and I. Lee. Optimal virtual cluster-based multiprocessor scheduling. Real-Time Systems, 43(1):25–59, 2009.

– H. Leontyev and J. H. Anderson. A hierarchical multiprocessor bandwidth reservation scheme with timing guarantees. In ECRTS, 2008.

– G. Lipari and E. Bini. A framework for hierarchical scheduling on multiprocessors: From application requirements to run-time allocation. In RTSS, 2010.

– E. Bini, M. Bertogna, and S. Baruah. Virtual multiprocessor platforms: Specification and use. In RTSS, 2009.

• Overhead-aware analysis on non-virtualization environment

– B. B. Brandenburg. Scheduling and Locking in Multiprocessor Real-Time

Operating Systems. PhD thesis, The University of North Carolina at Chapel Hill, 2011.

• Methods of getting the cache overhead value

– A. Bastoni, B. B. Brandenburg, and J. H. Anderson. Cache-Related Preemption and Migration Delays: Empirical Approximation and Impact on Schedulability. In OSPERT, 2010.

– S. Altmeyer, R. I. Davis, and C. Maiza. Improved cache related preemption delay aware response time analysis for fixed priority preemptive systems. Real-Time Systems, 2012.

(35)

Conclusion

• Contribution

– Propose DMPR resource model

– Introduce overhead-free compositional analysis under DMPR

– Quantify events that cause cache overhead – Propose cache-aware compositional analysis

• Future work

– Extend our method to multi-level cache hierarchy with shared cache

– Explore cache management methods to reduce the cache overhead