• No results found

Cache-Aware Compositional Analysis of Real-Time Multicore Virtualization Platforms

N/A
N/A
Protected

Academic year: 2021

Share "Cache-Aware Compositional Analysis of Real-Time Multicore Virtualization Platforms"

Copied!
35
0
0

Loading.... (view fulltext now)

Full text

(1)

Cache-Aware Compositional Analysis of

Real-Time Multicore Virtualization Platforms

Meng Xu, Linh T.X. Phan, Insup Lee, Oleg Sokolsky, Sisu Xi, Chenyang Lu and Christopher D. Gill

(2)

Complex Systems on Multicore Platforms

• Embedded systems

– Become more and more complex – Consist of multiple sub-systems

• Multicore platforms

– Number of cores keeps increasing

http://www.codeproject.com/Articles/16165/Robotics-Embedded-Systems-Part-I International technology roadmap for semiconductors

(3)

• The benefits of virtualization

– Consolidate legacy systems

– Integrate large, complex systems

Virtualization

Virtual Machine Monitor

VM 0 VM 1 VM 2

Guest OS

Guest OS Guest OS

VCPU VCPU VCPU VCPU VCPU VCPU VCPU VCPU

CPU cache CPU cache CPU cache CPU cache

(4)

Compositional Analysis for RT Guarantees

Virtual Machine Monitor

VM 0 VM 1 VM 2

Guest OS

Guest OS Guest OS

VCPU VCPU VCPU VCPU VCPU VCPU VCPU VCPU

CPU cache CPU cache CPU cache CPU cache

Interface 0 Interface 1 Interface 2

• Step 1: Abstract each component (VM) into an interface • Step 2: Transform each interface into a set of VCPUs

• Step 3: Abstract the VCPUs of all VMs to the system’s interface

Interface of the system

(5)

Limitations of Existing Multicore

Compositional Analysis

• Existing multicore compositional analysis does not

consider platform overhead

• In practice, platform overhead is not negligible

– Example: cache overhead

• Result: unsafe analysis!

– Reason: analysis does not consider the effect of cache overhead in virtualization and under-estimates resource – Examples: cache overhead due to task preemption,

(6)

Contributions

• Introduce overhead-free compositional analysis

– DMPR: improved MPR resource model

• Quantify events that cause cache overhead

– Task-preemption events, VCPU-preemption events, VCPU-completion events

• Propose cache-aware compositional analysis

– Hybrid analysis: combination of task-centric analysis and model-centric analysis

(7)

Deterministic Multi-Processor Resource

Model (DMPR)

Worst-case resource supply of a DMPR

t

1 VP 2 VP 3 VP m , ,Θ Π = µ DMPR

m

full VCPUs (i.e., with bandwidth 1) Θ

Π, one partial VCPU, with period and budget Π Interface Bandwidth =

Π

Θ

+

m

Θ Partial VCPU: Full VCPU: Full VCPU: 2 , 1 , 5 =

µ

(8)

Assumptions

hEDF scheduling of VCPUs

1

VP VP3

1

cpu cpu2 cpu3 cpu4

2

VP VP4 VP5

pin gEDF

• Each core has a private cache; no shared cache

• Period of each component’s interface is given by designers • Maximum cache overhead per task preemption or

migration in the system is upper bounded by

• Virtual machine monitor uses hybrid EDF (hEDF)

crpmd

(9)

Outline

• Introduction

Events that cause cache overhead

• Cache-aware compositional analysis

• Evaluation

(10)

0 1 2 3 4 5 6 7 8

Event 1: Task Preemption Event

Definition: A task-preemption event happens when a task

preempts another task within the same VM.

1

cpu

2 1

,

τ

τ

2

cpu

3

τ

Cache overhead Example } , , {τ1 τ2 τ3 τ = ) 5 , 12 ( 1 = τ gEDF ) 5 , 8 ( 2 = τ ) 3 , 4 ( 3 = τ 2

τ

1

τ

τ

3 1

τ

1 2 3

τ

τ

τ

> > priority t t Task-preemption event overhead
(11)

Event 2: VCPU-Preemption Event

gEDF 1 τ τ2 τ3 1 C 1 VP 1 , 3 , 5 1= µ gEDF 7 τ τ8 3 C µ3 = 6,4,0 hEDF 2 VP VP5 gEDF 4 τ τ5 τ6 2 C 3 VP 1 , 3 , 8 2 = µ 4 VP gEDF pin

Full VCPU Partial VCPU

Example:

Definition: A VCPU-preemption event occurs when a

VCPU is preempted by another VCPU of another VM.

(a) VMs’ configuration (b) VCPUs’ configuration 2 3 4 1 VP VP3 1 2 2 VP VP4 VP5 3 4 1 CPU CPU

(12)

0 1 2 3 4 5 6 7 8

3

VP

4

VP

Event 2: VCPU-Preemption Event

(c) Scheduling of partial VCPUs

0 1 2 3 4 5 6 7 8 ) 3 , 5 ( 2 VP VP5(6,4) VP4(8,3) 5 4,VP VP VP2

(d) Cache overhead of tasks in component 2

) 2 , 6 ( 5 τ τ4(8,4) τ6(10,1.5)

VP unavailable cache overhead

overhead caused by VCPU-preemption event gEDF 4 τ τ5 τ6 2 C ) 2 , 6 ( 5 τ ) 4 , 8 ( 4 τ ) 5 . 1 , 10 ( 6 τ ) 3 , 5 ( 2 VP ) 4 , 6 ( 5 VP ) 3 , 8 ( 4 VP gEDF CPU1 CPU2 4

τ

τ

6

τ

5 3 VP VP4
(13)

Event 3: VCPU-Completion Event

Definition: A VCPU-completion event of a VCPU happens

when the VCPU exhausts its budget in a period and stops its execution. Example: 0 1 2 3 4 5 6 7 8 1 VP 2 VP ) 2 , 6 ( 2 τ τ1(8,4) τ3(10,1.5)

VP unavailable cache overhead

gEDF 1 τ τ2 τ3 C ) 2 , 6 ( 2 τ ) 4 , 8 ( 1 τ ) 5 . 1 , 10 ( 3 τ 1

τ

τ

3

τ

2 1 VP VP2 cache overhead caused by VCPU-completion event full (4,2)
(14)

Outline

• Introduction

• Events that cause cache overhead

Cache-aware compositional analysis

(15)

• Task-preemption event

– Inflate higher priority task with one cache overhead

• VCPU-preemption/completion event

– Inflate task with the number of cache overhead caused by VCPU-preemption/completion events during a task’s period

Task-Centric Analysis

) ( 2 , 3 , k i k i VP VP crpmd k k e N N e′ = + ∆ τ + τ

(a) Task-preemption event overhead

k

τ

j

τ

crpmd

crpmd k k e e′ = + ∆ k

τ

(b) VCPU-preemption event

overhead during a period of task k number of VCPU-preemption/ completion events

task-preemption event cache overhead for task k

(2) (1)

(16)

Task-Centric Analysis

• Inflated WCET of each task

• System is schedulable under cache overhead if

the inflated workload is schedulable

) ( 3 , 2 , k i k i VP VP crpmd crpmd i k e N N e′ = + ∆ + ∆ τ + τ

(17)

Pessimistic When Number of Tasks Is Large

Cache overhead in VCPU-completion event

• Only two tasks have cache overhead in a VCPU-preemption/completion event

– But don’t know which two tasks have cache overhead

– To be safe: have to inflate all tasks’ WCET with one cache overhead per VCPU-preemption/completion event

Only two tasks have cache overhead due to the event

0 1 2 3 4 5 6 7 8 ) 2 , 6 ( 2 τ τ1(8,4) τ3(10,2.5) 1 VP 2 VP

VP unavailable cache overhead

3 1,τ

(18)

Model-Centric Approach

• Subtract the overhead due to VCPU-preemption/completion

events from the original resource supply of the interface to obtain its effective resource supply .

VCPU-preemption/ completion event overhead

Task-preemption event overhead

(19)

Effective SBF of DMPR Interface

Effective SBF of the interface

Effective SBF of the partial VCPU

Effective SBF of m full VCPUs

Reason: A DMPR interface provides resource with one partial VCPU and m full VCPUs

(20)

Worst Case Scenario of Effective Resource

Supply of Partial VCPU:

t

i

VP

1

t t2 t3 t4 t5 t6 t7 t8

Worst-case effective resource supply of the partial VCPU

The worst case happens when:

(1) The partial VCPU has all VCPU-preemption/completion events

(3) The time interval t begins when the VCPU finishes supplying its effective resource in the first period.

(2)The partial VCPU incurs the overhead as late as possible in the first period and as early as possible in the rest of periods

(1) (2)

(3)

Maximum number of VCPU-preemption/completion events during a partial VCPU’s period is computed in the paper

(21)

z

0 } , 0 max{ * + − − Π − Θ ≠ Θ t x y z if y

=

)

(

t

SBF

VPstop i 0 if Θ = 0

Effective Resource Supply of Partial VCPU

m , ,Θ Π = µ

where VPi belongs to interface

*

Θ

Π

=

crpmd

x

y = tΠxand

z

=

Π

Θ

*

Worst-case effective resource supply of the partial VCPU VPi * Θ i VP 1 t t 3 t t4 t5 t6 t8 2 t

x

7 t , } , 0 max{ * stop crpmd VPi N ∆ − Θ = Θ
(22)

Effective SBF of The Interface

Effective SBF of the interface

Effective SBF of the partial VCPU Effective SBF of m full VCPUs

(23)

Model-Centric Analysis

gEDF C µ = 10,8.5,1 1 τ  τ5 ) 5 , 20 ( , , 5 1 τ = τ 

• Step 1: Consider task-preemption event overhead

• Step 2: Consider VCPU-preemption/completion event overhead

• Step 3: Check if

(24)

Pessimistic When Number of Full VCPUS Is

Large

• Only one full VCPU is affected per VCPU-preemption/ completion event in practice

• But all full VCPUs marked unavailable at a VCPU-preemption/ completion event when we compute the effective SBF of m

full VCPUs 4 VP 3 VP CRPMD 1 t t2 t3 t4 t5 t6 t7 2 VP 1 VP

(25)

Task-Centric vs. Model-Centric

• Neither of these two analysis dominates the other

hEDF gEDF 1 τ gEDF 2 τ τ5 1 C C2 C ) 50 , 100 ( = τ period = 20 period = 50 period = 5 ) 50 , 100 ( , , 5 2 τ = τ  Bandwidth of task-centric analysis: 4.94 Bandwidth of model-centric analysis: 6.90 Task-centric is better hEDF gEDF 1 τ gEDF 2 τ τ5 1 C C2 C ) 25 , 100 ( 1 = τ period = 20 period = 50 period = 5 ) 25 , 100 ( , , 5 2 τ = τ  Bandwidth of task-centric analysis: 3.82 Bandwidth of model-centric analysis: 2.86 Model-centric is better 2 = ∆crpmd 2 = ∆crpmd

(26)

Hybrid Cache-Aware Analysis

hEDF gEDF 1 τ  τ5 gEDF 6 τ τ10 1 C C2 C ) 25 , 100 ( , , 5 2 τ = τ  ) 25 , 100 ( 1 = τ period = 20 period = 50 period = 5
(27)

Hybrid Cache-Aware Analysis

hEDF 1 C C2 C ) 25 , 100 ( , , τ = τ  = τ hEDF gEDF 1 τ  τ5 gEDF 6 τ τ10 1 C C2 C ) 25 , 100 ( , , τ = τ  ) 25 , 100 ( = τ

Task-centric analysis Model-centric analysis

0 , 8 . 9 , 20 1 = µ µ2 = 50,36.1,2 3 , 1 . 4 , 5 = µ 0 , 8 . 8 , 20 1 = µ µ2 = 50,39.7,1 2 , 3 . 4 , 5 = µ bandwidth :3.82 bandwidth :2.86

(28)

Outline

• Introduction

• Events that cause cache overhead

• Cache-aware compositional analysis

(29)

Experimental Setup

Dell Optiplex-980 quad-core workstation (3 cores for guest VMs, 1 core for VM0)

gEDF 1 1 τ k 1 τ 1 D gEDF 1 2 τ k 2 τ 2 D gEDF 1 3 τ k 3 τ 3 D gEDF 1 4 τ k 4 τ 4 D hEDF 256 1 = Π Π2 =128 Π3 =64 Π4 =32 9 . 1 = ∆crpmd     RT-Xen LITMUS WSS=256KB Task set: utilization 1.8;

Task utilization distribution: uniformly in [0.001,0.1]

measured

Hardware

(30)

Cache Overhead Is Not Negligible

MPR DMPR

Theory RT-Xen Theory RT-Xen

Schedulable Yes No Yes No

Cache-aware Hybrid Cache-aware Task-centric

Theory RT-Xen Theory RT-Xen

Schedulable No No No No

Unsafe—taskset claimed schedulabled by overhead-free analysis is not schedulable in practice

Safe—same taskset is claimed NOT schedulable by cache-aware analysis

(31)

Simulation Setup

Task's period uniformly in [350ms, 850ms]

Task's utilization

uniform uniformly in [0.001,0.1]

light bimodal 8/9 in [0.1,0.4] and 1/9 in [0.5,0.9]

medium bimodal 6/9 in [0.1,0.4] and 3/9 in [0.5,0.9]

heavy bimodal 4/9 in [0.1,0.4] and 5/9 in [0.5,0.9]

gEDF 1 1 τ k 1 τ 1 D gEDF 1 2 τ k 2 τ 2 D gEDF 1 3 τ k 3 τ 3 D gEDF 1 4 τ k 4 τ 4 D hEDF 256 1 = Π Π2 =128 Π3 =64 Π4 =32 9 . 0 = ∆crpmd     ms

(32)

Hybrid Analysis Saves Bandwidth

Hybrid analysis saves bandwidth over task-centric analysis per taskset utilization

Hybrid approach saves bandwidth for 64% of the tasksets

003 . 0 wcet Average = crpmd

(33)

Hybrid Analysis Saves Bandwidth

a) bimodal-light distribution b) bimodal-medium distribution c) bimodal-heavy distribution

0005 .

0 wcet Average =

Hybrid analysis still saves bandwidth over task-centric analysis when the distribution of tasks’ utilization changes

crpmd

0.0004 wcet Average = crpmd

0003 . 0 wcet Average = crpmd

(34)

Related Work

• Overhead-free compositional analysis

– S. Baruah and N. Fisher. Component-based design in multiprocessor real-time systems. In ICESS, 2009.

– A. Easwaran, I. Shin, and I. Lee. Optimal virtual cluster-based multiprocessor scheduling. Real-Time Systems, 43(1):25–59, 2009.

– H. Leontyev and J. H. Anderson. A hierarchical multiprocessor bandwidth reservation scheme with timing guarantees. In ECRTS, 2008.

– G. Lipari and E. Bini. A framework for hierarchical scheduling on multiprocessors: From application requirements to run-time allocation. In RTSS, 2010.

– E. Bini, M. Bertogna, and S. Baruah. Virtual multiprocessor platforms: Specification and use. In RTSS, 2009.

• Overhead-aware analysis on non-virtualization environment

– B. B. Brandenburg. Scheduling and Locking in Multiprocessor Real-Time

Operating Systems. PhD thesis, The University of North Carolina at Chapel Hill, 2011.

• Methods of getting the cache overhead value

– A. Bastoni, B. B. Brandenburg, and J. H. Anderson. Cache-Related Preemption and Migration Delays: Empirical Approximation and Impact on Schedulability. In OSPERT, 2010.

– S. Altmeyer, R. I. Davis, and C. Maiza. Improved cache related preemption delay aware response time analysis for fixed priority preemptive systems. Real-Time Systems, 2012.

(35)

Conclusion

• Contribution

– Propose DMPR resource model

– Introduce overhead-free compositional analysis under DMPR

– Quantify events that cause cache overhead – Propose cache-aware compositional analysis

• Future work

– Extend our method to multi-level cache hierarchy with shared cache

– Explore cache management methods to reduce the cache overhead

References

Related documents

Price's Station: Originally built by the Queen Anne & Kent Railroad, privately owned.. Princess Anne: Originally built by the PRR, used as

See also specifi c types

Younger people with a disability and their carers highlighted that there was a lack of recognition by service providers that younger clients and cares are more likely to

According to the previous results of Section 4 , it can be seen that three fundamental modes (A0-like, S0-like and SH0-like) of weld- guided waves at 120 kHz will be not observed in

C onstruction is expected to start in late spring on the Ross Street underpass. The Tender Package has been issued to six pre-qualified contractors with tenders closing at the

In the event of the biological mother being too unwell to give informed consent and sign the Authorisation for Discharge Of My Baby To Whāngai Parent Form (Appendix 2)

represented by: Tara L Grundemeier Linebarger Goggan Blair 1301 Travis Street Suite 300 Houston, TX 77002 [email protected] (Interested Party) Jason Elliott

Despite ratifying the need to deliver reliable products to customers within time constraints and costs, common to Classic Project Management, Highsmith (2009, p. 6)