• No results found

Remus: : High Availability via Asynchronous Virtual Machine Replication

N/A
N/A
Protected

Academic year: 2021

Share "Remus: : High Availability via Asynchronous Virtual Machine Replication"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

1 1

Remus

Remus

: High Availability via

: High Availability via

Asynchronous Virtual Machine

Asynchronous Virtual Machine

Replication

Replication

Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike

Mike FeeleyFeeley, Norm Hutchinson, and Andrew Warfield, Norm Hutchinson, and Andrew Warfield

Department of Computer Science Department of Computer Science The University of British Columbia The University of British Columbia

NSDI

NSDI ’’08: 5th USENIX Symposium on Networked 08: 5th USENIX Symposium on Networked Systems Design and Implementation (2008)

Systems Design and Implementation (2008)

Presented by

(2)

Outline

Outline

ƒ

ƒ

Introduction

Introduction

ƒ

ƒ

Design and Implementation

Design and Implementation

ƒ

ƒ

Evaluation

Evaluation

ƒ

(3)

3 3

Introduction

Introduction

ƒ

ƒ

It

It

s hard and expensive to design highly available

s hard and expensive to design highly available

system to survive hardware failure

system to survive hardware failure

– Using redundant component.Using redundant component. –

– Customized hardware and software.Customized hardware and software. –

– E.g. HP Nonstop Server.E.g. HP Nonstop Server.

ƒ

ƒ

Our goal is to provide high availability system, and

Our goal is to provide high availability system, and

it

it’

s :

s :

– Generality and transparencyGenerality and transparency

ƒ

ƒ Without modification of OS and App.Without modification of OS and App.

(4)

Introduction

Introduction

ƒ

ƒ

Our approach

Our approach

VM

VM

-

-

based whole system replication

based whole system replication

ƒ

ƒ Frequently checkpoint whole Virtual Machine state.Frequently checkpoint whole Virtual Machine state.

ƒ

ƒ Protected VM and Backup VM is located in different Protected VM and Backup VM is located in different Physical host.

Physical host.

Speculative execution

Speculative execution

ƒ

ƒ We buffer state to We buffer state to synsyn backup later, and continue backup later, and continue execution ahead of

execution ahead of synsyn point.point.

Asynchronous replication

Asynchronous replication

ƒ

(5)

5 5

Introduction

Introduction

(6)

Design and Implementation

Design and Implementation

ƒ

ƒ

Our implementation is based on:

Our implementation is based on:

– XenXen virtual machine monitor.virtual machine monitor. –

– XenXen’’ss support for live migration to provide finesupport for live migration to provide fine--grained grained checkpoints.

checkpoints. –

– Two host machines is connected over redundant gigabit Two host machines is connected over redundant gigabit Ethernet connections.

Ethernet connections.

ƒ

ƒ

The virtual machine does not actually execute on

The virtual machine does not actually execute on

the backup host until a failure occurs.

(7)

7 7

Design and Implementation

Design and Implementation

(8)

Design and Implementation

Design and Implementation

ƒ

ƒ

Pipelined checkpoint (see figure 1,p

Pipelined checkpoint (see figure 1,p

-

-

5)

5)

1.Pause the running VM and copy any changed state into a 1.Pause the running VM and copy any changed state into a

buffer. (talk later more detail) buffer. (talk later more detail)

2.With state changes preserved in a buffer, VM is

2.With state changes preserved in a buffer, VM is unpausedunpaused and speculative execution resumes.

and speculative execution resumes.

3.Buffered state is transmitted to the backup host. 3.Buffered state is transmitted to the backup host. 4.When complete state has been received,

4.When complete state has been received, ackack to the to the primary.

(9)

9 9

Design and Implementation

Design and Implementation

ƒ

ƒ

Next we talk about how to checkpoint CPU state,

Next we talk about how to checkpoint CPU state,

memory, disk , network.

memory, disk , network.

ƒ

ƒ

CPU & memory state

CPU & memory state

– CheckpointingCheckpointing is implemented above is implemented above Xen’Xen’ss existing existing machinery for performing

machinery for performing live migrationlive migration..

ƒ

ƒ

What is live migration?

What is live migration?

– A technique by which a VM is relocated to another A technique by which a VM is relocated to another physical host with slight interruption.

(10)

Design and Implementation

Design and Implementation

How live migration work?

How live migration work?

1.Memory is copied to the new location

1.Memory is copied to the new location

while

while

the

the

VM continues to run at the old location.

VM continues to run at the old location.

2.During migration, writes to memory are

2.During migration, writes to memory are

intercepted, and dirty pages are copied to the

intercepted, and dirty pages are copied to the

new location in rounds.

new location in rounds.

3.After a specified number of intervals, the guest

3.After a specified number of intervals, the guest

(11)

11 11

Design and Implementation

Design and Implementation

ƒ

ƒ

By hardware MMU, page protection is used

By hardware MMU, page protection is used

to trap dirty page.

to trap dirty page.

ƒ

ƒ

Actually,

Actually,

Remus

Remus

implements

implements

checkpointing

checkpointing

as repeated executions of the final round of

as repeated executions of the final round of

live migration.

live migration.

(12)

Design and Implementation

Design and Implementation

ƒ

ƒ

Disk buffering

Disk buffering

Requirement

Requirement

ƒ

ƒ All writes to disk in VM is configured to write though.All writes to disk in VM is configured to write though.

ƒ

ƒ As a result, we can achieve crashAs a result, we can achieve crash--consistent when consistent when active and backup host both fail.

active and backup host both fail.

On

On

-

-

disk state don

disk state don

t change until the entire

t change until the entire

checkpoint has been received, see figure4 in

checkpoint has been received, see figure4 in

p13.

(13)

13 13

Design and Implementation

Design and Implementation

(14)

Design and Implementation

Design and Implementation

ƒ

ƒ

Network buffering

Network buffering

Most networks cannot be counted on for reliable

Most networks cannot be counted on for reliable

data delivery.

data delivery.

Therefore, networked app use reliable protocol

Therefore, networked app use reliable protocol

as TCP to deal with packet loss or duplication.

as TCP to deal with packet loss or duplication.

ƒ

ƒ This fact simplifies the network buffering problem This fact simplifies the network buffering problem considerably : transmitted packets do not require considerably : transmitted packets do not require replication

(15)

15 15

Design and Implementation

Design and Implementation

ƒ

ƒ

In older to ensure packet transition atomic and

In older to ensure packet transition atomic and

checkpoint consistency:

checkpoint consistency:

– outbound packets generated since the previous outbound packets generated since the previous checkpoint are queued.

checkpoint are queued. –

– TilTil that checkpoint has been acknowledged by the that checkpoint has been acknowledged by the backup site, outbound packets is released.

(16)

Design and Implementation

Design and Implementation

ƒ

ƒ

Detecting Failure

Detecting Failure

We use a simple failure detector that is directly

We use a simple failure detector that is directly

integrated in the

integrated in the

checkpointing

checkpointing

stream:

stream:

ƒ

ƒ a timeout of the backup responding to commit a timeout of the backup responding to commit requests.

requests.

ƒ

ƒ a timeout of new checkpoints being transmitted from a timeout of new checkpoints being transmitted from the primary.

the primary.

(17)

17 17

Evaluation

Evaluation

(18)

Evaluation

Evaluation

(19)

19 19

Conclusion and future work

Conclusion and future work

ƒ

ƒ

We develop a novel method to provide high

We develop a novel method to provide high

availability to survive hardware failure, and it have

availability to survive hardware failure, and it have

low cost and transparency.

low cost and transparency.

ƒ

ƒ

But outbound packet latency lower network

But outbound packet latency lower network

throughput and still is waited for future work to

throughput and still is waited for future work to

solve.

References

Related documents

Using a geostatistical analysis of the soil moisture patterns we identified three groups of datasets with similar values for sill and range of the theoretical variogram: (i) modeled

902 Radula Gut Gill Gut Gill Radula Foot Mantle Mantle cavity Shell Gut Gill Tentacles Foot Mantle Mantle cavity Shell Radula Gut Gill Radula Foot Mantle Mantle cavity Shell Gut

Current Research on Methamphetamine: Epidemiology, Medical and Psychiatric Effects, Treatment, and Harm Reduction Efforts... Methamphetamine (MA) is the second most

(Note 1) Figures for fiscal years 2006 through 2009 represent Mitsui Sumitomo Insurance Group overseas results (including non-consolidated subsidiaries) (Note 2) FY 2010

423820 Snowblowers, household‐type, merchant wholesalers 423820 Spreaders, fertilizer, merchant wholesalers 423820 Tillers, farm and garden, merchant wholesalers

2 undertake training, attend courses if required, keep records, and take assessments to be determined by the employer and/or Modern Apprenticeship Centre, and

The study tries to depict the reaction of investor during 3 days before and after the announcement of financial report as well as on the day of the announcement through the

Some women may have difficulty getting pregnant because their ovaries do not release (ovulate) eggs. Fertility specialists may use medications that work on ovulation to help