Remus: : High Availability via Asynchronous Virtual Machine Replication

(1)

1 1

Remus

: High Availability via

_{: High Availability via}

Asynchronous Virtual Machine

Replication

Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike

Mike FeeleyFeeley, Norm Hutchinson, and Andrew Warfield, Norm Hutchinson, and Andrew Warfield

Department of Computer Science Department of Computer Science The University of British Columbia The University of British Columbia

NSDI

NSDI ’’08: 5th USENIX Symposium on Networked 08: 5th USENIX Symposium on Networked Systems Design and Implementation (2008)

Systems Design and Implementation (2008)

Presented by

(2)

Outline

Introduction

Design and Implementation

Evaluation

(3)

3 3

Introduction

It

’

s hard and expensive to design highly available

system to survive hardware failure

–

– Using redundant component.Using redundant component. –

– Customized hardware and software.Customized hardware and software. –

– E.g. HP Nonstop Server.E.g. HP Nonstop Server.

Our goal is to provide high availability system, and

it

it’

’

s :

–

– Generality and transparencyGenerality and transparency

Without modification of OS and App.Without modification of OS and App.

–

(4)

Introduction

Our approach

–

VM

-

based whole system replication

Frequently checkpoint whole Virtual Machine state.Frequently checkpoint whole Virtual Machine state.

Protected VM and Backup VM is located in different Protected VM and Backup VM is located in different Physical host.

Physical host.

–

Speculative execution

We buffer state to We buffer state to synsyn backup later, and continue backup later, and continue execution ahead of

execution ahead of synsyn point.point.

–

Asynchronous replication

(5)

5 5

Introduction

(6)

Design and Implementation

Our implementation is based on:

–

– XenXen virtual machine monitor.virtual machine monitor. –

– XenXen’’ss support for live migration to provide finesupport for live migration to provide fine--grained grained checkpoints.

checkpoints. –

– Two host machines is connected over redundant gigabit Two host machines is connected over redundant gigabit Ethernet connections.

Ethernet connections.

The virtual machine does not actually execute on

the backup host until a failure occurs.

(7)

7 7

Design and Implementation

(8)

Design and Implementation

Pipelined checkpoint (see figure 1,p

-

5)

1.Pause the running VM and copy any changed state into a 1.Pause the running VM and copy any changed state into a

buffer. (talk later more detail) buffer. (talk later more detail)

2.With state changes preserved in a buffer, VM is

2.With state changes preserved in a buffer, VM is unpausedunpaused and speculative execution resumes.

and speculative execution resumes.

3.Buffered state is transmitted to the backup host. 3.Buffered state is transmitted to the backup host. 4.When complete state has been received,

4.When complete state has been received, ackack to the to the primary.

(9)

9 9

Design and Implementation

Next we talk about how to checkpoint CPU state,

memory, disk , network.

CPU & memory state

–

– CheckpointingCheckpointing is implemented above is implemented above Xen’Xen’ss existing existing machinery for performing

machinery for performing live migrationlive migration..

What is live migration?

–

– A technique by which a VM is relocated to another A technique by which a VM is relocated to another physical host with slight interruption.

(10)

Design and Implementation

How live migration work?

1.Memory is copied to the new location

while

the

VM continues to run at the old location.

2.During migration, writes to memory are

intercepted, and dirty pages are copied to the

new location in rounds.

3.After a specified number of intervals, the guest

(11)

11 11

Design and Implementation

By hardware MMU, page protection is used

to trap dirty page.

Actually,

Remus

implements

checkpointing

as repeated executions of the final round of

live migration.

(12)

Design and Implementation

Disk buffering

–

Requirement

All writes to disk in VM is configured to write though.All writes to disk in VM is configured to write though.

As a result, we can achieve crashAs a result, we can achieve crash--consistent when consistent when active and backup host both fail.

active and backup host both fail.

–

On

-

disk state don

’

t change until the entire

checkpoint has been received, see figure4 in

p13.

(13)

13 13

Design and Implementation

(14)

Design and Implementation

Network buffering

–

Most networks cannot be counted on for reliable

data delivery.

–

Therefore, networked app use reliable protocol

as TCP to deal with packet loss or duplication.

This fact simplifies the network buffering problem This fact simplifies the network buffering problem considerably : transmitted packets do not require considerably : transmitted packets do not require replication

(15)

15 15

Design and Implementation

In older to ensure packet transition atomic and

checkpoint consistency:

–

– outbound packets generated since the previous outbound packets generated since the previous checkpoint are queued.

checkpoint are queued. –

– TilTil that checkpoint has been acknowledged by the that checkpoint has been acknowledged by the backup site, outbound packets is released.

(16)

Design and Implementation

Detecting Failure

–

We use a simple failure detector that is directly

integrated in the

checkpointing

stream:

a timeout of the backup responding to commit a timeout of the backup responding to commit requests.

requests.

a timeout of new checkpoints being transmitted from a timeout of new checkpoints being transmitted from the primary.

Remus: : High Availability via Asynchronous Virtual Machine Replication

Remus

Remus

: High Availability via

: High Availability via

Asynchronous Virtual Machine

Asynchronous Virtual Machine

Replication

Replication

Outline

Outline





Introduction

Introduction





Design and Implementation

Design and Implementation





Evaluation

Evaluation



Introduction

Introduction





It

It

’

’

s hard and expensive to design highly available

s hard and expensive to design highly available

system to survive hardware failure

system to survive hardware failure





Our goal is to provide high availability system, and

Our goal is to provide high availability system, and

it

it’

’

s :

s :

Introduction

Introduction





Our approach

Our approach

–

–

VM

VM

-

-

based whole system replication

based whole system replication

–

–

Speculative execution

Speculative execution

–

–

Asynchronous replication

Asynchronous replication

Introduction

Introduction

Design and Implementation

Design and Implementation





Our implementation is based on:

Our implementation is based on:





The virtual machine does not actually execute on

The virtual machine does not actually execute on

the backup host until a failure occurs.

_{: High Availability via}