• No results found

State-Machine Replication

N/A
N/A
Protected

Academic year: 2021

Share "State-Machine Replication"

Copied!
60
0
0

Loading.... (view fulltext now)

Full text

(1)

State-Machine

Replication

(2)

The Problem

Clients Server

(3)

The Problem

Clients Server

(4)

The Problem

Clients Server

(5)

The Problem

Clients Server

(6)

The Problem

Clients Server

(7)

Solution: replicate server!

The Problem

Clients Server

(8)

The Solution

(9)

The Solution

1. Make server deterministic (state machine)

(10)

The Solution

1. Make server deterministic (state machine)

State machine

(11)

The Solution

1. Make server deterministic (state machine) 2. Replicate server

State machines

(12)

The Solution

1. Make server deterministic (state machine) 2. Replicate server

State machines

(13)

The Solution

1. Make server deterministic (state machine) 2. Replicate server

3. Ensure correct replicas step through the same sequence of state transitions

State machines

(14)

The Solution

1. Make server deterministic (state machine) 2. Replicate server

3. Ensure correct replicas step through the same sequence of state transitions

Clients

State machines

(15)

The Solution

1. Make server deterministic (state machine) 2. Replicate server

3. Ensure correct replicas step through the same sequence of state transitions

Clients

Commands

State machines

(16)

The Solution

1. Make server deterministic (state machine) 2. Replicate server

3. Ensure correct replicas step through the same sequence of state transitions

Clients

Commands

State machines

(17)

The Solution

1. Make server deterministic (state machine) 2. Replicate server

3. Ensure correct replicas step through the same sequence of state transitions

4. Vote on replica outputs for fault-tolerance

Clients

State machines

(18)

The Solution

1. Make server deterministic (state machine) 2. Replicate server

3. Ensure correct replicas step through the same sequence of state transitions

4. Vote on replica outputs for fault-tolerance

Clients

State machines

(19)

The Solution

1. Make server deterministic (state machine) 2. Replicate server

3. Ensure correct replicas step through the same sequence of state transitions

4. Vote on replica outputs for fault-tolerance

Clients

Voter

State machines

(20)

The Solution

1. Make server deterministic (state machine) 2. Replicate server

3. Ensure correct replicas step through the same sequence of state transitions

4. Vote on replica outputs for fault-tolerance

Clients

Voter

State machines

(21)

The Solution

1. Make server deterministic (state machine) 2. Replicate server

3. Ensure correct replicas step through the same sequence of state transitions

4. Vote on replica outputs for fault-tolerance

Clients

Voter

State machines

(22)

The Solution

1. Make server deterministic (state machine) 2. Replicate server

3. Ensure correct replicas step through the same sequence of state transitions

4. Vote on replica outputs for fault-tolerance

Clients

Voter

State machines

(23)

A conundrum

(24)

A conundrum

(25)

A conundrum

(26)

A conundrum

(27)

A conundrum

(28)

A conundrum

. . .

(29)

A conundrum

. . .

A: voter

and client

share fate!

(30)

A conundrum

. . .

A: voter

and client

share fate!

(31)

A conundrum

. . .

A: voter

and client

share fate!

(32)

A conundrum

. . .

A: voter

and client

share fate!

(33)

A conundrum

. . .

A: voter

and client

share fate!

(34)

A conundrum

. . .

A: voter

and client

share fate!

(35)

State Machines

Set of state variables + Sequence of commands A command

Reads its read set values (opt. environment)

Writes to its write set values (opt. environment)

A deterministic command

Produces deterministic wsvs and outputs on given rsv

A deterministic state machine

Reads a fixed sequence of deterministic commands

(36)

Semantic Characterization of a State Machine

Outputs of a state machine are completely determined by the sequence of commands it

processes, independent of time and any other activity in a system

(37)

Replica Coordination

Agreement: Every non-faulty state machine !

! receives every command

Order: Every non-faulty state machine processes the !commands it receives in the same order

All non-faulty state machines receive all commands in the

same order

(38)

Where should RC be implemented?

In hardware

sensitive to architecture changes At the OS level

state transitions hard to track and coordinate At the application level

requires sophisticated application programmers

(39)

Hypervisor-based Fault-tolerance

Implement RC at a virtual machine running on the same instruction-set as underlying hardware Undetectable by higher layers of software

One of the great come-backs in systems research!

CP-67 for IBM 369 [1970]

Xen [SOSP 2003], VMware

(40)

The Hypervisor as a State Machine

Two types of commands

virtual-machine instructions

virtual-machine interrupts (with DMA input)

State transition must be deterministic

...but some VM instructions are not (e.g. time- of-day)

interrupts must be delivered at the same point in command sequence

(41)

The Architecture

Good-ol’ Primary-Backup Primary makes all non- deterministic choices

I/O Accessibility Assumption Primary and backup have

access to same I/O operations

Primary HP 9000/720

Backup HP 9000/720

I/O Device Ethernet

(42)

Ensuring identical command sequences

Ordinary (deterministic) instructions

Environment (nondeterministic) instructions

(43)

Ensuring identical command sequences

Ordinary (deterministic) instructions

Environment (nondeterministic) instructions Environment Instruction Assumption

Hypervisor captures all environmental

instructions, simulates them, and ensures they have the same effect at all state machines

(44)

Ensuring identical command sequences

Ordinary (deterministic) instructions

Environment (nondeterministic) instructions Environment Instruction Assumption

VM interrupts must be delivered at same point in instruction sequence at all replicas

(45)

Ensuring identical command sequences

Ordinary (deterministic) instructions

Environment (nondeterministic) instructions Environment Instruction Assumption

VM interrupts must be delivered at same point in instruction sequence at all replicas Instruction Stream Interrupt Assumption

Hypervisor can be invoked at specific point in the instruction stream

(46)

Ensuring identical command sequences

Ordinary (deterministic) instructions

Environment (nondeterministic) instructions Environment Instruction Assumption

VM interrupts must be delivered at same point in instruction sequence at all replicas Instruction Stream Interrupt Assumption

implemented via recovery register interrupts at backup are ignored

(47)

The failure-free protocol

P0: On processing environment

instruction at , HV of primary : sends ! ! ! to backup

waits for ack

P1: If HV receives from its VM:

buffers

P2: If epoch ends at :

sends to all buffered in waits for ack

delivers all VM in starts

P3: If HV processes environment

!instruction at :

waits for from returns

If receives from : sends ack to

buffers for delivery at P4: If HV receives from its VM

ignores

P5: If epoch ends at :

waits from for interrupts for sends ack to

delivers all VM buffered in starts

i

Int Int

p b

p

p b e

p

p

ep

p

ep:= ep+1 ep

p p

Int

pc i pc

[eb,pc, Vali]

[ep,pc, Vali] p

Vali

[E, pc, Val]

b p

b p

b Val E, pc

Int

b Int

b

b p eb

b p

Int

Int eb

b

eb:= eb+1

b eb

p

b’

b

b’

(48)

If the primary fails…

P6: If receives a failure notification instead of , executes

If in P5 receives failure notification instead of :

starts <--- failover epoch is promoted primary for epoch

eb := eb+1

eb+1 b

[eb, pc, Vali] b i

Int

b

eb b

b

If crashes before sending to , is lost!

Int

p Int b

(49)

SMR and the environment

On outputs, no exactly-once guarantee on outputs On primary failure, avoid input inconsistencies

time must increase monotonically

at epoch, primary informs backup of value of its clock

interrupts must be delivered as a fault-free processor would

but interrupts can be lost...

weaken constraints on I/O interrupts

(50)

On I/O device drivers

IO1: If an I/O instruction is executed and the I/O operation performed, the processor issuing the instruction delivers a completion interrupt, unless it fails. Either way, the I/O device is unaffected.

IO2: An I/O device may cause an uncertain interrupt (indicating the operation has been terminated) to be

delivered by the processor issuing the I/O instruction. The instruction could have been in progress, completed, or not even started.

On an uncertain interrupt, the device driver

reissues the corresponding I/O instruction–not all devices though are idempotent or testable

(51)

Backup promotion

and uncertain interrupts

P7: The backup’s VM generates an uncertain interrupt for each I/O operation that is outstanding right before the backup is

promoted primary (at the end of the failover epoch)

(52)

The Hypervisor prototype

Supports only one VM to eliminate issues of address translation

Exploits unused privileged levels in HP’s PA-RISC architecture (HV runs at level 1)

To prevent software to detect HV, hacks one assembly HP-UX boot instruction

(53)

RC in the Hypervisor

Nondeterministic ordinary instructions (Surprise!)

(54)

RC in the Hypervisor

Nondeterministic ordinary instructions (Surprise!) TLB replacement policy non-deterministic

TLB misses handled by software

Primary and backup may execute a different number of instructions!

HV takes over TLB replacement

(55)

RC in the Hypervisor

Nondeterministic ordinary instructions (Surprise!) TLB replacement policy non-deterministic

TLB misses handled by software

Primary and backup may execute a different number of instructions!

HV takes over TLB replacement Optimizations

sends immediately

blocks for acks only before output commit

p p

Int

(56)

The JVM as

a State Machine

Asynchronous commands interrupts

Non-deterministic commands read time-of-day

Non-deterministic read set values

multi-threaded access to shared dat

a

Output to the environment

simulate a single, fault-tolerant state machine

(57)

Non-deterministic Commands

Only invoked through Java Native Interface (JNI) direct access to OS and other libraries

implement windowing, I/O, read HW clock…

Executes outside the JVM:

can’t agree on inputs!

(58)

Non-deterministic Commands

Only invoked through Java Native Interface (JNI)

• direct access to OS and other libraries

• implement windowing, I/O, read HW clock…

Executes outside the JVM:

can’t agree on inputs!

(59)

Non-deterministic Commands

Only invoked through Java Native Interface (JNI)

• direct access to OS and other libraries

• implement windowing, I/O, read HW clock…

Executes outside the JVM:

can’t agree on inputs!

Force agreement on the wsvs

(60)

Non-deterministic Commands

Only invoked through Java Native Interface (JNI)

• direct access to OS and other libraries

• implement windowing, I/O, read HW clock…

Executes outside the JVM:

can’t agree on inputs!

Not out of the woods:

• Non-deterministic output to the environment

• Non-deterministic method invocation

Force agreement on the wsvs

References

Related documents

Intellectual disabilities, emotional disabilities including mood-disorders, as well as behavioral challenges of special needs communities may be characterized by deficits

Diploma of Higher Education Graduate Certificate (Grad Cert) Graduate Diploma (Grad Dip) Foundation Degree in Arts (FdA) Foundation Degree in Science (FdSc) Bachelor of

Health Minister Decree on the Implementa- tion of Mandatory Work and Practic- ing License for Medical Doctor and Dentist1988 (Peraturan Menteri Kesehatan Nomor 385 Tahun 1988

This automatically calculates a staff members Bradford Point score allowing you to manage your disciplinary for absence and sickness at your discretion. The Bradford Factor

You and the parent(s) identify any blockers to the student’s academic success (e.g., readiness to learn; academic skills; motivation) and write down intervention ideas...

Dosen harus melakukan login untuk dapat menerima presensi dari mahasiswa, melihat mata kuliah yang diajar, dan melihat peserta kuliah. Pengujian prototype pada halaman

Come at random instruction referenced memory cannot be written permission is also contain additional files and it is basically not remove the error. Domains as administrator

Reload the issue, the instruction referenced dayz fine as suggested in this problem is the problem as graphics processor for free to clear the error. Install