How to Tolerate Half Less One Byzantine Nodes in Practical Distributed Systems

(1)

How to Tolerate Half Less One

Byzantine Nodes in Practical

Distributed Systems

University of Lisboa Faculty of Sciences LASIGE – Navigators Group

http://www.navigators.di.fc.ul.pt/

Miguel Correia, Nuno F. Neves, Paulo Veríssimo

Navigators FCUL

Summary

• We present:

FAn intrusion-tolerant (Byzantine-tolerant) service FBased on the state machine approach

FThat tolerates malicious replicas

• Features:

FReduces for the first time the maximum resilience in practical distributed systems:

FCircumvents FLP without synchrony assumptions on the asynchronous part of the system (all synchrony is in a kind of oracle)

(2)

3

Outline

1. Motivation

2. System Model and TTCB

3. 2f+1 Atomic Multicast

4. State Machine Replication

5. Conclusions and Future Work

(3)

5

Motivation (I)

• Intrusions are hard to prevent entirely =>

intrusion tolerance

• State machine approach is a generic solution

for fault/intrusion-tolerant systems but:

FDistributed systems typically modeled with the

asynchronous model in this context

FMaximum resilience for asynchronous systems is

Motivation (II)

• Each replica has two costs:

FHardware and software

FDesign, since replicas have to fail independently – Cannot be attacked simultaneously (N-version programming)

• Using a kind of oracle we manage to reduce the

resilience to

FA reduction of 25% to 33% of the number of replicas! FImpossibility of consensus in asynchronous systems

(4)

System Model and TTCB

Architectural Hybridization

• Most of the system is asynchronous Byzantine

• There is a subsystem built in such a way that it

is secure: the TTCB

TTCB Control Channel Payload Network Host n Local TTCB Processes OS Host 2 OS Local TTCB Processes Host 1 OS Local TTCB Processes

(5)

9

TTCB and Intrusion Tolerance

• The TTCB serves to support the execution of

intrusion-tolerant protocols/applications

F They run mostly in the payload systemthat can be attacked F They use the TTCBto execute some critical steps securely

TTCB Control Channel Payload Network Host n Local TTCB Processes OS Host 2 OS Local TTCB Processes Host 1 OS Local TTCB Processes Navigators FCUL

TTCB Design Principles

• Interposition: it must be interposed between

vital resources and any attempt to interact

with them

• Shielding: its must be protected from faults

affecting security

• Validation: its implementation must be

verifiable

(6)

11

TTCB Services

• The design principles require that the TTCB is

simple => limited set of services

• Trusted Multicast Ordering Service (TMO)

FThe objective is to support the execution of an intrusion-tolerant atomic multicast protocol FIt is not affected by malicious faults since it is a

TTCB service

System Model

• The system is implemented by a set of

processes:

FThey run in the payload system (outside the TTCB) FThey communicate using the payload network

(outside the TTCB) using reliable channels: – Eventually deliver messages

– Message integrity is guaranteed

FThey call the TTCB TMO service to make some steps securely and efficiently

(7)

2f+1 Atomic Multicast

Atomic Multicast

• Simplified definition:

FAll correct processes deliver the same messages FIf the sender is correct then the correct processes

deliver the message

FAll messages are delivered in the same order by all correct processes

(8)

15

The Protocol

• Uses a cryptographic hash function H to give

the TTCB a unique identifier of the message

(e.g., SHA-1)

FThe output of H has fixed length

FIt is computationally infeasible to find two different inputs that hash to the same output

Protocol execution

P0 P1 P2 TTCB tmo sent H(M1) n=3 f=1 message delivery M1 received H(M1) f+1 processes have M1 order = 1 deliver H(M1), 1 received H(M1) deliver H(M1), 1

X

(9)

18

TMO Service

• Is the core of the solution

FDecides when a message can be delivered

– If f+1 processes show that have the message, then at least one correct process has the message FDefines a sequential order for the messages

FProvides reliable information since TTCB is secure

• Implementation of the service

FIt is a kind of atomic multicast protocol...

F...but executed in a benign environment: the TTCB

(10)

20

System Architecture

OS cm Host 2 Host 1 OS s1 OS Host n OS s2 sn OS c1 CLIENTS SERVERS TTCB Control Channel _TTCB Local TTCB Payload Network Local TTCB Local TTCB Navigators FCUL

State Machine Approach

• Servers are state machines:

Fstate variables, commands

• All correct servers follow the same history of

states iff:

FInitial state: all servers start in the same state FAgreement: all servers execute the same

commands

FTotal order: all servers execute the commands in the same order

FDeterminism: the same command executed in the same initial state generates the same final state

(11)

22

Clients

• Clients:

FAny number of clients can fail FThey have local (unreliable) clocks

• Protocol:

FSend a REQUEST message to one server

FWait for f+1 identical REPLY messages from different servers

FIf Tresendafter the REQUEST message was sent, the

replies were not received, send REQUEST messages to f additional servers

Servers

• At most

can fail

• Simplified protocol:

FWhen server receives a REQUEST message, atomically multicast it to all the servers

FWhen the atomic multicast protocol delivers a message, if the same request has not been previously delivered, then execute the command

(12)

24

Handling Requests’ Corruption

• A malicious server might corrupt a request

before atomically multicasting it

• Solution: the REQUEST message has a

vector of MACs, one per server

FObtained with keys shared by the client with each of the servers

FEach server can use one of the MACs to verify if the message was corrupted

FIf its MAC is wrong, a correct server does not give H(M) to the TTCB, i.e., it does not contribute to the

f+1 threshold

Performance

• Without batching

• Time complexity:

FTwo asynchronous rounds + FOne round of TMO executions

• Message complexity:

(13)

Conclusions and Future Work

Conclusions

• First solution for intrusion-tolerant state-machine

replication in practical distributed systems with

only 2f+1 replicas

• Interesting impact since each additional replica

has a considerable cost

• Circumvents FLP without synchrony assumptions

on the asynchronous part of the system

Fall synchrony is encompassed in the TTCB

• The performance is promising:

(14)

28

Future Work

• Is it possible to simplify the protocol?

• What is the minimum TTCB service that can be

used to solve atomic multicast?

• Ongoing implementation of a new TTCB-like

asynchronous component (Perseus/Fiasco)

FPerformance evaluation

FBetter understanding of the tradeoffs involved

• New TTCB-like components:

FLIDS, hardware (PC/104 appliances), WANs