• No results found

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas

N/A
N/A
Protected

Academic year: 2021

Share "Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

3. Replication

Replication

 Goal:

Avoid a single point of failure by replicating the server

Increase scalability by sharing the load among replicas

 Problems:

Partial failures of replicas and messages

No global ordering of messages

Some replicas might execute operations that others did not know about

The state of replicas diverge

L M B L M

(2)

Replicated State Machine

 Known also as Active Replication

 The idea:

Every replica sees exactly the same set of messages in the same order and will execute them in that order

 Benefit:

Immediate fail-over

 Limitations:

Waste of resources, since all replicas are doing the same

Requires determinism, which is not trivial to ensure

 An important issue:

At what level?

 Option 1:

Machine instruction level

Virtual Machine Level

Machine Instruction Level Replication

 Benefits:

Fast consistency resolution

Transparent

 Problems:

Requires special hardware

Usually behind technology curve

Resource wasteful

Requires physical proximity

Does not overcome software bugs / no multi-versioning

(3)

N-Modular Redundancy

Node 1 Node 2 Node N

Voter

Output

Monitoring System

Logical State Replication

 Idea:

The important thing is the logical/semantic state of the application

 Benefits:

The negation of the shortcoming of machine level

 Problems:

Not transparent

Changes programming model

 Although one tries to minimize this Slower consistency resolution

May result in lower throughput and higher latency

(4)

Total Ordering Based Replication

 Simple replication protocol

Clients can send requests to any replica

All replicas utilize a black-box total ordering mechanism to apply the updates in the same order

Client1 Client2

Total Order Request1

Request2

Reply2 Reply1

Replicated Servers

Total Ordering Based Replication – Fine Details

 How do clients find an alive server?

Name servers

Local directors

Virtual IP being migrated between alive servers

 What happens if the server fails?

Before servicing the request

Resubmit

After servicing the request

We need to avoid re-execution

Sequence numbers

(5)

Primary-Backup Replication

 Cold backup

Only the primary is active

Periodically checkpoints its state to storage that is available to the backup

 Stable storage or shared storage (SCSI, SAN)

When the primary fails, the backup is initiated, loads the state from storage, and continues from there

Slow recovery

The backup needs to be started (run applications, obtain resources, etc.) Either the backup needs to replay the last actions from a log file, or it may miss

the last updates since the most recent checkpoint + Resource efficient

+ Invocations need not be deterministic

It is possible to have several backups to survive multiple failures

 Requires consistent failure detection, e.g., by a group membership service

It is possible to have several nodes, each primary for some services and backup for others

Primary-Backup Replication

 Warm backup

In this case, the backup is (at least) partially alive, so the recovery phase is faster

 But typically still requires some replaying of last transactions, or losing the last few updates

Typically, updates are also frequent

 Hot standby (leader/follower)

The backup is also up, and is constantly updated about the state of the primary

+ Fast and up-to-date recovery

 Special protocols are required to ensure true up-to-date recovery

 We will talk about such protocols later

(6)

Challenges in Primary/Backup

 How to consistently agree on the primary?

 How to detect that the primary has failed?

 How to ensure that if we suspect the primary, it is indeed no longer operating on behalf of the system

 How to enable additional backups to join the system without manual intervention and reset

 We will discuss these issues when we talk about Consensus, Failure Detection, and Membership

Quorum Replication

 Atomic Register

Intuitively, operations should appear as if executed on a single server

 This is a private case of linearizability

 Well suited for distributed storage and distributed shared memory More specifically,

 A read always returns a value written either by the last write that

terminated before the read started, or by a write that is concurrent with the read

 If several writes are concurrent, then a following read can return a value written by any one of them

 A read should not return a value that is “older” than the value returned by a previous read

(7)

Quorums

 A quorum system consists of a set of subsets such that the intersection of every two subsets is not empty

Each of these subsets is called a quorum

 Example

U = {1,2,3,4}, S = {{1,2,3}, {2,3,4}, {1,4}}

 The simplest generic quorum system is majority

Every majority subset intersects with every other majority subset

 Another common type of generic quorum system is “any row + any column” of a lattice

Bi-Quorums

 The universe U is now composed of two sets of subsets, A and B

Every subset from A must intersect every subset from B

 Clearly, each quorum system is also a bi-quorum system

Majority is a generic bi-quorum system

 Scalable generic bi-quorums

Idea

 Servers are arranged in a logical matrix

 A write quorum consists of any one row in the matrix

 A read quorum consists of any one column in the matrix Drawback

Read quorums

(8)

Implementing Read/Write Registers with Quorums

 Quorum replication

Clients read and write directly to quorums of servers

 Essentially adaptation of Attiya, Bar-Noy, Dolev protocol for emulating distributed shared memory robustly, but using any given bi-quorum rather than majority

 Tradeoff between availability of the system and its scalability (size of quorum)

Cannot implement Read-Modify-Write semantics

Implementing Quorum Replication

 Each server maintains a logical timestamp for each register

 Servers’ protocol:

Upon receiving a r-request(x) message do

o := values[x]

reply with a r-reply(x,o.val,<o.ts,o.id>) message

Upon receiving a w-request(x,v,<ts,id>) message do

o := values[x]

if <ts,id> is lexicographically larger than <o.ts,o.id> then

(9)

Implementing Quorum Replication – continued

 A read operation R(x)

1. Send a read request r-request(x) to all servers

2. Wait for r-reply(x,v,<ts,id>)messages from a READ quorum of servers

let v be the value with largest logical timestamp <ts,id>

3. Send a write request w-request(x,v,<ts,id>) to all servers

4. Wait for w-reply(x)replies from a WRITE quorum of servers

5. Return (v) // the value that was reserved in line 2

Why are lines 3&4 important?

It is possible to first send requests only to the corresponding quorums; only if there is no reply, send to more servers

Implementing Quorum Replication – continued

 A write operation W(x,v)

1. Send a read request r-request(x) to all servers

2. Wait for r-reply(x,-,<ts,id>) messages from a READ quorum of servers

let <ts,id> be the largest returned logical timestamp

3. Send a write request w-request(x,v,<ts+1,my_id>) to all servers

4. Wait for w-reply(x) messages from a WRITE quorum of servers

5. Return

References

Related documents

Mo derat e-s ev ere int ell ect ual di sabilit y SD D U A S A As a physical education teacher, I do not have sufficient training necessary to teach students labeled ______

“To consider how best Scotland’s renewable energy potential might be realised having regard to the 2020 targets set by the UK Government and the Scottish Executive and to advise on

In this work, experiments were performed with several alternative current excited plasma 共 streamwise and spanwise 兲 actuators to control flow-induced noise from a cavity.. It was

Firstly, the results from table 5.1, which are for the characteristics of the multivariable process outputs after optimising PID control using individually PID controller tuning,

RESUMO - No presente trabalho, foi desenvolvido um estudo referente à variabilidade, correlação e te- petibilidade dos caracteres de tamanho de inflorescência, número de botões,

Finally, the results of this study may provide benefits to governmental agencies and advocacy groups by creating educational programs for customers regarding the advantages of

Industry skills &amp; coverage Corporate Risk Management (all-risks) Price risk management. Trading, marketing and

In this paper, we presented a novel human action recognition approach which addresses in a coherent framework the challenges involved in concurrent multiple human action recogni-