Database Definitions - On multicast primitives in large networks and partial replication protoc

A database D = {x1, .., xn} is a finite set of data items. Database sites have a partial copy of the database. For each site si (or process) inΠ, Items(si) ⊆ D is defined as the set of data items replicated on si. A transaction Ti is a sequence of read and write operations on data items followed by a commit or abort op- eration. A read and a write of transaction T_i on some data item x are respec- tively denoted as ri[x] and wi[x]; ci and ai respectively denote the commit and abort of Ti. For simplicity, we represent a transaction T as a tuple(id, rs, ws, up), where id is the unique identifier of T, rs is the readset of T, ws is the writeset of T and up contains the updates of T. More precisely, up is a set of tuples (x,

v), where, for each data item x in ws, v is the value written to x by T. For every transaction T, Items(T) is defined as the set of data items read or written by T. Two transactions T and T0 _{are said to be conflicting if there exists a data item}

x∈ Items(T) ∩ Items(T0) ∩ (T.ws ∪ T0.ws). We define Site(T) as the site on which

T is executed. Furthermore, we assume that for every data item x∈ D, there exists a correct site si which replicates x, i.e., x∈ Items(si). Finally, we define

Replicas(T) as the set of sites that replicate at least one data item written by T,

i.e., Replicas(T) = {si | si ∈ Π ∧ Items(si) ∩ T.ws 6= ;}.

Database replication protocols may ensure different data consistency crite- ria. In this thesis, we consider one-copy serializability (1-SR) [9]. Let H be a replicated data history consisting of committed transactions only. History H is 1-SR iff H is view-equivalent to some one-copy serial history 1H, where H and 1H are view-equivalent iff the following holds:

1. H and 1H are defined over the same set of transactions,

2. H and 1H have the same read-x-from relationships on data items: ∀Ti, T_j∈

H (and hence, T_i, T_j ∈ 1H): Tj read-x-from T_i in H iff T_j read-x-from T_i in 1H, and,

3. For each final write wi[x] in 1H, wi[xA] is also a final write in H for some copy x_Aof x.

FIFO and Causal Multicast

First things first, but not necessarily in that order.

Doctor Who

Multicast abstractions ensure a similar reliability guarantee—agreement on the set of messages delivered—but offer various message ordering properties. Two of these properties, FIFO and causal order, are of special interest: they ensure that a message m is not delivered at a process p that does not know

m’s context, where the notion of context is defined differently for each order property. With FIFO order, the context of m at p is the messages that were previously multicast by m’s sender and addressed to p. Causal order extends the notion of context to all messages that causally precede m, i.e., messages that are causally linked to m through a chain of multicast and delivery events. FIFO and causal order help the programming of distributed applications in various domains such as global snapshot construction[6], fair resource allocation [37], and replicated data management[57].

FIFO and causal broadcast protocols have been extensively studied in the literature. In this chapter, we propose FIFO and causal multicast protocols for systems composed of a set of disjoint groups (e.g., server racks or data centers), each containing several processes. In particular, we show that mechanisms de- vised for FIFO and causal broadcast protocols are not applicable to multicast protocols. As our main contribution, we propose FIFO and causal multicast algorithms that offer several desirable properties. To the best of our knowledge, these algorithms are the first to be simultaneously fast, scalable, flexible, and

highly resilient, in a precise sense, as we now explain.

16 3.1 Tolerating Quasi-reliable Networks

First, they are fast: messages can be delivered in two communication steps; and we further show that this is optimal. Second, these protocols are scalable: (i) to deliver a message m only the sender and the addressees of m participate in the protocol, a property referred to as genuineness; and (ii) if n and m denote, respectively, the number of processes and groups in the system, messages only carry m counters to ensure FIFO order, and n×m counters to ensure causal order. Since n is within a constant factor of m, this is optimal for causal multicast[36]. Third, the algorithms are flexible in the sense that a process p may multicast messages to groups p does not belong to, that is, groups are “open". Finally, our algorithms are highly resilient: they tolerate an arbitrary number of process failures and can cope with quasi-reliable links.

This is in contrast to several multicast protocols[42; 44; 51; 53; 36], which depend on reliable links—message delivery is guaranteed as long as the receiver is correct, regardless of the correctness of the sender. Reliable links are not a realistic assumption: to send a message m, the machine Mp hosting process p typically inserts m into one (or more) local buffer before m is sent over the wire. Hence, even though p thinks that m was successfully sent, m may still be lost in case M_p crashes before m hits the wire.

3.1 Tolerating Quasi-reliable Networks

Devising multicast protocols that tolerate quasi-reliable links introduces diffi- culties that were not discussed elsewhere. Figure 3.1 illustrates the problem. Consider some process p that multicasts a message m1 to some group g2. Later,

p multicasts a message m2 to groups g1 and g2 and crashes. Message m2 is re- ceived by processes in g1, and since m2 is the first message multicast from p to

g₁, m2 is delivered by processes in g1. On the contrary, all messages sent from

p to members of g2 are lost. Note that this can happen because p crashes and links are quasi-reliable.

... g1 p ... g2 multicast(m1) multicast(m2) multicast protocol deliver(m2)

From the reliability guarantees of multicast, correct processes in g2 must eventually deliver m2. However, if they do so, the ordering guarantees of FIFO and causal multicast will be violated: members of g2 cannot deliver m1 before

m2 since m1 was lost. If messages were broadcast, then m1 would also be ad- dressed to g1, and thus, g1 could help g2 by forwarding m1to g2. With multicast however, g1 does not even know about the existence of m1, since m1 was not addressed to g1. In this chapter, we propose a mechanism to cope with this problem despite an arbitrary number of process failures; the resulting FIFO and causal multicast algorithms are as latency-efficient as their broadcast counter- parts.

In document On multicast primitives in large networks and partial replication protocols (Page 33-37)