Minimal Obstructions for the Coordinated Attack
Problem and Beyond
Tristan Fevat
1and Emmanuel Godard
1,2 1Laboratoire d’Informatique Fondamentale CNRS (UMR 6166) – Aix-Marseille Universit´e2Pacific Institute for the Mathematical Sciences – CNRS (UMI 3069) (tristan.fevat|emmanuel.godard)@lif.univ-mrs.fr
Abstract—We consider the well known Coordinated Attack Problem, where two generals have to decide on a common attack, when their messengers can be captured by the enemy. Informally, this problem represents the difficulties to agree in the present of communication faults. We consider here only omission faults (loss of message), but contrary to previous studies, we do not to restrict the way messages can be lost,ie.we use no specific failure metric. Our contribution is threefold. First, we introduce the study of arbitrary patterns of failure (“omission schemes”), proposing notions and notations that revealed very convenient to handle. In the large subclass of omission schemes where the double simultaneous omission can never happen, we characterize which one are obstructions for the Coordinated Attack Problem.
We present then some interesting applications. We show for the first time that the well studied omission scheme (where at most one message can be lost at each round) is a kind of least worst case environment for the Coordinated Attack Problem. We also extend our study to networks of arbitrary size. In particular, we solve an open question of Santoro and Widmayer about the Consensus Problem in communication networks with omission faults.
Keywords:Distributed Systems, Fault-Tolerance, Message-Passing, Synchronous Systems, Coordinated Attack Problem, Two Generals Problem, Two Armies Problem, Consensus, Communication Schemes, Omission Faults
I. INTRODUCTION A. Motivations
The Coordinated Attack Problem (also known in the lit-erature as the two generals or the two armies problem) is a long time problem in the area of distributed computing. It is a fictitious situation where two armies have to agree to attack or not on a common enemy that is between them and might capture any of their messengers. Informally, it represents the difficulties to agree in the present of communication faults. The design of a solution is difficult, sometimes impossible, as it has to address possibly an infinity of lost mutual acknowl-edgments. It has important applications for the Distributed Databases commit for two processes, see [Gra78]. It was one of the first impossibility results in the area of fault tolerance and distributed computing [AEH75], [Gra78].
In the vocabulary of more recent years, this problem can now be stated as the Uniform Consensus Problem for 2 synchronous processes communicating by message passing in the presence of omission faults. It is then a simple instance of a problem that had been very widely studied [AT99], [CBGS00],
[MR98]. See for example [Ray02] for a recent survey about Consensus on synchronous systems with some emphasis on the omissions fault model.
Moreover, given that, if any message can be lost, the impossibility of reaching an agreement is obvious, one can wonder why it has such importance to get a name on its own, and maybe why it has been studied in the first place... We will discuss that later on the Related Works section. The idea is that one has usually to restrict the way the messages are lost in order to keep this problem relevant.
We call an arbitrary pattern of failure by loss of messages an
omission scheme. It will formally describe the fault environ-ment in which the system evolves. For example, the omission scheme where any message can be lost at any round except that all messages cannot be lost indefinitely is a special omission scheme (any possibility of failure except one scenario), for which it is still impossible to solve the Coordinated Attack Problem, but the proof might be less trivial. Given an omission scheme, a natural question is whether the Coordinated Attack Problem is solvable against this environment. More generally, the question that arises now is to describe what are exactly the omission schemes for which the Coordinated Attack Problem admit a solution, and for which ones is there no solution. These later omission schemes will be called obstructions.
On the way to a thorough characterization of these obstruc-tions, we address this problem for a particular subclass of failure patterns, namely the ones when no two messages can be lost at the same round. It is long known the Coordinated Attack Problem is unsolvable if one message might be lost at each round [CHLT00], [GKP03]. We also show how to apply the results to synchronous communication networks of arbitrary size and topology.
B. Related Works
The Coordinated Attack Problem is a kind of folklore problem for distributed systems. It seems to appear first in [AEH75] where it is a problem of gangsters plotting for a big job. It is usually attributed to [Gra78], where Jim Gray coined the name “Two Generals Paradox” and put the emphasis on the infinite recursive need for acknowledgments in the impossibility proof.
In textbooks it is often given as an example, however the drastic conditions under which this impossibility result yields
are never really discussed, even though for relevancy purpose they are often slightly modified. In [Lyn96], a different prob-lem of Consensus (with a weaker validity condition) is used. In [San06], such a possibility of eternal loss is explicitly ruled out as it would give a trivial impossibility proof otherwise.
This shows that the way the messages may be lost is an important part of the problem definition, hence it is interesting to characterize when the pattern of loss allows to solve the consensus problem or not, ie. whether the fault environment is an obstruction for the Coordinated Attack Problem or not. To our knowledge, this is the first time, this problem is investigated for arbitrary patterns of omission failures, even in the simple case of only two processes. Most notably, it has been addressed for an arbitrary number of processes and for special (quite regular) patterns in [CHLT00], [GKP03], [Ray02].
C. Scope of Application and Contributions
Impossibility results in distributed computing are the more interesting when they are tight, ie.they give an exact charac-terization of when a problem is solvable and when it is not. There are a lot of results regarding the Distributed Consensus problem when the underlying network is a complete graph (namely in the context of Shared Memory Systems), see for example the works in [HS99], [SZ00], [BG93] where an exact topologically-based characterizations is given.
There are also more recent results when the underlying network can be any arbitrary graph. The results given in [SW07] by Santoro and Widmayer are almost tight. What is worth to note is that the general theorems could not be directly used by this study for the very reason that the failure model for communication networks is not interestingly expressible in the fault model for systems with one to one communication. See Section II-C for a more detailed discussion.
On the way to find exact and general characterizations, we propose new insights in the very simple setting of two synchronous processes. Although this is indeed a complete graph, the extension to non-complete graphs will prove to be rather interesting, as will be seen in Section V where we show, in particular, that the Consensus Problem is solvable on a synchronous communication networkGwhere at mostflosses of messages per round can happen if and only if f < c(G), where c(G)is the connectivity of the graphG.
We underline that the omission failures we are studying here encompass networks withcrash failures, see Example II.10. It should also be clear that omission schemes can also be studied in the context of a problem that is not the Consensus Problem. Moreover, as we do not endorse any pattern of failures as being, say, more “realistic”, our technique can be applied for any new patterns of failures.
Our contribution is threefold. First, we introduce the study of arbitrarypatterns of failure (“omission schemes”), propos-ing notions and notations that revealed very convenient to handle, with a very broad scope of applications. In particular, “standard” failure patterns are very simply expressed in this
framework. In the large subclass of omission schemes where the double simultaneous omission can never happen, we characterize which one are obstructions for the Coordinated Attack Problem. Finally, giving tight results in our context means describing the obstructions that are minimal, the ones for which restricting only one possible scenario turns the problem from unsolvable to solvable. Thinking about our previous discussion about the trivial impossibility proofs, and therefore the need to address the problem for interesting
fault environments, intuitively, it implies that the minimal obstructions are the one for which the impossibility proof is the most challenging, hence they are the more relevant environments. Our result shows for the first time that the well studied omission scheme where at most one message can be lost at each round is merely a minimal obstruction for the Coordinated Attack Problem. And thirdly, we extend our study to arbitrary graphs deriving new results, like an answer to an open problem of [SW07].
The outline of the paper is the following. We describe Models and define our Problem in the Section II. We present numerous examples of application of our terminology and no-tation in Section II-E. We then address the characterization of omission schemes without simultaneous faults that are obstruc-tions for the Coordinated Attack Problem in Theorem III.8. We give the proofs for necessary condition (impossibility result) in III-C and for sufficient condition (explicit algorithm) in III-D. Applications are proposed on Section IV. An extension of both the notations and some results for arbitrary graphs are given in Section V.
II. MODELS ANDDEFINITIONS A. The Coordinated Attack Problem
1) A folklore problem: Two generals have gathered forces on top of two facing hills. In between, in the valley, their common enemy is entrenched. Every day each general sends a messenger to the other through the valley. However this is risky as the enemy may capture them. Now they need to get the last piece of information: are theybothready to attack?
This two army problem was originated by [AEH75] and then by Gray [Gra78] when modeling the distributed database commit. It corresponds to the binary consensus with two processes in the omission model. If their is no restriction on the fault environments, then any messenger may be captured. And if any messenger may be captured, then consensus is obviously impossible: the enemy can succeed in capturing all of them, and without communication, no distributed algorithm.
2) Possible Environments: Before trying to address what can be, in some sense, the most relevant environments, we will describe different environments in which the enemy cannot be so powerful as to be able to capture any messenger.
This is a list of possible environments, using the same military analogy. The generals name areWhiteandBlack.
1) no messenger is captured
2) messengers fromGeneral Whitemay be captured 3) messengers fromGeneral Blackmay be captured
4) messengers from one general are at risk, and if one of them is captured, all the following will also be captured (the enemy got the secret “Code of Operations” for this general from the first captured messenger)
5) messengers from one general are at risk (the enemy could manage to infiltrate a spy in one of the armies) 6) at most one messenger may be captured each day (the
enemy can’t closely watch both armies on the same day) 7) any messenger may be captured
Which ones are (trivial) obstructions, and which are not? Nor obstruction, nor trivial? What about more complicated environments?
B. The Binary Consensus Problem
A set of synchronous processes wish to agree about a binary value. This problem was first identified and formalized by Lamport, Shostak and Pease [PSL80]. Given a set of processes, a consensus protocol must satisfy the following properties for any combination of initial values [Lyn96]:
• Termination: every process decides some value.
• Validity: if all processes initially propose the same value
v, then every process decidesv.
• Agreement: if a process decides v, then every process decides v.
Consensus with such a termination and decision requirement for every process is more precisely referred to as theUniform Consensus, see [Ray02] for example for a discussion. Given a fault environment, the natural questions are : is the Consensus solvable, if it is solvable, what is the minimal complexity?
C. The Communication Model
In the first part of the paper, the system we consider is a setΠof only2processes namedwhiteandblack,Π ={◦,•}. The processes evolve in synchronized rounds. In each round
r, every process x executes the following steps: x sends a message to the other process, receives a messages M from the other process, and then updates its state according to the received message.
The messages sent are, or are not, delivered according to the environment in which the distributed computation takes place. This environment is described by a omission scheme. Such a communication scheme models exactly how some messages sent by a process to the other process may be lost at some rounds. We will consider arbitrary omission schemes, they will be represented as an arbitrary set of infinite sequences of combinations of messages being possibly transmitted or not.
In the following of [CBS09], we are not interested in the exact cause of a message not being sent. We only refer to the phenomenon: messages being transmitted or not. The fact that the schemes are arbitrary means that we do not endorse any metric to count the number of “failures” in the system. There are metrics that count the number of lost messages, that count the number of process that can lost messages (both in send and receive actions). Other metrics count the same parameters but only during a round of the system. The inconvenient of this metric-centric approach is that, even when restricted only
to omission faults, it can happen that some results obtained on complete networks are not usable on arbitrary networks, because in, say, a ring network, you are stating in some sense that every node is faulty. See eg [SW07] for a very similar discussion, where Santoro and Widmayer, trying to solve some generalization of agreement problems in general networks could not use directly the known results in complete networks.
As said in the introduction, the Coordinated Attack Problem is nowadays stated as the Uniform Consensus Problem for 2 synchronous processes communicating by message passing in the presence of omission faults. Nonetheless, we will use Consensus and Uniform Consensus interchangeably in this paper. We emphasize that, because we do not assign omission faults to any process (see previous discussion), there are only correct processes.
D. Omission Schemes
We introduce and present here our notation. In Section V-A this is naturally and, we shall say, interestingly extended to graphs of arbitrary size and topology.
Definition II.1. We denote by Σ the set of directed graphs with vertices inΠ.
Σ ={◦↔•,◦←•,◦→•,◦−−•} We denote byΓ the following subset ofΣ
Γ ={◦↔•,◦←•,◦→•}.
Those elements describe what can happen at a given round with the following obvious semantics:
• ◦↔•, no process looses messages
• ◦←•,the message of process◦, if any, is not transmitted • ◦→•,the message of process•, if any, is not transmitted • ◦−−•,both messages, if any, are not transmitted1.
Definition II.2. Anomission schemeoverΠis a set of infinite sequences of elements ofΣ.
We will use the standard following notations in order to describe more easily our communication schemes [PP04]. A (infinite) sequence is seen as a (infinite) word over the alphabet Σ. The empty word is denoted byε.
Definition II.3. Given Φ ⊂ Σ, Φ∗ is the set of all finite sequences of elements ofΦ,Φω is the set of all infinite ones
andΦ∞= Φ∗∪Φω.
A word in L ⊂ Σω is called a communication scenario (
or scenario for short) of omission scheme L. Given a word
w∈ Σ∗, it is called a partial scenario and|w| is the length of this word.
Intuitively, the letters at position r of the word describe whether there will be, or not, transmission of the message, if one is being sent at round r. A formal definition of an execution under a scenario will be given in Section II-F.
1In order to increase the readability, we note◦−−•instead of◦ •, the
We recall now the definition of the prefix of words and languages. A word u ∈ Σ∗ is a prefix for w ∈ Σ∗ (resp.
w0 ∈ Σω) if there exists v ∈ Σ∗ (resp. v0 ∈ Σω) such that
w=uv (resp.w0=uv0). Givenw∈Σω andr∈
N,w|r is the prefix of sizerofw. Definition II.4. Let w ∈ Γ∗, then P ref(w) = {u ∈ Γ∗|uis a prefix ofw}. Let L ⊂ Γ∗, then P ref(L) =
S
w∈L
P ref(w).
Remark. We do not restrict our study to regular sets, however all communication schemes we are aware of are regular, as could be seen in the following examples, where the rational expressions prove to be very convenient.
E. Examples
We show how standard fault environments are conveniently described in our framework.
Example II.5. Consider a system where, at each round, up to2messages can be lost. The associated omission scheme is Σω.
Example II.6. Consider a system where, at each round, only one message can be lost. The associated omission scheme is Γω.
Example II.7. Consider a system where, at each round, only one message can be lost. The associated omission scheme is {◦↔•,◦←•, lnoir}ω.
Example II.8. A communication system is fair if given an infinity of sent messages, an infinity is also received. The associated omission scheme is the following:
F = Σω\Σ∗({◦−−•,◦←•}ω∪ {◦−−•,◦→•}ω). Example II.9. Consider a system where at most one of the processes can loose messages. The associated scheme is the following:
S1={◦↔•,◦←•}ω∪ {◦↔•,◦→•}ω
Example II.10. Consider a system where at most one of the processes can crash, For the phenomena point of view, this is equivalent to the fact that at some point, no message from a particular process will be transmitted. The associated scheme is the following:
C1={◦↔•}ω∪ {◦↔•}∗({◦←•}ω∪ {◦→•}ω)
Example II.11. Finally the seven simple cases exposed in
Section II-A2 are described as follows:
S0 = {◦↔•}ω (1) T◦ = {◦↔•,◦←•}ω (2) T• = {◦↔•,◦→•}ω (3) C1 = {◦↔•}ω∪ {◦↔•}∗{◦←•}ω∪ {◦↔•}∗{◦→•}ω(4) S1 = {◦↔•,◦←•}ω∪ {◦↔•,◦→•}ω=T◦∪T• (5) R1 = {◦↔•,◦←•,◦→•}ω (6) S2 = Σω (7)
F. Execution of a Distributed Algorithm
Given an omission schemeL, we define what is a successful run of a given algorithmA.
An execution, or run, of an algorithm A under scenario
w∈Lis the following. At roundr∈N, messages are sent (or not) by the processes. The fact that the corresponding receive action will be successful depends ofa, ther-th letter ofw.
• if a = ◦↔•, then any eventual message is correctly delivered,
• if a = ◦←•, then the message of process ◦ is not transmitted (the receive call of •, if any at this round, returns null),
• if a = ◦←•, then the message of process • is not transmitted (the receive call of ◦, if any at this round, returns null),
• ifa=◦−−•,both eventual messages are not transmitted. Then, both processes updates their state according toAand the value received.
Given u ∈ P ref(w), we denote by sx(u) the state of
processxat the|u|-th round of the algorithmAunder scenario
w. This means in particular that sx(ε) represents the initial
state ofx.
An execution is a (possibly infinite) sequence of such messages exchanges and corresponding states. Finally and classically,
Definition II.12. A algorithm A solves the Coordinated At-tacked Problem for the omission schemeLif for any scenario
w∈L, there existsu∈P ref(w)such that the states of the two processes (s◦(u) ands•(u)) satisfy the three conditions of Section II-B.
Definition II.13. An omission schemeLis said to besolvable
if there exists an algorithm that solves the Coordinated At-tacked Problem forL. It is said to be anobstructionotherwise. An omission schemeLis a (inclusion) minimal obstruction if anyL0 L is solvable.
Remark. We emphasize that for an algorithm, “knowing” the omission scheme against which it runs is not tantamount to know whether a given message is actually received or not.
III. CHARACTERIZATION OFSOLVABLESCHEMES WITHOUTDOUBLEOMISSION
In this part, we consider the schemes without double omis-sion, that is the schemesL⊂Γω. We characterize which one
are solvable or not. Where the consensus is achievable, we also give an effective algorithm that will be proved to be optimal in some particular cases.
A. Index of a Scenario
We will use the following integer function of scenarios that, in some sense, encodes all the property the omission schemes. By induction, we define the following integer index givenw∈ Γ∗. First, we defineδ onΓ by
• δ(◦→•) =−1, • δ(◦↔•) = 0. • δ(◦←•) = 1,
Definition III.1. Letw∈Γ∗. We defineind(ε) = 0. If|w| ≥ 1, then we have w = ua whereu ∈ Γ∗ and a ∈ Γ. In this case, we define
ind(w) := 3ind(u) + (−1)ind(u)δ(a) + 1.
Lemma III.2. Letr∈N. The application indis a bijection
fromΓrto
J0,3
r−1
K.
Proof: The lemma is proved by a simple induction. If
r= 0, then the property holds.
Letr∈N∗. Suppose the property is satisfied forr−1. Given
w ∈Γr, we have w=ua withu∈Γr−1 anda∈ Γ. From
ind(w) = 3ind(u) + (−1)ind(u)δ(a) + 1 and the induction hypothesis, we get immediately that 0≤ind(w)≤3r−1.
Now, we need only to prove injectivity. Suppose there are
w, w0 ∈Γr such thatind(w) =ind(w0). So there areu, u0∈ Γr−1 anda, a0 ∈Γsuch that
3ind(u)+(−1)ind(u)δ(a)+1 = 3ind(u0)+(−1)ind(u0)δ(a0)+1.
Then
3(ind(u)−ind(u0)) = (−1)ind(u0)δ(a0)−(−1)ind(u)δ(a).
Remarking that the right hand side of this integer equality has an absolute value that can be at most 2, we finally get
ind(u) = ind(u0)
(−1)ind(u)δ(a) = (−1)ind(u0)δ(a0)
By induction hypothesis, we get that u=u0 and a =a0. Hence w=w0, andindis injective, therefore bijective ofΓr
ontoJ0,3r−1
K.
Two easy calculations give
Proposition III.3. Let r ∈ N, ind(◦←•r) = 0 and
ind(◦→•r) = 3r−1.
In Figure 1, the indexes for words of length at most 2 are given.
We now describe precisely what are the words whose indexes differs by only 1.
Lemma III.4. Let r ∈ N, and v, v0 ∈ Γr. Then ind(v0) =
ind(v)+1if and only if one of the following conditions holds: III.4.i ind(v)is even and
• either there existsu∈ Γr−1, and v =u◦↔•, v0 =
u◦→•, word of length1 ◦→• ◦↔• ◦←• index 0 1 2 word of length2 ◦→•◦→• ◦→•◦↔• ◦→•◦←• index 0 1 2 word of length2 ◦↔•◦→• ◦↔•◦↔• ◦↔•◦←• index 5 4 3 word of length2 ◦←•◦→• ◦←•◦↔• ◦←•◦←• index 6 7 8
Fig. 1. Indexes for some short words
• either there exists u, u0 ∈ Γr−1, and v = u◦←•,
v0=u0◦←•, and ind(u0) =ind(u) + 1.
III.4.ii ind(v)is odd and
• either there exists u∈Γr−1, and v =u◦←•,v0 =
u◦↔•,
• either there exists u, u0 ∈ Γr−1, and v = u◦→•,
v0=u0◦→•, and ind(u0) =ind(u) + 1.
Proof: The lemma is proved by a induction. If r = 0, then the property holds. Let r∈N∗. Suppose the property is satisfied forr−1.
Suppose there arew, w0∈Γrsuch thatind(w0) =ind(w)+ 1. So there areu, u0∈Γr−1 anda, a0 ∈Γsuch that
3ind(u)+(−1)ind(u)δ(a)+2 = 3ind(u0)+(−1)ind(u0)δ(a0)+1.
Then
3(ind(u)−ind(u0)) = (−1)ind(u0)δ(a0)−(−1)ind(u)δ(a)−1.
Remarking again that this is an integer equality, we then have
• either (ind(u) = ind(u0) and (−1)ind(u
0)
δ(a0) = (−1)ind(u)δ(a) + 1,
• either (ind(u0) = ind(u) + 1 and (−1)ind(u0)δ(a0) = (−1)ind(u)δ(a)−2.
This yields the following cases:
1) ind(u) =ind(u0)is even,a=◦←• anda0=◦↔•, 2) ind(u) =ind(u0)is odd,a=◦→• anda0 =◦↔•, 3) ind(u) =ind(u0)is even,a=◦↔• anda0=◦→•, 4) ind(u) =ind(u0)is odd,a=◦↔• anda0 =◦←•, 5) ind(u0) =ind(u) + 1,ind(u)is even,a=◦←•=a0, 6) ind(u0) =ind(u) + 1,ind(u)is odd,a=◦→•=a0, Getting all the pieces together, we get the results of the lemma.
Given an algorithmA, we have this fundamental corollary:
Corollary III.5. Letv, v0∈Γr such thatind(v0) =ind(v) + 1. Then,
III.5.i ifind(v)is even thens•(v) =s•(v0), III.5.ii if ind(v)is odd thens◦(v) =s◦(v0).
Proof: We prove the result using Lemma III.4 and re-marking that either a process receives a message from the other process being in the same state in the preceding configuration
u; either it receives no message when the state of the other process actually differ.
B. A Characterization
We prove that an omission scheme L⊂ Γω is solvable if and only if it does not contain a fair scenario or a special pair of unfair scenarios. We define the following set to help describe the special unfair pairs.
We recall the definition of unfair scenarios.
Definition III.6. A scenario w ∈ Σω is unfair if w ∈
Σ∗({◦−−•,◦←•}ω∪ {◦−−•,◦→•}ω). The set of fair scenarios
of Γω is denoted byF air(Γω).
Definition III.7. We define special pairs SP air(Γω) = {(w, w0)∈Γω×Γω|w6=w0,∀r∈
N|ind(w|r)−ind(w|0r)| ≤
1}.
Theorem III.8. Let L⊂Γω, then Consensus is solvable for omission scheme Lif and only if one of the following holds
III.8.i ∃f ∈F air(Γω), f /∈L,
III.8.ii ∃(u, u0)∈SP air(Γω), u, u0∈/L,
III.8.iii ◦→•ω∈/ L, III.8.iv ◦←•ω∈/L.
We present the proof of the Theorem III.8 in the following sections.
C. The Necessary Condition: an Impossibility Proof
We will use a standard bivalency proof technique. We sup-pose now on that there is an algorithmAto solve Consensus on L.
We proceed by contradiction. So we suppose that all the conditions of Theorem III.8 are not true, ie that
F air(Γω)∪ {◦→•ω,◦←•ω} ⊂L.
And that for all (w, w0)∈SP air(Γω), w /∈L=⇒w0∈L.
Definition III.9. Given an initial configuration, let v ∈
P ref(L) and i ∈ {0,1}. The partial scenario v is said to bei-valentif, for all scenariow∈Lsuch thatv∈P ref(w), Adecidesiat the end. Ifvis not0−valent nor1−valent, then it is saidbivalent.
By hypothesis, ◦←•ω,◦→•ω ∈L. Consider our algorithm A with input 0 on both processes running under scenario ◦←•ω. The algorithm terminates and has to output 0 by
Validity Property. Similarly, both processes output 1 under scenario◦→•ω with input1 on both processes.
From now on, we have this initial configuration I: 0 on process◦and1on process•. For◦there is no difference under scenario◦→•ω for this initial configurations and the previous
one. Hence, A will output 0 for scenario ◦→•ω. Similarly,
for • there is no difference under scenario ◦←•ω for both
considered initial configurations. Hence, A will output 1 for this other scenario. Henceεis bivalent for initial configuration
I. In the following, valency will always be implicitly defined with respect to the initial configurationI.
Definition III.10. Let v∈P ref(L),v isdecisive if III.10.i v is bivalent,
III.10.ii For anya∈Γsuch thatva∈P ref(L),vais not bivalent.
Lemma III.11. There exists a decisivev∈P ref(L).
Proof: We suppose this not true. Then, as ε is bivalent, it is possible to construct w ∈ Γω such that, P ref(w) ⊂
P ref(L)and for anyv ∈P ref(w),v is bivalent. Bivalency for a givenv means in particular that the algorithmAhas not stopped yet. Thereforew /∈L.
This means thatwis unfair. Then this means that, w.l.o.g we haveu∈Γ+, such that w=u◦←•ω andind(u)is even. We
denote byw0=ind−1(ind(u)−1)◦←•ω. The couple(w, w0) is a special pair. Therefore w0 belongs to L, so A halts at some roundr0 under scenariow0.
By Corollary III.5, s•(w|r0) = s•(w
0
|r0) This means that
w|r0 is not bivalent. As w|r0 ∈ P ref(L) this gives a
contradiction.
We can now end the impossibility proof.
Proof: Consider a decisivev ∈P ref(L). By definition, this means that there exists a, b ∈ Γ, witha 6= b, such that
va, vb∈P ref(L)and vaandvbare not bivalent and are of different valencies. Obviously, we can choose b to be ◦↔•. We terminate by a case by case analysis.
Suppose that a = ◦←•, and ind(v) is even. By Corol-lary III.5, this means • is in the same state after vaand vb. We consider the scenarios va◦←•ω andvb◦←•ω, they forms
a special pair. So one belongs to L by hypothesis, therefore both processes should halt at some point under this scenario. However, the state of • is always the same under the two scenarios so if it halts and decides some value, we get a contradiction with the different valencies.
Other cases are treated similarly.
D. A Consensus Algorithm
Given a word w inΓω, we define the following algorithm Aw (see Algorithm 1). It has messages always of the same
type. They have two components, the first one is the initial bit, named init. The second is an integer named ind. Given a message msg, we note msg.init (resp.msg.ind) the first (resp. the second) component of the message.
We prove that the index computed in the algorithm are al-most always equal to the index of the scenario.More precisely,
Proposition III.12. For any round r of an execution of Algorithm Aw under scenario v ∈ Γr, such that no process has already halted,
|ind•r−ind◦r|= 1,
sign(ind•r−indr◦) = (−1)ind(v),
ind(v) = min{ind◦r, ind•r}.
Proof:We prove the result by induction over r∈N. Forr= 0, the equalities are satisfied.
Suppose the property is true forr−1. We consider a round of the algorithm. Let u∈ Γr−1 anda ∈Γ, and consider an execution under environmentw=ua. There are exactly three cases to consider.
Algorithm 1: Consensus AlgorithmAw for Processx: Data:w∈Γω Input:init∈ {0,1} r=0; initother=null; if x=◦ then ind=0; else ind=1;
while |ind−ind(w|r)| ≤1 do
msg = (init,ind); send(msg); msg = receive();
if msg == nullthen// message was lost
ind= 3∗ind;
else
ind= 2∗msg.ind+ind; initother = msg.init; r=r+1;
if x=◦ then
if ind≤ind(w|r)then Output: init
else
Output: initother
else
if ind≥ind(w|r)then Output: init
else
Output: initother
Suppose a = ◦↔•. Then it means both messages are received andind◦r= 2ind•r−1+indr◦−1andind•r= 2ind◦r−1+
ind•r−1.Hence ind•r−indr◦=ind◦r−1−ind•r−1. Thus, using the recurrence property, we get |ind•r−ind◦r|= 1.
Moreover, by construction
ind(v) = ind(ua)
= 3ind(u) + (−1)ind(u)(δ(a)) + 1 = 3ind(u) + 1
The two indicesind(u)andind(v)are therefore of opposite parity. Hence, by induction property, sign(ind•r−ind◦r) = (−1)ind(v). And remarking that min{ind◦
r, ind•r}=
min{2ind•r−1+ind◦r−1,2ind◦r−1+ind•r−1} = ind•r−1+ind◦r−1+ min{ind◦r−1, ind•r−1} = 2 min{ind◦r−1, ind•r−1}+|ind•r−1−ind◦r−1|
+ min{ind◦r−1, ind•r−1} = 3 min{ind◦r−1, ind•r−1}+ 1
we get that the third equality is also verified in round r. Consider now the case a=◦→•. Then◦ gets no message from•and•gets a message from◦. So we have thatind◦r= 3ind◦r−1 and ind•r = 2ind◦r−1+ind•r−1. So we have that
ind•r−ind◦r=indr•−1−ind◦r−1. The first equality is satisfied.
We also have thatind(v) = 3ind(u) + (−1)ind(u)δ(◦→•) + 1 = 3ind(u) +α, withα= 0 if ind(u) is even and α= 2 otherwise. Hence, ind(u) andind(v) are of the same parity, and the second equality is also satisfied.
Finally, we have that min{ind◦r, ind•r} = {3ind◦r−1,2ind◦r−1 + ind•r−1}. If ind(u) is even, then
ind(u) = ind◦r−1 and the equality holds. If ind(u) is odd, thenind(u) =ind•r−1 and the equality also holds.
The case a = ◦←• is a symmetric case and is proved similarly.
E. Correctness of the Algorithm
Given an omission scheme L, we suppose that one of the following holds.
• ∃f ∈F air(Γω), f /∈L,
• ∃(u, u0)∈SP air(Γω), u, u0∈/L, • ◦→•ω∈/L,
• ◦←•ω∈/L.
In particular, L (Γω and we denote by w, a scenario in
Γω\L. Ifwis unfair, we assume it is either ◦→•ω or ◦←•ω,
or it belongs to a special pair that is not included in L. We consider the algorithmAwwith parameterwas defined
above.
Lemma III.13. Let v ∈ L. There exists r ∈ N such that |ind(v|r)−ind(w|r)| ≥2.
Proof: Given w /∈ L, at some round r, w|r 6= v|r.
Therefore|ind(v|r)−ind(w|r)| ≥1. From Lemma III.4 and
Definition III.7, it can be seen that the only way to remain indefinitely at a difference of one is exactly thatwandvform a special pair. Given the way we choosew, and that v ∈L, this is is impossible. So at some point, the difference will be greater than 2.
Now, we prove that under the omission scheme L, the algorithm is correct.
Proof:First we show Termination. This is a corollary of Lemma III.13 and Proposition III.12. Denote by pthe round when it is first terminated for one of the process. If the other process x has also terminated at round p then we are done. Otherwise, and according to Proposition III.12, this means that |indx
r−ind(w|r)|= 1.Then, in the following round, xwill
receive no message (the other process has halted) and|indx r−
ind(w|r)| ≥2 will hold.
The validity property is also obvious, as the output values are ones of the initial values. Since the only case where
initother• = null is when • has not received any mes-sages from ◦, ie. in the case where w = ◦←•p. But as
ind(◦←•p) = 3p−1is the maximal possible index for scenario
of length r, and ind(w|p) < ind•(◦←•p), it cannot have to
outputinitotherin this case. Similarly for◦, this proves that
nullcan never be output by any process.
We now prove the agreement property. Given that |ind◦r−
ind•r|= 1 by Proposition III.12, when the processes halt, the indexes are on the same side of ind(w|r). This means that
one of process outputs init, the other outputting initother. By construction, they output the same value.
F. Tightness of the Round Complexity
The time complexity (round complexity) of the algorithm is in some sense optimal. An Early Consensus Algorithm is a Consensus Algorithm that terminates early when there is less actual failures that it is possible [PRT06]. In some sense, this is what can be obtained here even if the notion of maximum of possible failures has no particular sense for us since we are not using a particular failure metric but rather considering any arbitrary failure pattern.
Due to lack of space, we will not present a full discussion, but we will rather show how to obtain the optimality in one particular case. It has to be noted that, givenL, thew∈Γω\L
that is chosen is not necessarily the one that gives an optimal algorithm. It might depend on the very structure of L.
We consider the particular case whereP ref(L) Γ∗. We denote by p the smallest integer such that Γp
* P ref(L). By hypothesis, this integer is always well defined.We denote by w0 a word in Γp and not inP ref(L).We prove that any algorithm solving Consensus onL needs at leastprounds.
Corollary III.14. Let A be an algorithm solving Consensus on Π under scenarios L ⊂ Γω. Then A runs at least in p rounds in the worst case.
Proof: Suppose that A run in q rounds with q < p
under any scenario in L. We have that P ref(L)∩Γq = Γq.
Consequently,Ais also a solution for Consensus on scenario Σω. This is a contradiction with Theorem III.8.
If we note w=w0◦↔•ω, then it proves that the bound is
tight since
Proposition III.15. Modifying the algorithm Aw by adding to thewhilecondition “orr=p”, we get an algorithm that halts in at most prounds under omission schemeL.
Proof: Considering the proof of the correctness of algo-rithm Aw, it shows that the only problem is when indx =
ind(w|r). But given that w0 is not a possibility, we can
correctly conclude at this round p. IV. APPLICATIONS A. Back to the Coordinated Attack Problem
We consider now our question on the seven examples of Example II.11.
The answer to possibility is obvious for the first and last cases. In the first three cases consensus can be reached in one day provided that the generals know in which scenario they are. The fourth and fifth cases are a bit more difficult but within reach of our Theorem III.8. We remark that the scenario ◦←•◦→•◦↔•ω is a fair scenario that does not belong to C
1, nor S1. Therefore, those are solvable cases also.
Note that the fourth and first case correspond respectively to the crash-prone model [Lyn96] and to the 1−resilient model [Ada03], [GKP03]. In the last one [Gra78], [Lyn96] consensus can’t be achieved, as said before.
Now, using Proposition III.15 for the five first of the seven cases of Example II.11 yields the following.
1) S0 is solvable in 1 round, 2) T◦ is solvable in 1 round, 3) T• is solvable in 1 round,
4) C1 is solvable in exactly2 rounds, 5) S1 is solvable in exactly2 rounds.
B. Application to Almost Fair Scenarios
Corollary IV.1. LetF◦= Γω\{◦→•ω}. The schemeF◦is not
an obstruction.
The corresponding algorithm A◦→•ω solves the Consensus
Problem by Theorem III.8. It is worth to note that it corre-sponds exactly to the following intuitive algorithm.
Process◦ sends its initial value to process• until it gets a message from•. Then it stops, outputting•’s initial value. On the other hand, process•sends its initial value until it receives no message from◦ and outputs its own initial value.
This shows, that even it is a general meta-algorithm,Aω is
exactly what you would propose intuitively for simple cases such asF◦.
C. About Minimal Obstructions
The Theorem III.8 shows that, even in the simpler subclass where no double omission are permitted, simple inclusion-minimal schemes may not exist. Indeed, there exists a se-quence of unfair scenarios (ui)i∈N) such that ∀i, j, (ui, uj)
is not a special pair. Therefore Ln = Γω\S0≤i≤nui define
an infinite decreasing sequence of obstructions for the Coor-dinated Attack Problem.
As the set SP air(Γω) define a bipartite graph over the unfair omissions, it is possible to prove that for any set U
of unfair scenario that is a cover for this graph, we have, by Theorem III.8, that the schemeΓω\Uis a minimal obstruction.
As a partial conclusion, we shall say that the well known schemeΓω, even not being formally a minimal obstruction, is
the nearest obstruction we have to asimpleminimal obstruc-tion...
V. NETWORKS OFARBITRARYSIZE
In this section, we investigate the case of synchronous communication networks of arbitrary (connected) topology with omission failures. Such a study has been done in [SW07], along with the study of numerous other fault models. In the part about omission failures, it is proved that if at mostf losses of messages per round can happen, the Consensus Problem2 is solvable on a network G if f is strictly less than c(G), the connectivity of G, ie the minimal number of edges to remove to disconnect the graphG. Iff is greater or equal to the minimum degreedeg(G)of G, the Consensus Problem is proved to be impossible to solve.
So for graphs with c(G)< deg(G), the question was still open whenc(G)≤f < deg(G). Here we prove the following theorem:
2In [SW07],k−Majority Problems are actually investigated. We focus here
on the results about the so-called “Unanimity Problem”, that is equivalent to the Consensus Problem.
Theorem V.1. The Consensus Problem is solvable on a synchronous communication networkGwhere at mostflosses of messages per round can happen if and only if f < c(G).
A. Definitions and Notations
Before presenting the proof, we show how to extend our notations from Section II-D. First, given an undirected graph
G, we note −→G the directed version, ie the directed graph (V,−→E), where
− →
E ={(u, v),(v, u)| {u, v} ∈E}.
A round with omission faults is then expressed as a sub-graph of −→G with the obvious semantic of Section II. Hence, we define the alphabetΣG by
ΣG={(V, −→
E0)|−E→0⊆−→E}.
The notation of Section II-D is naturally extended here to these (bigger) alphabets, and all possible scenarios on G are described by Σω
G.
With these notations, we can now describe the communica-tion scheme involved in the work of Santoro and Widmayer. We noteOf the following set,
Of ={(V, −→
E0)|−E→0⊂−→E ,Card(−→E\−E→0≤f}.
Solving the Consensus Problem with f omissions per rounds is exactly solving the Consensus Problem for the schemeOωf.
B. Proof
Now we present the proof of Theorem V.1.
Proof:The sufficient part being already proved in [SW07] (using a broadcast-based algorithm), we only have to prove the impossibility part.
Let G = (V, E) be a (undirected) connected graph, and let’s f =c(G). By definition of the connectivity, there exists a 3-partition A, B, C of E such that
1) Card(C) =f
2) the graphs (VA, A) and (VB, B) are connected, where
VX={v∈V | ∃e∈X, v∈e},
3) the graph (V, A∪B)is disconnected.
We assume that C={a1, b1, . . . , af, bf} withai∈Aand
bi∈B for alli. We define the following alphabetΓC:
ΓC={C↔, C→, C←} where C↔ = − → G , C→ = V,−→A∪−→B ∪ {(a1, b1), . . . ,(af, bf)} , C← = V,−→A∪−→B ∪ {(b1, a1), . . . ,(bf, af)} .
The set ΓC corresponds to the set of rounds where either
no message is lost or either all messages fromAtoBare lost, either all messages from B toA. This is an alphabet of size 3.
We will prove now that the omission scheme ΓωC is not solvable. SinceΓω
C⊆O ω
f, this will end the proof.
We prove the impossibility by reduction to the two processes case with omission schemeΓω.
Suppose now that we have an algorithm A that solves the Consensus Problem for the scheme Γω
C. We describe how to
derive an algorithmA0 that solves the consensus for Γω. The
idea is that ◦ will emulate A on sub-graph A, and • will emulate it on sub-graphB. See Algorithms 2 and 3
Algorithm 2: AlgorithmA0 for process◦
Input:init∈ {0,1}
H =A;
Label each node ofH byinit;
whileAis not finished on H do
EmulateAonH except for nodesa1, . . . , af;
Send messageM ={(i, mi), i= 1..f} wheremi is
defined by
• the message from ai tobi in this round for A,
• ⊥if there is not such a message Receive message N;
Apply Atoa1, . . . , af by extracting messages ni
fromN;
Output: value decided by AonH.
Algorithm 3: AlgorithmA0 for process•
Input:init∈ {0,1}
H =B;
Label each node ofH byinit;
while Ais not finished onH do
EmulateAonH except for nodesb1, . . . , bf;
Send messageM ={(i, mi), i= 1..f} wheremi is
defined by
• the message from bi toai in this round for A,
• ⊥if there is not such a message. Receive message N;
Apply Atob1, . . . , bf by extracting messages ni
fromN;
Output: value decided by AonH. We define now a bijectionρfromΓω
CtoΓ
ω. We first define
ρon the letters of the alphabet:
ρ(C↔) = ◦↔•
ρ(C→) = ◦→•
ρ(C←) = ◦←•
The functionρis then extended to infinite words ofΓωC. This is a bijection by construction.
It is then straightforward to check that any run ofA0 with scenario w from Γω yields a valid run of A with scenario
ρ−1(w)fromΓωC.
It is then straightforward to derive the three properties of Section II-B for A0 from the fact there are to be satisfied by
This is a contradiction with Theorem III.8.
C. About the Minimality of SchemeΓω C
The fact that the sub-graphs induced by A and B are actually connected can be used to prove thatΓω
C is also close
to a minimal obstruction. By extension of Definition III.7, we define the C-special pairs of Γω
C to be elements w, w
0 ∈Γω C
such that (ρ(w), ρ(w0))is a special pair ofΓω. Proposition V.2. A scheme L(Γω
C is an obstruction if and only if one of the following holds:
V.2.i L= Γω C, V.2.ii w∈Γω C\L⇒ ∃w 0 such that • (w, w0) is aC-special pair, • w0∈L.
Proof: The sufficient condition is obtained by using Theorem III.8 and the simulation technique of the previous proof.
We suppose now that L is such that ρ(L) is solvable. To prove the necessary condition, we will use the fact that the components A and B are connected to derive an algorithm AL. See Algorithm 4.
Algorithm 4: A Consensus AlgorithmAL for nodev. if v isa1 (resp.b1)then
Execute the algorithmAρ(w) with role◦ (resp.•), and neighbor b1 as •(resp.a1 as ◦);
Broadcast value returned by Aρ(w);
else
Broadcast received value.
Output: broadcasted value
Before the broadcast part, AL is simply Aρ(w) executed by a1 andb1. This part will terminate as the communication restricted to the link (a1, b1) is exactlyρ(u) for any wordu of L, andρ(u)∈Γω. The final broadcast succeed to send the
final value to any node of component A (resp. B) from a1 (resp.b1) because the component is connected and there is no omission in it for any scenarios of the schemeΓω
C, hence for
any scenarios in L. The validity and agreement properties for AL are derived easily from these properties forAρ(w).
VI. CONCLUSION ANDFUTUREWORKS
We have presented in this paper a formalism that help handle the question: for the Coordinated Attack Problem (that is, the Uniform Consensus Problem for two processes), which patterns of omission failures are obstructions. We also focus our study on Γω, the omission scheme where at most one
message can be lost at any given round. We showed, up to some special cases, any subfamily is solvable. We also gave a partial extension to communication networks of arbitrary size. Even if the work on two processes was restricted to omission schemes without double omission, we were able to derive from this study an exact bound about the solvability of
the Consensus Problem in arbitrary communication networks, tightening in the process the bounds given [SW07].
This partial characterizations have to be extended in two directions. First, a general characterization has to be provided, when messages can be lost at the same time. Moreover, this work should be fully extended for any given number of processes, not only for the case where the possible omissions are exactly the same at each round.
As the index function has a natural topological inter-pretation, one way to do this would probably be to better understand the present work in relation to the now stan-dard topological techniques that can be used in distributed computing [BG93], [HS99], [SZ00], [Ada03], [GKP03]. It is quite well understood, in the standard failure cases, how the impossibility proofs by way of bivalency are related to “connected components” of the global configurations space. We believe that the approach presented here can help to understand this topological relationships even in the presence of arbitrary patterns of failures.
REFERENCES
[Ada03] Giovanni Adagio. Using the topological characterization of synchronous models.Electr. Notes Theor. Comput. Sci., 81, 2003. [AEH75] E. A. Akkoyunlu, K. Ekanadham, and R. V. Huber. Some con-straints and tradeoffs in the design of network communications. InProceedings of the fifth ACM symposium on Operating systems principles, pages 67–74, Austin, Texas, United States, 1975. ACM. [AT99] Marcos Kawazoe Aguilera and Sam Toueg. A simple bivalency proof that -resilient consensus requires + 1 rounds.Inf. Process. Lett., 71(3-4):155–158, 1999.
[BG93] Elizabeth Borowsky and Eli Gafni. Generalized flp impossibility result for t-resilient asynchronous computations. InSTOC ’93: Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pages 91–100, New York, NY, USA, 1993. ACM Press.
[CBGS00] Bernadette Charron-Bost, Rachid Guerraoui, and Andr´e Schiper. Synchronous system and perfect failure detector: Solvability and efficiency issue. InDSN, pages 523–532. IEEE Computer Society, 2000.
[CBS09] Bernadette Charron-Bost and Andr´e Schiper. The heard-of model: computing in distributed systems with benign faults. Distributed Computing, 22(1):49–71, 2009.
[CHLT00] Soma Chaudhuri, Maurice Herlihy, Nancy A. Lynch, and Mark R. Tuttle. Tight bounds for k-set agreement.J. ACM, 47(5):912–943, 2000.
[GKP03] Rachid Guerraoui, Petr Kouznetsov, and Bastian Pochon. A note on set agreement with omission failures. Electr. Notes Theor. Comput. Sci., 81, 2003.
[Gra78] Jim Gray. Notes on data base operating systems. In Operating Systems, An Advanced Course, pages 393–481, London, UK, 1978. Springer-Verlag.
[HS99] Maurice Herlihy and Nir Shavit. The topological structure of asynchronous computability. J. ACM, 46(6):858–923, 1999. [Lyn96] Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann
Publishers Inc., San Francisco, CA, USA, 1996.
[MR98] Yoram Moses and Sergio Rajsbaum. The unified structure of consensus: Aayered analysis approach. InPODC, pages 123– 132, 1998.
[PP04] J.E. Pin and D. Perrin. Infinite Words, volume 141 ofPure and Applied Mathematics. Elsevier, 2004.
[PRT06] Philippe Parvdy, Michel Raynal, and Corentin Travers. Strongly terminating early-stopping k-set agreement in synchronous sys-tems with general omission failures. InProc. of Sirocco’06, 2006. [PSL80] L. Pease, R. Shostak, and L. Lamport. Reaching agreement in the
presence of faults. Journal of the ACM, 27(2):228–234, 1980. [Ray02] Michel Raynal. Consensus in synchronous systems:a concise
guided tour.Pacific Rim International Symposium on Dependable Computing, IEEE, 0:221, 2002.
[San06] N. Santoro.Design and Analysis of Distributed Algorithms. Wiley, 2006.
[SW07] Nicola Santoro and Peter Widmayer. Agreement in synchronous networks with ubiquitous faults. Theor. Comput. Sci., 384(2-3):232–249, 2007.
[SZ00] M. Saks and F. Zaharoglou. ”wait-free k-set agreement is impos-sible: The topology of public knowledge.SIAM J. on Computing, 29:1449–1483, 2000.