Case Studies functional reliability analysis

Chapter 4 BGP Session Lifetime Modeling in Congested Networks

5.4 Reliability Calculation of IBGP Networks

5.4.2 Case Studies functional reliability analysis

We perform a functional reliability analysis on eight IBGP networks which are overlaid on top of the same IP network. The functional reliability analysis means to analyze the reliability of the IBGP network in which the failure probabilities of all components (including IBGP

sessions) are the same. Let us denote r as the happening probability of each failure scenario and denote q as the conditional failure probability of the inﬂuenced IBGP sessions in all network failure scenarios. We only consider failures of single IP link or single router in the following analysis.

Table 5.2 shows eight IBGP reﬂection networks and their reliability metrics P, L_s and Lc. Solid lines represent IP links, dotted lines represent IBGP sessions, and the shaded nodes

represent reﬂectors. We deﬁne a notation β_k = 1−r+r(1−q)k to simplify the representation of IFP P. G, G_r (a) B E C D A _(b) B E C D A _(c) B E C D A _(d) B E C D A IF P : P 1− (1 − r)5_β3 3β22β1 1− (1 − r)5β14 1− (1 − r)5β2β13 1− (1 − r)5β23β13 ESL : L_s 10+9q₅ r (2 + q)r 4+3q₂ r 14+11q₇ r ECL : L_c r 10(20 + 18q) 10r(28 + 18q) 10r(26 + 23q− q 2₎ r 10(20 + 11q + q 2₎ G, G_r (e) B E C D A _(f) B E C D A _(g) B E C D A _(h) B E C D A IF P : P 1− (1 − r)5_β2 3β22β12 1− (1 − r)5β2β14 1− (1 − r)5β33β22β1 1− (1 − r)5β33β22β1 ESL : L_s 18+15q₉ r 10+7q₅ r 10+9q₅ r 10+9q₅ r ECL : L_c ₁₀r(20 + 15q) ₁₀r(25 + 19q− q2₎ r 10(20 + 7q + 3q 2₎ r 10(20 + 11q + q 3₎

Table 5.2: IFP, ESL and ECL comparisons of IBGP route reﬂection networks. Only single router or single link failures are considered. r = r_s, the probability of a failure scenario. q = q_s, the failure probability of IBGP sessions in failure scenarios (Equation 5.1). β_k = 1− r + r(1 − q)k_.

Using L_c as an example, we show the computation of reliability metrics as follows. In Case (c), if E fails, L_c(s) = 10₁₀, because all routers are deﬁnitely isolated; if A fails, A is isolated and B loses contact with others with probability q, so L_c(s) = 4+3q₁₀ ; etc. By combiningL_c(s) of all network failure scenarios, we obtain the connectivity loss of Case (c):

0 0.2 0.4 0.6 0.8 1 2 2.5 3 3.5 4 4.5 5

Expected connectivity loss (

)

BGP session failure probability ( q )

(a) (b) (c) (d) (e) (f) (g) (h) ×r

Figure 5.3: The comparison of expected connectivity loss for the case studies in 5.4.2. Lc = 10r (26 + 23q− q

2_{). The calculations of} _{P and L}

s follow Equation 5.2 and 5.4.

Let us compare these eight IBGP reﬂection topologies in terms of three reliability metrics. It is straightforward to see that 1− r ≤ β_k ≤ 1 and β_k ≥ β_k+1. Thus we can establish the order of the P’s for the eight cases as follows.

IFP P : (b) ≤ (c) ≤ (f) ≤ (d) ≤ (e) ≤ (a) = (g) = (h)

Similarly, we can obtain the order for L_s’s.

ESL L_s : (b)≤ (f) ≤ (c) ≤ (d) ≤ (e) ≤ (a) = (g) = (h)

The above orders of P and L_s hold for any q. However, the order of L_c is slightly inﬂuenced by the speciﬁc value of q and the order is shown in Fig. 5.3.

With respect to IBGP failure probability, Case (b) is the best network, because it has the minimum number of IBGP sessions and covers the minimum number of IP links. By introducing as few as possible unreliable components into the IBGP networks, Case (b) achieves the optimum IFP. Thus, optimizing IFP requires using the smallest number of

IBGP sessions and IP links, and redundant reﬂectors and sessions are thus not favored. However, in practice, this design strategy does not give us reliable IBGP networks in terms of IBGP function loss. The reason is that the impact of IBGP failures may be signiﬁcant.

ESL and ECL take into account the impact of IBGP failures. In this case study, it shows that ESL sometimes can not measure the loss of IBGP function appropriately. ESL order is roughly the same as IFP order, except for the sequence of Case (c) and Case (f) swapped, which is due to the redundant session in Case (f). Intuitively, the cases with more redundant elements lead to more robust IBGP networks, but ESL does not reﬂect this observation. On the other hand, ECL, which models the IBGP function loss more elaborately gives us satisfactory characterization, i.e., cases (b), (c) and (f) with small number of IBGP sessions and reﬂectors are less robust than other cases (shown in Fig. 5.3). The disadvantage of ECL is that it requires higher computing complexity than ESL. We analyze the reliability of these eight IBGP networks as follows by using ECL.

Case (a) is a traditional full mesh IBGP network with 10 sessions and no route reflector is deployed. Case (b) and (c) are route reflection networks, which have two and one cluster, respectively. Both of them suffer from the single point of failure problem. For example, if E fails in (c), all routers are isolated. Thus, cases (b) and (c) are less reliable than Case (a). There are two ways to increase IBGP network reliability: using redundant reflectors and adding redundant IBGP sessions between clients.

Case (d) uses two reflectors in one cluster. It is much more resilient than Case (c), due to the redundant reflectors and 3 additional sessions. For a small network, where the number of BGP sessions is not a big concern, this design is quite preferable. It is even more reliable than Case (a) which has the maximum number of sessions. The reason is that there is only one signaling path between any two routers in fully meshed IBGP networks, while multiple IBGP signaling paths may exist in Case (d). Thus, the route reflections by the redundant reflectors can avoid some cases of router isolation. For example, if link (C, D) fails, in Case (a), P r[B D] = q, because other routers do not reflect routes between B and D. But,s

in Case (d), the redundant reﬂectors, C and E, both reﬂect routes between B and D, i.e., there are two independent paths from B to D in graph Gs

BD. Therefore, the communication

between B and D is not aﬀected by the failure of (C, D).

However, using more redundant reflectors does not necessarily guarantee higher reliability. Case (e) uses one more reflector and two more sessions than Case (d), but it still performs worse. This is because a reflector can not reflect routes between its redundant reflectors and their clients (due to CLUSTER LIST loop detection), i.e., too many reflectors may make the IBGP signaling paths to be less redundant. In Case (e), there is only one path in graph Gs

AD from A to D. If link (D, E) fails, P r[A s

D] = q. But, in Case (d), two independent paths exist, because both A and D are clients and they can exchange routes via reﬂector C and E. Therefore, if link (D, E) fails, P r[A D] = 0.s

Using redundant sessions between clients of the same cluster can also improve reliability. Based on Case (c), we introduce one more session between node B and node C in Case (f). This improves ECL slightly, because the number of independent signaling paths between B and C increases. Case (g) even constructs a full mesh among all clients, and it is most reliable among all these IBGP networks. In addition, in some scenario, using many redundant sessions among clients can not improve ECL signiﬁcantly. For example, Case (h) only obtains very slightly smaller ECL than Case (d), thus these three additional sessions are not worthwhile.

Summary: This case study shows the pros and cons of the three proposed metrics in terms of characterizing the reliability of IBGP network. Furthermore, it gives some intuitions about optimizing IBGP networks for reliability: (1) The traditional full mesh IBGP network is not the most reliable solution, and we can make IBGP networks more reliable by introducing redundant reflectors and sessions appropriately, without incurring much additional overhead; (2) Redundant reflectors can improve BGP network reliability, but they have to be used appropriately, because too many redundant reflectors may decrease IBGP robustness.

In document Resilient Interdomain Routing with BGP - Protocols and Reliability Engineering (Page 110-115)