Chapter 4 BGP Session Lifetime Modeling in Congested Networks
5.4 Reliability Calculation of IBGP Networks
5.4.2 Case Studies functional reliability analysis
We perform a functional reliability analysis on eight IBGP networks which are overlaid on top of the same IP network. The functional reliability analysis means to analyze the reliability of the IBGP network in which the failure probabilities of all components (including IBGP
sessions) are the same. Let us denote r as the happening probability of each failure scenario and denote q as the conditional failure probability of the influenced IBGP sessions in all network failure scenarios. We only consider failures of single IP link or single router in the following analysis.
Table 5.2 shows eight IBGP reflection networks and their reliability metrics P, Ls and Lc. Solid lines represent IP links, dotted lines represent IBGP sessions, and the shaded nodes
represent reflectors. We define a notation βk = 1−r+r(1−q)k to simplify the representation of IFP P. G, Gr (a) B E C D A (b) B E C D A (c) B E C D A (d) B E C D A IF P : P 1− (1 − r)5β3 3β22β1 1− (1 − r)5β14 1− (1 − r)5β2β13 1− (1 − r)5β23β13 ESL : Ls 10+9q5 r (2 + q)r 4+3q2 r 14+11q7 r ECL : Lc r 10(20 + 18q) 10r(28 + 18q) 10r(26 + 23q− q 2) r 10(20 + 11q + q 2) G, Gr (e) B E C D A (f) B E C D A (g) B E C D A (h) B E C D A IF P : P 1− (1 − r)5β2 3β22β12 1− (1 − r)5β2β14 1− (1 − r)5β33β22β1 1− (1 − r)5β33β22β1 ESL : Ls 18+15q9 r 10+7q5 r 10+9q5 r 10+9q5 r ECL : Lc 10r(20 + 15q) 10r(25 + 19q− q2) r 10(20 + 7q + 3q 2) r 10(20 + 11q + q 3)
Table 5.2: IFP, ESL and ECL comparisons of IBGP route reflection networks. Only single router or single link failures are considered. r = rs, the probability of a failure scenario. q = qs, the failure probability of IBGP sessions in failure scenarios (Equation 5.1). βk = 1− r + r(1 − q)k.
Using Lc as an example, we show the computation of reliability metrics as follows. In Case (c), if E fails, Lc(s) = 1010, because all routers are definitely isolated; if A fails, A is isolated and B loses contact with others with probability q, so Lc(s) = 4+3q10 ; etc. By combiningLc(s) of all network failure scenarios, we obtain the connectivity loss of Case (c):
0 0.2 0.4 0.6 0.8 1 2 2.5 3 3.5 4 4.5 5
Expected connectivity loss (
Lc
)
BGP session failure probability ( q )
(a) (b) (c) (d) (e) (f) (g) (h) ×r
Figure 5.3: The comparison of expected connectivity loss for the case studies in 5.4.2. Lc = 10r (26 + 23q− q
2). The calculations of P and L
s follow Equation 5.2 and 5.4.
Let us compare these eight IBGP reflection topologies in terms of three reliability metrics. It is straightforward to see that 1− r ≤ βk ≤ 1 and βk ≥ βk+1. Thus we can establish the order of the P’s for the eight cases as follows.
IFP P : (b) ≤ (c) ≤ (f) ≤ (d) ≤ (e) ≤ (a) = (g) = (h)
Similarly, we can obtain the order for Ls’s.
ESL Ls : (b)≤ (f) ≤ (c) ≤ (d) ≤ (e) ≤ (a) = (g) = (h)
The above orders of P and Ls hold for any q. However, the order of Lc is slightly influenced by the specific value of q and the order is shown in Fig. 5.3.
With respect to IBGP failure probability, Case (b) is the best network, because it has the minimum number of IBGP sessions and covers the minimum number of IP links. By introducing as few as possible unreliable components into the IBGP networks, Case (b) achieves the optimum IFP. Thus, optimizing IFP requires using the smallest number of
IBGP sessions and IP links, and redundant reflectors and sessions are thus not favored. However, in practice, this design strategy does not give us reliable IBGP networks in terms of IBGP function loss. The reason is that the impact of IBGP failures may be significant.
ESL and ECL take into account the impact of IBGP failures. In this case study, it shows that ESL sometimes can not measure the loss of IBGP function appropriately. ESL order is roughly the same as IFP order, except for the sequence of Case (c) and Case (f) swapped, which is due to the redundant session in Case (f). Intuitively, the cases with more redundant elements lead to more robust IBGP networks, but ESL does not reflect this observation. On the other hand, ECL, which models the IBGP function loss more elaborately gives us satisfactory characterization, i.e., cases (b), (c) and (f) with small number of IBGP sessions and reflectors are less robust than other cases (shown in Fig. 5.3). The disadvantage of ECL is that it requires higher computing complexity than ESL. We analyze the reliability of these eight IBGP networks as follows by using ECL.
Case (a) is a traditional full mesh IBGP network with 10 sessions and no route reflector is deployed. Case (b) and (c) are route reflection networks, which have two and one cluster, respectively. Both of them suffer from the single point of failure problem. For example, if E fails in (c), all routers are isolated. Thus, cases (b) and (c) are less reliable than Case (a). There are two ways to increase IBGP network reliability: using redundant reflectors and adding redundant IBGP sessions between clients.
Case (d) uses two reflectors in one cluster. It is much more resilient than Case (c), due to the redundant reflectors and 3 additional sessions. For a small network, where the number of BGP sessions is not a big concern, this design is quite preferable. It is even more reliable than Case (a) which has the maximum number of sessions. The reason is that there is only one signaling path between any two routers in fully meshed IBGP networks, while multiple IBGP signaling paths may exist in Case (d). Thus, the route reflections by the redundant reflectors can avoid some cases of router isolation. For example, if link (C, D) fails, in Case (a), P r[B D] = q, because other routers do not reflect routes between B and D. But,s
in Case (d), the redundant reflectors, C and E, both reflect routes between B and D, i.e., there are two independent paths from B to D in graph Gs
BD. Therefore, the communication
between B and D is not affected by the failure of (C, D).
However, using more redundant reflectors does not necessarily guarantee higher reliability. Case (e) uses one more reflector and two more sessions than Case (d), but it still performs worse. This is because a reflector can not reflect routes between its redundant reflectors and their clients (due to CLUSTER LIST loop detection), i.e., too many reflectors may make the IBGP signaling paths to be less redundant. In Case (e), there is only one path in graph Gs
AD from A to D. If link (D, E) fails, P r[A s
D] = q. But, in Case (d), two independent paths exist, because both A and D are clients and they can exchange routes via reflector C and E. Therefore, if link (D, E) fails, P r[A D] = 0.s
Using redundant sessions between clients of the same cluster can also improve reliability. Based on Case (c), we introduce one more session between node B and node C in Case (f). This improves ECL slightly, because the number of independent signaling paths between B and C increases. Case (g) even constructs a full mesh among all clients, and it is most reliable among all these IBGP networks. In addition, in some scenario, using many redun- dant sessions among clients can not improve ECL significantly. For example, Case (h) only obtains very slightly smaller ECL than Case (d), thus these three additional sessions are not worthwhile.
Summary: This case study shows the pros and cons of the three proposed metrics in terms of characterizing the reliability of IBGP network. Furthermore, it gives some intuitions about optimizing IBGP networks for reliability: (1) The traditional full mesh IBGP network is not the most reliable solution, and we can make IBGP networks more reliable by introducing redundant reflectors and sessions appropriately, without incurring much additional overhead; (2) Redundant reflectors can improve BGP network reliability, but they have to be used appropriately, because too many redundant reflectors may decrease IBGP robustness.