D
ATA
C
ENTER
D
ESIGN
C
ASE
S
TUDIES
F
ROM
DMVPN
AND
WAN
E
DGE TO
S
ERVER
D
ATA
C
ENTER
D
ESIGN
C
ASE
S
TUDIES
Ivan Pepelnjak, CCIE#1354 EmeritusCopyright © 2014 ipSpace.net AG Fifth revision, November 2014
Added Replacing the Central Firewall chapter
Added Combine Physical and Virtual Appliances in a Private Cloud chapter
Added Scale-Out Private Cloud Infrastructure chapter
Added Simplify Workload Migration with Virtual Appliances chapter
W
ARNING AND
D
ISCLAIMER
This book is designed to provide information about real-life data center design scenarios. Every effort has been made to make this book as complete and as accurate as possible, but no warranty or fitness is implied.
The information is provided on an “as is” basis. The authors, and ipSpace.net shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book.
C
ONTENT AT A
G
LANCE
F
OREWORD...
XVI
NTRODUCTION...
XVII1
BGP
C
ONVERGENCEO
PTIMIZATION... 1-1
2
I
NTEGRATINGI
NTERNETVPN
WITHMPLS/VPN
WAN ... 2-1
3
BGP
R
OUTING INDMVPN
A
CCESSN
ETWORK... 3-1
4
R
EDUNDANTD
ATAC
ENTERI
NTERNETC
ONNECTIVITY... 4-1
5
E
XTERNALR
OUTING WITHL
AYER-2
D
ATAC
ENTERI
NTERCONNECT... 5-1
6
D
ESIGNING AP
RIVATEC
LOUDN
ETWORKI
NFRASTRUCTURE... 6-1
7
R
EDUNDANTS
ERVER-
TO-N
ETWORKC
ONNECTIVITY... 7-1
8
R
EPLACING THEC
ENTRALF
IREWALL... 8-1
9
C
OMBINEP
HYSICAL ANDV
IRTUALA
PPLIANCES IN AP
RIVATEC
LOUD... 9-1
10
H
IGH-S
PEEDM
ULTI-T
ENANTI
SOLATION... 10-1
11
S
CALE-O
UTP
RIVATEC
LOUDI
NFRASTRUCTURE... 11-1
C
ONTENTS
F
OREWORD...
XVI
NTRODUCTION...
XVII1
BGP
C
ONVERGENCEO
PTIMIZATION... 1-1
B
RIEFN
ETWORKD
ESCRIPTION... 1-3
S
OLUTION–
E
XECUTIVEO
VERVIEW... 1-3
D
ETAILEDS
OLUTION... 1-5
C
ONCLUSIONS... 1-11
2
I
NTEGRATINGI
NTERNETVPN
WITHMPLS/VPN
WAN ... 2-1
IP
R
OUTINGO
VERVIEW... 2-5
D
ESIGNR
EQUIREMENTS... 2-8
S
OLUTIONO
VERVIEW... 2-9
OSPF
AS THEI
NTERNETVPN
R
OUTINGP
ROTOCOL... 2-10
BGP-B
ASEDWAN
N
ETWORKD
ESIGN ANDI
MPLEMENTATIONG
UIDANCE... 2-24
C
ONCLUSIONS... 2-31
3
BGP
R
OUTING INDMVPN
A
CCESSN
ETWORK... 3-1
E
XISTINGIP
R
OUTINGO
VERVIEW... 3-4
IBGP
V
ERSUSEBGP ... 3-6
U
SINGEBGP
IN ADMVPN
N
ETWORK... 3-10
U
SINGIBGP
IN ADMVPN
N
ETWORK... 3-22
D
ESIGNR
ECOMMENDATIONS... 3-26
4
R
EDUNDANTD
ATAC
ENTERI
NTERNETC
ONNECTIVITY... 4-1
S
IMPLIFIEDT
OPOLOGY... 4-4
IP
A
DDRESSING ANDR
OUTING... 4-5
D
ESIGNR
EQUIREMENTS... 4-7
S
OLUTIONO
VERVIEW... 4-8
L
AYER-2
WAN
B
ACKBONE... 4-10
L
AYER-3
WAN
B
ACKBONE... 4-21
5
E
XTERNALR
OUTING WITHL
AYER-2
D
ATAC
ENTERI
NTERCONNECT... 5-1
D
ESIGNR
EQUIREMENTS... 5-7
S
OLUTIONO
VERVIEW... 5-8
D
ETAILEDS
OLUTION–
OSPF ... 5-11
D
ETAILEDS
OLUTION–
I
NTERNETR
OUTINGW
ITHBGP ... 5-14
C
ONCLUSIONS... 5-23
6
D
ESIGNING AP
RIVATEC
LOUDN
ETWORKI
NFRASTRUCTURE... 6-1
C
OLLECT THER
EQUIREMENTS... 6-2
P
RIVATEC
LOUDP
LANNING ANDD
ESIGNP
ROCESS... 6-5
D
ESIGNING THEN
ETWORKI
NFRASTRUCTURE... 6-8
C
ONCLUSIONS... 6-10
7
R
EDUNDANTS
ERVER-
TO-N
ETWORKC
ONNECTIVITY... 7-1
D
ESIGNR
EQUIREMENTS... 7-3
VLAN-B
ASEDV
IRTUALN
ETWORKS... 7-3
O
VERLAYV
IRTUALN
ETWORKS... 7-13
8
R
EPLACING THEC
ENTRALF
IREWALL... 8-1
F
ROMP
ACKETF
ILTERS TOS
TATEFULF
IREWALLS... 8-2
D
ESIGNE
LEMENTS... 8-5
D
ESIGNO
PTIONS... 8-17
B
EYOND THET
ECHNOLOGYC
HANGES... 8-18
9
C
OMBINEP
HYSICAL ANDV
IRTUALA
PPLIANCES IN AP
RIVATEC
LOUD... 9-1
E
XISTINGN
ETWORKS
ERVICESD
ESIGN... 9-2
S
ECURITYR
EQUIREMENTS... 9-5
P
RIVATEC
LOUDI
NFRASTRUCTURE... 9-6
N
ETWORKS
ERVICESI
MPLEMENTATIONO
PTIONS... 9-7
T
HER
EALITYI
NTERVENES... 9-10
10
H
IGH-S
PEEDM
ULTI-T
ENANTI
SOLATION... 10-1
I
NTERACTION WITH THEP
ROVISIONINGS
YSTEM... 10-3
C
OMMUNICATIONP
ATTERNS... 10-5
S
TATELESS ORS
TATEFULT
RAFFICF
ILTERS? ... 10-6
P
ACKETF
ILTERS ON X86-B
ASEDA
PPLIANCES... 10-9
C
ONCLUSIONS... 10-17
11
S
CALE-O
UTP
RIVATEC
LOUDI
NFRASTRUCTURE... 11-1
C
LOUDI
NFRASTRUCTUREF
AILURED
OMAINS... 11-4
W
ORKLOADM
OBILITYC
ONSIDERATIONS... 11-14
C
ONCLUSIONS... 11-20
12
S
IMPLIFYW
ORKLOADM
IGRATION WITHV
IRTUALA
PPLIANCES... 12-1
E
XISTINGA
PPLICATIONW
ORKLOADS... 12-2
I
NFRASTRUCTUREC
HALLENGES... 12-5
I
NCREASEW
ORKLOADM
OBILITY WITHV
IRTUALA
PPLIANCES... 12-6
B
UILDING AN
EXTG
ENERATIONI
NFRASTRUCTURE... 12-9
O
RCHESTRATIONC
HALLENGES... 12-13
T
ABLE OF
F
IGURES
Figure 1-1: Network core and Internet edge ... 1-2 Figure 2-1: Existing MPLS VPN WAN network topology ... 2-3 Figure 2-2: Proposed new network topology ... 2-4 Figure 2-3: OSPF areas ... 2-5 Figure 2-4: OSPF-to-BGP route redistribution ... 2-6 Figure 2-5: Inter-site OSPF route advertisements ... 2-7 Figure 2-6: DMVPN topology ... 2-9 Figure 2-7: OSPF areas in the Internet VPN ... 2-11 Figure 2-8: OSPF external route origination ... 2-12 Figure 2-9: Multiple OSPF processes with two-way redistribution ... 2-13 Figure 2-10: BGP sessions in the WAN infrastructure ... 2-15 Figure 2-11: Single AS number used on all remote sites ... 2-19 Figure 2-12: BGP enabled on every layer-3 device between two BGP routers ... 2-21 Figure 2-13: BGP routing information redistributed into OSPF ... 2-22 Figure 2-14: Dedicated VLAN between BGP edge routers ... 2-23 Figure 2-15: Remote site logical network topology and routing ... 2-27
Figure 2-16: Central site logical network topology and BGP+OSPF routing ... 2-29 Figure 3-1: Planned DMVPN network ... 3-3 Figure 3-2: BGP routing in existing WAN backbone ... 3-5 Figure 4-1: Redundant data centers and their internet connectivity ... 4-3 Figure 4-2: Simplified topology with non-redundant internal components ... 4-4 Figure 4-3: BGP sessions between Internet edge routers and the ISPs. ... 4-6 Figure 4-4: Outside WAN backbone in the redesigned network ... 4-9 Figure 4-5: Point-to-point Ethernet links implemented with EoMPLS on DCI routers ... 4-12 Figure 4-6: Single stretched VLAN implemented with VPLS across L3 DCI ... 4-13 Figure 4-7: Two non-redundant stretched VLANs provide sufficient end-to-end redundancy ... 4-14 Figure 4-8: Virtual topology using point-to-point links ... 4-15 Figure 4-9: Virtual topology using stretched VLANs ... 4-16 Figure 4-10: Full mesh of IBGP sessions between Internet edge routers ... 4-17 Figure 4-11: Virtual Device Contexts: dedicated management planes and physical interfaces ... 4-22 Figure 4-12: Virtual Routing and Forwarding tables: shared management, shared physical
interfaces ... 4-23
Figure 4-13: BGP core in WAN backbone ... 4-24 Figure 4-14: MPLS core in WAN backbone ... 4-25
Figure 4-15: Default routing in WAN backbone ... 4-27 Figure 5-1: Redundant data centers and their internet connectivity ... 5-3 Figure 5-2: IP addressing and routing with external networks ... 5-4 Figure 5-3: Simplified topology with non-redundant components ... 5-6 Figure 5-4: Primary/backup external routing ... 5-9 Figure 5-5: OSPF routing used in enterprise WAN network ... 5-12 Figure 5-6: EBGP and IBGP sessions on data center edge routers ... 5-15 Figure 5-7: BGP local preference in prefix origination and propagation ... 5-17 Figure 5-8: BGP next hop processing ... 5-18 Figure 7-1: Redundant server-to-network connectivity ... 7-2 Figure 7-2: Layer-2 fabric with two spine nodes ... 7-4 Figure 7-3: Layer-2 leaf-and-spine fabric using layer-2 ECMP technology ... 7-4 Figure 7-4: VMs pinned to a hypervisor uplink ... 7-5 Figure 7-5: Server-to-network links bundled in a single LAG ... 7-6 Figure 7-6: VM-to-uplink pinning with two hypervisor hosts connected to the same pair of ToR
switches ... 7-7
Figure 7-7: Suboptimal traffic flow with VM-to-uplink pinning ... 7-8 Figure 7-8: Traffic flow between orphan ports ... 7-9
Figure 7-9: LACP between a server and ToR switches ... 7-11 Figure 7-10: Optimal traffic flow with MLAG ... 7-12 Figure 7-11: Redundant server connectivity requires the same IP subnet on adjacent ToR switches
... 7-13
Figure 7-12: A single uplink is used without server-to-ToR LAG ... 7-15 Figure 7-13: All uplinks are used by a Linux host using balance-tlb bonding mode ... 7-16 Figure 7-14: All ToR switches advertise IP subnets with the same cost ... 7-17 Figure 7-15: IP routing with stackable switches ... 7-18 Figure 7-16: Layer-2 fabric between hypervisor hosts ... 7-20 Figure 7-17: Optimal flow of balance-tlb traffic across a layer-2 fabric ... 7-21 Figure 7-18: LAG between a server and adjacent ToR switches ... 7-22 Figure 8-19: Packet filters protecting individual servers ... 8-6 Figure 8-20: VM NIC firewalls ... 8-9 Figure 8-21: Per-application firewalls ... 8-12 Figure 8-22: High-performance WAN edge packet filters combined with a proxy server ... 8-15 Figure 9-1: Centralized network services implemented with physical appliances ... 9-3 Figure 9-2: Centralized network services implemented with physical appliances ... 9-4 Figure 9-3: Applications accessing external resources ... 9-5
Figure 9-4: Hybrid architecture combining physical and virtual appliances ... 9-11 Figure 10-1: Containers and data center backbone ... 10-2 Figure 10-2: Interaction with the provisioning/orchestration system ... 10-4 Figure 10-3: Traffic control appliances ... 10-10 Figure 10-4: Layer-3 traffic control devices ... 10-12 Figure 10-5: Bump-in-the-wire traffic control devices ... 10-13 Figure 10-6: Routing protocol adjacencies across traffic control appliances ... 10-14 Figure 11-1: Standard cloud infrastructure rack ... 11-2 Figure 11-2: Planned WAN connectivity ... 11-3 Figure 11-3: Cloud infrastructure components ... 11-5 Figure 11-4: Single orchestration system used to manage multiple racks ... 11-9 Figure 11-5: VLAN transport across IP infrastructure ... 11-13 Figure 12-1: Some applications use application-level load balancing solutions ... 12-3 Figure 12-2: Typical workload architecture with network services embedded in the application stack
... 12-3
Figure 12-3: Most applications use external services ... 12-4 Figure 12-4: Application tiers are connected through central physical appliances ... 12-5 Figure 12-5: Virtual appliance NIC connected to overlay virtual network ... 12-8
F
OREWORD
Ivan Pepelnjak first came onto my radar in 2001, when I was tasked with migrating a large
multinational network from IGRP to EIGRP. As a CCIE I was (over)confident in my EIGRP abilities. I had already deployed EIGRP for a smaller organization; how different could this new challenge be? A few months into the project, I realized that designing a large-scale EIGRP network was quite
different from configuring a small one. Fortunately I stumbled across Ivan’s EIGRP Network Design Solutions book. So began a cycle which continues to this day – I take on a new project, look for a definitive resource to understand the technologies, and discover that Ivan is the authoritative source. MPLS, L3VPN, IS-IS… Ivan has covered it all!
Several years ago I was lucky enough to meet Ivan in person through my affiliation with Gestalt IT’s Tech Field Day program. We also ‘shared the mic’ via the Packet Pushers Podcast on several
occasions. Through these opportunities I discovered Ivan to be a remarkably thoughtful collaborator. He has a knack for asking the exact right question to direct your focus to the specific information you need. Some of my favorite interactions with Ivan center on his answering my ‘could I do this?’ inquiry with a ‘yes, it is possible, but you don’t want to do that because…’ response. For a great example of this, take a look at “OSPF as the Internet VPN Routing Protocol” section in chapter 2 of this book.
I have found during my career as a network technology instructor that the case studies are the best method for teaching network design. Presenting an actual network challenge and explaining the thought process (including rejected solutions) greatly assists students in building the required skill base to create their own scalable designs. This book uses this structure to explain diverse Enterprise design challenges, from DMVPN to Data Centers to Internet routing. Over the next few hours of
reading you will accompany Ivan on many real-world consulting assignments. You have the option of implementing the designs as presented (I can assure you they work ‘out of the box’!), or you can use the rich collection of footnotes and references to customize the solution to your exact needs. In either event, I am confident that you will find these case studies as useful as I have found them to be.
Jeremy Filliben
Network Architect / Trainer CCDE#20090003, CCIE# 3851
I
NTRODUCTION
I started the ExpertExpress experiment a few years ago and it was unexpectedly successful; I was amazed at how many people decided to ask me to help design or troubleshoot their network. Most of the engagements touched at least one data center element, be it server virtualization, data center network core, WAN edge, or connectivity between data centers and customer sites or public Internet. I also noticed the same challenges appearing over and over, and decided to document them in a series of ExpertExpress case studies, which eventually resulted in this book.
The book has two major parts: data center WAN edge and WAN connectivity, and internal data center infrastructure.
In the first part, I’ll walk you through common data center WAN edge challenges:
Optimizing BGP routing on data center WAN edge routers to reduce the downtime and brownouts following link or node failures (chapter 1);
Integrating MPLS/VPN network provided by one or more service providers with DMVPN-over-Internet backup network (chapter 2);
Building large-scale DMVPN network connecting one or more data centers with thousands of remote sites (chapter 3);
Implementing redundant data center connectivity and routing between active/active data centers and the outside world (chapter 4);
The data center infrastructure part of the book covers these topics:
Designing a private cloud network infrastructure (chapter 6);
Redundant server-to-network connectivity (chapter 7);
Replacing the central firewall with a scale-out architecture combining packet filters, virtual inter-subnet firewalls and VM NIC firewalls (chapter 8);
Combining physical and virtual appliances in a private cloud (chapter 9);
High-speed multi-tenant isolation (chapter 10);
The final part of the book covers scale-out architectures, multiple data centers and disaster recovery:
Scale-out private cloud infrastructure using standardized building blocks (chapter 11);
Simplified workload migration and disaster recovery with virtual appliances (chapter 12);
Active-active data centers and scale-out application architectures (chapter 13 – coming in late 2014);
I hope you’ll find the selected case studies useful. Should you have any follow-up questions, please feel free to send me an email (or use the contact form @ ipSpace.net/Contact); I’m also available for short online consulting engagements.
Happy reading! Ivan Pepelnjak September 2014
1
BGP
C
ONVERGENCE
O
PTIMIZATION
I
N THIS CHAPTER:
B
RIEFN
ETWORKD
ESCRIPTIONS
OLUTION–
E
XECUTIVEO
VERVIEWD
ETAILEDS
OLUTIONE
NABLEBFD
E
NABLEBGP
N
EXTH
OPT
RACKINGR
EDUCE THEBGP
U
PDATET
IMERSR
EDUCE THEN
UMBER OFBGP
P
REFIXESBGP
P
REFIXI
NDEPENDENTC
ONVERGENCEA large multi-homed content provider has experienced a number of outages and brownouts in the Internet edge of their data center network. The brownouts were caused by high CPU load on the Internet edge routers, leading to unstable forwarding tables and packet loss after EBGP peering session loss.
This document describes the steps the customer could take to improve the BGP convergence and reduce the duration of Internet connectivity brownouts.
B
RIEF
N
ETWORK
D
ESCRIPTION
The customer’s data center has two Internet-facing edge routers, each of them connected to a different ISP through a 1GE uplink. Both routers are dual-attached to core switches (see Figure 1-1). ISP-A is the primary ISP; connection to ISP-B is used only when the uplink to ISP-A fails. Edge routers (GW-A and GW-B) have EBGP sessions with ISPs and receive full Internet routing (~450.000 BGP prefixes1). GW-A and GW-B exchange BGP routes over an IBGP session to ensure
consistent forwarding behavior. GW-A has higher default local preference; GW-B thus always prefers IBGP routes received from GW-A over EBGP routes.
Core routers (Core-1 and Core-2) don’t run BGP; they run OSPF with GW-A and GW-B, and receive default route from both Internet edge routes (the details of default route origination are out of scope).
S
OLUTION
–
E
XECUTIVE
O
VERVIEW
The temporary blackout and prolonged brownouts following an Internet uplink loss are caused by BGP convergence issues. Like with any other routing protocol, a router running BGP has to take the following steps to adapt the forwarding tables to link or neighbor loss:
1. BGP routing process detects a link or neighbor loss.
2. Invalid routes are removed from the local BGP, routing and forwarding tables. Alternate routes already present in BGP table could be installed at this point.
1 BGP Routing Table Analysis Reports
3. Updates are sent to other BGP neighbors withdrawing the lost routes.
4. BGP neighbors process the withdrawal updates, select alternate BGP best routes, and install them in their routing and forwarding tables.
5. BGP neighbors advertise their new best routes.
6. The router processes incoming BGP updates, selects new best routes, and installs them in routing and forwarding tables.
Neighbor loss detection can be improved with Bidirectional Forwarding Detection (BFD)2, fast
neighbor failover3 or BGP next-hop tracking. BGP update propagation can be fine-tuned with BGP
update timers. The other elements of the BGP convergence process are harder to tune; they depend primarily on the processing power of routers’ CPU, and the underlying packet forwarding hardware. Some router vendors offer functionality that can be used to pre-install backup paths in BGP tables (BGP best external paths) and forwarding tables (BGP Prefix Independent Convergence4). These
features can be used to redirect the traffic to the backup Internet connection even before the BGP convergence process is complete.
Alternatively, you can significantly reduce the CPU load of the Internet edge routes, and improve the BGP convergence time, by reducing the number of BGP prefixes accepted from the upstream ISPs.
2 Bidirectional Forwarding Detection
http://wiki.nil.com/Bidirectional_Forwarding_Detection_(BFD)
3 Fast BGP Neighbor Loss Detection
http://wiki.nil.com/Fast_BGP_neighbor_loss_detection
4 Prefix Independent Convergence – Fixing the FIB Bottleneck
Finally, you might need to replace your Internet edge routers with devices that have processing power matching today’s Internet routing table sizes.
D
ETAILED
S
OLUTION
The following design or configuration changes can be made to improve BGP convergence process:
Enable BFD on EBGP sessions
Enable BFD on IBGP sessions
Enable BGP next-hop tracking
Reduce the BGP update timers
Reduce the number of EBGP prefixes
Enable BGP Prefix Independent Convergence (if available).
Design and configuration changes described in this document might be disruptive and might result in temporary or long-term outages. Always prepare a deployment and rollback plan, and change your network configuration during a maintenance window. You can use the ExpertExpress service for a design/deployment check, design review, or a second opinion.
E
NABLE
BFD
Bidirectional Forwarding Detection (BFD) has been available in major Cisco IOS and Junos software releases for several years. Service providers prefer BFD over BGP hold time adjustments because the high-end routers process BFD on the linecard, whereas BGP hold timer relies on BGP process (running on the main CPU) sending keepalive packets over BGP TCP session.
BFD has to be supported and configured on both ends of a BGP session; check with your ISP before configuring BFD on your Internet-facing routers.
To configure BFD with BGP, use the following configuration commands on Cisco IOS: interface <uplink>
bfd interval <timer> min_rx <timer> multiplier <n> !
router bgp 65000
neighbor <ip> remote-as <ISP-AS> neighbor <ip> fall-over bfd
Although you can configure BFD timers in milliseconds range, don’t set them too low. BFD should detect a BGP neighbor loss in a few seconds; you wouldn’t want a short-term link glitch to start CPU-intensive BGP convergence process.
Cisco IOS and Junos support BFD on EBGP sessions. BFD on IBGP sessions is available Junos release 8.3. Multihop BFD is available in Cisco IOS, but there’s still no support for BFD on IBGP sessions.
E
NABLE
BGP
N
EXT
H
OP
T
RACKING
BGP next hop tracking removes routes from BGP table (and subsequently IP routing table and forwarding table) a few seconds after the BGP next hop becomes unreachable.
BGP next hop tracking deployed on GW-B could trigger the BGP best path selection even before GW-B starts receiving BGP withdrawn routes update messages from GW-A.
BGP next-hop tracking is enabled by default on Cisco IOS; you can adjust the tracking interval with the bgp nexthop trigger delay router configuration command.
In environments using default routing, you should limit the valid prefixes that can be used for BGP next hop tracking with the bgp nexthop route-map router configuration command.
If you want to use BGP next hop tracking in the primary/backup Internet access scenario described in this document:
Do not change the BGP next hop on IBGP updates with neighbor next-hop-self router
configuration command. Example: routes advertised from GW-A to GW-B must have the original next-hop from the ISP-A router.
Advertise IP subnets of ISP uplinks into IGP (example: OSPF) from GW-A and GW-B.
Use a route-map with BGP next hop tracking to prevent the default route advertised by GW-A and GW-B from being used as a valid path toward external BGP next hop.
When the link between GW-A and ISP-A fails, GW-A revokes the directly-connected IP subnet from its OSPF LSA, enabling GW-B to start BGP best path selection process before it receives BGP updates from GW-A.
BGP next-hop tracking detects link failures that result in loss of IP subnet. It cannot detect EBGP neighbor failure unless you combine it with BFD-based static routes.
R
EDUCE THE
BGP
U
PDATE
T
IMERS
BGP update timers (the interval between consecutive BGP updates) are configured for individual neighbors, peer groups, or peer templates. The default IBGP value used by Cisco IOS was 5 seconds (updates were sent to IBGP neighbors every 5 seconds). This value was reduced to zero (updates are sent immediately) in Cisco IOS releases 12.2SR, 12.4T and 15.0.
BGP update timers adjustment should be one of the last steps in the convergence tuning process; in most scenarios you’ll gain more by reducing the number of BGP prefixes in accepted by the Internet edge routers.
R
EDUCE THE
N
UMBER OF
BGP
P
REFIXES
Global Internet routing tables contain almost 450.000 prefixes, most of them irrelevant to content providers with localized content. Reducing the number of BGP prefixes in the BGP table can
significantly reduce the CPU load after a link or neighbor loss, and thus drastically improve BGP convergence time.
This solution is ideal if one could guarantee that all upstream providers always have visibility of all Internet destinations. In case of a peering dispute, that might not be true, and your network might potentially lose connectivity to some far-away destinations.
It’s impossible to document a generic BGP prefix filtering policy. You should always accept prefixes originated by upstream ISPs, their customers, and their peering partners. In most cases, filters based on AS-path lengths work well (example: accept prefixes that have no more than three distinct
AS numbers in the AS path). Some ISPs attach BGP communities to BGP prefixes they advertise to their customers to help the customers implement well-tuned filters5.
When building an AS-path filter, consider the impact of AS path prepending on your AS-path filter and use regular expressions that can match the same AS number multiple times6.
Example: matching up to three AS numbers in the AS path might not be good enough, as another AS might use AS-path prepending to enforce primary/backup path selection7.
After deploying inbound BGP update filters, your autonomous system no longer belongs to the default-free zone8 – your Internet edge routers need default routes from the upstream ISPs to reach
destinations that are no longer present in their BGP tables.
BGP default routes could be advertised by upstream ISPs, requiring no further configuration on the Internet edge routers.
5 BGP Community Guides
http://onesc.net/communities/
6 Filter Excessively-Prepended AS paths
http://wiki.nil.com/Filter_excessively_prepended_BGP_paths
7 BGP Essentials: AS Path Prepending
http://blog.ipspace.net/2008/02/bgp-essentials-as-path-prepending.html
8 Default-free zone
If the upstream ISPs don’t advertise BGP default routes, or if you can’t trust the ISPs to perform responsible default route origination9, use local static default routes pointing to far-away next hops.
Root name servers are usually a suitable choice.
The default routes on the Internet edge routers should use next-hops that are far away to ensure the next hop reachability reflects the health status of upstream ISP’s network. The use of root DNS servers as next hops of static routes does not mean that the traffic will be sent to the root DNS servers, just toward them.
BGP
P
REFIX
I
NDEPENDENT
C
ONVERGENCE
BGP PIC is a feature that allows a router to pre-install alternate routes to BGP destinations in its forwarding table. The drastic changes caused by external link failure or EBGP session failure are thus easier to implement in the forwarding table. Furthermore, the forwarding tables can be changed even before the final route selection is performed in the BGP table.
BGP PIC is a recently-introduced feature that does not necessarily interoperate with all other BGP features one might want to use. Its deployment and applicability are left for further study.
9 Responsible Generation of BGP Default Route
C
ONCLUSIONS
BGP neighbor loss detection can be significantly improved by deploying Bidirectional Forwarding Detection (BFD).
Backup Internet edge router can use BGP next-hop tracking to detect primary uplink loss and adjust its forwarding tables before receiving BGP updates from the primary Internet edge router.
To reduce the CPU overload and slow convergence caused by massive changes in the BGP, routing and forwarding tables following a link or EBGP session failure:
Reduce the number of BGP prefixes accepted by the Internet edge routers;
2
I
NTEGRATING
I
NTERNET
VPN
WITH
MPLS/VPN
WAN
I
N THIS CHAPTER:
IP
R
OUTINGO
VERVIEWD
ESIGNR
EQUIREMENTSS
OLUTIONO
VERVIEWOSPF
AS THEI
NTERNETVPN
R
OUTINGP
ROTOCOLB
ENEFITS ANDD
RAWBACKS OFOSPF
INI
NTERNETVPN
BGP
AS THEI
NTERNETVPN
R
OUTINGP
ROTOCOLIBGP
OREBGP?
A
UTONOMOUSS
YSTEMN
UMBERSI
NTEGRATION WITHL
AYER-3
S
WITCHESBGP-B
ASEDWAN
N
ETWORKD
ESIGN ANDI
MPLEMENTATIONG
UIDANCER
EMOTES
ITESC
ENTRALS
ITEI
NTERNETVPN
R
OUTINGP
OLICYA
DJUSTMENTSA large enterprise (the Customer) has a WAN backbone based on MPLS/VPN service offered by a regional Service Provider (SP). The service provider has deployed Customer Premises Equipment (CPE) routers at remote sites. Customer routers at the central site are connected directly to the SP Provider Edge (PE) routers with 10GE uplinks as shown in Figure 2-1.
Figure 2-1: Existing MPLS VPN WAN network topology
The traffic in the Customer’s WAN network has been increasing steadily prompting the customer to increase the MPLS/VPN bandwidth or to deploy an alternate VPN solution. The Customer decided to trial IPsec VPN over the public Internet, initially as a backup, and potentially as the primary WAN connectivity solution.
The customer will deploy new central site routers to support the IPsec VPN service. These routers will terminate the IPsec VPN tunnels and provide whatever other services are needed (example: QoS, routing protocols) to the IPsec VPNs.
New low-end routers connected to the existing layer-3 switches will be deployed at the remote sites to run the IPsec VPN (Figure 2-2 shows the proposed new network topology).
IP
R
OUTING
O
VERVIEW
The customer is using OSPF as the sole routing protocol and would prefer using OSPF in the new IPsec VPN.
OSPF routes are exchanged between Customer’s core routers and SP’s PE routers, and between Customer’s layer-3 switches and SP’s CPE routers at remote sites. Customer’s central site is in OSPF area 0; all remote sites belong to OSPF area 51.
Figure 2-3: OSPF areas
The only external connectivity remote customer sites have is through the MPLS/VPN SP backbone – the OSPF area number used at those sites is thus irrelevant and the SP chose to use the same OSPF area on all sites to simplify the CPE router provisioning and
CPE routers deployed at Customer’s remote sites act as Customer Edge (CE) routers from MPLS/VPN perspective. The Service Provider uses BGP as the routing protocol between its PE- and CE routers, redistributing BGP routes into OSPF at the CPE routers for further propagation into Customer’s remote sites.
OSPF routes received from the customer equipment (central site routers and remote site layer-3 switches) are redistributed into BGP used by the SP’s MPLS/VPN service, as shown in Figure 2-4.
Figure 2-4: OSPF-to-BGP route redistribution
The CPE routers redistributing remote site OSPF routes into SP’s BGP are not PE routers. The OSPF routes that get redistributed into BGP thus do not have OSPF-specific extended BGP communities, lacking any indication that they came from an OSPF routing process. These routes are therefore redistributed as external OSPF routes into the central site’s OSPF routing process by the SP’s PE routers.
The OSPF routes advertised to the PE routers from the central site get the extended BGP communities when they’re redistributed into MP-BGP, but since the extended VPNv4 BGP
communities don’t propagate to CE routers running BGP with the PE routers, the CPE routers don’t receive the extended communities indicating the central site routes originated as OSPF routes. The CPE routers thus redistribute routes received from other Customer’s sites as external OSPF routes into the OSPF protocol running at remote sites.
Summary: All customer routes appear as external OSPF routes at all other customer sites (see
Figure 2-5 for details).
D
ESIGN
R
EQUIREMENTS
The VPN-over-Internet solution must satisfy the following requirements:
Dynamic routing: the solution must support dynamic routing over the new VPN infrastructure
to ensure fast failover on MPLS/VPN or Internet VPN failures;
Flexible primary/backup configuration: Internet VPN will be used as a backup path until it
has been thoroughly tested. It might become the primary connectivity option in the future;
Optimal traffic flow: Traffic to/from sites reachable only over the Internet VPN (due to local
MPLS/VPN failures) should not traverse the MPLS/VPN infrastructure. Traffic between an MPLS/VPN-only site and an Internet VPN-only site should traverse the central site;
Hub-and-spoke or peer-to-peer topology: Internet VPN will be used in a hub-and-spoke
topology (hub = central site). The topology will be migrated to a peer-to-peer (any-to-any) overlay network when the Internet VPN becomes the primary WAN connectivity solution.
Minimal configuration changes: Deployment of Internet VPN connectivity should not require
major configuration changes in the existing remote site equipment. Central site routers will probably have to be reconfigured to take advantage of the new infrastructure.
Minimal disruption: The introduction of Internet VPN connectivity must not disrupt the
existing WAN network connectivity.
Minimal dependence on MPLS/VPN provider: After the Internet VPN infrastructure has been
established and integrated with the existing MPLS/VPN infrastructure (which might require configuration changes on the SP-managed CPE routers), the changes in the traffic flow must not require any intervention on the SP-managed CPE routers.
S
OLUTION
O
VERVIEW
Internet VPN will be implemented with the DMVPN technology to meet the future requirements of peer-to-peer topology. Each central site router will be a hub router in its own DMVPN subnet (one hub router per DMVPN subnet), with the remote site routers having two DMVPN tunnels (one for each central site hub router) as shown in Figure 2-6.
Figure 2-6: DMVPN topology
Please refer to the DMVPN: From Basics to Scalable Networks and DMVPN Designs webinars for more DMVPN details. This case study focuses on the routing protocol design
The new VPN infrastructure could use OSPF or BGP routing protocol. The Customer would prefer to use OSPF, but the design requirements and the specifics of existing MPLS/VPN WAN infrastructure make OSPF deployment exceedingly complex.
Using BGP as the Internet VPN routing protocol would introduce a new routing protocol in the Customer’s network. While the network designers and operations engineers would have to master a new technology (on top of DMVPN) before production deployment of the Internet VPN, the reduced complexity of BGP-only WAN design more than offsets that investment.
OSPF
AS THE
I
NTERNET
VPN
R
OUTING
P
ROTOCOL
A network designer would encounter major challenges when trying to use OSPF as the Internet VPN routing protocol:
1. Routes received through MPLS/VPN infrastructure are inserted as external OSPF routes into the intra-site OSPF routing protocol. Routes received through Internet VPN infrastructure must be worse than the MPLS/VPN-derived OSPF routes, requiring them to be external routes as well.
2. MPLS/VPN- and Internet VPN routers must use the same OSPF external route type to enable easy migration of the Internet VPN from backup to primary connectivity solution. The only difference between the two sets of routes should be their OSPF metric.
3. Multiple sites must not be in the same area. The OSPF routing process would prefer intra-area routes (over Internet VPN infrastructure) to MPLS/VPN routes in a design with multiple sites in the same area.
4. Even though each site must be at least an independent OSPF area, every site must use the same OSPF area number to preserve the existing intra-site routing protocol configuration.
Challenges #3 and #4 significantly limit the OSPF area design options. Remote site OSPF areas cannot extend to the Internet VPN hub router – the hub router would automatically merge multiple remote sites into the same OSPF area. Every remote site router must therefore be an Area Border Router (ABR) or Autonomous System Border Router (ASBR). The only design left is an OSPF backbone area spanning the whole Internet VPN.
Figure 2-7: OSPF areas in the Internet VPN
The requirement to advertise site routes as external OSPF routes further limits the design options. While the requirements could be met by remote site and core site layer-3 switches advertising directly connected subnet (server and client subnets) as external OSPF routes (as shown in Figure 2-8), such a design requires configuration changes on subnet-originating switch whenever you want to adjust the WAN traffic flow (which can only be triggered by changes in OSPF metrics).
Figure 2-8: OSPF external route origination
The only OSPF design that would meet the OSPF constraints listed above and the design
requirements (particularly the minimal configuration changes and minimal disruption requirements) is a design displayed in Figure 2-9 where:
Every site runs an independent copy of the OSPF routing protocol; Internet VPN WAN network runs a separate OSPF process;
Internet VPN edge routers perform two-way redistribution between intra-site OSPF process and Internet VPN OSPF process.
Figure 2-9: Multiple OSPF processes with two-way redistribution
B
ENEFITS AND
D
RAWBACKS OF
OSPF
IN
I
NTERNET
VPN
There’s a single benefit of running OSPF over the Internet VPN: familiarity with an existing routing protocol and mastery of configuration and troubleshooting procedures.
The drawbacks are also exceedingly clear: the only design that meets all the requirements is complex as it requires multiple OSPF routing processes and parallel two-way redistribution (site-to-MPLS/VPN and site-to-Internet VPN) between multiple routing domains.
It’s definitely possible to get such a design implemented with safety measures that would prevent redistribution (and traffic forwarding) loops, but it’s definitely not an error-resilient design – minor configuration changes or omissions could result in network-wide failures.
BGP
AS THE
I
NTERNET
VPN
R
OUTING
P
ROTOCOL
BGP-only WAN network design extends the existing BGP routing protocol running within the Service Provider’s MPLS/VPN network and between the PE- and CPE routers to all WAN routers. As shown in
Figure 2-10 BGP sessions would be established between: Remote site CPE routers and adjacent Internet VPN routers;
Central site WAN edge routers (MPLS/VPN CE routers and Internet VPN routers); Central site CE routers and SP’s PE routers;
Figure 2-10: BGP sessions in the WAN infrastructure
BGP local preference (within a single autonomous system) or Multi-Exit Discriminator (across autonomous systems) would be used to select the optimum paths, and BGP communities would be used to influence local preference between autonomous systems.
The BGP-only design seems exceedingly simple, but there are still a number of significant design choices to make:
IBGP or EBGP sessions: Which routers would belong to the same autonomous system (AS)?
Would the network use one AS per site or would a single AS span multiple sites?
Autonomous system numbers: There are only 1024 private AS numbers. Would the design
Integration with CPE routers: Would the Internet VPN routers use the same AS number as
the CPE routers on the same site?
Integration with layer-3 switches: Would the central site and remote site layer-3 switches
participate in BGP or would they interact with the WAN edge routers through OSPF?
IBGP
OR
EBGP?
There are numerous differences between EBGP and IBGP and their nuances sometimes make it hard to decide whether to use EBGP or IBGP in a specific scenario. However, you the following guidelines usually result in simple and stable designs:
If you plan to use BGP as the sole routing protocol in (a part of) your network, use EBGP.
If you’re using BGP in combination with another routing protocol that will advertise reachability of BGP next hops, use IBGP. You can also use IBGP between routers residing in a single subnet.
It’s easier to implement routing policies with EBGP. Large IBGP deployments need route reflectors for scalability and some BGP implementations don’t apply BGP routing policies on reflected routes.
All routers in the same AS should have the same view of the network and the same routing policies.
EBGP should be used between routers in different administrative (or trust) domains. Applying these guidelines to our WAN network gives the following results:
EBGP will be used across DMVPN network. A second routing protocol running over DMVPN would be needed to support IBGP across DMVPN, resulting in overly complex network design.
IBGP will be used between central site WAN edge routers. The existing central site routing
protocol can be used to propagate BGP next hop information between WAN edge routers (or they could belong the same layer-2 subnet).
EBGP will be used between central site MPLS/VPN CE routers and Service Provider’s PE routers (incidentally, most MPLS/VPN implementations don’t support IBGP as the PE-CE routing
protocol).
EBGP or IBGP could be used between remote site Internet VPN routers and CPE routers. While IBGP between these routers reduces the overall number of autonomous systems needed, the MPLS/VPN service provider might insist on using EBGP.
Throughout the rest of this document we’ll assume the Service Provider agreed to use IBGP between CPE routers and Internet VPN routers on the same remote site.
A
UTONOMOUS
S
YSTEM
N
UMBERS
The decision to use IBGP between CPE routers and Internet VPN routers simplifies the AS number decision: remote sites will use the existing AS numbers assigned to CPE routers.
The Customer has to get an extra private AS number (coordinated with the MPLS/VPN SP) for the central site, or use a public AS number for that site.
In a scenario where the SP insists on using EBGP between CPE routers and Internet VPN routers the Customer has two options:
Use a set of private AS numbers that the MPLS/VPN provider isn’t using on its CPE routers and number the remote sites;
Use 4-octet AS numbers reserved for private use by RFC 6996.
Unless you’re ready to deploy 4-octet AS numbers, the first option is the only viable option for networks with more than a few hundred remote sites (because there are only 1024 private AS numbers). The second option is feasible for smaller networks with a few hundred remote sites. The last option is clearly the best one, but requires router software with 4-octet AS number support (4-octet AS numbers are supported by all recent Cisco and Juniper routers).
Routers using 4-octet AS numbers (defined in RFC 4893) can interoperate with legacy routers that don’t support this BGP extension; Service Provider’s CPE routers thus don’t have to support 4-byte AS numbers (customer routers would appear to belong to AS 23456).
Default loop prevention filters built into BGP reject EBGP updates with local AS number in the AS path, making it impossible to pass routes between two remote sites when they use the same AS number. If you have to reuse the same AS number on multiple remote sites, disable the BGP loop prevention filters as shown in Figure 2-11 (using neighbor allowas-in command on Cisco IOS). While you could use default routing from the central site to solve this problem, the default routing solution cannot be used when you have to implement the any-to-any traffic flow requirement.
Figure 2-11: Single AS number used on all remote sites
Some BGP implementations might filter outbound BGP updates, omitting BGP prefixes with AS number of the BGP neighbor in the AS path from the updates sent to that neighbor. Cisco IOS does not contain outbound filters based on neighbor AS number; if you use routers from other vendors, check the documentation.
I
NTEGRATION WITH
L
AYER
-3
S
WITCHES
In a typical BGP-based network all core routers (and layer-3 switches) run BGP to get a consistent view of the forwarding information. At the very minimum, all layer-3 elements in every possible path between two BGP routers have to run BGP to be able to forward IP datagrams between the BGP routers as illustrated in Figure 2-12.
There are several workarounds you can use when dealing with non-BGP devices in the forwarding path:
Redistribute BGP routes into IGP (example: OSPF). Non-BGP devices in the forwarding path thus receive BGP information through their regular IGP (see Figure 2-13).
Figure 2-13: BGP routing information redistributed into OSPF
Enable MPLS forwarding. Ingress network edge devices running BGP label IP datagrams with MPLS labels assigned to BGP next hops to ensure the datagrams get delivered to the proper egress device; intermediate nodes perform label lookup, not IP lookup, and thus don’t need the full IP forwarding information.
Create a dedicated layer-2 subnet (VLAN) between BGP edge routers and advertise default route to other layer-3 devices as shown in Figure 2-14. This design might result in suboptimal
routing, as other layer-3 devices forward IP datagrams to the nearest BGP router, which might not be the optimal exit point.
Figure 2-14: Dedicated VLAN between BGP edge routers
We’ll extend BGP to core layer-3 switches on the central site (these switches will also act as BGP route reflectors) and use a VLAN between Service Provider’s CPE router and Internet VPN router on remote sites.
S
UMMARY OF
D
ESIGN
C
HOICES
The following parameters will be used in the BGP-based WAN network design: Each site is an independent autonomous system;
Each site uses a unique AS number assigned to it by the MPLS/VPN SP; IBGP will be used between routers within the same site;
EBGP will be used between sites;
Remote site layer-3 switches will continue to use OSPF as the sole routing protocol;
Core central site layer-3 switches will participate in BGP routing and will become BGP route reflectors.
BGP-B
ASED
WAN
N
ETWORK
D
ESIGN AND
I
MPLEMENTATION
G
UIDANCE
The following sections describe individual components of BGP-based WAN network design.
R
EMOTE
S
ITES
Internet VPN router will be added to each remote site. It will be in the same subnet as the existing CPE router.
Remote site layer-3 switch might have to be reconfigured if it used layer-3 physical interface on the port to which the CPE router was connected. Layer-3 switch should use a VLAN (or SVI) interface to connect to the new router subnet.
IBGP session will be established between CPE router and adjacent Internet VPN router. This is the only modification that has to be has to be performed on the CPE router.
Internet VPN router will redistribute internal OSPF routes received from the layer-3 switch into BGP. External OSPF routes will not be redistributed, preventing routing loops between BGP and OSPF. The OSPF-to-BGP route redistribution does not impact existing routing, as the CPE router already does it; it’s configured on the Internet VPN router solely to protect the site against CPE router failure.
Internet VPN router will redistribute EBGP routes into OSPF (redistribution of IBGP routes is disabled by default on most router platforms). OSPF external route metric will be used to influence the forwarding decision of the adjacent layer-3 switch.
OSPF metric of redistributed BGP routes could be hard-coded into the Internet VPN router confirmation or based on BGP communities attached to EBGP routes. The BGP community-based approach is obviously more flexible and will be used in this design.
The following routing policies will be configured on the Internet VPN routers:
EBGP routes with BGP community 65000:1 (Backup route) will get local preference 50. These routes will be redistributed into OSPF as external type 2 routes with metric 10000.
EBGP routes with BGP community 65000:2 (Primary route) will get local preference 150. These routes will be redistributed into OSPF as external type 1 routes with metric 1.
Furthermore, the remote site Internet VPN router has to prevent potential route leakage between MPLS/VPN and Internet VPN WAN networks. A route leakage between the two WAN networks might turn one or more remote sites into transit sites forwarding traffic between the two WAN networks.
NO-EXPORT BGP community will be used on the Internet VPN router to prevent the route leakage:
NO-EXPORT community will be set on updates sent over the IBGP session to the CPE router, preventing the CPE router from advertising routes received from the Internet VPN router into the MPLS/VPN WAN network.
NO-EXPORT community will be set on updates received over the IBGP session from the CPE router, preventing leakage of these updates into the Internet VPN WAN network.
Figure 2-15: Remote site logical network topology and routing
C
ENTRAL
S
ITE
The following steps will be used to deploy BGP on the central site:
1. BGP will be configured on existing MPLS/VPN edge routers, on the new Internet VPN edge routers, and on the core layer-3 switches.
2. IBGP sessions will be established between all loopback interfaces of WAN edge switches and both core layer-3 switches10. Core layer-3 switches will be BGP route reflectors.
3. EBGP sessions will be established between MPLS/VPN edge routers and adjacent PE routers.
4. BGP community propagation11 will be configured on all IBGP and EBGP sessions.
After this step, the central site BGP infrastructure is ready for routing protocol migration.
5. Internal OSPF routes will be redistributed into BGP on both core layer-3 switches. No other central site router will perform route redistribution.
At this point, the PE routers start receiving central site routes through PE-CE EBGP sessions and prefer EBGP routes received from MPLS/VPN edge routes over OSPF routes received from the same routers.
6. Default route will be advertised from layer-3 switches into OSPF routing protocol.
Access-layer switches at the core site will have two sets of external OSPF routes: specific routes originated by the PE routers and default route originated by core layer-3 switches. They will still prefer the specific routes originated by the PE routers.
7. OSPF will be disabled on PE-CE links.
10 BGP Essentials: Configuring Internal BGP Sessions
http://blog.ioshints.info/2008/01/bgp-essentials-configuring-internal-bgp.html
11 BGP Essentials: BGP Communities
At this point, the PE routers stop receiving OSPF routes from the CE routers. The only central site routing information they have are EBGP routes received from PE-CE EBGP session.
Likewise, the core site access-layer switches stop receiving specific remote site prefixes that were redistributed into OSPF on PE routers and rely exclusively on default route advertised by the core layer-3 switches.
Figure 2-16 summarizes central site IP routing design.
I
NTERNET
VPN
Two sets of EBGP sessions are established across DMVPN subnets. Each central site Internet VPN router (DMVPN hub router) has EBGP sessions with remote site Internet VPN routers in the same DMVPN subnet (DMVPN spoke routers). BGP community propagation will be configured on all EBGP sessions.
R
OUTING
P
OLICY
A
DJUSTMENTS
The following changes will be made on central site Internet VPN routers to adjust the WAN network routing policies:
VPN traffic flow through the central site: configure neighbor next-hop-self on DMVPN EBGP sessions. Central site Internet VPN routers start advertising their IP addresses as EBGP next hops for all EBGP prefixes, forcing the site-to-site traffic to flow through the central site.
Any-to-any VPN traffic flow: configure no neighbor next-hop-self on DMVPN EBGP sessions. Default EBGP next hop processing will ensure that the EBGP routes advertised through the central site routers retain the optimal BGP next hop – IP address of the remote site if the two remote sites connect to the same DMVPN subnet, or IP address of the central site router in any other case.
Internet VPN as the backup connectivity: Set BGP community 65000:1 (Backup route) on all EBGP updates sent from the central site routers. Remote site Internet VPN routers will lower the local preference of routes received over DMVPN EBGP sessions and thus prefer IBGP routes received from CPE router (which got the routes over MPLS/VPN WAN network).
Internet VPN as the primary connectivity: Set BGP community 65000:2 (Primary route) on all EBGP updates sent from the central site routers. Remote site Internet VPN routers will increase the local preference of routes received over DMVPN EBGP session and thus prefer those routes to IBGP routes received from the CPE router.
C
ONCLUSIONS
A design with a single routing protocol running in one part of the network (example: WAN network or within a site) is usually less complex than a design that involves multiple routing protocols and route redistribution.
When you have to combine MPLS/VPN WAN connectivity with any other WAN connectivity, you’re forced to incorporate BGP used within the MPLS/VPN network into your network design. Even though MPLS/VPN technology supports multiple PE-CE routing protocols, the service providers rarely
implement IGP PE-CE routing protocols with all the features you might need for successful enterprise WAN integration. Provider-operated CE routers are even worse, as they cannot propagate
MPLS/VPN-specific information (extended BGP communities) into enterprise IGP in which they participate.
WAN network based on BGP is thus the only logical choice, resulting in a single protocol (BGP) being used in the WAN network. Incidentally, BGP provides a rich set of routing policy features, making your WAN network more flexible than it could have been were you using OSPF or EIGRP.
3
BGP
R
OUTING IN
DMVPN
A
CCESS
N
ETWORK
I
N THIS CHAPTER:
E
XISTINGIP
R
OUTINGO
VERVIEWIBGP
V
ERSUSEBGP
IBGP
ANDEBGP
B
ASICSR
OUTEP
ROPAGATIONBGP
N
EXTH
OPP
ROCESSINGU
SINGEBGP
IN ADMVPN
N
ETWORKS
POKES
ITESH
AVEU
NIQUEAS
N
UMBERSU
SINGEBGP
WITHP
HASE1
DMVPN
N
ETWORKSR
EDUCING THES
IZE OF THES
POKER
OUTERS’
BGP
T
ABLEU
SINGIBGP
IN ADMVPN
N
ETWORKA large enterprise (the Customer) has an existing international WAN backbone using BGP as the routing protocol. They plan to replace a regional access network with DMVPN-based solution and want to extend the existing BGP routing protocol into the access network to be able to scale the access network to several thousand sites.
The initial DMVPN access network should offer hub-and-spoke connectivity, with any-to-any traffic implemented at a later stage.
The Customer’s design team is trying to answer these questions:
Should they use Internal BGP (IBGP) or External BGP (EBGP) in the DMVPN access network?
What autonomous system (AS) numbers should they use on remote (spoke) sites if they decide to use EBGP in the DMVPN access network?
E
XISTING
IP
R
OUTING
O
VERVIEW
The existing WAN network is already using BGP routing protocol to improve the overall scalability of the network. The WAN backbone is implemented as a single autonomous system using the
Customer’s public AS number.
IBGP sessions within the WAN backbone are established between loopback interfaces and the Customer is using OSPF is exchange reachability information within the WAN backbone (non-backbone routes are transported in BGP).
The WAN backbone AS is using BGP route reflectors; new DMVPN hub routers will be added as route reflector clients to existing BGP topology.
IBGP
V
ERSUS
EBGP
The following characteristics of IBGP and EBGP have to be considered when deciding whether to use single AS or multiple AS design12:
Route propagation in IBGP and EBGP;
BGP next hop processing;
Route reflector behavior and limitations (IBGP only);
Typical IBGP and EBGP use cases;
IBGP
AND
EBGP
B
ASICS
An autonomous system is defined as a set of routers under a common administration and using common routing policies.
IBGP is used to exchange routing information between all BGP routers within an autonomous system. IBGP sessions are usually established between non-adjacent routers (commonly using loopback interfaces); routers rely on an IGP routing protocol (example: OSPF) to exchange intra-AS reachability information.
EBGP is used to exchange routing information between autonomous systems. EBGP sessions are usually between directly connected IP addresses of adjacent routers. EBGP was designed to work in without IGP.
12 IBGP or EBGP in an Enterprise Network
R
OUTE
P
ROPAGATION
BGP loop prevention logic enforces an AS-level split horizon rule:
Routes received from an EBGP peer are further advertised to all other EBGP and IBGP peers (unless an inbound or outbound filter drops the route);
Routes received from an IBGP peer are advertised to EBGP peers but not to other IBGP peers. BGP route reflectors (RR) use slightly modified IBGP route propagation rules:
Routes received from an RR client are advertised to all other IBGP and EBGP peers. RR-specific BGP attributes are added to the routes advertised to IBGP peers to detect IBGP loops.
Routes received from other IBGP peers are advertised to RR clients and EBGP peers. The route propagation rules influence the setup of BGP sessions in a BGP network:
EBGP sessions are established based on physical network topology;
IBGP networks usually use a set of route reflectors (or a hierarchy of route reflectors); IBGP sessions are established between all BGP-speaking routers in the AS and the route reflectors.
BGP
N
EXT
H
OP
P
ROCESSING
The BGP next hop processing rules13 heavily influence the BGP network design and dictate the need
of an IGP in IBGP networks:
An BGP router advertising a BGP route without a NEXT HOP attribute (locally originated BGP route) sets the BGP next hop to the source IP address of the BGP session over which the BGP route is advertised;
A BGP router advertising a BGP route to an IBGP peer does not change the value of the BGP NEXT HOP attribute;
A BGP router advertising a BGP route to an EBGP peer sets the value of the BGP NEXT HOP attribute to the source IP address of the EBGP session unless the existing BGP NEXT HOP value belongs to the same IP subnet as the source IP address of the EBGP session.
You can modify the default BGP next hop processing rules with the following Cisco IOS configuration options:
neighbor next-hop-self router configuration command sets the BGP NEXT HOP attribute to the
source IP address of the BGP session regardless of the default BGP next hop processing rules.
13 BGP Next Hop Processing
BGP route reflector cannot change the BGP attributes of reflected routes14. Neighbor
next-hop-self is thus not effective on routes reflected by a route reflector.
Recent Cisco IOS releases support an extension to the neighbor next-hop-self command:
neighbor address next-hop-self all configuration command causes a route server to
change BGP next hops on all IBGP and EBGP routes sent to the specified neighbor.
Inbound or outbound route maps can set the BGP NEXT HOP to any value with the set ip
next-hop command (the outbound route maps are not applied to reflected routes). The most useful
use of this command is the set ip next-hop peer-address used in an inbound route map.
set ip next-hop peer-address sets BGP next hop to the IP address of BGP neighbor when
used in an inbound route map or to the source IP address of the BGP session when used in an outbound route map.
14 BGP Route Reflectors