VMware
®
and Brocade
®
Network Virtualization
Reference Whitepaper
Table of Contents
EXECUTIVE SUMMARY
2
VMWARE NSX WITH BROCADE VCS: SEAMLESS TRANSITION TO SDDC
2
VMWARE'S NSX NETWORK VIRTUALIZATION PLATFORM
3
O
VERVIEW3
C
OMPONENTS OF THEVM
WARENSX
4
D
ATAP
LANE4
C
ONTROLP
LANE5
M
ANAGEMENTP
LANE5
C
ONSUMPTIONP
LATFORM5
F
UNCTIONALS
ERVICES OFNSX
FOR VS
PHERE5
WHY DEPLOY BROCADE NETWORK FABRIC WITH VMWARE NSX
5
DESIGN CONSIDERATIONS FOR VMWARE NSX AND BROCADE NETWORK FABRIC
6
D
ESIGNC
ONSIDERATIONS FORB
ROCADEN
ETWORKF
ABRIC6
B
ROCADEVCS
F
ABRIC ANDVDX
S
WITCHES6
S
CALABLEB
ROCADEVCS
F
ABRICS6
F
LEXIBLEB
ROCADEVCS
F
ABRICB
UILDINGB
LOCKS FORE
ASYM
IGRATION7
B
ROCADEVDX
S
WITCHESD
ISCUSSED INT
HISG
UIDE7
M
IXEDS
WITCHF
ABRICD
ESIGN8
M
ULTI-‐
FABRICD
ESIGNS10
D
EPLOYING THEB
ROCADEVDX
8770
ANDVCS
F
ABRICS AT THEC
LASSICA
GGREGATIONL
AYER11
VCS
F
ABRICB
UILDINGB
LOCKS12
VMWARE NSX NETWORK DESIGN CONSIDERATIONS
16
DESIGNING FOR SCALE AND FUTURE GROWTH
16
COMPUTE
RACKS
17
EDGE RACKS
19
INFRASTRUCTURE RACKS
24
LOGICAL SWITCHING
25
T
RANSPORTZ
ONE27
vSphere hypervisor environment based on the joint solution from VMware NSX and Brocade “Virtual Cluster Switching” (VCS) technology.
VMware’s Software Defined Data Center (SDDC) vision leverages core data center virtualization technologies to transform data center economics and business agility through automation and non-‐disruptive deployment that embraces and extends existing compute, network and storage infrastructure investments. VMware’s NSX is the component providing the networking virtualization pillar of this vision. With NSX customers can build an agile “overlay “infrastructure for Public and Private cloud environments leveraging Brocade’s robust and resilient “Virtual Cluster Switching” (VCS) for the physical “underlay” network. Together, Brocade and VMware help customers leverage the promise of VMware’s Software Defined Data Center (SDDC) vision to enable the power, intelligence, and analytics of networks with a flexible, end-‐to-‐end solution.
VMware NSX with Brocade VCS: Seamless Transition to SDDC
New technologies and applications are driving constant change in organizations both large and small, and nowhere are the effects felt more keenly than in the network. Large-‐scale server virtualization is generating unpredictable bandwidth requirements driven by virtual machine (VM) mobility. The move toward cloud computing is demanding a high-‐performance network interconnect that can be driven by servers and VMs that number in the tens of thousands. Modern virtualized multi-‐tiered applications are generating massive levels of east/west inter-‐server traffic. Unfortunately, traditional network topologies and solutions were not designed to support these highly virtualized environments with mobile VMs and demanding modern workloads.
VMware NSX has emerged as an attractive solution to these challenges, bringing dramatic improvements over the inefficiencies, rigidity, fragility, and management challenges of classic hierarchical Ethernet networks. For optimal performance, it is recommended to run NSX on a resilient physical network or fabric underlay for providing robust network connectivity. Brocade’s VCS® Fabric technology is ideal for this scenario, enabling organizations to migrate to a highly available and automated fabric at their own pace, without disrupting their existing data center network architecture. Here are some typical instances when customers may choose to transition to a Brocade network fabric as part of their evolution to NSX SDDC architecture:
• Transitioning from Gigabit Ethernet (GbE) to 10 GbE -‐ Many organizations are consolidating multiple workloads onto fewer more powerful servers, creating a demand for greater network bandwidth.
• Scaling the network -‐ The elasticity, manageability, flexibility, and scalability of Ethernet fabrics make them ideal for new virtualization and cloud computing environments.
• Adding storage -‐ Storage virtualization and those organizations developing Ethernet Storage Area Networks (SANs) require a true lossless fabric.
• Adopting Network Virtualization -‐ Network virtualization introduce additional parameters to set up and manage, and typically require new skill sets as well. Ethernet fabrics provide a simpler, highly resilient, low-‐latency foundation to virtualize the network to reach SDDC.
This combined solution by Brocade and VMware NSX delivers the required IT agility through automated, zero-‐touch VM discovery, configuration, and mobility, which is demanded by today’s constantly evolving workloads.
VMware's NSX Network Virtualization Platform
Overview
IT organizations have gained significant benefits as a direct result of server virtualization. Server consolidation reduced physical complexity, increased operational efficiency, and the ability to dynamically re-‐purpose underlying resources to quickly and optimally meet the needs of increasingly dynamic business applications are just a handful of the gains that have already been realized.
Now, VMware’s Software Defined Data Center (SDDC) architecture is extending virtualization technologies across the entire physical data center infrastructure. VMware NSX, the network virtualization platform is a key product in the SDDC architecture. With NSX, virtualization now delivers for networking the same value and advantages it has provided for compute and storage. In much the same way that server virtualization programmatically creates, snapshots, deletes and restores software-‐based virtual machines (VMs), NSX network virtualization programmatically creates, snapshots, deletes, and restores software-‐based virtual networks. The result is a completely transformative approach to networking that not only enables data center managers to achieve orders of magnitude better agility and economics, but also allows for a vastly simplified operational model for the underlying physical network. With the ability to be deployed on any IP network, including both existing traditional networking models and next generation fabric architectures from any vendor, NSX is a completely non-‐disruptive solution.
Figure 1 Server and Network Virtualization Analogy
Figure 1 draws an analogy between compute and network virtualization. With server virtualization, a software abstraction layer (server hypervisor) reproduces the familiar attributes of an x86 physical server (e.g., CPU, RAM, Disk, NIC) in software, allowing them to be programmatically assembled in any arbitrary combination to produce a unique virtual machine (VM) in a matter of seconds.
With network virtualization, the functional equivalent of a “network hypervisor” reproduces the complete set of Layer 2 to Layer 7 networking services (e.g., switching, routing, access control, firewalling, QoS, and load balancing) in software. As a result, these services can be programmatically assembled in any arbitrary combination, to produce unique, isolated virtual networks in a matter of seconds.
Not surprisingly, similar benefits are also derived. For example, just as VMs are independent of the underlying x86 platforms and allow IT to treat physical hosts as a pool of compute capacity, virtual networks are independent of the underlying IP network hardware and allow IT to treat the physical network as a pool of transport capacity that can be consumed and repurposed on demand. Unlike legacy architectures, virtual networks can be provisioned, changed, stored, deleted and restored programmatically without reconfiguring the underlying physical hardware or topology. By matching the capabilities and benefits derived from familiar server and storage
Figure 2 NSX Components
Data Plane
The NSX Data plane consists of the NSX vSwitch. The vSwitch in NSX for vSphere is based on the vSphere Distributed Switch (VDS) with additional components to enable rich services. The add-‐on NSX components include kernel modules (VIBs) which run within the hypervisor kernel providing services such as distributed routing, distributed firewall and enable VXLAN bridging capabilities. The NSX VDS vSwitch abstracts the physical network and provides access-‐level switching in the hypervisor. It is central to network virtualization because it enables logical networks that are independent of physical constructs such as VLAN. Some of the benefits of the vSwitch are:
• Support for overlay networking with protocols such as VXLAN and centralized network configuration. Overlay networking enables the following capabilities:
o Creation of a flexible logical layer 2 (L2) overlay over existing IP networks on existing physical infrastructure without the need to re-‐architect any of the data center networks
o Provision of communication (east–west and north–south) while maintaining isolation between tenants
o Application workloads and virtual machines that are agnostic of the overlay network and operate as if they were connected to a physical L2 network
• NSX vSwitch facilitates massive scalability of hypervisors and their attached workloads.
Control Plane
The NSX control plane runs in the NSX controller and does not have any data plane traffic passing through it. The NSX controller nodes are deployed in a cluster of odd members in order to enable high-‐availability and scale. Any failure of the controller nodes does not impact any data plane traffic.
Management Plane
The NSX management plane is built upon the NSX manager. The NSX manager provides the single point of configuration and is the target for REST API entry-‐points in a vSphere NSX environment.
Consumption Platform
The consumption of NSX can be driven directly via the NSX manager UI which is available via the vSphere Web UI itself. Typically end-‐ users tie in network virtualization to their cloud management platform for deploying applications. NSX provides a rich set of integration into virtually any CMP via the REST API. Out of the box integration is also available through VMware vRealize automation vRA (previously known as vCloud Automation Center).
Functional Services of NSX for vSphere
In this design guide we will discuss how all of the components described above give us the following functional services:
• Logical Layer 2 – Enabling extension of a L2 segment / IP Subnet anywhere in the fabric irrespective of the physical network design
• Distributed L3 Routing – Routing between IP subnets can be done in a logical space without traffic going out to the physical router. This routing is performed in the hypervisor kernel with a minimal CPU / memory overhead. This functionality provides an optimal data-‐path for routing traffic within the virtual infrastructure. Similarly the NSX Edge provides a mechanism to do full dynamic route peering using OSPF and BGP with the physical network to enable seamless integration.
• Distributed Firewall – Security enforcement is done at the kernel and VNIC level itself. This enables firewall rule enforcement in a highly scalable manner without creating bottlenecks onto physical appliances. The firewall is distributed in kernel and hence has minimal CPU overhead and can perform at line-‐rate.
• Logical Load-‐balancing – Support for L4-‐L7 load balancing with ability to do SSL termination. • SSL VPN services to enable L2 VPN services.
Why Deploy Brocade Network Fabric with VMware NSX
Open and Reliable Infrastructure:
Brocade as a leader in the networking and data center space has over a decade of experience building high performance and reliable networks for the most demanding workloads and some of the world’s largest data centers. Brocade VDX switches support both open standards and more elegant options for customers in terms of deployment for cloud based architectures and the SDDC. For example Brocade supports standard Link aggregation for connection with legacy networking equipment but also offers Brocade Trunking for more efficient utilization of links and higher performance to better serve customer needs. Equal Cost Multipath (ECMP) is also supported to provide predictable performance and resiliency across the network as a whole. By supporting industry standards Brocade provides interoperability and consistency for customers while still being able to provide higher level functionality for particularly intensive SDDC environments that other network vendors don’t offer.where potential hot spots are forming.
Design Considerations for VMware NSX and Brocade Network Fabric
VMware NSX network virtualization can be deployed over existing data center networks. In this section, we discuss how the logical overlay networks using VXLAN encapsulation can be deployed over common data center network topologies. We first address requirements for the physical network and then look at the network designs that are optimal for network virtualization. Finally, the logical networks and related services and scale considerations are explained.
Design Considerations for Brocade Network Fabric
Brocade
VCS
Fabric
and VDX
Switches
Brocade VCS fabrics provide advanced Ethernet fabric technology, eliminating many of the drawbacks of classic Ethernet networks in the data center. In addition to standard Ethernet fabric benefits, such as logically flat networks without the need for Spanning Tree Protocol (STP), Brocade VCS Fabric technology also brings advanced automation with logically centralized management. Brocade VCS Fabric technology includes unique services that are ideal for simplifying traffic in a cloud data center, such as scalable network multi-‐tenancy capabilities, automated VM connectivity and highly-‐efficient multipathing at Layers 1, Layer 2 and Layer 3 with multiple Layer 3 gateways.
The VCS architecture conforms to the Brocade strategy of “revolution through evolution,” therefore Brocade VDX switches with Brocade VCS Fabric technology connect seamlessly with existing data center Ethernet products, whether offered by Brocade or other vendors. At the same time, the VCS architecture allows newer datacenter solutions to be integrated quickly. For example, Brocade VDX switches are hardware-‐enabled to support emerging SDN protocols, such as Virtual Extensive LAN (VXLAN). Logical Chassis technology and
northbound Application Programming Interfaces (APIs) can provide operationally scalable management and access to emerging management frameworks such as VMware vRealize Automation vRA (previously known as vCloud automation Center vCAC)
Scalable Brocade VCS Fabrics
Brocade VCS fabrics offer dramatic improvements over the inefficiencies, inherent limitations, and management challenges of classic hierarchical Ethernet networks. Implemented on Brocade VDX switches, Brocade VCS fabrics drastically simplify the deployment and management of scale-‐out architectures.
• Brocade VCS fabrics are elastic, self-‐forming, and self-‐healing, allowing administrators to focus on service delivery instead of basic network operations and administration. All-‐active connections and load balancing throughout Layers 1–3 provide resilience that is not artificially hampered by arbitrary limitations at any network layer. The distributed control plane ensures that all nodes are aware of the health and state of their peers and that they forward traffic accordingly across the shortest path in the topology. Nodes can be added and removed non-‐disruptively, automatically inheriting predefined configurations and forming new links upon entry or removal of a node.
• Brocade VCS fabrics are easy to manage, with a shared control plane and unified management plane that allow the fabric nodes to function and to be managed as a single entity, regardless of fabric size. Open APIs and OpenStack support facilitate orchestration of VCS fabrics within The On-‐Demand Data Center™.
Brocade VCS fabrics offer considerable scale and capacity, as shown in Table 1.
Criteria
Brocade Switches
Number of switches in a cluster Up to 32 Number of ports in a cluster 8,000+ Switching fabric capacity 10.7+ Tbps Data forwarding capacity 7.7 Tbps MAC addresses 384,000
Maximum ports per switch 384 x 10GbE or 216 x 40GbE or 48 x 100GbE Table 1 Brocade VCS Fabric Scalability
Flexible Brocade VCS Fabric Building Blocks for Easy Migration
Brocade VCS fabrics can be deployed as a large single domain or multiple smaller fabric domains can be configured to suit either application needs or administrative boundaries (see Figure 3). A single larger domain affords a simple, highly efficient
configuration that avoids STP while smoothly supporting significant east-‐west traffic common to modern applications. Data Center Bridging (DCB) is supported on all nodes, allowing for unified storage access over Ethernet. Multiple Brocade VCS domains can be configured to easily scale out the data center, while offering multiple active Layer 3 gateways, contained failure domains, and MAC address scalability—all while avoiding STP.
Figure 3 Brocade VCS fabrics easily accommodate a wide range of configurations, from a single large VCS domain to multiple smaller domains.
L3 G/W
L3 G/W
options, and the ability to connect over 8,000 server ports in a single switching domain.
As shown in Figure 4, organizations can easily deploy Brocade VCS fabrics at the access layer, incrementally expanding the fabric over time. As the Brocade VCS fabric expands, existing network infrastructure can remain in place, if desired. Eventually, the advantages of VCS fabrics can extend to the aggregation layer, delivering the benefits of Brocade VCS Fabric technology to the entire enterprise, while allowing legacy aggregation switches to be redeployed elsewhere. Alternatively, a VCS fabric can be implemented initially in the aggregation tier, leaving existing access tier switches in place.
Figure 4 Incremental deployment of Brocade VCS Fabric.in brown-‐field environment
Mixed Switch Fabric Design
For dense server deployments and highly virtualized environments, multiple Brocade VDX switch types can be combined to form one single VCS Fabric and leverage administrative simplicity through a single logical chassis.. For instance, a small and cost-‐effective Brocade VCS fabric can be piloted using the family of Brocade VDX 6700 products alone—and eventually scaled out, using the Brocade VDX 8770, as the fabric grows and the organization moves toward deploying larger and larger virtualized environments and cloud services.
The configuration shown in Figure 5 shows a typical “leaf-‐spine” fabric. Here the “leaf” or access layer uses the Brocade VDX 6740 switch links to provide redundant Gigabit Ethernet access to servers, while also providing redundant 10 Gigabit Ethernet links to the “Spine” layer which uses the Brocade VDX 8770s while the “Core” layer uses Brocade MLX series switches with MCT technology (For more information on Brocade Multi-‐chassis Trunking please see
VDX 8770 switch greatly expands the VCS fabric, providing high-‐volume connectivity for large numbers of servers. When used at the spine, the Brocade VDX 8770 can provide Layer 3 routing capabilities .Deploying Layer 3 routing at the spine layer shields the core switches from unnecessary routing traffic, thus enabling additional network scale and enhancing application performance. Multiple active Layer 3 gateways at the Spine layer provide high availability through an architecturally hierarchical, but logically flat, network.
Figure 5 The Brocade VDX 8770 switch can be added to existing small-‐to-‐medium-‐scale VCS fabrics at both the leaf and spine for additional scale while containing Layer 3 traffic.
For very data intensive applications with very low-‐latency requirements, the Brocade VDX 8770 switch can be paired with Brocade VDX 6730 switches for connecting to FC arrays, as shown in Figure 6. This highly-‐redundant dual-‐fabric configuration offers the benefits of both Brocade VCS fabrics and FC fabrics.
Figure 6 Highly redundant, dual-‐fabric design for an HPC environment can consolidate multiple data stores into a single managed service.
Multi-‐fabric Designs
The Brocade VDX 8770 can be used to accomplish phased data center deployments of VCS fabrics, or to accomplish truly massive scalability through multi-‐fabric Brocade VCS Fabric deployments. By deploying the Brocade VDX 8770 switch as a spine switch, multiple fabrics can be interconnected to provide additional scale and Layer 3 flexibility. Figure 7 illustrates separate fabrics built from Brocade VDX 6740 and 8770 switches. As shown, Virtual LAG (vLAG ) connect the separate fabric domains using both 40 Gbe connections and 10 Gbe DCB connections for storage access. Note : Link aggregation allows you to bundle multiple physical Ethernet links to form a single logical trunk providing enhanced performance and redundancy. The aggregated trunk is referred to as a Link Aggregation Group (LAG). Virtual LAG (vLAG) is a feature that is included in Brocade VCS Fabric technology that extends the concept of LAG to include edge ports on multiple VCS switches.
Figure 7 Illustrates separate fabrics built from Brocade VDX 6700 and 8770 switches.
Deploying the Brocade VDX 8770 and VCS Fabrics at the Classic Aggregation Layer
Many medium-‐to-‐large data centers are looking for opportunities to move toward cloud computing solutions, while realizing the benefits of Ethernet fabrics. Often these organizations need to improve the performance of their existing networks, but they also want to protect investments in existing networking technology. Even in traditional hierarchical deployment scenarios, the combination of the Brocade VDX 8770 switch and Brocade VCS Fabric technology can offer significant benefits in terms of future-‐ proofing the network, advancing network convergence, and offering a migration path to 40 GbE, eventually, 100 GbE technologies.
The Brocade VDX 8770 switch can provide many advantages, especially for those organizations that are tied to tiered network architecture for now, but want to deploy hybrid architecture for investment protection.
Deploying a Brocade VCS fabric at the traditional aggregation layer can dramatically improve the performance of the existing network, while protecting both investments in existing infrastructure as well as new investments in Brocade VCS technology.
Advantages of deploying Brocade VCS Fabric technology at the traditional aggregation layer include: • Multiple Layer 3 gateways for redundancy and optimal load balancing
• Standard Layer 2 and Layer 3 functionality • Wire-‐speed performance
• High-‐density 10GbE 40GbE and 100GbE • ~4 µsec latency within the VCS fabric • Resiliency through high availability
• Reduced demand on core switches for east-‐west traffic
Figure 8 Dual Brocade VDX 8770 switches configured as a VCS fabric at the aggregation/distribution layer convey many benefits to traditional tiered networks.
VCS Fabric Building Blocks
Multiple data center templates can be defined, tested and deployed out a common set of building blocks. This promotes reusability of building blocks (and technologies), reduces testing and simplifies support.
VCS Fabric flattens the network using Brocade’s VCS Fabric technology. Within a single fabric, both layer 2 and layer 3 switching are available on any or all switches in the fabric. VCS Fabric of ToR switches can be configured to create a layer 2 fabric with layer 2 links to an aggregation block. In this set of building blocks the aggregation and access switching are combined into a single VCS Fabric of VDX switches. A single fabric is a single logical management domain simplifying configuration of the network.
VCS Fabric Topologies
Figure 9 Leaf-‐Spine VCS Fabric Topology with L3 at Spine
Each leaf switch at the bottom is connected to all spine switches at the top. The connections are Brocade ISL Trunks for resiliency which can contain up to 16 links per trunk. All servers can connect with each other with two switch hops in between. As shown, all leaf switches are at layer 2 and spine switches create the L2/L3 boundary. However, the L2/L3 boundary can be at the leaf switch as well as shown below.
Figure 10 Leaf-‐Spine VCS Fabric Topology with L3 at Leaf
In this option, VLAN traffic is routed across the spine and each leaf switch includes layer 3 routing services. Brocade ISL Trunks continue to provide consistent latency and large cross-‐sectional bandwidth with link resiliency. However, ECMP at layer 3 provides multipath forwarding rather than ECMP at layer 2.
Figure 11 Collapsed Spine VCS Fabric Topology
The VDX 8770 is a modular chassis switch with high density of 10 GbE and/or 40 GbE ports. A collapsed spine topology can be an efficient building block for server virtualization with NAS storage pools. Multiple racks of virtualized servers and NAS servers are connected to a middle of row (MoR) or end of row (EoR) cluster of VDX 8770 switches. The collapsed spine topology lends itself to data center scale out that relies on pods of compute, storage and networking connected to a common data center routing core. For cloud computing environments, pod-‐based scale-‐out architectures are attractive.
The following describe several VCS Fabric building blocks.
VCS Fabric Leaf-‐Spine Topology
A VCS Fabric leaf-‐spine topology can be used to create a scalable fabric with consistent latency, high bandwidth multipath switch links and automatic link resiliency. This block forms the spine with each spine switch connecting to all leaf switches. Fabric connections in red are Brocade ISL Trunks with up to 16 links per auto-‐forming trunk. Layer 2 traffic moves across the fabric while layer 3 traffic exits the fabric on port configured for a routing protocol. As shown by the black arrows, uplinks to the core router would be routed, for example using OSPF. And connection to an IP Services block would also use layer 3 ports on spine switches.
The blue links show layer 2 ports that can be used to attach NAS storage to the spine switches. This option creates a topology for NAS storage that is similar to best practices for SAN storage fabrics based on a core/edge topology. For most applications, storage IOPS and bandwidth is less per server than a NAS port can service. An economical use of NAS ports, particularly when using 10 GbE ports, is to fan-‐out multiple servers to each NAS port. Therefore, attaching NAS storage nodes to the spine switches facilitates this architecture.
Collapsed Spine
This is a collapsed spine with a two switch VCS Fabric. Typically, high port count modular switches such as the VDX 8770 series would be used. This block works efficiently for data centers that scale-‐out by replicating a pod of compute, storage and networking. Each pod is connected via layer 3 routing to the data center core routers. Local traffic within the pod does not transit the core routers, but inter-‐pod traffic does. The collapsed spine uses VRRP/VRRP-‐E for IP gateway resiliency with the VCS Fabric providing layer-‐2
resiliency. As shown, the collapsed spine can be used effectively when connecting a large number of compute nodes to NAS storage as is commonly found in cloud computing environments and data analytics configurations such as a Hadoop cluster. The blue arrows represent 10 GbE links that use vLAG for link resiliency within the VCS Fabric and NIC Teaming for NAS server and compute server resiliency. As shown, IP Services blocks can be attached to the spine switches providing good scalability for load balancing and IDS/IPS services.
Figure 13 VCS Fabric Spine: Collapsed Spine Topology
These VDX switches as leaf nodes can be used with the VCS Fabric Spine for Leaf-‐Spine Topology. They can also be used to convert the VCS Fabric Spine for Collapsed Spine into a Leaf-‐Spine topology.
Designing for Scale and Future Growth
When designing a new environment, it is essential to choose an architecture that allows for future growth. The approach presented is intended for deployments that begin small with the expectation of growth to a larger scale while retaining the same overall
architecture.
This network virtualization solution does not require spanning of VLANs beyond a single rack. Elimination of this requirement has a widespread impact on the design and scalability of the physical switching infrastructure.
Although this appears to be a simple requirement, it has widespread impact on how a physical switching infrastructure can be built and on how it scales.
Note the following three types of racks within the infrastructure: • Compute
• Edge
• Infrastructure
Figure 14 Data Center Design -‐ layer-‐3 in Access Layer
Compute Racks
Compute racks are the section of the infrastructure where tenant virtual machines are hosted. Central design characteristics include: • Interoperability with an existing network
• Repeatable rack design
• Connectivity for virtual machines without use of VLANs • No requirement for VLANs to extend beyond a compute rack
A hypervisor typically sources three or more types of traffic. This example consists of VXLAN, management, vSphere vMotion, and storage traffic. The VXLAN traffic is a new traffic type that carries all the virtual machine communication, encapsulating it in the UDP frame. The following section will discuss how the hypervisors connect to the external network and how these different traffic types are commonly configured.
Connecting Hypervisors
The servers in the rack are connected to the access layer switch via a number of Gigabit Ethernet (1GbE) interfaces or 10GbE
interfaces. Physical server NICs are connected to the virtual switch on the other end. For best practices on how to connect the NICs to the virtual and physical switches, refer to the VMware vSphere Distributed Switch Best Practices technical white paper.
http://www.vmware.com/files/pdf/techpaper/vsphere-‐distributed-‐switch-‐best-‐practices.pdf
The connections between each server in the rack and the leaf switch are usually configured as 802.1q trunks. A significant benefit of deploying VMware NSX network virtualization is the drastic reduction of the number of VLANs carried on those trunk connections.
Figure 15. Example -‐ Host and Leaf Switch Configuration in a Rack
machines connected to one of the VXLAN-‐based logical layer-‐2 networks use this traffic type to communicate. The traffic from the virtual machine is encapsulated and sent out as VXLAN traffic. The external physical fabric never detects the virtual machine IP or MAC address. The virtual tunnel endpoint (VTEP) IP address is used to transport the frame across the fabric. In the case of VXLAN, the tunnels are initiated and terminated by a VTEP. Traffic that flows between virtual machines in the same data center is typically referred to as east–west traffic. For this type of traffic, both the source and destination VTEP are situated in hypervisors located in compute racks. Traffic leaving the data center will flow between a tenant virtual machine and an NSX Edge, and is referred to as north–south traffic.
VXLAN configuration requires an NSX VDS vSwitch. One requirement of a single-‐VDS–based design is that the same VLAN ID is defined for each hypervisor to source VXLAN encapsulated traffic (VLAN ID 88 in the example in Figure 15). Because a VDS can span hundreds of hypervisors, it can reach beyond a single leaf switch. Note that the use of the same VLAN ID does not mean that the different VTEPs across hypervisors are necessarily in the same broadcast domain (i.e. VLAN). It simply means they encapsulate their traffic using the same VLAN ID. The host VTEPs -‐ even if they are on the same VDS —can use IP addresses in different subnets, thus offering the capability to leverage an end-‐to-‐end L3 fabric.
Management Traffic
Management traffic can be categorized into two types; one is sourced and terminated by the management VMkernel interface on the host, the other is involved with the communication between the various NSX components. The traffic that is carried over the
management VMkernel interface of a host includes the communication between vCenter Server and hosts as well as communication with other management tools such as NSX Manager. The communication between the NSX components involves the heartbeat between active and standby edge appliances.
Management traffic stays inside the data center. A single VDS can span multiple hypervisors that are deployed beyond a single leaf switch. The management interfaces of hypervisors participating in a common VDS and connected to separate leaf switches could reside in the same or in separate subnets.
vSphere vMotion Traffic
During the vSphere vMotion migration process, the running state of a virtual machine is transferred over the network to another host. The vSphere vMotion VMkernel interface on each host is used to move this virtual machine state. Each vSphere vMotion VMkernel interface on the host is assigned an IP address. The number of simultaneous vMotion migrations than can be performed is limited by the speed of the physical NIC. On a 10GbE NIC, eight simultaneous vSphere vMotion migrations are allowed.
Note: VMware has previously recommended deploying all the VMkernel interfaces used for vMotion as part of a common IP subnet. This is not possible when designing a network for network virtualization using layer-‐3 at the access layer, where it is mandatory to select different subnets in different racks for those VMkernel interfaces. Until VMware officially relaxes this restriction, it is recommended that customers requiring vMotion over NSX go through VMware's RPQ (“Request for Product Qualification”) process so that the customer's design can be validated on a case-‐by-‐case basis.
Storage Traffic
leaf switch) is part of the same subnet. This subnet cannot span beyond this leaf switch, therefore the storage VMkernel interface IP of a host in a different rack is in a different subnet. For an example of the IP address for these VMkernel interfaces, refer to the “VLAN Provisioning” section.
Edge Racks
Tighter interaction with the physical infrastructure occurs while bridging between the overlay world and the physical infrastructure. The main functions provided by an edge rack include:
• Providing on-‐ramp and off-‐ramp connectivity to physical networks • Connecting with VLANs in the physical world
• Hosting centralized physical services
Tenant-‐specific addressing is exposed to the physical infrastructure where traffic is not encapsulated in VXLAN (e.g., NAT not used at the edge). In the case of a layer-‐3 edge, the IP addresses within the overlay are exposed to the physical fabric. The guiding principle in these cases is to separate VXLAN (overlay) traffic from the un-‐encapsulated (native) traffic. As shown in Figure 16, VXLAN traffic hits the data center internal Ethernet switching infrastructure. Native traffic traverses a dedicated switching and routing infrastructure facing the WAN or Internet and is completely decoupled from the data center internal network.
Figure 16 . VXLAN Traffic and the Data Center Internal Ethernet Switching Infrastructure
To maintain the separation, NSX Edge virtual machines can be placed in NSX Edge racks, assuming the NSX Edge has at least one native interface. For routing and high availability, the two interface types—overlay and native—must be examined individually. The failover mechanism is based on the active standby model, where the standby Edge takes over after detecting the failure of the active Edge.
Layer-‐3 NSX Edge Deployment Considerations
Figure 17 : Reference Topology for NSX Edge HA models
The next three sections illustrate briefly the HA models mentioned above.
Stateful Active/Standby HA Model
This is the redundancy model where a pair of NSX Edge Services Gateways is deployed for each tenant; one Edge functions in Active mode (i.e. actively forwards traffic and provides the other logical network services), whereas the second unit is in Standby state, waiting to take over should the active Edge fail. Health and state information for the various logical network services is exchanged between the active and standby NSX Edges leveraging an internal
communication protocol. The first vNIC interface of type “Internal” deployed on the Edge is used by default to establish this communication, but the user is also given the possibility of explicitly specifying the Edge internal interface to be used.
Note: it is mandatory to have at least one Internal interface configured on the NSX Edge to b able to exchange keepalives between the Active and Standby units. Deleting the last Internal interface would break this HA model.
The
Figure 18
below highlights how the Active NSX Edge is active both from a control and data plane perspectives.Figure 18 : NSX Edge Active Standby HA Model (left) and Traffic Recovery (right)
Standalone HA Model (NSX 6.0.x Releases)
The standalone HA model inserts two independent NSX Edge appliances between the DLR and the physical network and it is supported when running NSX 6.0.x SW releases.
Figure 20 : Traffic Flows with Standalone HA model
ECMP HA Model (NSX 6.1 Release Onward)
NSX software release 6.1 introduces support for a new Active/Active ECMP HA model, which can be considered the improved and evolved version of the previously described Standalone one.
In the ECMP model, the DLR and the NSX Edge functionalities have been improved to support up to 8 equal cost paths in their forwarding table. Focusing for the moment on the ECMP capabilities of the DLR, this means that up to 8 active NSX Edges can be deployed at the same time and all the available control and data planes will be fully utilized, as shown in Figure 21.
This HA model provides two main advantages:
1. An increased available bandwidth for north-‐south communication (up to 80 Gbps per tenant). 2. A reduced traffic outage (in terms of % of affected flows) for NSX Edge failure scenarios.
Notice from the diagram in Figure 21 that traffic flows are very likely to follow an asymmetric path, where the north-‐ to-‐south and south-‐to-‐north legs of the same communications are handled by different NSX Edge Gateways. The DLR distributes south-‐to-‐north traffic flows across the various equal cost paths based on hashing of the source and destination IP addresses of the original packet sourced by the workload in logical space. The way the physical router distributes north-‐to-‐south flows depends instead on the specific HW capabilities of that device.
and storage traffic. Provisioning of IP addresses to the VMkernel NICs of each traffic type is automated using vSphere host profiles. The host profile feature enables creation of a reference host with properties that are shared across the deployment. After this host has been identified and required sample configuration performed, a host profile can be created and applied across in the deployment. This allows quick configuration of a large numbers of hosts.
As shown in , the same set of four VLANs—storage, vSphere vMotion, VXLAN, management—is provided in each rack.
Figure 22 : Host Infrastructure Traffic Types and IP address Assignment
Multi-‐tier Edges and Multi-‐tier Application Design Considerations
Classical multi-‐tier compute architectures have functions that are logically separated, where each function has different requirements for resource access, data segregation, and security. Classical three-‐tier compute architecture typically comprises a presentation tier, an application or data access tier, and a database tier. Communication between the application tier and the database tier should be allowed, while an external user has access to only the presentation tier, which is typically a web-‐based service.
The recommended solution to comply with data access policies is to deploy a two-‐tier edge design. The inner edge enables VXLAN-‐to-‐ VXLAN east–west traffic among the presentation, database, and application tiers, represented by different logical networks. The outer edge connects the presentation tier with the outer world for on-‐ramp and off-‐ramp traffic. Communication within a specific logical network enables virtual machines to span across multiple racks to achieve optimal utilization of the compute rack infrastructure.
Note: At the current time, a logical network can span only a single vCenter domain. Figure 23 shows the placement of the logical elements of this architecture.
Figure 23 : Two Options for Logical Element Placement in a Multitier Application
It is preferable that the outer edges be physically placed in the edge racks. Inner edges can be centralized in the Edge racks or distributed across the compute racks, where web and application compute resources are located.
Logical Switching
Figure 24 : Logical Switching Components
NSX Manager
The NSX Manager is the management plane component responsible for configuring logical switches and connecting virtual machines. It also provides API interface, which automates deployment and management of these switches through a cloud management platform.
Controller Cluster
Transport Zone
As part of the host preparation process, the hypervisor modules are deployed and configured through the NSX Manager. After logical switching components are installed and configured, the next step is to define the span of logical switches by creating a transport zone. The transport zone consists of a set of clusters. For example, if there are 10 clusters in the data center, a transport zone can include some or all of those 10 clusters. In this scenario a logical switch can span the whole data center. Figure below shows a deployment after the NSX components are installed to provide logical switching. The Edge Services Router in the edge rack provides the logical switches access to the WAN and other network services.
Figure 25 : Logical Switching components in the racks
Logical Switch Replication Modes
When two VMs connected to different ESXi hosts need to communicate directly, unicast VXLAN encapsulated traffic is exchanged between the VTEP IP addresses associated to the two hypervisors. Traffic originated by a VM may need to be sent to all the other VMs belonging to the same logical switch, specifically for three types of layer-‐2 traffic:
• Broadcast • Unknown Unicast • Multicast
Note: These types of multi-‐destination traffic types may be referred to using the acronym BUM (Broadcast, Unknown unicast, Multicast). In an NSX deployment with vSphere, there should never be a need to flood unknown unicast traffic on a given logical network since the NSX controller is made aware of the MAC addresses of any actively connected VM.
For these three scenarios, traffic originated by a given ESXi host must be replicated to multiple remote hosts (hosting other VMs part of the same logical network). NSX supports three different replications modes to enable multi-‐destination communication on VXLAN backed logical switches – unicast, hybrid and multicast. By default a logical switch inherits its replication mode from the transport zone, however this can be overridden.
segments (only one remote segment is shown in this example).
Figure 26 : Unicast Mode Logical Switch