Scenario 2: Deploy VM2 with affinity to Site B
3. Based on the selected storage reservation policy, Federation Enterprise Hybrid Cloud workflows programmatically determine that Site B is the preferred location,
and therefore locates the virtual machine DRS affinity group corresponding with Site B, namely SiteB_VMs.
4. The expected result is:
a. VM2 is deployed into SiteB_VMs, meaning it resides on hosts CL1-H3 or CL1H4. b. VM1 is deployed onto a datastore from the SiteB_Preferred_CA_Enabled storage
reservation policy. For example:
VPLEX_Distributed_LUN_SiteB_Preferred_01 or VPLEX_Distributed_LUN_SiteB_Preferred_02
There must be at least one virtual array for each site. By configuring the virtual arrays in this way, ViPR can discover the VPLEX and storage topology. You should carefully plan and perform this step because it is not possible to change the configuration after resources have been provisioned, without first disruptively removing the provisioned volumes.
ViPR virtual pools for block storage offer two options under High Availability: VPLEX local and
VPLEX distributed. When you specify local high availability for a virtual pool, the ViPR storage provisioning services create VPLEX local virtual volumes. If you specify VPLEX distributed high availability for a virtual pool, the ViPR storage provisioning services create VPLEX distributed virtual volumes.
To configure a VPLEX distributed virtual storage pool through ViPR:
Ensure a virtual array exists for both sites, with the relevant physical arrays associated with those virtual arrays. Each VPLEX cluster must be a member of the virtual array at its own site only.
Before creating a VPLEX high-availability virtual pool at the primary site, create a local pool at the secondary site. This is used as the target virtual pool when creating VPLEX distributed virtual volumes.
When creating the VPLEX high-availability virtual pool on the source site, select the source storage pool from the primary site, the remote virtual array, and the remote pool created in Step 2. This pool is used to create the remote mirror volume that makes up the remote leg of the VPLEX virtual volume.
Note: This pool is considered remote when creating the high availability pool because it belongs to VPLEX cluster 2 and we are creating the high availability pool from VPLEX cluster 1.
Figure 20 shows this configuration, where VPLEX High Availability Virtual Pool represents the VPLEX high-availability pool being created.
ViPR virtual arrays
Figure 20. Interactions between local and VPLEX distributed pools
As described in Site affinity for virtual machines, Federation Enterprise Hybrid Cloud workflows leverage the ‘winning’ site in a VPLEX configuration to determine which site to map VMs to. To enable active/active clusters, it is therefore necessary to create two sets of datastores – one set that will win on Site A and another set than will win on Site B. To enable this, you need to configure an environment similar to Figure 20 for Site A, and the inverse of it for Site B (where the local pool is on Site A, and the high availability pool is configured from Site B).
VPLEX uses consistency groups to maintain common settings on multiple LUNs. To create a VPLEX consistency group using ViPR, a ViPR consistency group must be specified when creating a new volume. ViPR consistency groups are used to control multi-LUN consistent snapshots and have a number of important rules associated with them when creating VPLEX distributed devices:
All volumes in any given ViPR consistency group must contain only LUNs from the same physical array. As a result of these considerations, the Federation Enterprise Hybrid Cloud STaaS workflows create a new consistency group per physical array, per vSphere cluster per site.
All VPLEX distributed devices in a given ViPR consistency group must have source and target backing LUNS from the same pair of arrays.
As a result of these two rules, it is a requirement of the Federation Enterprise Hybrid Cloud that an individual ViPR virtual pool is created for every physical array that provides physical pools for use in a VPLEX distributed configuration.
Federation Enterprise Hybrid Cloud STaaS workflows use the name of the ViPR virtual pool chosen as part of the naming for the vRealize Storage Reservation Policy (SRP) that the new datastore is added to. The Virtual Pool Collapser (VPC) function of Federation Enterprise Hybrid Cloud collapses the LUNs from multiple virtual pools into a single SRP.
The VPC function can be used in the scenario where multiple physical arrays provide physical storage pools of the same configuration or service level to VPLEX, but through ViPR and VPLEX
consistency groups interaction Virtual Pool Collapser function
different virtual pools, and where required to ensure that all LUNS provisioned across those physical pools are collapsed into the same SRP.
VPC can be enabled or disabled at a global Federation Enterprise Hybrid Cloud level. When enabled, the Federation Enterprise Hybrid Cloud STaaS workflows examine the naming convention of the virtual pool selected to determine which SRP it should add the datastore to. If the virtual pool has the string ‘_VPC-‘ in it, then Federation Enterprise Hybrid Cloud knows that it should invoke VPC logic.
Virtual Pool Collapser example
Figure 21 shows an example of VPC in use. In this scenario, the administrator has enabled the VPC function and created two ViPR virtual pools
GOLD_VPC-000001, which has physical pools from Array 1 GOLD_VPC-000002, which has physical pools from Array 2
When determining how to construct the SRP name to be used, the VPC function will only use that part of the virtual pool name that exists before ‘_VPC-‘. In this example that results in the term ‘GOLD’ which then contributes to the common SRP name of SITEA_GOLD_CA_Enabled. This makes it possible to conform to the rules of ViPR consistency groups as well as providing a single SRP for all datastores of the same type, which maintains abstraction and balanced datastore usage at the vRealize layer.
Figure 21. Virtual Pool Collapser example
In the example shown in Figure 21, all storage is configured to win on a single site (Site A). To enable true active/active vSphere Metro Storage clusters, additional pools should be configured in the opposite direction, as mentioned in Continuous availability storage considerations.
Note: For the DA release, you should only create a single virtual pool of type VPLEX distributed, and this pool should only contain physical pools from a single array. The GA release supports multiple VPLEX distributed pools, and provides a function that allows all LUNs provisioned from these pools to be consolidated into a single vRealize Automation storage reservation policy, if required.
VPLEX distributed storage is provisioned to the Workload vSphere clusters in the
environment using the Federation Enterprise Hybrid Cloud catalog item named Provision Cloud Storage.
As shown in Figure 20, these VPLEX volumes can be backed by VMAX, VNX, or XtremIO arrays.
Storage provisioning
Note: The Federation recommends that you follow the best practice guidelines when deploying any of the supported platform technologies. The Federation Enterprise Hybrid Cloud does not require any variation from these best practices.
The workflow interacts with both ViPR and vRealize Automation to create the storage, presents it to the chosen vSphere cluster, and adds the new volume to the relevant vRealize storage reservation policy.
As with the single-site topology, vSphere clusters are made eligible for storage provisioning by ‘tagging’ them with vRealize Automation custom properties. However, in this case they are defined as CA Enabled clusters, that is, they are part of a vMSC that spans both sites in the environment. This tagging is done during the installation and preparation of vSphere clusters for use by the Federation Enterprise Hybrid Cloud using the CA Cluster Onboarding
workflow provided as part of the Federation Enterprise Hybrid Cloud self-service catalog. As local-only vSphere clusters can also be present in CA topology, the Provision Cloud Storage
catalog item will automatically present only ViPR VPLEX distributed virtual storage pools to provision from when you attempt to provision to a CA-enabled vSphere cluster.
This model provides no resilience/recovery for the cloud management platform. To enable this you should use the CA dual-site/single vCenter variant.
As all of the management pods reside on vMSC, management components are recovered through vSphere HA mechanisms. Assuming the VPLEX Witness has been deployed in a third fault domain, this should happen automatically.
The primary option for backup in a dual-site/single vCenter topology is the Redundant Avamar/single vCenter configuration though the Standard Avamar configuration may also be used if backup is only required on one of the two sites. Both options are described in
Chapter 7. Standard dual- site/single vCenter topology CA dual- site/single vCenter topology Dual-site/single vCenter topology backup
This chapter presents the following topics:
Overview ... 48 Standard dual-site/dual vCenter topology ... 48 Disaster recovery dual-site/dual vCenter topology ... 49 Disaster recovery network considerations ... 50 vCenter Site Recovery Manager considerations ... 57 vRealize Automation considerations ... 59 Disaster recovery storage considerations ... 60 Recovery of cloud management platform ... 61 Best practices ... 62 Backup in dual-site/dual vCenter topology ... 62
This chapter describes networking and storage considerations for a dual-site/dual vCenter topology in the Federation Enterprise Hybrid Cloud solution.
The dual-site/single vCenter Federation Enterprise Hybrid Cloud topology may be used in either of the following scenarios:
Two sites are present that require management via independent vCenter instances and a single Federation Enterprise Hybrid Cloud management stack/portal.
This model has no additional storage or network considerations above the single- site/single vCenter model because each site has totally independent infrastructure resources with independent vCenters, but is managed by the same Federation Enterprise Hybrid Cloud management platform/portal.
Note: In this case, the scope of the term site is at the users’ discretion. This can be separate individual geographical locations, or independent islands of infrastructure in the same geographical location, such as independent Vblock platforms.
DR is required. This topology also requires that EMC RecoverPoint is available. Note: Typically this model is used when the latency between the two physical data center locations exceeds 10 ms.
The standard dual-site/dual vCenter Federation Enterprise Hybrid Cloud architecture controls two sites, each with independent islands of infrastructure, each using its own vCenter instance but controlled by a single Federation Enterprise Hybrid Cloud management platform/portal.
This architecture provides a mechanism to extend an existing Federation Enterprise Hybrid Cloud by adding additional independent infrastructure resources to an existing cloud, when resilience of the management platform itself is not required, but where the resources being added either already belong to an existing vCenter or it is desirable for them to do so. Figure 22 shows the architecture used for this topology option.
When to use the dual-site/dual vCenter topology
Figure 22. Federation Enterprise Hybrid Cloud standard dual-site/dual vCenter architecture
The DR dual-site/dual vCenter topology for the Federation Enterprise Hybrid Cloud solution provides protection and restart capability for workloads deployed to the cloud. Management and workload virtual machines are placed on storage protected by RecoverPoint and are managed from VMware vCenter Site Recovery Manager™.
This topology allows for multi-site resilience across two sites with DR protection for both the management platform and workload virtual machines on the surviving site. Figure 23 shows the overall architecture of the solution.
Figure 23. Federation Enterprise Hybrid Cloud DR dual-site/dual vCenter architecture
The Federation Enterprise Hybrid Cloud solution deploys a highly resilient and fault-tolerant network architecture for intra-site network, compute, and storage networking. To achieve this, it uses features such as redundant hardware components, multiple link aggregation technologies, dynamic routing protocols, and high availability deployment of logical networking components. The DR dual-site/dual vCenter topology of the Federation
Enterprise Hybrid Cloud solution requires network connectivity across two sites using WAN technologies. It maintains the resiliency of the Federation Enterprise Hybrid Cloud by implementing a similarly high-availability and fault tolerant network design with redundant links and dynamic routing protocols. The high-availability features of the solution, which can minimize downtime and service interruption, address any component-level failure within the site.
Throughput and latency requirements are other important aspects of physical network design. To determine these requirements, consider carefully both the size of the workload and data that must be replicated between sites and the requisite RPOs and RTOs for your Physical network
applications. Traffic engineering and QOS capabilities can be used to guarantee the throughput and latency requirements of data replication.
The DR dual-site/dual vCenter topology is supported on both the distributed management model and the collapsed management model. In the collapsed management model, the Automation Pod components must be on a different physical network to the Core and NEI Pod components so that they can be failed over using VMware Site Recovery Manager, and the Automation network re-converged without affecting the Core and NEI Pod components on the source site.
In a DR dual-site/dual vCenter topology, all NSX Controller components are duplicated with one full set residing on each sites corresponding NEI Pod. NSX best practice suggests that each controller be placed on separate physical hosts. When NSX is the chosen networking technology, this solution uses a minimum of three NSX controllers.
When using the Federation Enterprise Hybrid Cloud Sizing tool, appropriate consideration should be given to the choice of server specification for the NEI Pod to ensure efficient use of hardware resources, given that a three-server minimum will be enforced.
VMware Anti-Affinity Rules should be used to ensure that the NSX controllers reside on different hosts in optimum conditions.
Figure 24 shows how the various NSX components are deployed independently on both sites within the topology.
Figure 24. NEI Pods from the cloud vCenter Server instances on Site A and Site B NSX Manager
The Core Pods on Site A and Site B host distinct instances of NSX Manager. On each site, NSX Manager integrates with the vCenter Server instance on that site to provide network and security management for the site’s logical networking and security requirements. NSX controllers
NSX controllers are deployed in a high-availability architecture on the NEI Pod at each site and are managed by the local NSX Manager. NSX controllers provide the learning and forwarding of data packets to support virtual machine communication. The deployment of Requirements based on the management model Network Controller placement
NSX controllers helps to simplify the physical infrastructure and eliminates the need for multicast support in the physical network infrastructure to enable intra-VXLAN
communication. Perimeter NSX Edge
The Federation Enterprise Hybrid Cloud solution provides multitier security support and security policy enforcement by deploying NSX Edges as perimeter firewalls. An NSX Edge can be deployed at different tiers to support tiered security policy control. Each site's NSX Manager deploys corresponding NSX Edge Services Gateways (ESGs) configured for services such as firewall, DHCP, NAT, VPN, and SSL-VPN.
Logical switches
NSX provides logical networking support through logical switches corresponding to VXLAN segments. These logical switches support the extension of Layer 2 connections between various virtual machines and other networking components such as NSX Edges and logical routers. The use of VXLAN also increases the scalability of the solution.
For the DR for the Federation Enterprise Hybrid Cloud solution topology, transit logical switches are required on both sites to provide connections between the DLRs and NSX Edges, as shown in Figure 25 and Figure 26. Duplicate logical switches are also needed on both sites for use by the workload virtual machines.
Figure 26. Logical switches on Site B Distributed logical router
Three-tier applications are the most commonly deployed model in enterprises and can be used to demonstrate the network and security provisioning capabilities of NSX when integrated with vRealize Automation.
The web tier is external facing and load balanced, serving web pages to users. Each web server needs to communicate with the application server; the application server in turn writes to and retrieves data from the database server.
Where vRealize Automation provisions multimachine workloads to networks not created by vRealize Automation (that is, to a pre-provisioned deployment), the networks and router must be created before vRealize Automation can provision the virtual machines. The network adapters of the deployed virtual machines are connected to their respective DLR and an IP address is assigned using either Dynamic Host Configuration Protocol (DHCP) or, as in this solution, a static IP address.
The DLR provides the default gateway services for the virtual machines connected to the pre-provisioned application networks. The use of DLR optimizes the traffic flow and throughput for communication between the virtual machines of the multitier applications. Using the transit logical switch segment, the DLR provides a routed path to the ESG and thereby to the physical core for north/south traffic.
The DLR control virtual machine is deployed on the NEI Pod in high-availability mode. In this mode, two virtual machines are deployed on separate hosts as an active/passive pair. The active/passive pair maintains state tables and verifies each other's availability through heartbeats. When a failure of the active DLR is detected, the passive DLR immediately takes over and maintains the connection state and workload availability.
A DLR kernel module is deployed to each NSX-enabled Workload Pod host to provide east/west traffic capability.
To provide default gateway services on both sites, a corresponding DLR must be deployed on both sites, as shown in Figure 27.
Figure 27. DLR interfaces on Site A and Site B
This Federation Enterprise Hybrid Cloud solution supports migration of virtual machines to the recovery site without the need to change the IP addresses of the virtual machines. Default gateways on each site are created using DLRs. By configuring the DLRs on both sites identically, the same IP addresses and IP subnets are assigned to their corresponding network interfaces, as shown in Figure 27. In this way, there is no need to reconfigure workloads default gateway settings in a recovery scenario.
A dynamic routing protocol is configured for the logical networking and is integrated with the physical networking to support dynamic network convergence and IP mobility for the
networks (subnets) supported for DR. This approach simplifies the solution and eliminates the need to deploy additional services to support IP address changes.
A prefix configured on the DLR specifies the subnets of directly connected public networks. The DLR can also support private networks; these networks are reachable only within the DLR, with access prohibited from outside the DLR networks.