VMware vsphere Validated Reference Architecture for ThinkServer

(1)

VMware vSphere Validated Reference

Architecture for ThinkServer

Lenovo rack servers, Extreme Networks switches, and Dot Hill

storage

Lenovo Enterprise Product Group

Version 1.0 May 2013

(2)

LENOVO PROVIDES THIS PUBLICATION “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. This information could include technical inaccuracies or typographical errors. Changes may be made to the information herein; these changes will be incorporated in new editions of the publication. Lenovo may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

The following terms are trademarks of Lenovo in the United States, other countries, or both: Lenovo, ThinkServer.

Intel and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. VMware, vMotion, vFabric, VMware vCloud, vCloud Director, vCenter, vSphere, Site Recovery Manager, and ESXi are registered trademarks or trademarks of VMware, Inc. in the United States and other jurisdictions.

AssuredSAN, AssuredCopy, AssuredSnap, AssuredRemote, and RAIDar are trademarks of Dot Hill Systems Corp. in the United States and/or other countries.

ExtremeXOS® Summit® X670, Summit® X670V, Summit® X670, Summit® X440are trademarks of Extreme Networks, Inc. in the United States and/or other countries.

(3)

List of Figures

Figure 1 – The Cloud Solution Stack ... 6

Figure 2 – Two-Tiered Network Architectures ... 10

Figure 3 – Three-Tiered Network Architectures ... 10

Figure 4 – Integrated Cloud Stack ... 18

Figure 5 – Features of VMware vSphere Platform with vCenter Operations Management ... 19

Figure 6 – Logical Architecture ... 22

Figure 7 – Network Topology ... 23

Figure 8 – RD630 Port Layout ... 24

Figure 9 – RD330 Port Layout ... 24

Figure 10 – AssuredSAN Port Layout ... 25

Figure 11 – ISC Ports ... 26

Figure 12 – VLAN Configuration Commands ... 27

Figure 13 – Jumbo Frames and MLAG Configuration Commands ... 28

Figure 14 – QOS Configuration Commands ... 28

Figure 15 – Storage Layout ... 28

Figure 16 – Distributed Virtual Switch topology ... 30

Figure 17 – Resource Pools ... 31

Figure 18 – vCenter Operations manager Dashboard ... 35

Figure 19 – IT Operations Manager Dashboard ... 35

Figure 20 – AssuredSAN RAIDar Storage Management Utility Console ... 36

List of Tables

Table 1 – Server Configurations ... 14

Table 2 – RD630 Connections ... 24

Table 3 – RD330 Connections ... 25

Table 4 – AssuredSAN Network Connections ... 25

Table 5 – Network Traffic Segmentation ... 26

Table 6 – Cluster1 Properties ... 29

Table 7 – Distributed Virtual Switch Virtual Port Groups ... 31

(5)

1.0 Introduction

1.1 Solution Overview

Server virtualization is well understood today, and as a mature technology, it can lead to a more efficient and cost-effective data center, and is an organization’s first step on a journey to a private cloud. An organization may begin using virtualization to support server consolidation, and by increasing hardware resource utilization, help to reduce capital expenditure (CAPEX) and related facilities cost (OPEX).

Virtualization provides many benefits, and can make it faster and easier to deliver IT capabilities to users, however, virtualization can create new complexities, and this puts more pressure (and opportunity) on IT organizations to use it more efficiently. Large virtualized environments can introduce a number of business and technology issues such as the proliferation of virtual machines, with the resulting management burden and the effects on networks.

Business needs will also drive the IT infrastructure to become more agile, able to respond to changes in demand for resources, more-rapid provisioning, greater scalability, and assured availability of those resources. As a result, IT organizations will always be pressured to enable greater business agility, while becoming more efficient.

The use of virtualization necessarily changes how IT is acquired, managed, and used. Many of the technical and business challenges that come from large-scale virtualization projects can be met with cloud technologies.

Cloud computing is simply the on-demand use of a shared pool of scalable and elastic computing resources (hardware and software) that are delivered as a service over a network. Expectations are that these computing resources can be rapidly provisioned and released on demand as a service, perhaps tied to a particular business model (e. g. PaaS, IaaS, or SaaS), with minimal IT management effort or interaction. Cloud technologies provide the automation, management, orchestration, and provisions for new business models for the use of virtualized environments. Embracing the cloud is a logical, but significant step in data center evolution. Moving to implement cloud IT infrastructures in an organization is a significant step even for the most mature IT organizations. Many of the technologies used in cloud deployments are still evolving, and can often require custom tools. The journey to a more capable, flexible, agile data center can be made by making deliberate, incremental steps that make progress towards a longer-term strategy of a private cloud. A roadmap must exist to ensure that investments made along the way, build upon each other and are not wasted.

A cloud, however, always starts with a well-managed, virtual infrastructure. The best designed and delivered virtualization platform today is also the cloud ready data center of the future and

(6)

What we seek is a virtualization solution that provides a best of breed, shared, virtual infrastructure, that enables the automation of creating and provisioning the virtual environment, and manages the environment with as little human effort or involvement as possible. This enables an organization to take full advantage of the benefits of a virtualization while also providing the cloud ready foundation for the cloud management and operations capabilities that can be added later when the business is ready. The solution must also protect cloud ready investments made, and leverage them as an organization’s journey to the cloud progresses.

Figure 1 – The Cloud Solution Stack

Lenovo has partnered with VMware to leverage the virtualization and management capabilities of VMware’s vCloud Suite of products to develop a reference architecture that demonstrates the solution objectives, shows how the solution can integrate into existing infrastructures, and scales to meet expansion needs. VMware's vision for the data center is synchronized with the market trends and directions for private cloud computing. VMware is ideally suited when a proven solution is important, uptime is critical, large-scale automation is valued, and cloud computing is desired in the future.

Specifically, this reference architecture delivers on the elements required for the Cloud Ready Infrastructure as shown in Figure 1, and provides:

 A virtualized, shared environment

 Automation and management to reduce complexity and simplify operations and maintenance

 A solution with dramatically lower costs enabling investment to be redirected into other value-add opportunities

(7)

 A building block approach that enables flexible, agile IT service delivery to meet and anticipate needs of the business

The following sections describe the solution in detail.

1.2 Document Scope and Purpose

This document provides a reference architecture, and guidance for deploying and configuring VMware virtualization solutions using Lenovo third generation ThinkServer servers, Extreme Networks switches, and Dot Hill storage products. The purpose is to demonstrate the

functionality, flexibility, and scalability of a virtual infrastructure enabled by VMware vSphere. The document describes best practices and design considerations for the solution based on the components selected, and can be used to plan, evaluate, or procure the components needed to replicate and extend the solution in a customer location.

Our solution recommendations are based on knowledge of data center technology trends, as well as issues that confront businesses operating their own data centers, but are not intended to be a comprehensive guide to every aspect of the solution.

1.3 Audience

This document is prepared for the benefit of IT administrators, IT managers, and channel partners who plan to evaluate, plan, or deploy Lenovo virtualization solutions.

2.0 Solution Objectives

2.1 Principles

To fulfill the vision for the cloud ready data center of the future, the solution should have the following characteristics:

Optimized Virtual Resources – All compute, storage, and networking resources must together be made available as a common pool of resources shared by many applications. Effective use of these pooled resources dictates that they can be dynamically configured, or reallocated to match any changing performance, throughput, or capacity needs of the individual applications. Resource optimization drives efficiencies and infrastructure cost savings, but also provides the foundation that enables IT to be more flexible when responding to business requests, ultimately improving business speed and agility.

Resilient – A key principle of the data center is to provide highly available services, however, using new technologies and methods, high availability can be achieved at a lower cost through resiliency rather than component redundancy. High availability is traditionally achieved by

(8)

prevention. It is expected and accepted that components will fail, and instead, minimizes the impact of a failure by rapidly restoring the application when a failure occurs. Virtualization technologies, real-time fault detection, and automated responses to issues, move workloads off the failing components, often without any perceived impact. Resiliency moves the responsibility for high-availability from hardware to software. This reduces the cost for physical redundancy for hardware and facilities, and increases application availability by reducing the impact of component and system failures.

Automated – The normal operations of the data center should have as little human involvement as possible. First, resiliency cannot be achieved without automatic detection and response to failure conditions. Operational management tasks must happen without intervention to maintain high-availability. In addition, requests for more IT resources or a new application can be automated so that provisioning and configuration can be handled based on the needs of the application. Automation can also elastically add or reduce capacity as workloads require. Automation increases the productivity of IT administrators, and simplifies management by performing tasks, that in legacy environments may have been done by multiple administrators operating on their individual piece of the solution stack. Finally, automation is a foundational element critical to cloud infrastructures where even more details of the environment will necessarily be handled automatically (e.g. billing, metering, chargeback for consumption, etc.). Open – An infrastructure based on open, industry standard technologies ensures that previous investments made in existing IT infrastructure can be protected. Industry standards enable the incremental adoption of new technologies, and the addition of new components for growth, only as business needs dictate. Openness also enables integration into heterogeneous

environments without fear of vendor lock-in. The products used in this reference architecture are built on open standards so that solutions developed with them will work with existing infrastructures and provide investment protection for the future.

Ease of Deployment – Hardware and software elements of the solution must be easy to install and configure. Management tools and best practices simplify and automate many of the tasks involved with deployment, and reduce the incidence of human error, which is a significant cause of downtime in a data center.

Scalable – The solution should scale to increase performance and capacity by simply adding additional components. New components should be discovered and provisioned automatically, and added to the shared resource pools. In addition, scale should be achieved with a “pay as you grow” approach that ensures investments are only made as needed, avoiding large capital expenditures in advance for capabilities that cannot be presently utilized.

(9)

3.0 Architecture

3.1 Architecture Overview

This section describes some of the key architectural decisions made, and how the architecture meets the business challenges and design goals described earlier.

Resiliency

Resiliency relies on VMware technologies, and management policies that provide high availability and fault tolerance for the compute, storage, and network resources in the environment. If a potential mission critical point of failure cannot be recovered in software, then redundancy is used to insure there is no single point of failure.

Network Architecture

The reference architecture leverages a converged network topology in which server data, and storage data share a common infrastructure. A network of this design has many advantages in simplified management, scalability, and significant cost savings for essentially the same, if not better performance as physically separate data and storage networks.

In addition, virtualization changes data traffic patterns where data no longer moves primarily between servers inside, and clients outside of the data center, but instead moves horizontally between virtual machines within a server, between servers and storage within a rack, and across racks within the data center network. To support low latency, high bandwidth applications in virtual environments with converged data and storage networks, the reference architecture uses a flat, layer 2 network with layer 3 routing in the core network. This network architecture creates a scalable fabric, and provides the greatest flexibility and support for VMware vMotion.

(10)

This architecture consists of two kinds of switches; the first are “access” or “Top of Rack” (ToR) switches that connect servers and storage, and the second are “core” switches that connect ToR switches, creating a non-blocking, low-latency fabric (see Figure 2).

Figure 2 – Two-Tiered Network Architectures

In comparison, a 3-tier architecture adds an additional layer of switches. It includes ToR switches that connect servers and storage, and connect to distribution switches, just as in a 2-tier design. However, in the 3-2-tier architecture, the distribution switches are connected to core switches that forward traffic from servers to intranet and internet, and between core and distribution switches (see Figure 3).

(11)

A 2-tiered network architecture reduces complexity and simplifies design, improves

performance, and reduces latency in virtualized environments. It also requires fewer switches providing an associated savings in capital cost and operational expense.

These network architecture decisions require close attention to Ethernet bandwidth, switch performance, and network routing. The reference architecture employs 10 GbE with support for the use of jumbo frames to support both storage and data traffic. 10 GbE also insures adequate bandwidth is available for burst traffic, vMotion, Storage vMotion, and the logging traffic between primary and secondary VMs when using VMware Fault Tolerance (a VMware best practice). The Ethernet fabric also requires lossless Ethernet to assure integrity of storage transport between the servers and storage.

Storage Architecture

Several shared storage technologies are available for use in data center solutions such as Networked Attached Storage (NAS), iSCSI SAN, or Fiber Channel SAN. This reference architecture employs iSCSI, an Internet Protocol (IP)-based storage area networking (SAN) standard. iSCSI provides the required functionality and performance at a lower cost than most alternatives.

IP based storage, iSCSI in particular, avoids the cost, additional complexity, and compatibility issues associated with Fibre Channel SANs. iSCSI is also routable, and can be run over long distances using existing network infrastructure, which improves flexibility and simplifies scale-out issues.

10 Gb/s Ethernet fabric makes iSCSI an even more viable storage solution for converged networks, with performance comparable to a Fibre Channel SAN operating at 8 Gb/s. On the server side, high performance adapter cards can fully offload the protocol management from the server CPU to an iSCSI HBA or Network Interface Card. iSCSI also supports centralized boot management, which is useful when deploying large numbers of servers into an IT environment. iSCSI operation does not require, but can be enhanced by the use of the standards-based extensions to Ethernet called Data Center Bridging (DCB). DCB provides QoS and bandwidth management capabilities that help enable the convergence of data and storage traffic onto a single unified fabric.

iSCSI storage is a great choice for greenfield deployments where it can be employed as a dedicated storage resource, or it can be integrated into existing infrastructures using Ethernet networks already in place.

(12)

Scale

The reference architecture is built using modular components that are based on open standards. This approach allows for easily adding capacity by adding additional compute, network or storage components.

A simple configuration could be built with no shared storage and few compute servers, but the reference architecture begins with sufficient resources to support the resiliency requirements. Integration into Existing Environments

While the reference architecture could be deployed as a greenfield installation, existing infrastructure is likely already in place. This solution can be integrated into existing 2-tier or 3-tier Ethernet networks. Existing management tools can be used and are compatible with the standards based hardware used.

Other Assumptions

The architecture assumes that the core network and routing are in place. Infrastructure services such as Directory Services, Domain Name Services (DNS), etc., are in place and available in the core network.

3.2 Solution Components

3.2.1 Compute – ThinkServers

Servers for virtualization require choosing the right combination of features to support performance and power efficiency, at the right price.

The Lenovo ThinkServer RD630 and RD330 mainstream rack servers provide powerful new choices to meet demanding enterprise needs, while offering outstanding value.

ThinkServer RD630

The Lenovo ThinkServer RD630 rack server is chosen for the virtualization hosts for its high performance for demanding workloads, advanced I/O capabilities, storage and networking options, systems and power

management capabilities, reliability, and power efficiency.

The Lenovo ThinkServer RD630 is a 2U server that supports two Intel Xeon E5-2600 family processors with up to eight cores each, and 20 memory DIMMs to provide maximum performance and scaling for enterprise virtualization, databases, and compute intensive applications.

(13)

Additional storage, I/O and networking options are available to configure the server. ThinkServer RD330

The Lenovo ThinkServer RD330 rack server is used to host the solution’s management components. It provides high performance, scalable I/O and storage options, systems and power management capabilities, reliability, and power efficiency.

The Lenovo ThinkServer RD330 provides an outstanding value in a 1U server, with a rich set of features delivered at an

attractive price. Supporting two of the latest generation Intel Xeon E5-2400

processors with up to 8 cores each, and 12 memory DIMMs, the ThinkServer RD330 provides enterprise-grade performance for virtualization.

Flexible storage and networking configurations enable the server to be deployed in many environments, and scale to demand as needed.

Common to all ThinkServers are thoughtful product designs that deliver the world-class power efficiency and reliability demanded in virtualized infrastructures.

The power-efficient ThinkServers includes Lenovo ThinkServer Smart Grid Technology based on Intel Power Node Manager® technology. SmartGrid provides sophisticated, policy-based, dynamic power capping tools to help monitor and intelligently manage power utilization to changing server workloads for a server or group of servers via a central console. This power management technology helps to reduce energy consumption and infrastructure costs. Additionally, the ThinkServer RD630 is Energy Star 1.1 certified and Climate Savers certified. A diagnostic panel on the front of each server displays system health and status at a glance, and specifically identifies any failing components should that occur.

Finally, ThinkServers are backed by a suite of lifecycle tools to ease and simplify deployment, monitoring, maintenance, and power optimization. Systems management based on open standards and protocols mean easy integration into any existing environment.

3.2.2 Server Specifications

Table 1 provides quantities and specific configurations for the servers used in the reference architecture. A minimum of three servers required by VMware for VMware High Availability and this provides a minimum of n+1+1 redundancy (one additional host is available in the cluster to maintain HA whenever operational maintenance is performed, effectively removing the host

(14)

Modification of the solution to meet specific needs can be accomplished by adjusting the server configurations, or by adding more servers as needed. The factors most likely to be modified to scale the solution include:

 Scale up additional CPU and memory capacity by adding to the number of VMware ESXi hosts

 Support more, or larger VM’s by increasing the amount of memory for each VMware ESXi host

 Increase processing bandwidth for VM’s by raising the performance and power rating of the processors in each VMware ESXi host.

 Increase greater network IOPs by increasing the number of NIC ports, or the bandwidth of the ports in each VMware ESXi host.

Table 1 – Server Configurations

Management Server ESXi Host Servers Platform ThinkServer RD330 ThinkServer RD630

Quantity 1 3

Operating System ESXi 5.1 ESXi 5.1

CPU 2x Xeon E5-2420 2x Xeon E5-2665

Memory 48 GB 1600 MHz DDR3 64 GB 1600 MHz DDR3

RAID Controller RAID 700 RAID 700

HDD / RAID Level 2x 600GB SAS RAID 1 2x 500GB SATA RAID 1 Network

Controllers

2x Intel I350 onboard Intel 32574L onboard

2x Intel X540T2 2x Intel I350 onboard Intel 32574L onboard Platform

Management TMM Premium TMM Premium

Power Supplies 2x 550W PSUs 2x 800W PSUs

3.2.3 Network – Extreme Networks Switches

In the converged, 2-tier network architecture, Ethernet switch performance is of extreme importance. Switches must provide low latency, high throughput, and high density while minimizing power consumption. To optimize for converged data and storage, support for QoS, and Data Center Bridging (DCB) protocols including Priority-based Flow Control (PFC or IEEE 802.1Qbb) should also be available.

Network resiliency should be as efficient as possible, avoiding bandwidth limitations by routing network traffic around bottlenecks, reducing the risks of a single point of failure, and allowing load balancing across multiple switches.

The network solution must be interoperable with existing 2-tier or 3-tier architectures without requiring a different operating methodology, or requiring a “forklift upgrade.”

(15)

Additionally, as the data center evolves with new technologies or cloud infrastructures, the network components should support these trends. Switches that support 10 Gb/s, and 40 Gb/s protect investments made today.

The network should also allow for the adoption of emerging protocols that are being developed for cloud infrastructures such as Trill and SPB-M, and technologies such as Software Defined Networking (SDN) which is the ability to program network interfaces, enabling a high degree of automation in provisioning network services.

There are two types of switches used in the reference architecture. The Summit X670 is a 48-port 10 GbE switch used for connecting compute, storage, and upstream connectivity to the core network. The Summit X440 is a 48-port 1 GbE switch used for management of the virtualized environment.

Summit X440

The Summit X440 series switches provide high bandwidth, non-blocking Layer 2 to Layer 4 functionality on 8, 24, or 48 Ethernet ports delivering high-density, low latency Gigabit Ethernet connectivity using fixed 10/100/1000BASE-T ports. Additional port types are also supported on various models. The X440 provides redundant power supplies, and comprehensive security features, in a cost effective, small access switch. All Summit X440 models can be stacked with up to eight units in a stack.

Summit X670

The Summit X670 series switches are purpose-built top-of-rack switches designed to support 10 Gigabit Ethernet networks with optional, future-proofing 40 GbE uplinks. The Summit X670 series provides high density Layer 2/3 switching with low latency cut-through switching, and IPv4 and IPv6 unicast and multicast routing to enable enterprise aggregation and core backbone deployment.

The Summit X670 series is available in two models – Summit X670V and Summit X670. Summit X670V, used in the reference architecture, supports up to 64 ports in one system and 448 ports in a stacked system using high-speed SummitStack-V160, which provides 160 Gb/s throughput and distributed forwarding.

The switch can optionally support an additional four 40 GbE QSFP+ ports providing configuration options for either four dedicated 40 GbE uplink ports, or each port can be independently

configured as 40 Gigabit Ethernet or 4 x 10GbE links.

Summit X670 series switches also have specific features to support virtualized data centers, including support for M-LAG, Data Center Bridging features including PFC, Enhanced

(16)

Common to all Extreme switches is the ExtremeXOS modular operating system. The high availability ExtremeXOS operating system provides simplicity and ease of operation through the use of one OS everywhere in the network.

The switches are also energy efficient and include highly efficient power supplies to reduce power consumption and heat in the data center.

3.2.4 Network Specifications

Data Network

Two Summit x670V-48t switches provide the converged network connectivity with 40 Gb/s VIM4 expansion modules installed. Each switch provides forty-four 10GBASE-T copper ports for server connectivity, and four SFP+ ports with 10GBASE-SR SFP+ optical modules for Storage connectivity. Each switch also has four 40 Gb/s QSFP ports, two of which will be aggregated to form an 80 Gb/s Inter-Switch Link (ISC) for redundancy. The remaining two 40 Gb/s ports are available to either increase the ISC’s capacity, or provide external connectivity to the core network.

In this configuration, each switch pair accommodates up to 22 servers and 2 storage appliances. This configuration provides 2n redundancy for the server and storage traffic.

Scaling the solution to accommodate more compute servers or storage devices is accomplished by simply adding more Ethernet ports (additional switches) when required. Switches must be added in pairs in order to provide the required redundancy. The two “unused” 40 GbE ports can be used to interconnect the pairs or aggregate to another pair.

Management Network

A separate management network is enabled with a Summit x440 switch. This switch provides 48 10/100/1000BASE-T ports. Each server and storage appliance will connect to one port in the switch. This is a less expensive alternative where performance requirements are not as great. In addition, the management network is not mission critical so redundancy is not required, but a redundant switch could be added if desired.

3.2.5 Storage – Dot Hill Storage

Storage is a critical component of the virtualization infrastructure. High levels of performance and availability are required to maintain application SLA’s running in the virtual machines. The storage platform should be able to deliver high performance, while using its capacity efficiently, and scaling easily.

Storage must also be modular and scalable to enable purchasing only what is needed, when it is needed. It should be possible to add additional drives to increase capacity or performance, without disrupting operation.

(17)

The storage subsystem must also expose no single point of failure, and failed components must be replaceable without interruption of service.

The Dot Hill AssuredSAN family of products are designed for high availability computing environments, with redundant, active/active processing components, RAID protected storage, multiple data paths, and built-in snapshot and replication features. In addition, AssuredSAN systems offer an assortment of drive and interface options, providing custom solutions to a wide variety of applications. Finally, the AssuredSAN storage system is architected for speed. Every system is optimized for high performance delivery of data regardless of the workload.

The AssuredSAN 3420 supports 24 small form-factor drives (SSD, SAS and SATA), and up to seven expansion units per system. The RAID controllers support RAID 0, 1, 3, 5, 6, 10, and 50. Redundant hot swap components (power supplies, controller modules, and hard disk drives) enable the ability to replace failed components without interruption of service.

Network connectivity for management and monitoring is provided through a 100 Gb/s Ethernet interface on each controller, thereby providing a redundant connection.

The AssuredSAN is ALUA compliant and compatible with standard VMware multi-pathing drivers. Advanced data protection features include array-to-array replication and snapshot solutions.

3.2.6 Storage Specifications

The reference architecture uses a Dot Hill AssuredSAN model 3420 populated with 24 600GB 10K RPM drives.

Four 10 Gb/s Ethernet ports are provided for connection to the network. These are divided between the two fully redundant iSCSI controller modules that operate in active-active mode, and provide continuous data service in the event of a controller failure, or the loss of any data path.

The array can be expanded to enhance performance or increase capacity by connecting one or more expansion chassis to the main chassis. Expansion chassis can be attached without service disruption. This expansion does not require additional connections to the iSCSI storage

network, and the new resources are presented to VMware through the existing iSCSI targets. Up to seven expansion chassis can be attached to a single RAID chassis.

This kind of expansion works well when the performance of the storage subsystem is adequate (i.e., adequate IOPS, sufficiently low command response times). However, if the load is

approaching the limitation of the pair of RAID controller modules, performance can be increased by attaching a new RAID chassis to the storage network. Its 10 Gb/s ports will provide additional

(18)

3.2.7 VMware Virtualization Software

VMware vSphere is the industry-leading virtualization platform for building cloud infrastructures. vSphere enables IT organizations to meet SLAs for the most demanding business-critical applications, at the lowest TCO.

VMware now offers vSphere with vCenter Operations Management, combining the virtualization platform with VMware’s award winning management capabilities. This new solution enables vSphere customers to gain operational insight for improved availability and performance while also optimizing capacity.

The vSphere solution is the key enabler for cloud computing architectures. As an enterprise transitions to cloud business models for operating the data center, additional VMware

components can be installed from the vCloud Suite. vCloud Suite is a new offering comprised of several components that form the complete, integrated cloud infrastructure. As shown in Figure 4, the vCloud Suite is based on vSphere Enterprise Plus, and adds the following components:

 vCloud Director & vCloud Connector for Infrastructure-as-a-Service and Hybrid Cloud Connectivity

 vCloud Networking & Security for Cloud networking and security

 vCenter Operations Management Suite for Automated Operations Management

 vFabric Application Director for Cloud-enabled Application Provisioning

(19)

The reference architecture uses vSphere Enterprise Plus with vCenter Operations Manager to provide the capabilities required of the solution, and to provide the foundation for cloud ready infrastructure. Figure 5 depicts the features available with this software, and a brief description of key features exploited in the architecture is provided below.

Figure 5 – Features of VMware vSphere Platform with vCenter Operations Management

VMware ESXi is the virtualization software that runs on the physical servers and abstracts processor, memory, networking, storage, and other compute resources into multiple virtual machines. ESXi is installed on the server’s local hard drive, or boots from remote storage. VMware vCenter Server is the central point for configuring, provisioning, and managing the virtualized IT environment. The reference architecture uses the Enterprise Plus license level. vCenter Operations Manager – vCenter Operations Manager is the key component of the vCenter Operations Management Suite, and provides comprehensive visibility and insights into the performance, capacity, efficiency, and health of your virtualized infrastructure.

(20)

enables virtual machines to be dynamically reallocated without application downtime when performing planned server maintenance.

Migration with Storage vMotion allows a running virtual machine's files or storage to be moved from one datastore to another without any interruption in the availability of the virtual

machine. This allows administrators, for example, to off-load virtual machines from one storage array to another to optimize performance, perform maintenance, reconfigure LUNs, resolve out-of-space issues, and upgrade VMFS volumes.

VMwareDistributed Resource Scheduler (DRS) – DRS helps manage a cluster of physical hosts as a single compute resource, and automatically allocates and load balances computing capacity dynamically across the cluster for virtual machines. When a virtual machine is assigned to a cluster, DRS finds an appropriate host on which to run the virtual machine, and ensures that the load across the cluster is balanced. As cluster conditions change (for example, load and

available resources), DRS migrates (using vMotion) virtual machines to other hosts as necessary. Distributed Power Management (DPM) – DRS includes DPM that enable a datacenter to

significantly reduce its power consumption. DPM throttles down inactive VMs to reduce energy consumption. If the resource demands of the running virtual machines can be met by a subset of hosts in the cluster, Distributed Resource Scheduler migrates the virtual machines to this subset and powers down the hosts that are not needed. This dynamic cluster right-sizing reduces the power consumption of the cluster without sacrificing virtual machine performance or

availability.

Storage Distributed Resource Scheduler – Storage DRS provides smart virtual machine placement and load balancing mechanisms based on I/O latency and storage capacity.

VMwareHigh Availability (HA) – VMware HA is a feature that enables quick, automated restart of virtual machines on a different physical server within a cluster if a host server fails. An HA agent on each physical host in the cluster maintains a heartbeat with the other hosts in the cluster. Loss of a heartbeat indicates a host has failed, and the process of restarting all affected virtual machines on other hosts is initiated. HA is configured centrally through vCenter Server, but after it is configured, it operates continuously and in a distributed manner on every ESX host without needing vCenter Server. Even if vCenter Server fails, HA can still successfully restart virtual machines.

VMwareFault Tolerance – When Fault Tolerance is enabled for a virtual machine, a secondary copy of the original (or primary) virtual machine is created. All actions completed on the primary virtual machine are also applied to the secondary virtual machine. If the primary virtual machine becomes unavailable, the secondary machine becomes active, taking over execution without service interruption or loss of data, and providing continuous availability for the application.

(21)

vNetwork Distributed Switch (vDS) – A distributed virtual switch (vDS) enables a significant reduction of on-going network maintenance activities while increasing network capacity. vDS allows virtual machines to maintain a consistent network configuration even as they migrate across multiple host servers. This simplifies network provisioning, administration, and

monitoring by treating the entire network as an aggregated resource that spans the entire data center.

Storage I/O Control – Storage I/O control enables storage I/O prioritization to be setup by allocating available I/O resources to virtual machines according to business needs.

Administrators can set congestion thresholds for I/O shares, and VMware will continuously monitor I/O load of a storage volume and dynamically allocating available /O resources accordingly.

Network I/O Control (NIOC) – Network I/O Control enables the prioritization of network access by continuously monitoring I/O load over the network, and dynamically allocating available I/O resources according to established business rules. NIOC can be used to ensure that latency-sensitive and critical traffic flows can access the network bandwidth they need, particularly in converged networks where VM and storage data traverse the same network. NIOC creates network resource pools and provides controls to ensure predictable network performance when multiple traffic types contend for the same physical network resources. NIOC is only supported with the vSphere Distributed Switch (VDS).

(22)

3.3 Logical View

The reference architecture begins with a shared cluster of compute servers that can be expanded by adding additional server nodes to the cluster. These ESXi hosts connect to the iSCSI storage array on the converged production network. Storage can be expanded by adding additional drives (expansion chassis) to the array, or by adding additional arrays to the network. Finally, the management server is connected to a physically separate network. In the reference architecture, the management server is not in a failover cluster, nor is it connected to redundant switches, but this could be added if desired. Infrastructure services are assumed to exist in the Core Network.

(23)

3.4 Physical View

Figure 7 depicts the physical network connections and the LAN topology.

X440-48T-10G RD630 AssuredSAN 3420 RD330 X670v-48T 40Gb/s Uplink 10Gb/s 10Gb/s Fiber Management ISC

Figure 7 – Network Topology

4.0 Deployment Guide

4.1 Physical (Rack & Cabling)

4.1.1 ESXi Host Servers

(24)

Figure 8 – RD630 Port Layout

Each RD630 is connected to the X670V and X440 switches according to Table 2.

Table 2 – RD630 Connections

Server Port Switch Port

RD630-1 0 X670v-1 44 RD630-1 1 X670v-2 31 RD630-1 2 X440 15 RD630-1 3 X440 31 RD630-1 4 X670-1 31 RD630-1 5 X670-2 44 RD630-1 TMM X440 9

RD630-2 0 X670v-1 43 RD630-2 1 X670v-2 30 RD630-2 2 X440 14 RD630-2 3 X440 30 RD630-2 4 X670-1 30 RD630-2 5 X670-2 43 RD630-2 TMM X440 8

RD630-3 0 X670v-1 42 RD630-3 1 X670v-2 29 RD630-3 2 X440 13 RD630-3 3 X440 29 RD630-3 4 X670-1 29 RD630-3 5 X670-2 42 RD630-3 TMM X440 7 4.1.2 Management Server

(25)

The RD330 is connected to the X440 switch according to Table 3.

Table 3 – RD330 Connections

Host Port Switch Port

RD330 0 X440 19

RD330 2 X440 13

4.1.3 iSCSI Storage

The AssuredSAN 3420 has two controllers A and B. Each controller has a management port connected to the X440 switch and two host ports a0, a1 and b0, b1 respectively, shown in Figure 10.

Figure 10 – AssuredSAN Port Layout

The storage connects to the switches according to Table 4.

Table 4 – AssuredSAN Network Connections

Storage Port Switch Port

A-Manage X440 6 B-Manage X440 5 A0 X670-1 47 (Fiber) B0 X670-1 46 (Fiber) A1 X670-2 47 (Fiber) B1 X670-2 46 (Fiber) 4.1.4 ISC Ports

On the back of the X670-1, the ports S1 and S2 are connected to the corresponding port on X670-2 with QSFP Active Fiber cables supporting 40Gb/s for the Inter-Switch Connection (see Figure 11).

(26)

Figure 11 – ISC Ports

4.2 Switching

The network supports several types of traffic, and they are segregated by VLANs. The VLANS and the traffic they support are summarized in Table 5.

Table 5 – Network Traffic Segmentation

Network Traffic VLANs Tag Subnet Description

Virtualization Management

Management Untagged 172.31.240.0/20 Traffic between VMware Clients and Hosts, and vCenter

and hosts

Includes out-of-band platform management

Fault Tolerance Logging

FTLogging 4 172.30.20.0/24 Traffic to synchronize Fault Tolerant primary and

secondary servers vMotion

Migration

vMotion 3 172.30.10.0/24 Supports traffic for moving Virtual Machines between

Hosts

Virtual Machines VM-Traffic1

VM-Traffic2 VM-Traffic3 100 200 300 172.20.10.0/24 172.20.20.0/24 172.20.30.0/24

Traffic to and from the Virtual Machines performing their business functions

iSCSI Storage 4090 172.30.0.0/24 iSCSI traffic

ISC ISC 4091 1.1.1.0/30 Traffic between the Summit X670-48T switches

To support this architecture, the switches are configured as outlined below. VM and VMotion VLANs will be provisioned on the same physical ports and can be configured with QoS

characteristics that will guarantee traffic based on requirements.

4.2.1 Management Switch

An Extreme Networks X440 switch is deployed to provide the management network. There is one VLAN “Manage” configured for ports 1-48, with an IP Address 172.31.241.1/20. Commands to configure the VLAN are below.

create vlan "manage"

configure vlan manage add ports 1-48 untagged

(27)

4.2.2 Production Network

Two Extreme Network X670V switches are deployed and connected per section 4.1. Configure VLANs per Table 5.

Example commands for configuring the VLANs, including applying the IP Addresses and enabling Jumbo Frames are provided below. Commands to configure MLAG peers, MLAG Ports, and QOS settings for the storage network are also defined.

The configuration commands below are for switch 1; the configuration commands for switch 2 are identical with the exception of the IP addresses and the MLAG Peer.

create vlan "FTLogging" configure vlan FTLogging tag 4 create vlan "isc"

configure vlan isc tag 4091 create vlan "storage"

configure vlan storage tag 4090 create vlan "vm-traffic1"

configure vlan vm-traffic1 tag 100 create vlan "vm-traffic2"

configure vlan vm-traffic2 tag 200 create vlan "vm-traffic3"

configure vlan vm-traffic3 tag 300 create vlan "vmotion"

configure vlan vmotion tag 3

configure vlan FTLogging add ports 17-32,49 tagged configure vlan isc add ports 49 untagged

configure vlan storage add ports 33-44 tagged configure vlan storage add ports 45-48 untagged configure vlan vm-traffic1 add ports 17-32,49 tagged configure vlan vm-traffic2 add ports 17-32,49 tagged configure vlan vm-traffic3 add ports 17-32,49 tagged configure vlan vmotion add ports 17-32,49 tagged

configure vlan Mgmt ipaddress 172.31.240.3 255.255.240.0 configure vlan isc ipaddress 1.1.1.1 255.255.255.252

(28)

configure jumbo-frame-size 9216 enable jumbo-frame ports 17-49 create mlag peer "x670-2"

configure mlag peer "x670-2" ipaddress 1.1.1.2 vr VR-Default enable mlag port 29 peer "x670-2" id 29

enable mlag port 31 peer "x670-2" id 31 enable mlag port 32 peer "x670-2" id 32

Figure 13 – Jumbo Frames and MLAG Configuration Commands

configure qosscheduler weighted-round-robin configure qosprofile QP8 weight 0

create qosprofile qp2

configure qosprofile qp2 weight 2 configure vlan storage qosprofile qp2

Figure 14 – QOS Configuration Commands

4.3 Storage

The Dot Hill AssuredSAN 3420 Storage array is configured by using a web browser or SSH to connect to the default IP address of 10.0.0.2 on controller A.

Volumes for virtual machine storage are created as follows (see Figure 15). The twenty-four physical drives are organized into two virtual disks (vdisks), one composed of 12 drives, the second of 10 drives. Two drives are configured as global hot spares that, in the event of a drive failure, would be used to rebuild the data on the failed volume. The vdisks are configured in a RAID 10 (striped and mirrored) geometry. This provides optimum performance as well as enhanced protection against drive failures. In this implementation, the AssuredSAN provides 6.6 TB of usable storage capacity presented to the ESXi server hosts as two datastores.

(29)

For each volume, the default mapping specifies all host ports. This is required to maintain failover capability.

If desired, access lists can be created to control access to the volumes by adding the IQN and CHAP configuration of all ESXi hosts that will use the volumes.

Network connections for data traffic are configured under System Settings->Host interfaces. This is also where the Jumbo Frames option is enabled.

The management network interfaces, SNMP Trap Settings, and SMI-S are configured to facilitate management of the AssuredSAN.

4.4 Virtualization management

Several features of the included components have complex configurations. The configurations for VMware vSphere vCenter Server, and VMware Auto Deploy are covered in more depth in sections 4.4.1 and 4.4.2

4.4.1 VMware vSphere vCenter Server

The foundation of the deployment is VMware vSphere vCenter Server. The vCenter server is configured as follows:

Cluster Cluster1 is configured with the following properties:

Table 6 – Cluster1 Properties

Option Configuration Setting

VSphere HA On

VSphere DRS On

VM Monitoring VM Monitoring Only

Datastore Heartbeating Select any of the cluster datastores

vSphere DRS Fully Automated, middle threshold

Power Management Automatic, middle threshold

Admission Control Enabled

Admission Control Policy 1 host failure tolerated

VM Restart priority Medium

Host isolation response Leave powered on

EVC Enable EVC for Intel Hosts

VMware EVC Mode Intel “Sandy Bridge” Generation

(30)

A Distributed Virtual Switch is configured with four uplinks to support the four 10GBase-t interfaces on the RD630 servers, and seven Distributed Virtual Port Groups as shown in Figure 16. The Distributed Virtual Port Groups are aligned with the VLANs outlined in section 4.2 with the exception of the ISC and management VLANS which travel on separate physical networks. Detailed configuration is shown in Table 7.

(31)

Table 7 – Distributed Virtual Switch Virtual Port Groups

Port Group Configuration/VLAN Teaming And Failover

dvSwitch-DVUplinks-96 4 Uplinks

FTLogging VLAN 4 Load Balancing: IP Hash

Active: dvUplink2, dvUplink3 Unused: dvUplink1, dvUplink4

iSCSI1 VLAN 4090 Active: dvUplink1

Unused: dvUplink2, dvUplink3, dvUplink4

iSCSI2 VLAN 4090 Active: dvUplink4

Unused: dvUplink1, dvUplink2, dvUplink3

vMotion VLAN 3 Load Balancing: IP Hash

Active: dvUplink3 Standby: dvUplink2

Unused: dvUplink1, dvUplink4

VM-Traffic1 VLAN 100 Load Balancing: IP Hash

Storage I/O Control (SOIC) is enabled for the cluster.

Network IO Control (NOIC) is enabled for the Distributed Virtual Switch, and each network resource pool can be assigned a shares value to prioritize the appropriate traffic as shown in Figure 17.

(32)

4.4.2 VMware Auto Deploy

The steps for successfully enabling VMware Auto Deploy include:

 Install the Auto Deploy feature on the vCenter server.

 Configure a Reference Host and Host Profile for application to Auto Deployed ESXi Hosts.

 Set the correct options on the DHCP for PXE boot and IP reservations

 Upload the boot files to the TFTP server.

 Configure the Auto Deploy Software Depot

 Enable rules for the correct assignment of Auto Deployed ESXi Servers Additional details are given below.

Reference Host – A VMware ESXi host was installed using the installation media and joined the vSphere vCenter server manually. The host interfaces are connected to the Virtual Distributed Switch according to Table 8.

Table 8 – Virtual Distributed Port Groups

Interface Uplink or Port Group IP Address vmnic0 dvUplink1 vmnic1 dvUplink2 vmnic4 dvUplink3 vmnic5 dvUplink4 vmk2 FTLogging 172.30.20.14 vmk3 iSCSI1 172.30.0.28 vmk4 iSCSI2 172.30.0.29 vmk5 vMotion 172.30.10.14 vmk6 vMotion2 172.30.10.24

An NTP server and time zone are configured, as well as the IMPI/ILO interface.

An iSCSI software adapter is configured with Port Groups iSCSI1 and iSCSI2 for network connections (see Table 7). The Dynamic Discovery Ports were configured with IP Addresses 172.30.0.6, 172.30.0.7, 172.30.0.8, and 172.30.0.9, each with port 3260.

A Host Profile named Lenovo-profile-1 has been created from the reference host. The following changes were made to the Host Profile.

 Software FCOE Configuration Activation profiles are removed as unnecessary with the current configuration

 Time zone confirmed to be Fixed with server IP 172.31.241.71, time zone US/Eastern

 Firewall opened for NTP client

(33)

 Coredump configured for fixed network coredump, interface vmk0, IP 172.31.241.91, port 6500

 Administrator password configured as fixed, password entered and confirmed

 Syslog configured with the following options

o Advanced configuration option->Syslog.global.loghost o Name of the option…: Syslog.global.loghost

o Value of the option: upd://v3rats-mgt-01.v3rats.com:514

 Stateless Cache enabled, first disk arguments: esx, local, overwrite VMFS volumes selected.

The profile is assigned to cluster Cluster1, and will be automatically applied to Auto Deployed ESXI servers.

DHCP – The DHCP server was configured with reservations for the ESXi Hosts network interface 2, and the DHCP options 66 Boot Server Name set to 172.31.241.72 (the TFTP server) and Option 67 Bootfile Name to udionly.kpxe.mvw.vmw-hardwired.

DNS – DNS is configured to allow for forward and reverse name resolution for the ESXI Hosts. TFTP – The DeployTFTPDeployment package files have been extracted and copied to the TFTP root directory

PowerCLI – The following PowerCLI commands are used to configure the Auto Deploy server so that the ESXi Servers are added to the correct vCenter server and cluster.

Connect-VIServer 172.31.241.91

Add-EsxSoftwareDepot C:\VMWare-Depot\VMware-ESXi-5.1.0-799733-depot.zip New-DeployRule -name "prod" -item "ESXi-5.1.0-799733-standard" -Pattern "ipv4=172.31.241.240-172.31.241.254"

New-DeployRule -name clusterrule -item cluster1 -pattern "ipv4=172.31.241.240-172.31.241.255"

Add-DeployRule prod Add-DeployRule clusterrule

The configuration above specified Stateless Caching, which ensures that if upon a reboot, the host cannot contact the PXE server or the vCenter server, it will boot to the last configuration according to rules specified in VMware documentation.

(34)

Configuration – The RD630 is configured in BIOS to boot from the network.

First Boot Configuration – When a host is booted with Auto Deploy the first time, the host profile must be applied. Applying the profile will start a wizard to prompt for the server specific information required. The resultant answer file will be stored with the Host Profile for future reboots.

Advanced implementations can apply VMware tools or scripting for additional automation.

5.0 Management

This section provides an overview of tools available to perform the various management tasks on the infrastructure.

Virtualization Management

VMware vCenter Server provides centralized management of ESXi host servers and virtual machines from a single console. VMware vCenter Server gives administrators deep visibility into the configuration of all the critical components of a virtual infrastructure, and enables the control of the advanced features provided, including HA and Fault Tolerance.

VMware vCenter Operations Manager provides a new and much simplified approach to operations management of vSphere, physical and virtual infrastructure. Using patented, self-learning analytics, vCenter Operations Manager provides operations dashboards to gain deep insights and visibility into health, risk and efficiency of the infrastructure, performance management and capacity optimization capabilities. vCenter Operations Manager enables administrators to:

 Gain comprehensive visibility into the health, risk and efficiency of your infrastructure and applications

 Proactively manage the health of vSphere, virtual machines and applications

 Spot potential performance bottlenecks early on and remediate before end users notice

 Right-size and reclaim overprovisioned capacity to increase consolidation ratios

 Manage thousands of virtual machines, physical servers and applications across multiple datacenters from a single console

 Automatically correlate and analyze monitoring data across infrastructure and applications silos to gain a holistic view of root cause and effect.

(35)

Figure 18 – vCenter Operations manager Dashboard

Performance Monitoring

VMware vCenter provides basic tools for performance monitoring and troubleshooting at the cluster and individual ESXi host server.

Infrastructure Health and Status Monitoring

Hitachi IT Operations Analyzer provides multiple dashboard views of all devices (servers,

storage, and networks) in the environment, and provides status on health and performance. An innovative Root Cause Analysis feature automatically displays the snapshot view of the root cause node and detailed information on the failure mechanism whenever a system fault or performance issue occurs.

(36)

Network Management

The Summit switches supports comprehensive network management through a command line interface (CLI), SNMP v1, v2c, v3, and the ExtremeXOS ScreenPlay™ embedded XML-based Web user interface.

Extreme Networks also offers a network element manager called Ridgeline that simplifies managing the network. Ridgeline is a scalable, full-featured network and service management tool that simplifies configuration, provisioning, troubleshooting and status monitoring of IP-based networks. Ridgeline is also virtualization aware, providing complete visibility of the virtual resources and reconfigures the network automatically to maintain QoS requirements for the VM as it is moved from server to server. Although Ridgeline is not part of the reference

architecture, it can be integrated seamlessly and easily as the environment grows. Storage Management

Storage management is performed using the RAIDar Storage Management Utility web based GUI in the AssuredSAN Storage array. This utility facilitates creating drive pools, RAID configurations, and volumes. A command line interface is also available for scripting series of configuration commands and machine-to-machine communication.

Storage performance and health status can be monitored by IT Operations Manager.

Figure 20 – AssuredSAN RAIDar Storage Management Utility Console

Out-of-Band Platform Management

Lenovo ThinkServers provide basic out-of-band management through the ThinkServer

(37)

 Remote systems management and monitoring

 Advanced features are enabled with the TMM Premium:

o remote Keyboard, video and mouse (KVM) for redirecting the server console over a LAN

o Remote virtual media that redirects local media devices to servers

 Platform events and alerts that indicate warnings or errors detected in the system hardware sent via Simple Network Management Protocol (SNMP) traps, or by email

 A remote web interface for configuring or monitoring the TMM

 Remote power control functionality

 Access to system event logs Hypervisor Maintenance

VMware vCenter Update Manager can be used to automate the patch management for ESXi hosts. vCenter Update Manager compares versions of installed software on the ESXi hosts as well as select guest operating systems, and automatically applies updates and patches.

VMware vsphere Validated Reference Architecture for ThinkServer