EMC VSPEX PRIVATE CLOUD:

(1)

EMC VSPEX PRIVATE CLOUD:

Microsoft Hyper-V and EMC ScaleIO

EMC VSPEX

Abstract

This document describes the EMC^® VSPEX^® Proven Infrastructure solution for private cloud deployments with Microsoft Hyper-V and EMC ScaleIO^® technology.

June 2015

(2)

Published June 2015

EMC believes the information in this publication is accurate as of its publication date.

The information is subject to change without notice.

The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

EMC², EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

EMC VSPEX Private Cloud: Microsoft Hyper-V and EMC ScaleIO Proven Infrastructure Guide

Part Number H14028

(3)

Contents

Chapter 1 Executive Summary 7

Introduction ... 8

Target audience ... 9

Document purpose ... 9

Business needs ... 9

Chapter 2 Solution Architecture Overview 11 Overview ... 12

Solution architecture ... 12

High-level architecture ... 12

Logical architecture ... 13

Key components ... 14

Virtualization layer ... 15

Overview ... 15

Configuration guidelines ... 15

High-availability and failover ... 16

Compute layer ... 16

Overview ... 16

Network layer ... 18

Overview ... 18

Storage layer ... 20

Overview ... 20

Chapter 3 Sizing the Environment 29 Overview ... 30

Reference workload ... 30

Scalability ... 31

VSPEX building blocks ... 31

Building block approach ... 31

(4)

Validated building block ... 31

Customizing the building block ... 32

Configuration sizing guidelines ... 34

Introduction to the Customer configuration worksheet ... 34

Using the customer sizing worksheet ... 34

Calculating the building block requirement ... 37

Fine-tuning hardware resources ... 38

Summary ... 39

Chapter 4 VSPEX Solution Implementation 40 Overview ... 41

Network implementation ... 41

Preparing the network switches ... 42

Configuring the infrastructure network ... 42

Configuring the VLANs ... 42

Completing the network cabling... 42

Installing and configuring the Microsoft Hyper-V hosts ... 42

Installing and configuring Microsoft SQL Server databases ... 43

Overview ... 43

Deploying the System Center Virtual Machine Manager server ... 44

Overview ... 44

Preparing and configuring the storage ... 45

Prepare the ScaleIO nodes ... 46

Preparing the installation worksheet ... 47

Installing the ScaleIO components... 48

Creating and mapping volumes ... 53

Installing the GUI ... 56

Provisioning a virtual machine ... 56

Summary ... 56

Chapter 5 Verifying the Solution 57 Overview ... 58

Post-install checklist ... 59

Deploying and testing a single virtual server ... 59

Verifying the redundancy of the solution components ... 59

Chapter 6 System Monitoring 60 Overview ... 61

Key areas to monitor ... 61

Performance baseline ... 61

(5)

Servers ... 62

Networking ... 62

ScaleIO layer... 62

Appendix A Reference Documentation 63 EMC documentation ... 64

……….

connections^Network

Network

Supporting infrastructure

Network components Storage

components

Virtualization components

Storage network

SDS/SDC SDS/SDC

SDS/SDC

Figure 2. VSPEX private cloud components

The solution uses ScaleIO software and Hyper-V to provide the storage and

virtualization platforms for an environment of Microsoft Windows server 2012 virtual machines provisioned by the Hyper-V platform.

To provide predictable performance for end-user computing solutions, the storage system must be able to handle the peak I/O load from the clients while keeping High-level

architecture

(13)

response time to a minimum. In this solution, we used ScaleIO software to use the servers’ local disks to build the storage system with high performance and scalability.

Figure 3 shows the logical architecture of this solution.

Windows Server 2012 R2 Hyper-V cluster EMC ScaleIO

Microsoft Windows 2012 R2 Hyper-V virtual servers

Virtual server 1 Virtual server n

……….

10 GbE IP Network SCVMM

SQL Server

DNS Server

Active Directory Server

Shared infrastructure Storage

network

Figure 3. Logical architecture for the solution

Table 1 lists the solution configuration components.

Table 1. Solution architecture configuration Component Solution configuration

Microsoft Hyper-V Hyper-V provides a common virtualization layer to host the server environment. Hyper-V provides a highly available infrastructure through features such as Live Migration, Failover Clustering, and High Availability (HA).

Microsoft System Center Virtual Machine Manager (SCVMM)

SCVMM is not required for this solution. However, if deployed, SCVMM (or its corresponding functionality in Microsoft System Center Essentials) simplifies provisioning, management, and monitoring of the Hyper-V environment.

EMC ScaleIO ScaleIO software provides a storage layer to host and store virtual machines.

Microsoft SQL Server SCVMM requires an SQL Server database instance to store configuration and monitoring details.

Logical architecture

(14)

Component Solution configuration

Active Directory server Active Directory services are required for the various solution components to function properly. We used the Microsoft Active Directory Service running on a Windows Server 2012 R2 server for this purpose.

DHCP server The dynamic host configuration protocol (DHCP) server centrally manages the IP address scheme for the virtual machines. This service is hosted on the same virtual machine as the domain controller and domain name server (DNS). The Microsoft DHCP Service running on a Windows 2012 R2 server is used for this purpose.

DNS server DNS services are required for the various solution components to perform name resolution. The Microsoft DNS Service running on a Windows 2012 R2 server is used for this purpose.

IP networks A standard Ethernet network with redundant cabling and switching carries all network traffic. A shared network carries user and management traffic, while a private, non-routable subnet carries virtual SAN (vSAN) storage traffic.

Key components

The key components of this solution include:

 Virtualization layer— Decouples the physical implementation of resources from the applications that use the resources, so that the application view of the available resources is no longer directly tied to the hardware. This enables many key features required by the private cloud.

 Compute layer— Provides memory and processing resources for the virtualization layer software and for the applications running in the private cloud. The VSPEX program defines the minimum amount of required compute layer resources, and implements the solution by using any server hardware that meets these requirements.

 Network layer— Connects users of the private cloud to the resources in the cloud, and connects the storage layer to the compute layer. The VSPEX program defines the minimum number of required network ports, provides general guidance on network architecture, and enables you to implement the solution by using any network hardware that meets these requirements.

 Storage layer— Provides storage to implement the private cloud. ScaleIO implements a pure block storage layout with converged nodes to support compute and storage. With multiple hosts accessing shared data through ScaleIO components, ScaleIO provides high-performance data storage while maintaining high availability.

(15)

Virtualization layer

Hyper-V performs the hypervisor-based virtualization role for Microsoft Windows Server and provides the virtualization platform for this solution.

 Hyper-V live migration and live storage migration enable seamless movement of virtual machines or virtual machines files between Hyper-V servers or storage systems, transparently and with minimal performance impact.

 Hyper-V works with Windows Server 2012 Failover Clustering and Cluster Shared Volumes (CSVs) to provide high availability in a virtualized infrastructure, significantly increasing the availability of virtual machines during planned and unplanned downtime. Configure Failover Clustering on the Hyper-V host to monitor virtual machine health and to migrate virtual machines between cluster nodes.

 Hyper-V Replica provides asynchronous replication of virtual machines between two Hyper-V hosts at separate sites. Hyper-V replicas protect business

applications in the Hyper-V environment from downtime associated with an outage at a single site.

 Hyper-V snapshots provide consistent point-in-time views of a virtual machine and enables users to revert the virtual machine to a previous point-in-time if necessary. Snapshots function as the source for backups, test and

development activities, and other use cases.

Microsoft System Center Virtual Machine Manager

Microsoft System Center Virtual Machine Manager (SCVMM) is a centralized management platform that enables datacenter administrators to configure and manage virtualized host, networking, and storage resources, and to create and deploy virtual machines and services to private clouds. SCVMM simplifies provisioning, management, and monitoring in the Hyper-V environment.

Windows Server Cluster-Aware Updating

Windows Cluster-Aware Updating (CAU) enables updating of cluster nodes with little or no loss of availability. CAU is integrated with Windows Server Update Services (WSUS) and can be automated using PowerShell.

Hyper-V has several advanced features that help maximize performance and overall resource utilization. The most important features relate to memory management. This section describes some of these features, and the items to consider when using these features in a VSPEX environment.

Dynamic Memory and Smart Paging

Dynamic Memory increases physical memory efficiency by treating memory as a shared resource, dynamically allocating it to virtual machines, and reclaiming unused memory from idle virtual machines. Administrators can dynamically adjust the

amount of memory used by each virtual machine at any time.

With Dynamic Memory, Hyper-V allows more virtual machines than the available physical memory can support. This introduces the risk that there might not be Overview

Configuration guidelines

(16)

sufficient physical memory available to restart a virtual machine if required. Smart Paging is a memory management technique that uses disk resources as a temporary memory replacement when more memory is required to restart a virtual machine.

Non-Uniform Memory Access

Non-Uniform Memory Access (NUMA) is a multinode technology that enables a CPU to access remote-node memory. Because this type of memory access degrades

performance, Windows Server 2012 uses processor affinity, which pins threads to a single CPU, to avoid remote-node memory access. This feature is available to the host and to the virtual machines, where it provides improved performance in symmetrical multiprocessor (SMP) environments.

Hyper-V memory overhead

Virtualized memory has some associated overhead, including the memory consumed by the Hyper-V the parent partition and additional overhead for each virtual machine.

Leave at least 2 GB memory for the Hyper-V parent partition in this solution.

Virtual machine memory

Each virtual machine in this solution is assigned 2 GB memory in fixed mode.

Configure high availability in the virtualization layer, and enable the hypervisor to restart failed virtual machines automatically. Figure 4 illustrates the hypervisor layer responding to a failure in the compute layer.

Figure 4. High availability at the virtualization layer

Implementing high availability at the virtualization layer ensures that, even in the event of a hardware failure, the infrastructure will attempt to keep as many services running as possible.

Compute layer

The choice of a server platform for a VSPEX infrastructure is not only based on the technical requirements of the environment, but on the supportability of the platform, existing relationships with the server provider, advanced performance and

management features, and many other factors. For these reasons, VSPEX solutions are designed to run on a wide variety of server platforms. Instead of requiring a specific number of servers with a specific set of requirements, VSPEX defines the minimum requirements for the number of processor cores and the amount of RAM.

ScaleIO components are designed to work with a minimum of three server nodes. The physical server node, running Hyper-V, can host workloads other than the ScaleIO virtual machine.

High-availability and failover

Overview

(17)

When designing and ordering the compute layer of this VSPEX solution, several factors can affect the final purchase. If you understand the system workload well, then you can use virtualization features such as memory ballooning and transparent page sharing to reduce the aggregate memory requirement.

You can reduce the number of virtual CPUS (vCPUs) if the virtual machine pool does not have a high level of peak or concurrent usage. Conversely, if the deployed applications are highly computational, you might need to increase the number of CPUs and the amount of memory.

Apply the following best practices in the compute layer:

 Use identical, or at least compatible, servers. VSPEX implements hypervisor- level high-availability technologies that may require similar instruction sets on the underlying physical hardware. By implementing VSPEX on identical server units, you can minimize compatibility problems in this area.

 When implementing high availability at the hypervisor layer, the largest virtual machine you can create is constrained by the smallest physical server in the environment.

Note: To enable high availability for the compute layer, each customer needs one additional server to ensure that the system has enough capacity to maintain business operations when a server fails.

 Implement the available high-availability features in the virtualization layer, and ensure that the compute layer has sufficient resources to accommodate at least single server failures. This enables the implementation of minimal- downtime upgrades and tolerance for single unit failures.

Within the boundaries of these recommendations and best practices, the VSPEX compute layer can be flexible to meet your specific needs. Ensure that there are sufficient processor cores and RAM per core to meet the needs of the target environment.

While the choice of servers to implement in the compute layer is flexible, use enterprise-class servers designed for the datacenter. This type of server has

redundant power supplies, as shown in Figure 5. Connect these servers to separate power distribution units (PDUs) in accordance with your server vendor’s best practices.

(18)

Figure 5. Redundant power supplies

To configure high availability in the virtualization layer, configure the compute layer with enough resources to meet the needs of the environment, even with a server failure, as shown in Figure 4.

Network layer

The infrastructure network requires redundant network links for each Hyper-V host.

This configuration provides both redundancy and additional network bandwidth. This is a required configuration regardless of whether the network infrastructure for the solution already exists, or you are deploying it alongside other components of the solution.

This section provides guidelines for setting up a redundant, highly- available network configuration. The guidelines consider virtual LANS (VLANs) and the ScaleIO network layer.

ScaleIO network

The ScaleIO network creates a Redundant Array of Independent Nodes (RAIN)

topology between the server nodes, distributing data so that the loss of a single node does not affect data availability. This topology requires ScaleIO nodes to send data to other nodes to maintain consistency.

A high-speed, low-latency IP network is required for this to work correctly. We¹ created the test environment with redundant 10 Gb Ethernet networks. The network was not heavily used during testing at small scale points. For that reason, at small points of scale, you can implement the solution using 1 Gb networks. However, EMC recommends a 10 GbE IP network designed for high availability, as shown in Table 2.

1 In this guide, “we” refers to the EMC Solutions engineering team that validated the solution.

Overview

(19)

Table 2. Recommended 10 Gb switched Ethernet network layer

Nodes 10 Gb switched Ethernet 1 Gb switched Ethernet 3

Recommended Possible

4 5 6

7 Not recommended

VLANs

Isolate network traffic so that management traffic, traffic between hosts and storage, and traffic between hosts and clients all move over isolated networks. Physical isolation might be required in some cases for regulatory or policy compliance reasons. Logical isolation with VLANs is sufficient in many cases.

EMC recommends separating the network into two types for security and increased efficiency:

 A management network, used to connect and manage the ScaleIO environment.

This network is generally connected to the client management network.

Because this network has less I/O traffic, EMC recommends a 1 GbE network.

 An internal data network, used for communication between the ScaleIO components. This is generally a 10 GbE network.

In this solution, we used one VLAN for client access and one VLAN for management.

Figure 6 depicts the VLANs and the network connectivity requirements for a ScaleIO environment.

Servers

...

Management Network

Client access network

Management network Storage network

Figure 6. Required networks for ScaleIO

(20)

You can use the client access network to communicate with the ScaleIO

infrastructure. The network provides communication between each ScaleIO node.

Administrators use the management network as a dedicated way to access the management connections on the ScaleIO software components, network switches, and hosts.

Note: Some best practices need additional network isolation for cluster traffic, virtualization layer communication, and other features. Implement these additional networks if necessary.

Each Windows host has multiple connections to user and Ethernet networks to guard against link failures, as shown in Figure 7. Spread these connections across multiple Ethernet switches to guard against component failure in the network.

`

STAT

CONSOLE L1

L2 MGMT 0

MGMT 1 481216

371115

261014

15913

20242832

19232731

18222630

17212529

SLOT2

3438

3337

3640

3539

SLOT3

Cisco Nexus 5020

PS1PS2

200-240v-6A 50~60Hz

`

STAT

CONSOLE L1

L2 MGMT 0

MGMT 1 481216

371115

261014

15913

20242832

19232731

18222630

17212529

SLOT2

3438

3337

3640

3539

SLOT3

Cisco Nexus 5020

PS1PS2

200-240v-6A 50~60Hz

Server connects to multiple switches

Switches connect to each other

Network

Figure 7. Network layer high availability

Storage layer

ScaleIO is a software-only solution that uses hosts’ existing local disks and LAN to realize a vSAN that has all the benefits of external storage—but at a fraction of the cost and the complexity. ScaleIO turns local internal storage into shared block

storage that is comparable to or better than the more expensive external shared block storage. The lightweight ScaleIO software components are installed in the application hosts and communicate using a standard LAN to handle the application I/O requests sent to ScaleIO block volumes. An extremely efficient decentralized block I/O flow, combined with a distributed, sliced volume layout, results in a massively parallel I/O system that can scale to hundreds and thousands of nodes.

Overview

(21)

ScaleIO is designed and implemented with enterprise-grade resilience as an essential attribute. Furthermore, the software features efficient distributed auto-healing

processes that overcome media and node failures without requiring administrator involvement. Dynamic and elastic, ScaleIO enables administrators to add or remove nodes and capacity “on the fly.” The software immediately responds to the changes, rebalancing the storage distribution and achieving a layout that optimally suits the new configuration.

Architecture

Software components

The ScaleIO Data Client (SDC) is a lightweight device driver situated in each host whose applications or file system requires access to the ScaleIO virtual SAN block devices. The SDC exposes block devices representing the ScaleIO volumes that are currently mapped to that host.

The ScaleIO Data Server (SDS) is a lightweight software component within each host that contributes local storage to the central ScaleIO vSAN.

Convergence of storage and compute

The ScaleIO software components, which have a negligible impact on the applications running in the hosts, are carefully designed and implemented to consume the minimum computing resources required for operation.

ScaleIO converges the storage and application layers. The hosts that run applications can also be used to realize shared storage, yielding a wall-to-wall, single layer of hosts. Because the same hosts run applications and provide storage for the vSAN, an SDC and SDS are typically both installed in each of the participating hosts.

Pure block storage implementation

ScaleIO implements a pure block storage layout. Its entire architecture and data path are optimized for block storage access needs. For example, when an application submits a read I/O request to its SDC, the SDC instantly deduces which SDS is responsible for the specified volume address and then interacts directly with the relevant SDS. The SDS reads the data (by issuing a single read I/O request to its local storage or by just fetching the data from the cache in a cache-hit scenario), and returns the result to the SDC. The SDC provides the read data to the application.

This flow is simple, consuming as few resources as necessary. The data moves over the network exactly once, and a single I/O request is sent to the SDS storage. The write I/O flow is similarly simple and efficient. Unlike some block storage systems that run on top of a file system or object storage that runs on top of a local file system, ScaleIO offers optimal I/O efficiency.

Massively parallel, scale-out I/O architecture

ScaleIO can scale to a large number of nodes, thus breaking the traditional scalability barrier of block storage. Because the SDCs propagate the I/O requests directly to the pertinent SDSs, there is no central point through which the requests move—and thus a potential bottleneck is avoided. This decentralized data flow is crucial to the linearly scalable performance of ScaleIO. Therefore, a large ScaleIO configuration results in a massively parallel system. The more servers or disks the system has, the

(22)

greater the number of parallel channels that will be available for I/O traffic and the higher the aggregated I/O bandwidth and IOPS will be.

Mix-and-match nodes

The vast majority of traditional scale-out systems are based on a “symmetric brick”

architecture. Unfortunately, datacenters cannot be standardized on exactly the same bricks for a prolonged period, because hardware configurations and capabilities change over time. Therefore, such symmetric scale-out architectures are bound to run in small islands. ScaleIO was designed from the ground up to support a mix of new and old nodes with dissimilar configurations.

Hardware agnostic

ScaleIO is platform agnostic and works with existing underlying hardware resources.

Besides its compatibility with various types of disks, networks, and hosts, it can take advantage of the write buffer of existing local RAID controller cards—and can also run in servers that do not have a local RAID controller card.

For the local storage of an SDS, you can use internal disks, directly attached external disks, virtual disks exposed by an internal RAID controller, partitions within such disks, and more. Partitions can be useful to combine system boot partitions with ScaleIO capacity on the same raw disks. If the system already has a large, mostly unused partition, ScaleIO does not require repartitioning of the disk, as the SDS can actually use a file within that partition as its storage space.

Volume mapping and volume sharing

The volumes that ScaleIO exposes to the application clients can be mapped to one or more clients running in different hosts. Mapping can be changed dynamically if necessary. In other words, ScaleIO volumes can be used by applications that expect shared-everything block access and by applications that expect shared-nothing or shared-nothing-with-failover access.

Clustered, striped volume layout

A ScaleIO volume is a block device that is exposed to one or more hosts. It is the equivalent of a logical unit in the SCSI world. ScaleIO breaks each volume into a large number of data chunks, which are scattered across the SDS cluster’s nodes and disks in a fully balanced manner. This layout practically eliminates hot spots across the cluster and allows for the scaling of the overall I/O performance of the system through the addition of nodes or disks. Furthermore, this layout enables a single application that is accessing a single volume to use the full IOPS of all the cluster’s disks. This flexible, dynamic allocation of shared performance resources is one of the major advantages of converged scale-out storage.

Software-only—but as resilient as a hardware array

Traditional storage systems typically combine system software with commodity hardware—which is comparable to application servers’ hardware—to provide enterprise-grade resilience. With its contemporary architecture, ScaleIO provides similar enterprise-grade, no-compromise resilience by running the storage software directly on the application servers. Designed for extensive fault tolerance and high availability, ScaleIO handles all types of failures, including failures of media, connectivity, and nodes, software interruptions, and more. No single point of failure

(23)

can interrupt the ScaleIO I/O service. In many cases, ScaleIO can overcome multiple points of failure as well.

Managing clusters of nodes

Many storage cluster designs use tightly coupled techniques that might be adequate for a small number of nodes but begin to break when the cluster is larger than a few dozen nodes. The loosely coupled clustering management schemes of ScaleIO provide exceptionally reliable—yet lightweight—failure and failover handling in both small and large clusters.

Most clustering environments assume exclusive ownership of the cluster nodes and might even physically fence or shut down malfunctioning nodes. ScaleIO uses application hosts. The ScaleIO clustering algorithms are designed to work efficiently and reliably without interfering with the applications with which ScaleIO coexists.

ScaleIO will never disconnect or invoke Intelligent Platform Management Interface shutdowns of malfunctioning nodes, because they might still be running healthy applications.

Protection domains

As shown in Figure 8, you can divide a large ScaleIO storage pool into multiple protection domains, each of which contains a set of SDSs. ScaleIO volumes are assigned to specific protection domains. Protection domains are useful for mitigating the risk of a dual point of failure in a two-copy scheme or a triple point of failure in a three-copy scheme.

Figure 8. Protection domains

For example, if two SDSs that are in different protection domains fail simultaneously, no data will become unavailable. Just as incumbent storage systems can overcome a large number of simultaneous disk failures as long as they do not occur within the same shelf, ScaleIO can overcome a large number of simultaneous disk or node failures as long as they do not occur within the same protection domain.

Management and monitoring

ScaleIO provides several tools to manage and monitor the system, including a command line interface (CLI), an active GUI, and representational state transfer (REST) management application program interface (API) commands. The CLI enables

(24)

administrators to have direct platform access to perform backend configuration actions and obtain monitoring information.

The active GUI, shown in Figure 9, provides system dashboards for capacity,

throughput, bandwidth statistics, access to system alerts, and the ability to provision backend devices. The REST management API allows users to execute the same management and monitoring commands available with the CLI using a next- generation, cloud-based interface.

Figure 9. ScaleIO active GUI

Interoperability

ScaleIO is integrated with Hyper-V and OpenStack to provide customers with greater flexibility in deploying ScaleIO with existing environments. The OpenStack integration (“Cinder” support) allows customers to use commodity hardware with ScaleIO, providing a software-defined block volume solution in an OpenStack environment.

Additionally, ScaleIO software can be packaged with EMC ViPR® for management and orchestration functions and with EMC ViPR SRM for additional monitoring and reporting capabilities

Enterprise Features

Whether you are a service provider delivering hosted infrastructure as a service or your IT department delivers infrastructure as a service to functional units within your organization, ScaleIO offers a set of features that gives you complete control over performance, capacity, and data location. For both private cloud datacenters and service providers, these features enhance system control and manageability, ensuring that quality of service is met. With ScaleIO, you can limit the amount of performance—IOPS or bandwidth—that selected customers can consume. The limiter allows you to impose and regulate resource distribution to prevent application

“hogging” scenarios. You can apply data masking to provide added security for sensitive customer data. ScaleIO offers instantaneous, writeable snapshots for data backups.

(25)

For improved read performance, dynamic random-access memory (DRAM) caching enables you to improve read access by using SDS server RAM. Fault sets—a group of SDS that are likely to go down together—can be defined to ensure data mirroring occurs outside the group, improving business continuity. You can create volumes with thin provisioning, providing on-demand storage as well as faster setup and startup times.

Finally, tight integrations with other EMC products are available. You can use ScaleIO in conjunction with EMC XtremCache™ for flash cache auto tiering to further

accelerate application performance.

Figure 10 shows the ScaleIO enterprise features.

Figure 10. ScaleIO enterprise features

ScaleIO 1.32

ScaleIO 1.32 includes the following new features and functionality:

 Release of the ScaleIO ‘Free and Frictionless’ download, a free download of ScaleIO for non-production environments with no time / function / capacity limits

 Support for VMware ESX 6.0 (VMware certified)

 Support for SUSE Linux Enterprise Server (SLES) 12

 Support for IBM Spectrum Scale™ (General Parallel File System (GPFS)™) over ScaleIO for Linux environments (Red Hat Enterprise Linux (RHEL) / SLES)

 Additional flexibility during the configuration process

(26)

This section provides guidelines for setting up the storage layer of the solution to provide high availability and the expected level of performance.

Microsoft Hyper-V supports more than one method of storage when hosting virtual machines. The ScaleIO solution is based on block protocols, and the ScaleIO layer described in this section uses all current best practices. A customer or architect with the necessary training and background can make modifications based on their understanding of the system’s usage and load if required. However, the building blocks described in Chapter 3 ensure acceptable performance.

Hyper-V storage virtualization

Windows Server 2012 Hyper-V and Failover Clustering use Cluster Shared Volumes v2 and VHDX features to virtualize storage presented from an external shared storage system to the host virtual machines. In Figure 11, the ScaleIO volumes present block- based LUNs (as CSVs) to the Windows hosts to host the virtual machines.

Figure 11. Hyper-V virtual disk types

CSV

A CSV is a shared disk containing a New Technology File System (NTFS) volume that is accessible to all nodes of a Windows Failover Cluster. The CSV can be deployed over any SCSI-based local or network storage.

Pass-through disks

Windows Server 2012 also supports pass-through disks, which enable a virtual machine to access a physical disk mapped to a host that does not have a volume configured on it.

VHDX

Hyper-V in Windows Server 2012 contains an update to the virtual hard disk (VHD) format called VHDX, which has much greater capacity and built-in resiliency. The main features of the VHDX format are:

 Support for virtual hard disk storage capacity of up to 64 TB

 Additional protection against data corruption during power failures by logging updates to the VHDX metadata structures

(27)

 Optimal structure alignment of the virtual hard disk format to suit large sector disks

The VHDX format also has the following features:

 Larger block size for dynamic and differential disks, which enables the disks to better meet the needs of the workload

 A 4 KB logical-sector virtual disk that enables increased performance when used by applications and workloads that are designed for 4 KB sectors

 The ability to store custom file metadata that the user might want to record, such as the operating system version or applied updates

 Space reclamation features that can result in smaller file sizes and enable the underlying physical storage device to reclaim unused space (for example, TRIM requires direct-attached storage or SCSI disks and TRIM-compatible hardware) Redundancy scheme and rebuild process

ScaleIO uses a mirroring scheme to protect data against disk and node failures. The ScaleIO architecture supports a distributed two-copy scheme. If an SDS node or SDS disk fails, applications can continue to access ScaleIO volumes; their data is still available through the remaining mirrors. ScaleIO immediately starts a seamless rebuild process to create another mirror for the data chunks that were lost in the failure. During the rebuild process, ScaleIO copies those data chunks to free areas across the SDS cluster, so it is not necessary to add any capacity to the system.

The surviving SDS cluster nodes carry out the rebuild process by using the aggregated disk and network bandwidth of the cluster. The process is fast and minimizes both exposure time and application performance degradation. After the rebuild, all the data is fully mirrored and healthy again.

If a failed node rejoins the cluster before the rebuild process is completed, ScaleIO dynamically uses data from the rejoined node to further minimize the exposure time and the use of resources. This capability is important for overcoming short outages efficiently.

Elasticity and rebalancing

Unlike many other systems, a ScaleIO cluster is extremely elastic. Administrators can add and remove capacity and nodes on the fly during I/O operations.

When a cluster is expanded with new capacity (such as new SDSs or new disks added to existing SDSs), ScaleIO immediately rebalances the storage by seamlessly

migrating data chunks from the existing SDSs to the new SDSs or disks. This

migration does not affect the applications, which continue to access the data stored in the migrating chunks. By the end of the rebalancing process, all the ScaleIO volumes are spread across all the SDSs and disks, including the newly added ones, in an optimally balanced manner, as shown in Figure 12. Thus, adding SDSs or disks not only increases the available capacity but also increases the performance of the applications as they access their volumes.

(28)

Figure 12. Automatic rebalancing when disks are added

When an administrator decreases capacity (for example, by removing SDSs or removing disks from SDSs), ScaleIO performs a seamless migration that rebalances the data across the remaining SDSs and disks in the cluster, as shown in Figure 13.

Figure 13. Automatic rebalancing when disks or nodes are removed

Notes:

 In all types of rebalancing, ScaleIO migrates the least amount of data possible.

ScaleIO is sufficiently flexible to accept new requests to add or remove capacity while still rebalancing previous capacity additions and removals.

 To maintain data availability, remove only one node at a time.

(29)

Chapter 3 Sizing the Environment

Overview ... 30 Reference workload... 30 Scalability ... 31 VSPEX building blocks ... 31 Configuration sizing guidelines ... 34

(30)

Overview

This chapter presents the following information:

 How to design and size the VSPEX Private Cloud for Microsoft Hyper-V with EMC ScaleIO solution to meet the customer’s needs

 How to design the nodes for the ScaleIO environment and specify the number of nodes

 Results from the solution testing and validation as to how variations in node size and number affect the maximum number of supported servers. The virtual machines used in the sizing calculations correspond to the definition of the reference workload (reference virtual machine) for the VSPEX Private Cloud.

Reference workload

When you move an existing server to a virtual infrastructure, you can gain efficiency by rightsizing the virtual hardware resources assigned to that system.

Each VSPEX Proven Infrastructure balances the storage, network, and compute resources needed for a set number of virtual machines, as validated by EMC. In practice, each virtual machine has its own requirements that rarely fit a pre-defined specification.

To simplify sizing the solution, VSPEX defines a reference workload, which represents a unit of measure for quantifying the resources in the solution reference architecture.

By comparing the customer’s actual usage to this reference workload, you can determine how to size the solution.

For VSPEX Private Cloud solutions, the reference workload is defined as a single virtual machine with the characteristics shown in Table 3.

Table 3. VSPEX Private Cloud workload

Parameter Value

Virtual machine OS Windows Server 2012 R2

Virtual CPUs 1

Virtual CPUs per physical core (maximum) 4 Memory per virtual machine 2 GB

IOPS per virtual machine 25

I/O pattern Fully random skew = 0.5

I/O read percentage 67%

Virtual machine storage capacity 100 GB

This solution uses the VSPEX Private Cloud reference virtual machine for sizing the customer environment in the same way that the reference virtual machine is used in VSPEX Private Cloud solutions for the EMC VNX platform. For further information, refer

(31)

to EMC VSPEX Private Cloud: Microsoft Windows Server 2012 R2 with Hyper-V for up to 1000 Virtual Machines Proven Infrastructure Guide.

Scalability

ScaleIO is designed to scale from three to thousands of nodes. Unlike most traditional storage systems, as the number of servers grows, so do capacity,

throughput, and IOPS. Performance scales linearly with the growth of the deployment.

Whenever additional storage and compute resources (such as servers and drives) are needed, you can add them modularly. Storage and compute resources grow together so that the balance between them is maintained.

VSPEX building blocks

Sizing the system to meet the virtual server application requirements is a complicated process. When applications generate I/O, several components serve that I/O—for example, server CPU, server dynamic random access memory (DRAM) cache, and disks. Customers must consider various factors when planning and scaling their storage system to balance capacity, performance, and cost for their applications.

VSPEX uses a building block approach to reduce complexity. A building block consists of one server node that is configured and validated to support a certain number of virtual servers in the VSPEX architecture. Each building block node combines several local disk spindles to contribute a shared ScaleIO volume to support the needs of the private cloud environment. The SDS and the SDC are both installed on each building block node to contribute the local disk to the ScaleIO storage pool and expose ScaleIO shared block volumes to run the virtual machines.

The configuration of the validated reference building block includes the memory size and the number of physical CPU cores and disk spindles shown in Table 4. This configuration provides a flexible solution for VSPEX sizing.

Table 4. Building block node configuration

Physical CPU cores Memory (GB) SAS drives (10k rpm) SAS capacity (GB)

6 64 6 600

The building block configuration contains six SAS disks per node. The validated solution models these drives at 600 GB each. Solution testing revealed that drive capacity, rather than drive performance, limits the node configuration for a VSPEX Private Cloud and the number of reference virtual machines that a building block can support. The reference building block memory can support 31 reference virtual machines; but the reference building block disk capacity can support only 12 virtual machines, as shown in Table 5.

Customizing the building block provides information about how to customize the building block configuration.

Building block approach

Validated building block