EMC VSPEX PRIVATE CLOUD:

(1)

EMC VSPEX PRIVATE CLOUD:

VMware vSphere 5.5 and EMC ScaleIO

EMC VSPEX

Abstract

This document describes the EMC^® VSPEX^® Proven Infrastructure solution for private cloud deployments with VMware vSphere 5.5 and EMC ScaleIO^®technology.

June 2015

(2)

Published June 2015

EMC believes the information in this publication is accurate as of its publication date.

The information is subject to change without notice.

The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

EMC², EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

EMC VSPEX Private Cloud: VMware vSphere 5.5 and EMC ScaleIO Proven Infrastructure Guide

Part Number H14207

(3)

Introduction

EMC^® VSPEX^® Proven Infrastructures are optimized for virtualizing business-critical applications. VSPEX provides modular solutions built with technologies that enable faster deployment, more simplicity, wider choice, greater efficiency, and lower risk.

Figure 1 shows the modular, virtualized infrastructures validated by EMC and

delivered by EMC VSPEX partners. Partners can choose the virtualization, server, and network technologies that best fit a customer’s environment, while the server’s local disks with elastic EMC ScaleIO^® software provide the storage.

Figure 1. VSPEX Proven Infrastructures

This document is a comprehensive guide to the technical aspects of VSPEX Private Cloud for VMware vSphere with EMC ScaleIO solution. Server capacity is provided in generic terms for required minimums of CPU, memory, and network interfaces; the customer is free to select the server and networking hardware that meets or exceeds the stated minimums.

(10)

Target audience

Readers of this document must have the necessary training and background to install and configure VMware vSphere 5.5, ScaleIO, and associated infrastructure as

required by this implementation. External references are provided where applicable, and readers should be familiar with these documents.

Readers should also be familiar with the infrastructure and database security policies of the customer installation.

Individuals selling and sizing a VMware Private Cloud infrastructure should focus on the first five chapters of this guide. After purchase, implementers of the solution should focus on the configuration guidelines in Chapter 4, the solution validation in Chapter 5, and the appropriate references and appendices in Chapter 6.

Document purpose

This document includes an initial introduction to the VSPEX architecture, an explanation of how to modify the architecture for specific implementations, and instructions on how to effectively deploy and monitor the system.

The VSPEX Private Cloud architecture provides customers with a modern system capable of hosting many virtual machines at a consistent performance level. This solution runs on the vSphere virtualization layer. ScaleIO software runs on top of vSphere hypervisor. The compute and network components, which are defined by the VSPEX partners, are designed to be redundant and sufficiently powerful to handle the processing and data needs of the virtual machine environment.

The solution described in this document is based on the capacity of the cluster server and on a defined reference workload. Because not every virtual machine has the same requirements, this document contains methods and guidance to adjust your system to be cost-effective when deployed.

A private cloud architecture is a complex system offering. This guide facilitates setup by providing prerequisite software and hardware material lists, step-by-step sizing guidance and worksheets, and verified deployment steps. After the last component is installed, validation tests and monitoring instructions ensure that your system is running properly.

Business needs

VSPEX solutions are built with proven technologies to create complete virtualization solutions that allow you to make an informed decision in the hypervisor, server, and networking layers.

(11)

Business applications are moving into consolidated compute, network, and storage environments. This solution reduces the complexity of configuring every component of a traditional deployment model, and simplifies integration management while maintaining the application design and implementation options. It also provides unified administration while enabling adequate control and monitoring of process separation. The business benefits for the architectures include:

 An end-to-end virtualization solution to effectively use the capabilities of the unified infrastructure components

 Efficient virtualization of virtual machines for varied customer use cases

 A reliable, flexible, and scalable reference design

(12)

Chapter 2 Solution Architecture Overview

Overview ... 13

Solution architecture ... 13

Key components ... 15

Virtualization layer ... 15

Compute layer ... 19

Network layer ... 20

Storage layer ... 23

Security layer ... 30

(13)

Overview

This chapter provides a comprehensive guide to the major aspects of this solution.

Server capacity is presented generically for required minimums of CPU, memory, and network resources. You can select server and networking hardware that meets or exceeds the stated minimums. The specified ScaleIO architecture, and the system that meets server and network requirements, was validated by EMC to provide high levels of performance while delivering a highly available architecture for your private cloud deployment.

Solution architecture

This solution is designed and proven by EMC to provide server virtualization, server, network, and storage resources to provide customers with the ability to deploy a small-scale architecture and scale as their business requires.

Figure 2 shows the high-level architecture of the validated solution.

Compute components

Hypervisor

Virtual servers Virtual servers

……….

connections^Network

Network

Supporting infrastructure

Network components Storage

components

Virtualization components

Storage network

SDS/SDC SDS/SDC

SDS/SDC

Figure 2. Architecture of the validated solution

The solution uses ScaleIO software and vSphere to provide the storage and

virtualization platforms for an environment of Microsoft Windows Server 2012 virtual machines provisioned by the vSphere platform.

To provide predictable performance for end-user computing solutions, the storage system must be able to handle the peak I/O load from the clients while keeping response time to a minimum. In this solution, we used ScaleIO software to leverage High-level

architecture

(14)

the servers’ local disks to build the storage system with high performance and scalability.

Figure 3 shows the logical architecture of this solution.

VMware ESXi cluster EMC ScaleIO

Vmware ESXi virtual servers

Virtual server 1 Virtual server n

……….

10 GbE IP Network

Network

vCenter Server

SQL Server

DNS Server

Active Directory Server

Shared infrastructure Storage

network

Figure 3. Logical architecture for the solution

Table 1 summarizes the configuration of the various components of the solution architecture. The Key components section provides detailed overviews of the key technologies.

Table 1. Solution architecture configuration Component Solution configuration

VMware vSphere 5.5 This solution uses VMware vSphere to provide a common virtualization layer to host the server environment. We configured high availability in the virtualization layer with vSphere features such as VMware High Availability (HA) clusters and VMware vMotion.

VMware vCenter Server

5.5 In the solution, all vSphere hosts and their virtual machines are managed through a vCenter Server Appliance.

EMC ScaleIO ScaleIO software provides a storage layer to host and store virtual machines.

Microsoft SQL Server VMware vCenter Server requires a database service to store configuration and monitoring details. This solution uses a Microsoft SQL Server 2012 database.

Active Directory server Active Directory services are required for the various solution components to function properly. We used the Microsoft Active Directory Service running on a Windows Server 2012 R2 server for this purpose.

Logical architecture

(15)

Component Solution configuration

DHCP server The dynamic host configuration protocol (DHCP) server centrally manages the IP address scheme for the virtual machines. This service is hosted on the same virtual machine as the domain controller and domain name server (DNS). The Microsoft DHCP Service running on a Windows 2012 R2 server is used for this purpose.

DNS server DNS services are required for the various solution components to perform name resolution. The Microsoft DNS Service running on a Windows 2012 R2 server is used for this purpose.

IP networks All network traffic is carried by a standard Ethernet network with redundant cabling and switching. User and management traffic is carried over a shared network, while virtual SAN (vSAN) storage traffic is carried over a private, non-routable subnet.

Key components

This section describes the key components of this solution:

 Virtualization layer—Decouples the physical implementation of resources from the applications that use the resources so that the application view of the available resources is no longer directly tied to the hardware. This enables many key features required by the private cloud.

 Compute layer—Provides memory and processing resources for the virtualization layer software and for the applications running in the private cloud. The VSPEX program defines the minimum amount of required compute layer resources, and implements the solution by using any server hardware that meets these requirements.

 Network layer—Connects users of the private cloud to the resources in the cloud, and connects the storage layer to the compute layer. The VSPEX program defines the minimum number of required network ports, provides general guidance on network architecture, and enables you to implement the solution by using any network hardware that meets these requirements.

 Storage layer—Provides storage to implement the private cloud. ScaleIO implements a pure block storage layout with converged nodes to support compute and storage. With multiple hosts accessing shared data through ScaleIO components, ScaleIO provides high-performance data storage while maintaining high availability.

 Security—An optional solution component that provides consumers with additional options to control access to the environment and ensure that only authorized users are permitted to use the system

Virtualization layer

vSphere is the leading virtualization platform in the industry. For years, it has

provided flexibility and cost savings to the end users by enabling the consolidation of Overview

(16)

large, inefficient server farms into nimble, reliable cloud infrastructures. The core vSphere components are the vSphere hypervisor and the vCenter Server for system management.

The VMware hypervisor runs on a dedicated server and allows multiple operating systems to run on the system at one time as virtual machines. These hypervisor systems can be connected to operate in a clustered configuration. These clustered configurations are then managed as a larger resource pool through vCenter, and allow for dynamic allocation of CPU, memory, and storage across the cluster.

Features such as VMware vMotion, which allows a virtual machine to move between different servers with no disruption to the operating system, and Distributed Resource Scheduler (DRS), which performs vMotion automatically to balance the load, make vSphere a solid business choice. With vSphere 5.5, a VMware-virtualized

environment can host virtual machines with up to 64 virtual CPUs and 1 TB of virtual random access memory (RAM).

Memory is a critical component of any virtual system, and the mapping between physical memory present in a server and virtual memory presented to a guest virtual machine is a major component of the design of the target service. This section outlines some of the relevant considerations.

Virtual machine memory management

vSphere has a number of advanced features that help optimize performance and overall use of resources. This section describes the key features for memory management and considerations for using them with your solution.

 Memory over-commitment

Memory over-commitment occurs when more memory is allocated to virtual machines than is physically present in a vSphere host. Using sophisticated techniques such as ballooning and transparent page sharing, vSphere is able to handle memory over-commitment without any performance degradation.

However, if more memory is being actively used than is present on the server, vSphere might resort to swapping portions of a virtual machine's memory.

Note: EMC VSPEX Private Cloud solutions do not account for memory over-

commitment in sizing examples because the performance risks associated with that configuration will depend heavily on the customer environment.

 Transparent page sharing

Virtual machines running similar operating systems and applications typically have identical sets of memory content. Page sharing allows the hypervisor to reclaim the redundant copies and return them to the host’s free memory pool for reuse. However, VMware recommends disabling this option for security reason.

 Memory compression

vSphere uses memory compression to store pages that would otherwise be swapped out to disk through host swapping, in a compression cache located in the main memory.

Configuration guidelines

(17)

 Memory ballooning

Memory ballooning relieves host resource exhaustion by allocating free pages from the virtual machine to the host for reuse, with little to no impact on the application’s performance.

 Hypervisor swapping

Hypervisor swapping causes the host to force arbitrary virtual machine pages out to disk.

For more information, refer to Understanding Memory Resource Management in VMware vSphere 5.5.

Memory configuration guidelines

Proper sizing and configuration of the solution requires care. This section provides guidelines for allocating memory to virtual machines.

vSphere memory overhead

There is some memory space overhead associated with virtualizing memory resources. This overhead has two components:

 System overhead for the VMkernel

 Additional overhead for each virtual machine

The overhead for the VMkernel is fixed, whereas the amount of additional memory for each virtual machine depends on the number of virtual CPUs (vCPUs) and the amount of memory configured for the guest OS.

Virtual machine memory settings

Figure 4 shows the memory setting’s parameters in a virtual machine, including:

 Configured memory—Physical memory allocated to the virtual machine at the time of creation.

 Reserved memory—Memory that is guaranteed to the virtual machine.

 Touched memory—Memory that is active or in use by the virtual machine.

 Swappable—Memory that can be de-allocated from the virtual machine if the host is under memory pressure from other virtual machines using ballooning, compression, or swapping.

(18)

Figure 4. Virtual machine memory settings

EMC recommends that you follow these best practices for virtual machine memory settings:

 Do not disable the default memory reclamation techniques. These lightweight processes provide flexibility with minimal impact to workloads.

 Intelligently size memory allocation for virtual machines.

Over-allocation wastes resources, while under-allocation causes performance impacts that can affect other virtual machines’ sharing resources. Over- committing can lead to resource exhaustion if the hypervisor cannot procure memory resources. In severe cases, when hypervisor swapping occurs, virtual machine performance might be adversely affected.

Having performance baselines of your virtual machine workloads assists in this process.

Allocating memory to virtual machines

Many factors determine the proper sizing for virtual machine memory in VSPEX architectures. With the number of application services and use cases available, determining a suitable configuration for an environment requires creating a baseline configuration, testing the configuration, and making adjustments for optimal results.

Configure high availability in the virtualization layer, and enable the hypervisor to automatically restart failed virtual machines. Figure 5 illustrates the hypervisor layer responding to a failure in the compute layer.

VMware vSphere cluster – VMHA configured Host failure VMware vSphere cluster – VMHA configured

Figure 5. High availability at the virtualization layer High availability

and failover

(19)

By implementing high availability at the virtualization layer, even with a hardware failure, the infrastructure will attempt to keep as many services running as possible.

Compute layer

The choice of a server platform for an EMC VSPEX infrastructure is not only based on the technical requirements of the environment, but also on how well the platform is supported. Other important factors include the customer’s relationship with the server provider and the performance and management of the platform. For this reason, EMC VSPEX solutions are designed to run on a wide variety of server

platforms. Rather than presenting a specific number of servers with a specific set of requirements, VSPEX documents present the minimum requirements needed for the number of processor cores and the amount of RAM.

ScaleIO components are designed to work with a minimum of three server nodes. The physical server node, running vSphere, can host other workloads beyond the ScaleIO virtual machine. In this VSPEX document, we use at least three compute nodes to implement the solution.

When designing and ordering the compute/server layer of this VSPEX solution, several factors may impact the final purchase. From a virtualization perspective, if a system workload is well understood, features such as memory ballooning and transparent page sharing can reduce the aggregate memory requirement.

If the virtual machine pool does not have a high level of peak or concurrent usage, reduce the number of vCPUs. Conversely, if the applications being deployed are highly computational in nature, increase the number of CPUs and memory purchased.

Use the following best practices in the compute layer:

 Use several identical, or at least compatible, servers. VSPEX implements hypervisor level high-availability technologies that may require similar

instruction sets on the underlying physical hardware. By implementing VSPEX on identical server units, you can minimize compatibility problems in this area.

 If you implement high availability at the hypervisor layer, the largest virtual machine you can create is constrained by the smallest physical server in the environment.

Note: To enable high availability for the compute layer, each customer needs one additional server to ensure that the system has enough capacity to maintain business operations when a server fails.

 Implement the high availability features in the virtualization layer, and ensure that the compute layer has sufficient resources to accommodate at least single server failures. This enables the implementation of minimal-downtime

upgrades, and tolerance for single unit failures.

Within the boundaries of these recommendations and best practices, the compute layer for VSPEX can be flexible to meet your specific needs. Ensure that there are sufficient processor cores and RAM per core to meet the needs of the target environment.

Overview

(20)

While the choice of servers to implement in the compute layer is flexible, we

recommend using enterprise-class servers designed for the datacenter. This type of server has redundant power supplies, as shown in Figure 6. Connect these servers to separate power distribution units (PDUs) following your server vendor’s best

practices.

Figure 6. Redundant power supplies

To configure high availability in the virtualization layer, configure the compute layer with enough resources to meet the needs of the environment, even with a server failure, as demonstrated in Figure 5.

Network layer

The infrastructure network requires redundant network links for each vSphere host.

This configuration provides both redundancy and additional network bandwidth. This is a required configuration regardless of whether the network infrastructure for the solution already exists, or if you are deploying it with other components of the solution.

This section provides guidelines for setting up a redundant, highly available network configuration. The guidelines consider virtual LANs (VLANs), the link aggregation control protocol (LACP) ESXi server, and the ScaleIO layer.

ScaleIO network

ScaleIO creates a Redundant Array of Independent Nodes (RAIN) topology between the server nodes. In practice, this means that the system distributes data so that the loss of a single node will not impact data availability. This, in turn, requires that the ScaleIO nodes send data to other nodes to maintain consistency. A high-speed, low- latency IP network is required for this to work correctly. We recommend a 10 GbE IP network designed for high availability, as shown in Table 2. We¹ created the test

1 In this guide, “we” refers to the EMC Solutions engineering team that validated the solution.

High-availability and failover

Overview

(21)

environment with redundant 10 Gb Ethernet networks. During testing, at small scale points, the network was not heavily used.

Table 2. Recommended 10 Gb switched Ethernet network layer

Nodes 10 Gb switched Ethernet 1 Gb switched Ethernet

3 Recommended Possible

4 5 6

7+ Not recommended

VLANs

Isolate network traffic so that the traffic between hosts and storage, hosts and clients, and management traffic all move over isolated networks. In some cases, physical isolation may be required for regulatory or policy compliance reasons, but in many cases, logical isolation with VLANs is sufficient.

We recommend separating the network for security and increased efficiency. There are two types of networks:

 A management network, used to connect and manage the ScaleIO virtual machines, is normally connected to the client management network. Because this network has less I/O traffic, we recommend a 1 GB network.

 A data network is internal, enabling communication between the ScaleIO components, and is generally a 10 GB network.

In this solution, we used one VLAN for client access and one VLAN for management.

Figure 7 depicts the VLANs and the network connectivity requirements for a ScaleIO environment.

(22)

Servers

.. .

Management Network

Client access network

Management network Storage network

Figure 7. Required networks for ScaleIO

You can use the client access network to communicate with the ScaleIO

infrastructure. The network provides communication between each ScaleIO node.

Administrators use the management network as a dedicated way to access the management connections on the ScaleIO software component, network switches, and hosts.

Note: Some best practices need additional network isolation for cluster traffic, virtualization layer communication, and other features. Implement these additional networks if necessary.

Each vSphere host has multiple connections to user and Ethernet networks to guard against link failures, as shown in Figure 8. Spread these connections across multiple Ethernet switches to guard against component failure in the network.

`

STAT

CONSOLE L1

L2 MGMT 0

MGMT 1 481216

371115

261014

15913

20242832

19232731

18222630

17212529

SLOT2

3438

3337

3640

3539

SLOT3

Cisco Nexus 5020

PS1PS2

200-240v-6A

50~60Hz

`

STAT

CONSOLE L1

L2 MGMT 0

MGMT 1 481216

371115

261014

15913

20242832

19232731

18222630

17212529

SLOT2

3438

3337

3640

3539

SLOT3

Cisco Nexus 5020

PS1PS2

200-240v-6A50~60Hz

Server connects to multiple switches

Switches connect to each other

Network

Figure 8. Network layer high availability High availability

and failover

(23)

Storage layer

ScaleIO is a software-only solution that uses hosts’ existing local disks and LAN to realize a vSAN that has all the benefits of external storage—but at a fraction of the cost and the complexity. ScaleIO turns local internal storage into shared block

storage that is comparable to or better than the more expensive external shared block storage. The lightweight ScaleIO software components are installed in the application hosts and inter-communicate using a standard LAN to handle the application I/O requests sent to ScaleIO block volumes. An extremely efficient decentralized block I/O flow, combined with a distributed, sliced volume layout, results in a massively parallel I/O system that can scale to hundreds and thousands of nodes.

ScaleIO is designed and implemented with enterprise-grade resilience as an essential attribute. Furthermore, the software features efficient distributed auto-healing

processes that overcome media and node failures without requiring administrator involvement. Dynamic and elastic, ScaleIO enables administrators to add or remove nodes and capacity “on the fly.” The software immediately responds to the changes, rebalancing the storage distribution and achieving a layout that optimally suits the new configuration.

Architecture

Software components

The ScaleIO Data Client (SDC) is a lightweight device driver situated in each host whose applications or file system requires access to the ScaleIO vSAN block devices.

The SDC exposes block devices representing the ScaleIO volumes that are currently mapped to that host.

The ScaleIO Data Server (SDS) is a lightweight software component within each host that contributes local storage to the central ScaleIO vSAN.

Convergence of storage and compute

The ScaleIO software components, which have a negligible impact on the applications running in the hosts, are carefully designed and implemented to consume the minimum computing resources required for operation.

ScaleIO converges the storage and application layers. The hosts that run applications can also be used to realize shared storage, yielding a wall-to-wall, single layer of hosts. Because the same hosts run applications and provide storage for the vSAN, an SDC and SDS are typically both installed in each of the participating hosts.

Pure block storage implementation

ScaleIO implements a pure block storage layout. Its entire architecture and data path are optimized for block storage access needs. For example, when an application submits a read I/O request to its SDC, the SDC instantly deduces which SDS is responsible for the specified volume address and then interacts directly with the relevant SDS. The SDS reads the data (by issuing a single read I/O request to its local storage or by just fetching the data from the cache in a cache-hit scenario), and returns the result to the SDC. The SDC provides the read data to the application.

This flow is simple, consuming as few resources as necessary. The data moves over the network exactly once, and a single I/O request is sent to the SDS storage. The Overview

(24)

write I/O flow is similarly simple and efficient. Unlike some block storage systems that run on top of a file system or object storage that runs on top of a local file system, ScaleIO offers optimal I/O efficiency.

Massively parallel, scale-out I/O architecture

ScaleIO can scale to a large number of nodes, thus breaking the traditional scalability barrier of block storage. Because the SDCs propagate the I/O requests directly to the pertinent SDSs, there is no central point through which the requests move—and thus a potential bottleneck is avoided. This decentralized data flow is crucial to the linearly scalable performance of ScaleIO. Therefore, a large ScaleIO configuration results in a massively parallel system. The more servers or disks the system has, the greater the number of parallel channels that will be available for I/O traffic and the higher the aggregated I/O bandwidth and IOPS will be.

Mix-and-match nodes

The vast majority of traditional scale-out systems are based on a “symmetric brick”

architecture. Unfortunately, datacenters cannot be standardized on exactly the same bricks for a prolonged period, because hardware configurations and capabilities change over time. Therefore, such symmetric scale-out architectures are bound to run in small islands. ScaleIO was designed from the ground up to support a mix of new and old nodes with dissimilar configurations.

Hardware agnostic

ScaleIO is platform agnostic and works with existing underlying hardware resources.

Besides its compatibility with various types of disks, networks, and hosts, it can take advantage of the write buffer of existing local RAID controller cards—and can also run in servers that do not have a local RAID controller card.

For the local storage of an SDS, you can use internal disks, directly-attached external disks, virtual disks exposed by an internal RAID controller, partitions within such disks, and more. Partitions can be useful to combine system boot partitions with ScaleIO capacity on the same raw disks. If the system already has a large, mostly unused partition, ScaleIO does not require repartitioning of the disk, as the SDS can actually use a file within that partition as its storage space.

Volume mapping and volume sharing

The volumes that ScaleIO exposes to the application clients can be mapped to one or more clients running in different hosts. Mapping can be changed dynamically if necessary. In other words, ScaleIO volumes can be used by applications that expect shared-everything block access and by applications that expect shared-nothing or shared-nothing-with-failover access.

Clustered, striped volume layout

A ScaleIO volume is a block device that is exposed to one or more hosts. It is the equivalent of a logical unit in the SCSI world. ScaleIO breaks each volume into a large number of data chunks, which are scattered across the SDS cluster’s nodes and disks in a fully balanced manner. This layout practically eliminates hot spots across the cluster and allows for the scaling of the overall I/O performance of the system through the addition of nodes or disks. Furthermore, this layout enables a single application that is accessing a single volume to use the full IOPS of all the cluster’s

(25)

disks. This flexible, dynamic allocation of shared performance resources is one of the major advantages of converged scale-out storage.

Software-only—but as resilient as a hardware array

Traditional storage systems typically combine system software with commodity hardware—which is comparable to application servers’ hardware—to provide enterprise-grade resilience. With its contemporary architecture, ScaleIO provides similar enterprise-grade, no-compromise resilience by running the storage software directly on the application servers. Designed for extensive fault tolerance and high availability, ScaleIO handles all types of failures, including failures of media, connectivity, and nodes, software interruptions, and more. No single point of failure can interrupt the ScaleIO I/O service. In many cases, ScaleIO can overcome multiple points of failure as well.

Managing clusters of nodes

Many storage cluster designs use tightly coupled techniques that might be adequate for a small number of nodes but begin to break when the cluster is larger than a few dozen nodes. The loosely coupled clustering management schemes of ScaleIO provide exceptionally reliable—yet lightweight—failure and failover handling in both small and large clusters.

Most clustering environments assume exclusive ownership of the cluster nodes and might even physically fence or shut down malfunctioning nodes. ScaleIO uses application hosts. The ScaleIO clustering algorithms are designed to work efficiently and reliably without interfering with the applications with which ScaleIO coexists.

ScaleIO will never disconnect or invoke Intelligent Platform Management Interface shutdowns of malfunctioning nodes, because they might still be running healthy applications.

Protection domains

As shown in Figure 9, a large ScaleIO storage pool can be divided into multiple protection domains, each of which contains a set of SDSs. ScaleIO volumes are assigned to specific protection domains. Protection domains are useful for mitigating the risk of a dual point of failure in a two-copy scheme or a triple point of failure in a three-copy scheme.

Figure 9. Protection domains

(26)

For example, if two SDSs that are in different protection domains fail simultaneously, no data will become unavailable. Just as incumbent storage systems can overcome a large number of simultaneous disk failures as long as they do not occur within the same shelf, ScaleIO can overcome a large number of simultaneous disk or node failures as long as they do not occur within the same protection domain.

Management and monitoring

ScaleIO provides several tools to manage and monitor the system, including a command line interface (CLI), an active GUI, and representational state transfer (REST) management application program interface (API) commands. The CLI enables administrators to have direct platform access to perform backend configuration actions and obtain monitoring information.

The active GUI, shown in Figure 10, provides system dashboards for capacity,

throughput, bandwidth statistics, access to system alerts, and the ability to provision backend devices. The REST management API allows users to execute the same management and monitoring commands available with the CLI using a next- generation, cloud-based interface.

Figure 10. ScaleIO active GUI

Interoperability

ScaleIO is integrated with vSphere and OpenStack to provide customers with greater flexibility in deploying ScaleIO with existing environments. The vSphere plug-in facilitates the provisioning of a ScaleIO system in ESX and runs from within the vSphere web interface. Additionally, ScaleIO software can be packaged with EMC ViPR® for management and orchestration functions and with EMC ViPR SRM for additional monitoring and reporting capabilities

The OpenStack integration (“Cinder” support) allows customers to use commodity hardware with ScaleIO, providing a software-defined block volume solution in an OpenStack environment.

(27)

Additionally, ScaleIO software can be packaged with EMC ViPR^® to provide block data services for commodity and EMC ECS™ hardware platforms.

Enterprise Features

Whether you are a service provider delivering hosted infrastructure as a service or a business whose IT department delivers infrastructure as a service to functional units within your organization, ScaleIO offers a set of features that give you complete control over performance, capacity, and data location. For both private cloud data centers and service providers, these features enhance system control and

manageability, ensuring that quality of service is met. With ScaleIO, you can limit the amount of performance—IOPS or bandwidth—that selected customers can consume.

The limiter allows you to impose and regulate resource distribution to prevent

“application hogging” scenarios. You can apply data masking to provide added security for sensitive customer data. ScaleIO offers instantaneous, writable snapshots for data backups.

For improved read performance, dynamic random-access memory (DRAM) caching enables you to improve read access by using SDS server RAM. Fault sets—a group of SDSs that are likely to go down together—can be defined to ensure data mirroring occurs outside the group, improving business continuity. You can create volumes with thin provisioning, providing on-demand storage as well as faster setup and startup times.

Finally, tight integrations with other EMC products are available. You can use ScaleIO in conjunction with EMC XtremCache™ for flash cache auto tiering to further

accelerate application performance.

Figure 11 shows the ScaleIO enterprise features.

Figure 11. ScaleIO enterprise features

(28)

ScaleIO 1.32

ScaleIO 1.32 includes the following new features and functionality:

 Release of the ScaleIO ‘Free and Frictionless’ download, a free download of ScaleIO for non-production environments with no time / function / capacity limits

 Support for VMware ESX 6.0 (VMware certified)

 Support for SUSE Linux Enterprise Server (SLES) 12

 Support for IBM Spectrum Scale™ (General Parallel File System (GPFS)™) over ScaleIO for Linux environments (Red Hat Enterprise Linux (RHEL) / SLES)

 Additional flexibility during the configuration process

 Enhanced background scanning / remediation of data

This section provides guidelines for setting up the storage layer of the solution to provide high availability and the expected level of performance.

vSphere 5.5 allows more than one method of storage when hosting virtual machines.

The tested solution uses block protocols, and the ScaleIO layer described in this section uses all current best practices. A customer or architect with the necessary training and background can make modifications based on their understanding of the system usage and load if required. However, the building blocks described in this document ensure acceptable performance. Chapter 5 lists specific recommendations for customization.

VMware vSphere storage virtualization for VSPEX

vSphere provides host-level storage virtualization, virtualizes the physical storage, and presents the virtualized storage to the virtual machines.

A virtual machine stores its operating system and all the other files related to the virtual machine activities in a virtual disk. The virtual disk itself consists of one or more files. VMware uses a virtual SCSI controller to present virtual disks to a guest operating system running inside the virtual machines.

Virtual disks, as shown in Figure 12, reside on a datastore. Depending on the protocol used, a datastore can be a VMware Virtual Machine File System (VMFS) datastore.

Another option, raw device mapping (RDM), allows the virtual infrastructure to connect a physical device directly to a virtual machine. In our ScaleIO solution, we use VMFS datastore or RDM as the device to provide disk capacity.

(29)

Disk for RDM Disk for VMFS

Virtual machine

ScaleIO volume VMDK

Hypervisor ScaleIO

VMFS

RDM

Figure 12. VMware virtual disk types

VMFS

VMFS is a cluster file system that provides storage virtualization optimized for virtual machines. Deploy over any SCSI-based local or network storage.

Raw Device Mapping (RDM)

VMware also provides RDM, which allows a virtual machine to directly access a volume on the physical storage.

Note: We recommend using RDM mapping in the vSphere environment. The device is created on ScaleIO virtual machines that point to the physical disk on the vSphere server.

Redundancy scheme and rebuild process

ScaleIO uses a mirroring scheme to protect data against disk and node failures. The ScaleIO architecture supports a distributed two-copy redundancy scheme. When an SDS node or SDS disk fails, applications can continue to access ScaleIO volumes;

their data is still available through the remaining mirrors. ScaleIO immediately starts a seamless rebuild process whose goal is to create another mirror for the data chunks that were lost in the failure. In the rebuild process, those data chunks are copied to free areas across the SDS cluster, so it is not necessary to add any capacity to the system. All the surviving SDS cluster nodes together carry out the rebuild process by using the aggregated disk and network bandwidth of the cluster. As a result, the process is dramatically faster—resulting in a shorter exposure time and less

application-performance degradation. On the completion of the rebuild, all the data is fully mirrored and healthy again. If a failed node rejoins the cluster before the rebuild process has been completed, ScaleIO dynamically uses the rejoined node’s data to further minimize the exposure time and the use of resources. This capability is particularly important for overcoming short outages efficiently.

Elasticity and rebalancing

Unlike many other systems, a ScaleIO cluster is extremely elastic. Administrators can add and remove capacity and nodes “on the fly” during I/O operations. When a cluster is expanded with new capacity (as for example when new SDSs or new disks are added to existing SDSs), ScaleIO immediately responds to the event and

rebalances the storage by seamlessly migrating data chunks from the existing SDSs to the new SDSs or disks. Such a migration does not affect the applications, which continue to access the data stored in the migrating chunks. As shown in Figure 13, by High-availability

and failover

(30)

the end of the rebalancing process all the ScaleIO volumes have been spread across all the SDSs and disks, including the newly added ones, in an optimally balanced manner. Thus, adding SDSs or disks not only increases the available capacity, but also increases the performance of the applications as they access their volumes.

Figure 13. Automatic rebalancing when disks are added

When an administrator decreases capacity (for example, by removing SDSs or removing disks from SDSs), ScaleIO performs a seamless migration that rebalances the data across the remaining SDSs and disks in the cluster, as shown in Figure 14.

Figure 14. Automatic rebalancing when disks are removed

Note that in all types of rebalancing, ScaleIO migrates the least amount of data possible. Furthermore, ScaleIO is flexible enough to accept new requests to add or remove capacity while still rebalancing previous capacity additions and removals.

Security layer

The ability to secure data and ensure the identity of devices and users is critical in today’s enterprise IT environment. This is particularly true for regulated sectors such as healthcare, finance, and government. VSPEX solutions can offer many different hardened computing platforms, most commonly by implementing a public-key infrastructure (PKI).

The VSPEX solutions can be engineered with a PKI solution designed to meet the security criteria of your organization. The solution can be implemented with a modular process, where layers of security can be added as needed. The general process implements a PKI infrastructure by replacing generic self-certified certificates with trusted certificates from a third-party certificate authority. Services that support PKI can then be enabled using the trusted certificates to ensure a high degree of authentication and encryption where supported.

Depending on the scope of PKI services needed, it can be necessary to implement a PKI service dedicated to those needs. There are many third party tools that offer PKI Overview

(31)

services. End-to-end solutions from RSA can be deployed within a VSPEX environment. For additional information, visit the RSA website.

(32)

Chapter 3 Sizing the Solution

Overview ... 33 Reference workload... 33 Scalability ... 34 VSPEX building blocks ... 34 Configuration guidelines ... 37

(33)

Overview

This chapter provides definitions of the reference workload used to size and implement the VSPEX architectures. Sizing the environment includes designing the nodes that will be used for the ScaleIO environment and specifying the number of those nodes. This section provides findings from the EMC Solutions group on how variations in node size and number impact the maximum number of supported servers. The virtual machines used in this section correspond to the VSPEX definitions of those workloads.

Reference workload

When you move an existing server to a virtual infrastructure, you can gain efficiency by right-sizing the virtual hardware resources assigned to that system.

Each VSPEX Proven Infrastructure balances the storage, network, and compute resources needed for a set number of virtual machines, as validated by EMC. In practice, each virtual machine has its own requirements that rarely fit a predefined idea of a virtual machine. In any discussion about virtual infrastructures, you need to first define a reference workload. Not all servers perform the same tasks, and it is impractical to build a reference that considers every possible combination of workload characteristics.

To simplify sizing the solution, this section presents a representative customer reference workload. By comparing the actual customer usage to this reference workload, you can determine how to size the solution.

VSPEX Private Cloud solutions define a reference virtual machine (RVM) workload, which represents a common point of comparison. This workload is described in Table 3.

Table 3. VSPEX Private Cloud workload

Parameter Value

Virtual machine OS Windows Server 2012 R2

Virtual CPUs 1

Virtual CPUs per physical core (maximum) 4

Memory per virtual machine 2 GB

IOPS per virtual machine 25

I/O Pattern Fully random skew = 0.5

I/O read percentage 67%

Virtual machine storage capacity 100 GB

This specification for a virtual machine is not intended to represent any specific application. Rather, it represents a single common point of reference against which other virtual machines can be measured.

(34)

Scalability

ScaleIO is designed to scale from three to a large number of nodes. Unlike most traditional storage systems, as the number of servers grow, so does capacity, throughputs, and IOPS. The scalability of performance is linear for the growth of the deployment. Whenever additional storage and compute resources (such as servers and drives) are needed, you can add them modularly. Storage and compute resources grow together so that the balance between them is maintained.

VSPEX building blocks

Sizing the system to meet the virtual server application requirement is a complicated process. When applications generate I/O, server components, such as server CPU, server DRAM cache, and disks, will serve that I/O. Customers must consider various factors when planning and scaling their storage system to balance capacity,

performance, and cost for their applications.

VSPEX uses a building block approach to reduce complexity. A building block is one specific server node that can support a certain number of virtual servers in the VSPEX architecture. Each building block combines several local disk spindles to contribute a shared ScaleIO volume that supports the needs of the private cloud environment.

Both SDS and SDC are installed on each building block node to contribute the server local disk to a ScaleIO storage pool and then expose ScaleIO shared block volumes to run the virtual machines.

The configuration of a reference building block includes the physical CPU core number, memory size, and disk spindle number for a server.

Table 4 shows one specific validated node that provides a flexible solution for VSPEX sizing.

Table 4. Building block node configuration Node parameter Target value Notes

CPU 6 cores The Customize the building block section provides more information on how to create building block configurations.

Memory 64 GB According to VSPEX configuration guidelines, this configuration can support up to a maximum of 30 virtual machines.

Disks 6 x 600 GB

10 k RPM SAS

Disk capacity, rather than performance, limits the configuration for a VSPEX Private Cloud.

This configuration contains six SAS disks per node. The validated solution modeled these drives at 600 GB each. For the private cloud workload definition, we were limited more by drive capacity than by drive IOPS. With this configuration, up to 12 virtual machines can be supported by one building block.

Building blocks approach

Validated building blocks

(35)

Reference building blocks are a starting point to plan a virtual infrastructure. In this section, we will discuss customizing building block nodes to meet specific customer needs.

The node configuration shown in Table 6 defines the CPU, memory, and disk

configuration for one server. However, ScaleIO is infrastructure-agnostic and can run on any server. This solution also provides more options for the building block node configuration. You can redefine a building block with different configurations, but after the building block configuration is redefined, the virtual machine number that the building block can support is also changed.

To calculate the virtual machine that the new building block can support, we must consider the following components:

 CPU capability

For VSPEX systems, we recommend a maximum of 4 vCPUs for each physical core in a virtual machine environment. For example, a server node with 16 physical cores can support up to 64 virtual machines.

 Memory capability

When sizing the memory for a server node, the ScaleIO virtual machine and hypervisor must be considered. We tested a ScaleIO virtual machine that consumes 3 GB of RAM, and reserves 2 GB RAM for the hypervisor. We do not recommend using memory overcommit in this environment.

Note: ScaleIO 1.3 introduces a new RAM cache feature by using the SDS server RAM.

By default, the RAM size of the ScaleIO virtual machine is set to 3 GB and 128 MB of the RAM uses the SDS server RAM cache. Add the RAM size to the 3 GB of the ScaleIO virtual machine if more RAM cache is used.

Disk capacity

ScaleIO uses a RAIN topology to ensure data availability. In general, the capacity available is a function of the capacity per node (formatted capacity) and the number of nodes available.

Assuming N nodes and C TB of capacity per server, the storage available, S, is:

𝑆 =(𝑁 − 1) ∗ 𝐶 2

This formula accounts for two copies of data and the ability to survive a single node failure. The values in Table 5 assume sufficient CPU and memory resources for each node.

Customize the building block

(36)

Table 5. Maximum number of virtual machines per node in three-node cluster environment, limited by disk capacity

Disk capacity (GB)

Disks per node

3 4 5 6 7 8 9 10

600 6 8 10 12 14 16 18 20

900 9 12 15 18 21 24 27 30

1200 12 16 20 24 28 32 36 40

1500 15 20 25 30 35 40 45 50

IOPS

The primary method for adding IOPS capability to a node without considering cache technologies is to increase the number of disk units or increase the speed of those units. Table 6 shows the number of virtual machines supported with 4, 6, 8, or 10 SAS drives per node, limited by disk performance.

Table 6. Maximum number of virtual machines per node, limited by disk performance 10 K SAS drives Number of virtual machines

4 20

6 30

8 40

10 50

Note: The values in Table 6 assume that the CPU and memory resource of each node are sufficient.

Determine the maximum number of virtual machines on the building block node With the entire configuration defined for the building block node, we calculate the number of virtual machines that each component can support to find out the number of virtual machines that the building block node can support.

For example, consider the redefined building block configuration in Table 7.

Table 7. Redefined building block node configuration example

Physical CPU cores Memory (GB) 10 K SAS drive capacity

16 128 10 * 1500 GB

As a result, the calculations in Table 8 are applied, giving a new supported virtual machine count for this node.

(37)

Table 8. Node sizing example

The final number that this building block node can support is 24 virtual machines, which is the minimum number for the CPU, memory, and disks according to the calculation results.

Figure 15 shows how to determine the maximum number of virtual machines that a customer redefined building block configuration can support.

50 virtual machines support CPU

RAM

IOPS

64 virtual machines supported by CPU 61 virtual machines supported by memory 50 virtual machines supported by disk IOPS

Capacity 50 virtual machines supported by disk Capacity

Figure 15. Determine the maximum number of virtual machines that a building block configuration can support

Configuration guidelines

To choose the appropriate reference architecture for a customer environment, determine the resource requirements of the environment and then translate these requirements to an equivalent number of reference virtual machines that have the characteristics defined in Table 4. This section describes how to use the worksheet to simplify the sizing calculations and additional factors you should take into

consideration when deciding which architecture to deploy.

The Customer configuration worksheet helps you to assess the customer environment and calculate the sizing requirements of the environment.

Table 9 shows a completed worksheet for a sample customer environment. Appendix B provides a blank worksheet that you can print and use to help size the solution.

Physical attribute VMs supported Calculation

CPU cores: 16 64 16 cores * 4 VMs per core = 64 VMs

RAM: 128 GB 61 (128 GB total RAM – 2GB (Hypervisor Reserved) – 3GB (ScaleIO VM)) / 2 = 61.5

Storage capacity:

1500 GB 50 See Table 5.

Storage

performance: 50 See Table 6.

Introduction to the Customer

configuration worksheet

Use the Customer configuration worksheet

EMC VSPEX PRIVATE CLOUD: