EMC VSPEX PRIVATE CLOUD:
Microsoft Hyper-V and EMC ScaleIO
EMC VSPEX
Abstract
This document describes the EMC® VSPEX® Proven Infrastructure solution for private cloud deployments with Microsoft Hyper-V and EMC ScaleIO® technology.
June 2015
Copyright © 2015 EMC Corporation. All rights reserved. Published in the USA.
Published June 2015
EMC believes the information in this publication is accurate as of its publication date.
The information is subject to change without notice.
The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
EMC2, EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.
EMC VSPEX Private Cloud: Microsoft Hyper-V and EMC ScaleIO Proven Infrastructure Guide
Part Number H14028
Contents
Chapter 1 Executive Summary 7
Introduction ... 8
Target audience ... 9
Document purpose ... 9
Business needs ... 9
Chapter 2 Solution Architecture Overview 11 Overview ... 12
Solution architecture ... 12
High-level architecture ... 12
Logical architecture ... 13
Key components ... 14
Virtualization layer ... 15
Overview ... 15
Configuration guidelines ... 15
High-availability and failover ... 16
Compute layer ... 16
Overview ... 16
Configuration guidelines ... 17
High-availability and failover ... 17
Network layer ... 18
Overview ... 18
Configuration guidelines ... 18
High-availability and failover ... 20
Storage layer ... 20
Overview ... 20
Configuration guidelines ... 26
High-availability and failover ... 27
Chapter 3 Sizing the Environment 29 Overview ... 30
Reference workload ... 30
Scalability ... 31
VSPEX building blocks ... 31
Building block approach ... 31
Validated building block ... 31
Customizing the building block ... 32
Configuration sizing guidelines ... 34
Introduction to the Customer configuration worksheet ... 34
Using the customer sizing worksheet ... 34
Calculating the building block requirement ... 37
Fine-tuning hardware resources ... 38
Summary ... 39
Chapter 4 VSPEX Solution Implementation 40 Overview ... 41
Network implementation ... 41
Preparing the network switches ... 42
Configuring the infrastructure network ... 42
Configuring the VLANs ... 42
Completing the network cabling... 42
Installing and configuring the Microsoft Hyper-V hosts ... 42
Installing and configuring Microsoft SQL Server databases ... 43
Overview ... 43
Deploying the System Center Virtual Machine Manager server ... 44
Overview ... 44
Preparing and configuring the storage ... 45
Prepare the ScaleIO nodes ... 46
Preparing the installation worksheet ... 47
Installing the ScaleIO components... 48
Creating and mapping volumes ... 53
Installing the GUI ... 56
Provisioning a virtual machine ... 56
Summary ... 56
Chapter 5 Verifying the Solution 57 Overview ... 58
Post-install checklist ... 59
Deploying and testing a single virtual server ... 59
Verifying the redundancy of the solution components ... 59
Chapter 6 System Monitoring 60 Overview ... 61
Key areas to monitor ... 61
Performance baseline ... 61
Servers ... 62
Networking ... 62
ScaleIO layer... 62
Appendix A Reference Documentation 63 EMC documentation ... 64
Other documentation ... 64
Appendix B Customer Configuration Worksheet 65 Customer configuration worksheet ... 66
Printing the worksheet ... 67
Appendix C Customer Sizing Worksheet 69 Customer sizing worksheet for Private Cloud ... 70
Figures Figure 1. VSPEX Proven Infrastructures ... 8
Figure 2. VSPEX private cloud components ... 12
Figure 3. Logical architecture for the solution ... 13
Figure 4. High availability at the virtualization layer ... 16
Figure 5. Redundant power supplies ... 18
Figure 6. Required networks for ScaleIO ... 19
Figure 7. Network layer high availability ... 20
Figure 8. Protection domains ... 23
Figure 9. ScaleIO active GUI ... 24
Figure 10. ScaleIO enterprise features ... 25
Figure 11. Hyper-V virtual disk types ... 26
Figure 12. Automatic rebalancing when disks are added... 28
Figure 13. Automatic rebalancing when disks or nodes are removed ... 28
Figure 14. Determine the maximum number of virtual machines that a building block can support ... 34
Figure 15. Required resource from the reference virtual machine pool ... 37
Figure 16. Disk format partition option ... 46
Figure 17. Installation Manager Home page ... 49
Figure 18. Manage installation packages ... 49
Figure 19. Upload installation packages ... 50
Figure 20. Upload CSV file... 50
Figure 21. Installation configuration ... 51
Figure 22. Monitor page ... 52
Figure 23. Completed Install Operation ... 53
Tables
Table 1. Solution architecture configuration ... 13
Table 2. Recommended 10 Gb switched Ethernet network layer ... 19
Table 3. VSPEX Private Cloud workload ... 30
Table 4. Building block node configuration ... 31
Table 5. Maximum number of virtual machines per node, limited by disk capacity32 Table 6. Maximum number of virtual machines per node, limited by disk performance ... 33
Table 7. Redefined building block node configuration example ... 33
Table 8. Node sizing example ... 33
Table 9. Customer sizing worksheet example ... 34
Table 10. Reference virtual machine resources ... 36
Table 11. Example worksheet row ... 36
Table 12. Node scaling example ... 37
Table 13. Server resource component totals ... 38
Table 14. Deployment process overview ... 41
Table 15. Tasks for switch and network configuration ... 41
Table 16. Tasks for server installation ... 43
Table 17. Tasks for SQL Server database setup ... 43
Table 18. Tasks for SCVMM configuration ... 44
Table 19. Set up and configure a ScaleIO environment ... 45
Table 20. CSV installation spreadsheet ... 47
Table 21. add_volume command parameters ... 54
Table 22. map_volume_to_sdc command parameters ... 55
Table 23. Tasks for testing the installation ... 58
Table 24. Common server information ... 66
Table 25. Hyper-V server information ... 66
Table 26. ScaleIO information ... 66
Table 27. Network infrastructure information ... 67
Table 28. VLAN information ... 67
Table 29. Service accounts ... 67
Table 30. Customer sizing worksheet ... 70
This chapter presents the following topics:
Introduction ... 8
Target audience ... 9
Document purpose ... 9
Business needs ... 9
Introduction
EMC® VSPEX® Proven Infrastructures are optimized for virtualizing business-critical applications. VSPEX provides modular solutions built with technologies that enable faster deployment, greater simplicity, greater choice, higher efficiency, and lower risk.
Figure 1 shows the modular, virtualized infrastructures validated by EMC and
delivered by EMC VSPEX partners. Partners can choose the virtualization, server, and network technologies that best fit a customer’s environment, while the server’s local disks with elastic EMC ScaleIO® software provide the storage.
Figure 1. VSPEX Proven Infrastructures
This guide is a comprehensive guide to the technical aspects of the VSPEX Private Cloud for Microsoft Hyper-V with EMC ScaleIO solution. This guide describes the solution architecture and key components, and describes how to design, size, and deploy the solution to meet the customer’s needs.
Target audience
Readers of this guide must have the necessary training and background to install and configure a VSPEX solution based on the Hyper-V hypervisor, ScaleIO, and associated infrastructure, as required by this implementation. External references are provided where applicable, and readers should be familiar with these documents.
Readers should also be familiar with the infrastructure and database security policies of the customer installation.
Partners selling and sizing a VSPEX Private Cloud with ScaleIO infrastructure should focus on the first five chapters of this guide. After purchase, implementers of the solution should focus on the implementation guidelines in Chapter 4, the solution validation in Chapter 5, and the monitoring guidelines in Chapter 6.
Document purpose
This guide includes an initial introduction to the VSPEX architecture, an explanation of how to modify the architecture for specific engagements, and instructions on how to effectively deploy and monitor the system.
The VSPEX Private Cloud architecture provides customers with a modern system capable of hosting many virtual machines at a consistent performance level. This solution runs on a Microsoft Hyper-V virtualization layer. EMC ScaleIO software runs on top of the Hyper-V hypervisor. The compute and network components, which are defined by the VSPEX partners, are designed to be redundant and sufficiently powerful to handle the processing and data needs of the virtual machine
environment. This guide details server capacity minimums for CPU, memory, and network interfaces. The customer can select any server and networking hardware that meets or exceeds the stated minimums.
The solution described in this guide is based on the capacity of the cluster server and on a defined reference workload. Because not every virtual machine has the same requirements, this guide includes methods and guidance to adjust the system to be cost effective as deployed.
A private cloud architecture is a complex system offering. This guide provides prerequisite software and hardware material lists, step-by-step sizing guidance and worksheets, and verified deployment steps. After you install the last component, the validation tests and monitoring instructions ensure that your system is running properly.
Business needs
EMC builds VSPEX solutions with proven technologies to create complete
virtualization solutions that allow you to make informed decisions for the hypervisor, server, and networking layers.
Business applications are moving into consolidated compute, network, and storage environments. This solution reduces the complexity of configuring every component
of a traditional deployment model. The solution simplifies integration management while maintaining application design and implementation options. It also provides unified administration while still enabling adequate control and monitoring of process separation.
The business benefits of the architectures include:
An end-to-end virtualization solution to effectively use the capabilities of the unified infrastructure components
Efficient virtualization of virtual machines for varied customer use cases
A reliable, flexible, and scalable reference design
Chapter 2 Solution Architecture Overview
This chapter presents the following topics:
Overview ... 12
Solution architecture ... 12
Key components ... 14
Virtualization layer ... 15
Compute layer ... 16
Network layer ... 18
Storage layer ... 20
Overview
This chapter provides a comprehensive guide to the major aspects of this solution. It generically presents server capacity required minimums for CPU, memory, and network resources. You can select server and networking hardware that meets or exceeds the stated minimums. EMC has validated the specified ScaleIO architecture, and the fulfillment of server and network requirements, to provide high levels of performance while delivering a highly available architecture for your private cloud deployment.
Solution architecture
EMC has designed and proven this solution to provide virtualization, server, network, and storage resources that enable customers to deploy a small-scale architecture and scale as their business needs change.
Figure 2 shows the high-level architecture of the validated solution.
Compute components
Hypervisor
Virtual servers Virtual servers
……….
connectionsNetworkNetwork
Supporting infrastructure
Network components Storage
components
Virtualization components
Storage network
SDS/SDC SDS/SDC
SDS/SDC
Figure 2. VSPEX private cloud components
The solution uses ScaleIO software and Hyper-V to provide the storage and
virtualization platforms for an environment of Microsoft Windows server 2012 virtual machines provisioned by the Hyper-V platform.
To provide predictable performance for end-user computing solutions, the storage system must be able to handle the peak I/O load from the clients while keeping High-level
architecture
response time to a minimum. In this solution, we used ScaleIO software to use the servers’ local disks to build the storage system with high performance and scalability.
Figure 3 shows the logical architecture of this solution.
Windows Server 2012 R2 Hyper-V cluster EMC ScaleIO
Microsoft Windows 2012 R2 Hyper-V virtual servers
Virtual server 1 Virtual server n
……….
10 GbE IP Network SCVMM
SQL Server
DNS Server
Active Directory Server
Shared infrastructure Storage
network
Figure 3. Logical architecture for the solution
Table 1 lists the solution configuration components.
Table 1. Solution architecture configuration Component Solution configuration
Microsoft Hyper-V Hyper-V provides a common virtualization layer to host the server environment. Hyper-V provides a highly available infrastructure through features such as Live Migration, Failover Clustering, and High Availability (HA).
Microsoft System Center Virtual Machine Manager (SCVMM)
SCVMM is not required for this solution. However, if deployed, SCVMM (or its corresponding functionality in Microsoft System Center Essentials) simplifies provisioning, management, and monitoring of the Hyper-V environment.
EMC ScaleIO ScaleIO software provides a storage layer to host and store virtual machines.
Microsoft SQL Server SCVMM requires an SQL Server database instance to store configuration and monitoring details.
Logical architecture
Component Solution configuration
Active Directory server Active Directory services are required for the various solution components to function properly. We used the Microsoft Active Directory Service running on a Windows Server 2012 R2 server for this purpose.
DHCP server The dynamic host configuration protocol (DHCP) server centrally manages the IP address scheme for the virtual machines. This service is hosted on the same virtual machine as the domain controller and domain name server (DNS). The Microsoft DHCP Service running on a Windows 2012 R2 server is used for this purpose.
DNS server DNS services are required for the various solution components to perform name resolution. The Microsoft DNS Service running on a Windows 2012 R2 server is used for this purpose.
IP networks A standard Ethernet network with redundant cabling and switching carries all network traffic. A shared network carries user and management traffic, while a private, non-routable subnet carries virtual SAN (vSAN) storage traffic.
Key components
The key components of this solution include:
Virtualization layer— Decouples the physical implementation of resources from the applications that use the resources, so that the application view of the available resources is no longer directly tied to the hardware. This enables many key features required by the private cloud.
Compute layer— Provides memory and processing resources for the virtualization layer software and for the applications running in the private cloud. The VSPEX program defines the minimum amount of required compute layer resources, and implements the solution by using any server hardware that meets these requirements.
Network layer— Connects users of the private cloud to the resources in the cloud, and connects the storage layer to the compute layer. The VSPEX program defines the minimum number of required network ports, provides general guidance on network architecture, and enables you to implement the solution by using any network hardware that meets these requirements.
Storage layer— Provides storage to implement the private cloud. ScaleIO implements a pure block storage layout with converged nodes to support compute and storage. With multiple hosts accessing shared data through ScaleIO components, ScaleIO provides high-performance data storage while maintaining high availability.
Virtualization layer
Hyper-V performs the hypervisor-based virtualization role for Microsoft Windows Server and provides the virtualization platform for this solution.
Hyper-V live migration and live storage migration enable seamless movement of virtual machines or virtual machines files between Hyper-V servers or storage systems, transparently and with minimal performance impact.
Hyper-V works with Windows Server 2012 Failover Clustering and Cluster Shared Volumes (CSVs) to provide high availability in a virtualized infrastructure, significantly increasing the availability of virtual machines during planned and unplanned downtime. Configure Failover Clustering on the Hyper-V host to monitor virtual machine health and to migrate virtual machines between cluster nodes.
Hyper-V Replica provides asynchronous replication of virtual machines between two Hyper-V hosts at separate sites. Hyper-V replicas protect business
applications in the Hyper-V environment from downtime associated with an outage at a single site.
Hyper-V snapshots provide consistent point-in-time views of a virtual machine and enables users to revert the virtual machine to a previous point-in-time if necessary. Snapshots function as the source for backups, test and
development activities, and other use cases.
Microsoft System Center Virtual Machine Manager
Microsoft System Center Virtual Machine Manager (SCVMM) is a centralized management platform that enables datacenter administrators to configure and manage virtualized host, networking, and storage resources, and to create and deploy virtual machines and services to private clouds. SCVMM simplifies provisioning, management, and monitoring in the Hyper-V environment.
Windows Server Cluster-Aware Updating
Windows Cluster-Aware Updating (CAU) enables updating of cluster nodes with little or no loss of availability. CAU is integrated with Windows Server Update Services (WSUS) and can be automated using PowerShell.
Hyper-V has several advanced features that help maximize performance and overall resource utilization. The most important features relate to memory management. This section describes some of these features, and the items to consider when using these features in a VSPEX environment.
Dynamic Memory and Smart Paging
Dynamic Memory increases physical memory efficiency by treating memory as a shared resource, dynamically allocating it to virtual machines, and reclaiming unused memory from idle virtual machines. Administrators can dynamically adjust the
amount of memory used by each virtual machine at any time.
With Dynamic Memory, Hyper-V allows more virtual machines than the available physical memory can support. This introduces the risk that there might not be Overview
Configuration guidelines
sufficient physical memory available to restart a virtual machine if required. Smart Paging is a memory management technique that uses disk resources as a temporary memory replacement when more memory is required to restart a virtual machine.
Non-Uniform Memory Access
Non-Uniform Memory Access (NUMA) is a multinode technology that enables a CPU to access remote-node memory. Because this type of memory access degrades
performance, Windows Server 2012 uses processor affinity, which pins threads to a single CPU, to avoid remote-node memory access. This feature is available to the host and to the virtual machines, where it provides improved performance in symmetrical multiprocessor (SMP) environments.
Hyper-V memory overhead
Virtualized memory has some associated overhead, including the memory consumed by the Hyper-V the parent partition and additional overhead for each virtual machine.
Leave at least 2 GB memory for the Hyper-V parent partition in this solution.
Virtual machine memory
Each virtual machine in this solution is assigned 2 GB memory in fixed mode.
Configure high availability in the virtualization layer, and enable the hypervisor to restart failed virtual machines automatically. Figure 4 illustrates the hypervisor layer responding to a failure in the compute layer.
Figure 4. High availability at the virtualization layer
Implementing high availability at the virtualization layer ensures that, even in the event of a hardware failure, the infrastructure will attempt to keep as many services running as possible.
Compute layer
The choice of a server platform for a VSPEX infrastructure is not only based on the technical requirements of the environment, but on the supportability of the platform, existing relationships with the server provider, advanced performance and
management features, and many other factors. For these reasons, VSPEX solutions are designed to run on a wide variety of server platforms. Instead of requiring a specific number of servers with a specific set of requirements, VSPEX defines the minimum requirements for the number of processor cores and the amount of RAM.
ScaleIO components are designed to work with a minimum of three server nodes. The physical server node, running Hyper-V, can host workloads other than the ScaleIO virtual machine.
High-availability and failover
Overview
When designing and ordering the compute layer of this VSPEX solution, several factors can affect the final purchase. If you understand the system workload well, then you can use virtualization features such as memory ballooning and transparent page sharing to reduce the aggregate memory requirement.
You can reduce the number of virtual CPUS (vCPUs) if the virtual machine pool does not have a high level of peak or concurrent usage. Conversely, if the deployed applications are highly computational, you might need to increase the number of CPUs and the amount of memory.
Apply the following best practices in the compute layer:
Use identical, or at least compatible, servers. VSPEX implements hypervisor- level high-availability technologies that may require similar instruction sets on the underlying physical hardware. By implementing VSPEX on identical server units, you can minimize compatibility problems in this area.
When implementing high availability at the hypervisor layer, the largest virtual machine you can create is constrained by the smallest physical server in the environment.
Note: To enable high availability for the compute layer, each customer needs one additional server to ensure that the system has enough capacity to maintain business operations when a server fails.
Implement the available high-availability features in the virtualization layer, and ensure that the compute layer has sufficient resources to accommodate at least single server failures. This enables the implementation of minimal- downtime upgrades and tolerance for single unit failures.
Within the boundaries of these recommendations and best practices, the VSPEX compute layer can be flexible to meet your specific needs. Ensure that there are sufficient processor cores and RAM per core to meet the needs of the target environment.
While the choice of servers to implement in the compute layer is flexible, use enterprise-class servers designed for the datacenter. This type of server has
redundant power supplies, as shown in Figure 5. Connect these servers to separate power distribution units (PDUs) in accordance with your server vendor’s best practices.
Configuration guidelines
High-availability and failover
Figure 5. Redundant power supplies
To configure high availability in the virtualization layer, configure the compute layer with enough resources to meet the needs of the environment, even with a server failure, as shown in Figure 4.
Network layer
The infrastructure network requires redundant network links for each Hyper-V host.
This configuration provides both redundancy and additional network bandwidth. This is a required configuration regardless of whether the network infrastructure for the solution already exists, or you are deploying it alongside other components of the solution.
This section provides guidelines for setting up a redundant, highly- available network configuration. The guidelines consider virtual LANS (VLANs) and the ScaleIO network layer.
ScaleIO network
The ScaleIO network creates a Redundant Array of Independent Nodes (RAIN)
topology between the server nodes, distributing data so that the loss of a single node does not affect data availability. This topology requires ScaleIO nodes to send data to other nodes to maintain consistency.
A high-speed, low-latency IP network is required for this to work correctly. We1 created the test environment with redundant 10 Gb Ethernet networks. The network was not heavily used during testing at small scale points. For that reason, at small points of scale, you can implement the solution using 1 Gb networks. However, EMC recommends a 10 GbE IP network designed for high availability, as shown in Table 2.
1 In this guide, “we” refers to the EMC Solutions engineering team that validated the solution.
Overview
Configuration guidelines
Table 2. Recommended 10 Gb switched Ethernet network layer
Nodes 10 Gb switched Ethernet 1 Gb switched Ethernet 3
Recommended Possible
4 5 6
7 Not recommended
VLANs
Isolate network traffic so that management traffic, traffic between hosts and storage, and traffic between hosts and clients all move over isolated networks. Physical isolation might be required in some cases for regulatory or policy compliance reasons. Logical isolation with VLANs is sufficient in many cases.
EMC recommends separating the network into two types for security and increased efficiency:
A management network, used to connect and manage the ScaleIO environment.
This network is generally connected to the client management network.
Because this network has less I/O traffic, EMC recommends a 1 GbE network.
An internal data network, used for communication between the ScaleIO components. This is generally a 10 GbE network.
In this solution, we used one VLAN for client access and one VLAN for management.
Figure 6 depicts the VLANs and the network connectivity requirements for a ScaleIO environment.
Servers
...
Management Network
Client access network
Management network Storage network
Figure 6. Required networks for ScaleIO
You can use the client access network to communicate with the ScaleIO
infrastructure. The network provides communication between each ScaleIO node.
Administrators use the management network as a dedicated way to access the management connections on the ScaleIO software components, network switches, and hosts.
Note: Some best practices need additional network isolation for cluster traffic, virtualization layer communication, and other features. Implement these additional networks if necessary.
Each Windows host has multiple connections to user and Ethernet networks to guard against link failures, as shown in Figure 7. Spread these connections across multiple Ethernet switches to guard against component failure in the network.
`
STAT
CONSOLE L1
L2 MGMT 0
MGMT 1 481216
371115
261014
15913
20242832
19232731
18222630
17212529
SLOT2
3438
3337
3640
3539
SLOT3
Cisco Nexus 5020
PS1PS2
200-240v-6A 50~60Hz
`
STAT
CONSOLE L1
L2 MGMT 0
MGMT 1 481216
371115
261014
15913
20242832
19232731
18222630
17212529
SLOT2
3438
3337
3640
3539
SLOT3
Cisco Nexus 5020
PS1PS2
200-240v-6A 50~60Hz
Server connects to multiple switches
Switches connect to each other
Network
Figure 7. Network layer high availability
Storage layer
ScaleIO is a software-only solution that uses hosts’ existing local disks and LAN to realize a vSAN that has all the benefits of external storage—but at a fraction of the cost and the complexity. ScaleIO turns local internal storage into shared block
storage that is comparable to or better than the more expensive external shared block storage. The lightweight ScaleIO software components are installed in the application hosts and communicate using a standard LAN to handle the application I/O requests sent to ScaleIO block volumes. An extremely efficient decentralized block I/O flow, combined with a distributed, sliced volume layout, results in a massively parallel I/O system that can scale to hundreds and thousands of nodes.
High-availability and failover
Overview
ScaleIO is designed and implemented with enterprise-grade resilience as an essential attribute. Furthermore, the software features efficient distributed auto-healing
processes that overcome media and node failures without requiring administrator involvement. Dynamic and elastic, ScaleIO enables administrators to add or remove nodes and capacity “on the fly.” The software immediately responds to the changes, rebalancing the storage distribution and achieving a layout that optimally suits the new configuration.
Architecture
Software components
The ScaleIO Data Client (SDC) is a lightweight device driver situated in each host whose applications or file system requires access to the ScaleIO virtual SAN block devices. The SDC exposes block devices representing the ScaleIO volumes that are currently mapped to that host.
The ScaleIO Data Server (SDS) is a lightweight software component within each host that contributes local storage to the central ScaleIO vSAN.
Convergence of storage and compute
The ScaleIO software components, which have a negligible impact on the applications running in the hosts, are carefully designed and implemented to consume the minimum computing resources required for operation.
ScaleIO converges the storage and application layers. The hosts that run applications can also be used to realize shared storage, yielding a wall-to-wall, single layer of hosts. Because the same hosts run applications and provide storage for the vSAN, an SDC and SDS are typically both installed in each of the participating hosts.
Pure block storage implementation
ScaleIO implements a pure block storage layout. Its entire architecture and data path are optimized for block storage access needs. For example, when an application submits a read I/O request to its SDC, the SDC instantly deduces which SDS is responsible for the specified volume address and then interacts directly with the relevant SDS. The SDS reads the data (by issuing a single read I/O request to its local storage or by just fetching the data from the cache in a cache-hit scenario), and returns the result to the SDC. The SDC provides the read data to the application.
This flow is simple, consuming as few resources as necessary. The data moves over the network exactly once, and a single I/O request is sent to the SDS storage. The write I/O flow is similarly simple and efficient. Unlike some block storage systems that run on top of a file system or object storage that runs on top of a local file system, ScaleIO offers optimal I/O efficiency.
Massively parallel, scale-out I/O architecture
ScaleIO can scale to a large number of nodes, thus breaking the traditional scalability barrier of block storage. Because the SDCs propagate the I/O requests directly to the pertinent SDSs, there is no central point through which the requests move—and thus a potential bottleneck is avoided. This decentralized data flow is crucial to the linearly scalable performance of ScaleIO. Therefore, a large ScaleIO configuration results in a massively parallel system. The more servers or disks the system has, the
greater the number of parallel channels that will be available for I/O traffic and the higher the aggregated I/O bandwidth and IOPS will be.
Mix-and-match nodes
The vast majority of traditional scale-out systems are based on a “symmetric brick”
architecture. Unfortunately, datacenters cannot be standardized on exactly the same bricks for a prolonged period, because hardware configurations and capabilities change over time. Therefore, such symmetric scale-out architectures are bound to run in small islands. ScaleIO was designed from the ground up to support a mix of new and old nodes with dissimilar configurations.
Hardware agnostic
ScaleIO is platform agnostic and works with existing underlying hardware resources.
Besides its compatibility with various types of disks, networks, and hosts, it can take advantage of the write buffer of existing local RAID controller cards—and can also run in servers that do not have a local RAID controller card.
For the local storage of an SDS, you can use internal disks, directly attached external disks, virtual disks exposed by an internal RAID controller, partitions within such disks, and more. Partitions can be useful to combine system boot partitions with ScaleIO capacity on the same raw disks. If the system already has a large, mostly unused partition, ScaleIO does not require repartitioning of the disk, as the SDS can actually use a file within that partition as its storage space.
Volume mapping and volume sharing
The volumes that ScaleIO exposes to the application clients can be mapped to one or more clients running in different hosts. Mapping can be changed dynamically if necessary. In other words, ScaleIO volumes can be used by applications that expect shared-everything block access and by applications that expect shared-nothing or shared-nothing-with-failover access.
Clustered, striped volume layout
A ScaleIO volume is a block device that is exposed to one or more hosts. It is the equivalent of a logical unit in the SCSI world. ScaleIO breaks each volume into a large number of data chunks, which are scattered across the SDS cluster’s nodes and disks in a fully balanced manner. This layout practically eliminates hot spots across the cluster and allows for the scaling of the overall I/O performance of the system through the addition of nodes or disks. Furthermore, this layout enables a single application that is accessing a single volume to use the full IOPS of all the cluster’s disks. This flexible, dynamic allocation of shared performance resources is one of the major advantages of converged scale-out storage.
Software-only—but as resilient as a hardware array
Traditional storage systems typically combine system software with commodity hardware—which is comparable to application servers’ hardware—to provide enterprise-grade resilience. With its contemporary architecture, ScaleIO provides similar enterprise-grade, no-compromise resilience by running the storage software directly on the application servers. Designed for extensive fault tolerance and high availability, ScaleIO handles all types of failures, including failures of media, connectivity, and nodes, software interruptions, and more. No single point of failure
can interrupt the ScaleIO I/O service. In many cases, ScaleIO can overcome multiple points of failure as well.
Managing clusters of nodes
Many storage cluster designs use tightly coupled techniques that might be adequate for a small number of nodes but begin to break when the cluster is larger than a few dozen nodes. The loosely coupled clustering management schemes of ScaleIO provide exceptionally reliable—yet lightweight—failure and failover handling in both small and large clusters.
Most clustering environments assume exclusive ownership of the cluster nodes and might even physically fence or shut down malfunctioning nodes. ScaleIO uses application hosts. The ScaleIO clustering algorithms are designed to work efficiently and reliably without interfering with the applications with which ScaleIO coexists.
ScaleIO will never disconnect or invoke Intelligent Platform Management Interface shutdowns of malfunctioning nodes, because they might still be running healthy applications.
Protection domains
As shown in Figure 8, you can divide a large ScaleIO storage pool into multiple protection domains, each of which contains a set of SDSs. ScaleIO volumes are assigned to specific protection domains. Protection domains are useful for mitigating the risk of a dual point of failure in a two-copy scheme or a triple point of failure in a three-copy scheme.
Figure 8. Protection domains
For example, if two SDSs that are in different protection domains fail simultaneously, no data will become unavailable. Just as incumbent storage systems can overcome a large number of simultaneous disk failures as long as they do not occur within the same shelf, ScaleIO can overcome a large number of simultaneous disk or node failures as long as they do not occur within the same protection domain.
Management and monitoring
ScaleIO provides several tools to manage and monitor the system, including a command line interface (CLI), an active GUI, and representational state transfer (REST) management application program interface (API) commands. The CLI enables
administrators to have direct platform access to perform backend configuration actions and obtain monitoring information.
The active GUI, shown in Figure 9, provides system dashboards for capacity,
throughput, bandwidth statistics, access to system alerts, and the ability to provision backend devices. The REST management API allows users to execute the same management and monitoring commands available with the CLI using a next- generation, cloud-based interface.
Figure 9. ScaleIO active GUI
Interoperability
ScaleIO is integrated with Hyper-V and OpenStack to provide customers with greater flexibility in deploying ScaleIO with existing environments. The OpenStack integration (“Cinder” support) allows customers to use commodity hardware with ScaleIO, providing a software-defined block volume solution in an OpenStack environment.
Additionally, ScaleIO software can be packaged with EMC ViPR® for management and orchestration functions and with EMC ViPR SRM for additional monitoring and reporting capabilities
Enterprise Features
Whether you are a service provider delivering hosted infrastructure as a service or your IT department delivers infrastructure as a service to functional units within your organization, ScaleIO offers a set of features that gives you complete control over performance, capacity, and data location. For both private cloud datacenters and service providers, these features enhance system control and manageability, ensuring that quality of service is met. With ScaleIO, you can limit the amount of performance—IOPS or bandwidth—that selected customers can consume. The limiter allows you to impose and regulate resource distribution to prevent application
“hogging” scenarios. You can apply data masking to provide added security for sensitive customer data. ScaleIO offers instantaneous, writeable snapshots for data backups.
For improved read performance, dynamic random-access memory (DRAM) caching enables you to improve read access by using SDS server RAM. Fault sets—a group of SDS that are likely to go down together—can be defined to ensure data mirroring occurs outside the group, improving business continuity. You can create volumes with thin provisioning, providing on-demand storage as well as faster setup and startup times.
Finally, tight integrations with other EMC products are available. You can use ScaleIO in conjunction with EMC XtremCache™ for flash cache auto tiering to further
accelerate application performance.
Figure 10 shows the ScaleIO enterprise features.
Figure 10. ScaleIO enterprise features
ScaleIO 1.32
ScaleIO 1.32 includes the following new features and functionality:
Release of the ScaleIO ‘Free and Frictionless’ download, a free download of ScaleIO for non-production environments with no time / function / capacity limits
Support for VMware ESX 6.0 (VMware certified)
Support for SUSE Linux Enterprise Server (SLES) 12
Support for IBM Spectrum Scale™ (General Parallel File System (GPFS)™) over ScaleIO for Linux environments (Red Hat Enterprise Linux (RHEL) / SLES)
Additional flexibility during the configuration process
This section provides guidelines for setting up the storage layer of the solution to provide high availability and the expected level of performance.
Microsoft Hyper-V supports more than one method of storage when hosting virtual machines. The ScaleIO solution is based on block protocols, and the ScaleIO layer described in this section uses all current best practices. A customer or architect with the necessary training and background can make modifications based on their understanding of the system’s usage and load if required. However, the building blocks described in Chapter 3 ensure acceptable performance.
Hyper-V storage virtualization
Windows Server 2012 Hyper-V and Failover Clustering use Cluster Shared Volumes v2 and VHDX features to virtualize storage presented from an external shared storage system to the host virtual machines. In Figure 11, the ScaleIO volumes present block- based LUNs (as CSVs) to the Windows hosts to host the virtual machines.
Figure 11. Hyper-V virtual disk types
CSV
A CSV is a shared disk containing a New Technology File System (NTFS) volume that is accessible to all nodes of a Windows Failover Cluster. The CSV can be deployed over any SCSI-based local or network storage.
Pass-through disks
Windows Server 2012 also supports pass-through disks, which enable a virtual machine to access a physical disk mapped to a host that does not have a volume configured on it.
VHDX
Hyper-V in Windows Server 2012 contains an update to the virtual hard disk (VHD) format called VHDX, which has much greater capacity and built-in resiliency. The main features of the VHDX format are:
Support for virtual hard disk storage capacity of up to 64 TB
Additional protection against data corruption during power failures by logging updates to the VHDX metadata structures
Configuration guidelines
Optimal structure alignment of the virtual hard disk format to suit large sector disks
The VHDX format also has the following features:
Larger block size for dynamic and differential disks, which enables the disks to better meet the needs of the workload
A 4 KB logical-sector virtual disk that enables increased performance when used by applications and workloads that are designed for 4 KB sectors
The ability to store custom file metadata that the user might want to record, such as the operating system version or applied updates
Space reclamation features that can result in smaller file sizes and enable the underlying physical storage device to reclaim unused space (for example, TRIM requires direct-attached storage or SCSI disks and TRIM-compatible hardware) Redundancy scheme and rebuild process
ScaleIO uses a mirroring scheme to protect data against disk and node failures. The ScaleIO architecture supports a distributed two-copy scheme. If an SDS node or SDS disk fails, applications can continue to access ScaleIO volumes; their data is still available through the remaining mirrors. ScaleIO immediately starts a seamless rebuild process to create another mirror for the data chunks that were lost in the failure. During the rebuild process, ScaleIO copies those data chunks to free areas across the SDS cluster, so it is not necessary to add any capacity to the system.
The surviving SDS cluster nodes carry out the rebuild process by using the aggregated disk and network bandwidth of the cluster. The process is fast and minimizes both exposure time and application performance degradation. After the rebuild, all the data is fully mirrored and healthy again.
If a failed node rejoins the cluster before the rebuild process is completed, ScaleIO dynamically uses data from the rejoined node to further minimize the exposure time and the use of resources. This capability is important for overcoming short outages efficiently.
Elasticity and rebalancing
Unlike many other systems, a ScaleIO cluster is extremely elastic. Administrators can add and remove capacity and nodes on the fly during I/O operations.
When a cluster is expanded with new capacity (such as new SDSs or new disks added to existing SDSs), ScaleIO immediately rebalances the storage by seamlessly
migrating data chunks from the existing SDSs to the new SDSs or disks. This
migration does not affect the applications, which continue to access the data stored in the migrating chunks. By the end of the rebalancing process, all the ScaleIO volumes are spread across all the SDSs and disks, including the newly added ones, in an optimally balanced manner, as shown in Figure 12. Thus, adding SDSs or disks not only increases the available capacity but also increases the performance of the applications as they access their volumes.
High-availability and failover
Figure 12. Automatic rebalancing when disks are added
When an administrator decreases capacity (for example, by removing SDSs or removing disks from SDSs), ScaleIO performs a seamless migration that rebalances the data across the remaining SDSs and disks in the cluster, as shown in Figure 13.
Figure 13. Automatic rebalancing when disks or nodes are removed
Notes:
In all types of rebalancing, ScaleIO migrates the least amount of data possible.
ScaleIO is sufficiently flexible to accept new requests to add or remove capacity while still rebalancing previous capacity additions and removals.
To maintain data availability, remove only one node at a time.
Chapter 3 Sizing the Environment
This chapter presents the following topics:
Overview ... 30 Reference workload... 30 Scalability ... 31 VSPEX building blocks ... 31 Configuration sizing guidelines ... 34
Overview
This chapter presents the following information:
How to design and size the VSPEX Private Cloud for Microsoft Hyper-V with EMC ScaleIO solution to meet the customer’s needs
How to design the nodes for the ScaleIO environment and specify the number of nodes
Results from the solution testing and validation as to how variations in node size and number affect the maximum number of supported servers. The virtual machines used in the sizing calculations correspond to the definition of the reference workload (reference virtual machine) for the VSPEX Private Cloud.
Reference workload
When you move an existing server to a virtual infrastructure, you can gain efficiency by rightsizing the virtual hardware resources assigned to that system.
Each VSPEX Proven Infrastructure balances the storage, network, and compute resources needed for a set number of virtual machines, as validated by EMC. In practice, each virtual machine has its own requirements that rarely fit a pre-defined specification.
To simplify sizing the solution, VSPEX defines a reference workload, which represents a unit of measure for quantifying the resources in the solution reference architecture.
By comparing the customer’s actual usage to this reference workload, you can determine how to size the solution.
For VSPEX Private Cloud solutions, the reference workload is defined as a single virtual machine with the characteristics shown in Table 3.
Table 3. VSPEX Private Cloud workload
Parameter Value
Virtual machine OS Windows Server 2012 R2
Virtual CPUs 1
Virtual CPUs per physical core (maximum) 4 Memory per virtual machine 2 GB
IOPS per virtual machine 25
I/O pattern Fully random skew = 0.5
I/O read percentage 67%
Virtual machine storage capacity 100 GB
This solution uses the VSPEX Private Cloud reference virtual machine for sizing the customer environment in the same way that the reference virtual machine is used in VSPEX Private Cloud solutions for the EMC VNX platform. For further information, refer
to EMC VSPEX Private Cloud: Microsoft Windows Server 2012 R2 with Hyper-V for up to 1000 Virtual Machines Proven Infrastructure Guide.
Scalability
ScaleIO is designed to scale from three to thousands of nodes. Unlike most traditional storage systems, as the number of servers grows, so do capacity,
throughput, and IOPS. Performance scales linearly with the growth of the deployment.
Whenever additional storage and compute resources (such as servers and drives) are needed, you can add them modularly. Storage and compute resources grow together so that the balance between them is maintained.
VSPEX building blocks
Sizing the system to meet the virtual server application requirements is a complicated process. When applications generate I/O, several components serve that I/O—for example, server CPU, server dynamic random access memory (DRAM) cache, and disks. Customers must consider various factors when planning and scaling their storage system to balance capacity, performance, and cost for their applications.
VSPEX uses a building block approach to reduce complexity. A building block consists of one server node that is configured and validated to support a certain number of virtual servers in the VSPEX architecture. Each building block node combines several local disk spindles to contribute a shared ScaleIO volume to support the needs of the private cloud environment. The SDS and the SDC are both installed on each building block node to contribute the local disk to the ScaleIO storage pool and expose ScaleIO shared block volumes to run the virtual machines.
The configuration of the validated reference building block includes the memory size and the number of physical CPU cores and disk spindles shown in Table 4. This configuration provides a flexible solution for VSPEX sizing.
Table 4. Building block node configuration
Physical CPU cores Memory (GB) SAS drives (10k rpm) SAS capacity (GB)
6 64 6 600
The building block configuration contains six SAS disks per node. The validated solution models these drives at 600 GB each. Solution testing revealed that drive capacity, rather than drive performance, limits the node configuration for a VSPEX Private Cloud and the number of reference virtual machines that a building block can support. The reference building block memory can support 31 reference virtual machines; but the reference building block disk capacity can support only 12 virtual machines, as shown in Table 5.
Customizing the building block provides information about how to customize the building block configuration.
Building block approach
Validated building block