Server Consolidation and Workload Optimization

(1)

Server Consolidation and Workload

Optimization

Contributions for this vendor neutral technology paper have been provided by Blade.org members including, IBM, Emulex, Double-Take Software, NetApp and APC by Schneider Electric.

(2)

2

Executive Summary

Throughout 2008 many data center managers were looking to reduce costs and improve efficiency any way possible. 2009 will be continuation of that trend and much of that has to do with

consolidating older less performing servers, storage and workstations to smaller more efficient systems. In order to become more efficient with power/cooling, scalability, lifecycle management, backup and over all reduction of the data center footprint IT managers will continue to adopt technologies which will enable them to do more with less. These papers will discuss a few of the top trends and solutions for consolidating servers, storage and workstations to reduce

consumption and management resources while simultaneously increasing processing and performance ability.

Highlights

 Server Consolidation provides several benefits from reduced power and cooling, smaller data center footprint, site consolidation, scalability, efficiency and overall ease of

management.

 Technologies used for Consolidation smaller more efficient blade servers, virtualization for consolidating workloads and various software products to help migrate workloads to more efficient hardware all while minimizing impact to production operations.

 Deployment best practices include methods for provisioning new servers, migrating existing systems while minimizing production interruption as well as centralizing an appropriate backup solution for the consolidated infrastructure.

 Summary With any type of IT deployment there are a few things that to be planned. Location, Power and cooling, Floor space/footprint, networking and keeping your

production system uninterrupted during this deployment but all of this will result in a more efficient and flexible infrastructure to reduce overall cost of ownership.

(3)

3

Server Consolidation

There are several objectives when considering a server consolidation project. First and foremost is to reduce the overall footprint for more efficient servers and storage and reduce the requirements for power and cooling. A resulting benefit is an increase in processing, storage power as well as providing more efficient and flexible management and there are several ways to accomplish this. This paper will review the latest industry technologies for assisting with this consolidation effort as well as best practices for deploying new systems without interrupting your production operations.

Benefits

Power and Cooling

Power and cooling concerns shouldn’t be a barrier to adopting new technologies such as virtualization. But the truth is, in many cases existing power and cooling infrastructure may not be able to handle the higher density of a virtualized environment as intense heat loads stress equipment. Power and cooling demands fluctuate as virtual loads constantly move around and perimeter cooling systems must overcool the entire room to handle localized hot spots.

The issue then becomes how can you get your IT environment to where it can handle the high-density challenges of virtualization—quickly, easily, and without ripping out and rebuilding?

The answer is a High-Density-Ready power and cooling architecture to control when, where, and how to turn the data center into a high-density environment. Using an architecture that can be deployed either as a stand-alone system or as a dedicated high-density zone within your existing data center will help solve this challenge. The flexibility is enabled though the use of modular, standardized components compatible with all major IT software, hardware and physical infrastructure brands. This will assure that the power and cooling systems integrate seamlessly with current IT equipment and existing enterprise and building management systems.

Scalability

Server consolidation using Blade technology provides enhanced scalability using two different but effective mechanisms. First, the Blade architecture allows the discrete

addition of compute blades in single-blade increments, while leveraging the shared power, cooling, and networking infrastructure of the chassis.

Secondly, through hypervisor virtualization, customers may add multiple discrete servers to physical blades. Using over-provisioning of memory and I/O resources increases the ratio of virtual servers to physical blades. As a result, the blade architecture, when combined with hypervisor virtualization, provides the highest degree of scalability.

(4)

4

Consolidated Backup

A benefit of consolidating is it provides the opportunity to re-evaluate existing backup procedures to centrally manage those processes once smaller remote office sites have been folded into the primary data center. However, when you consolidate multiple sites and or combine many virtual servers to a single physical server there is an increased risk for interruption if that virtual host server fails. So, there is a need for a proper backup and recovery plan whether it is for disaster recovery or a high availability solution for moving the virtual machines in the event of a failure.

Having a standby virtual or a few additional blade servers will also help with the availability of business critical workloads as well as improve day to day operational maintenance. The ability to roll over a cluster node or move virtual machines in real-time, using something like VMware® Vmotion or Hyper-V live migration, between virtual hosts will keep the servers highly available and minimize interruption to business operations. This will allow more time to resolve the failure, then fail back or move the workload when repaired; or don’t failback and leave it as the new production server. Backup servers no longer have to be dedicated to a single purpose. They can run low intensive application workloads and still be utilized as a backup and recovery server for either physical or virtual workloads further reducing operational costs. Some storage solutions are integrated with the server virtualization platform to accommodate a server-less backup. Storage that is aware of the virtual environment can facilitate backups and snapshots without interrupting existing operations or consuming incremental network bandwidth.

Technology used for Consolidation

Reducing floor space/footprint

Having enough room for the number of servers required will help with your power and cooling requirements but too big of a datacenter room might require more cooling than is necessary and too little space can cause the air conditioner to run more often for the optimum temperature. Utilizing smaller more efficient blade servers helps with all of the above. Not only can more servers fit in less space they are more efficient and can take up significantly less power and cooling using a more localized and dense power and cooling solution. Adding in virtualization will further consolidate the servers needed to run the same number of workloads on less hardware to operate even more efficiently.

Blade Servers

Blade technology can further reduce overall foot print and floor space required. Blade servers pack a huge amount of processing power into a smaller condensed area that are easily provisioned and managed. In combination with virtualization, a data center

containing one hundred physical servers could potentially be reduced to as small as ten in the same space with improved processing power. When Blade servers were first announced there were some hesitations to adopt because of limitations to NIC cards, I/O, processing and or HBA for storage connections. Many of these concerns are less important when combined with virtualization and the increased demand for iSCSI storage devices. Many

(5)

5 storage vendors are moving to provide an iSCSI line and connections via standard Ethernet cards because they are not limited by fiber channel connections or HBA’s required to connect the servers to the storage.

Virtualization

Virtualizing physical servers enables IT managers to run multiple workloads on a single physical server that would normally be limited to just one function. There are several virtual software vendors Virtual Iron®, Xen®, VMware® and the Microsoft Hyper-V product. In addition, the Linux-based KVM hypervisor will be supported this year by enterprise Linux distributions. All provide the ability to run multiple virtual guests on a single host server. The technical specifications of the physical server, processor(s), memory and disk speed will determine the number of virtual machines that the server will support but it is common to have six to ten virtual guests running on a single virtual host.

The single most important thing is to verify the hardware will support the virtualization technology selected. Refer to the virtual product website for a hardware compatibility list and, at the very least, minimum operating requirements and here are a few other things to consider when planning a virtual deployment:

o Disk Space – plan for the host virtual server to have at least 100GB available for virtual configuration files, virtual server images and or any type of virtual management activity. Disk space is cheap; don’t reduce the ability to scale the workloads when needed most

o Memory – There is never too much; max out the server’s capacity so there isn’t a future need to take the system offline to add more. Don’t virtualize all the physical servers only to realize they can’t perform optimally, planning in advance will help avoid unwanted consequences in the future.

o Network Connectivity –Most virtual infrastructure products will require multiple NIC cards for each of the virtual machines being hosted, but also plan for dedicated cards for communication, management and options to move the virtual images. o Disk Partitions – Take time to plan appropriately when carving out disk partitions.

It’s the old carpenter adage “measure twice cut once”. The more planning

performed up front regarding virtual image storage, configuration files, kernel, and file systems will provide a better ability to scale and prevent future changes.. o Additional helpful tools:

• Tunnelier: http://www.bitvise.com/tunnelier.html is a utility for secure SSH and telnet connections. Useful for transferring files from a workstation to the virtual server

• Sysprep: Microsoft utility useful for preparing virtual machines

• Newsid: http://www.sysinternals.com/Utilities/NewSid.html a Freeware utility to change the SID and computer name of a Windows server.

o P2V(Physical to Virtual) tools

• Check with virtualization venders • Backup and recovery products • Other specialized applications

(6)

6

Site Consolidation Deployment best practices

Once the hardware is configured in the data center the next step is to provision the machines and or migrate the production workloads with minimal interruption to operations. There are two methods for consolidating infrastructure, provisioning new servers and migrating existing servers.

Provisioning can be considerably easier to deploy if using bootable applicable server images but migrating from older physical servers will be necessary to transfer the current production

workloads. The challenge with a physical to virtual (P2V) migrations is really no different than any other migration: Time. Often there is a very limited window, usually over a weekend, to complete a migration and if it isn’t complete and operational by business hours Monday morning then the original server is brought back on-line and the next attempt scheduled. Depending on the amount of data that needs to be migrated, i.e. terabytes, it is possible that there isn’t enough time in a weekend to complete. It primarily comes down to a formula as simple as calculating the amount of data that needs to be migrated and divide by the speed the data can be written. If bandwidth isn’t a limitation then the transfer rate may be limited by the speed of the disk being written to. A

conservative number to use for disk write speed is 20GB per hour.

Networking & Storage Connectivity

Bandwidth isn’t nearly as big an issue as it has been in previous years. Bandwidth is relatively cheap and plentiful but doesn’t mean appropriately planning for connecting all your servers together with the proper connectivity isn’t required. If using a fiber connected SAN then make sure there are the necessary HBA’s for all the servers to be connected. If using an iSCSI storage solution then consider the number of Ethernet cards required for each server. Usually, a minimum of two Ethernet cards are necessary, one for the

connection of the storage another for the connection to the LAN/WAN. If clustering is used then there may need to be a third to segment all the UDP heartbeats and internal node communication from the rest of the network.

The implementation of virtual servers is highly correlated with the adoption of storage area networks (SANs) to provide shared data access necessitated by shared server environments. One of the new technologies that is available as part of VMware ESX Server 3.5, is support for the N-Port ID virtualization standard (NPIV). NPIV is the acronym for “N_Port ID Virtualization”, a T11 ANSI standard which was developed by Emulex and IBM, to provide the capability for a fabric switch to register several WWPNs on the same physical HBA port. Support of NPIV enables each virtual machine (VM) on a VMware ESX Server to have a unique Fibre Channel Worldwide Port Name (WWPN) providing a monitored data path to the SAN through the virtual HBA Port. By providing a unique virtual HBA port, storage administrators will be able to implement SAN best practices such as LUN-masking and zoning for individual virtual machines. Administrators will also be able to take advantage of powerful SAN management tools and simplified migration of virtual machines and their storage resources.

The advent of server virtualization provides users the benefit of multiple virtual machines replacing physical hosts, but loses the direct port-to-host relationship. That relationship can be restored and represented as a virtual port using NPIV capabilities.

(7)

7 Below is a list of advantages when using NPIV with the VMware ESX Server 3.5.

o I/O throughput, storage traffic and utilization can be tracked to the virtual machine level via the WWPN, allowing for application or user-level chargeback. As each NPIV entity is seen uniquely on the SAN, it is possible to track the individual SAN usage of a virtual server. Prior to NPIV, the SAN and ESX Server could only see the aggregate usage of the physical Fibre Channel Port by all of the virtual machines running on that server, except for some vendor-specific LUN-based tools.

o Virtual machines can be associated to devices mapped under RDM to allow for LUN tracking and customization to the application needs. SAN tools tracking WWPN can report virtual machine specific performance or diagnostic data. As each NPIV entity is seen uniquely on the SAN, switch-side reporting tools, and array-side tools, can report diagnostic and performance-related data on a virtual machine basis.

o Bi-directional association of storage with virtual machines gives SAN administrators an enhanced ability to both trace from a virtual machine to an RDM (available today) but also be able to trace back from an RDM to a VM (significantly enhanced with NPIV support).

o Storage provisioning for ESX Server hosted virtual machines can use the same methods, tools and expertise in place for physical servers. As the virtual machine is once again uniquely related to a WWPN, traditional methods of zoning and LUN masking could continue to be used, enabling unified administration of virtualized and non-virtualized servers. Fabric zones can restrict target visibility to selected

applications hosted by Virtual Machines. Configurations which required unique physical adapters based on an application can now be remapped on to unique NPIV instances on the ESX Server.

o Storage administrators can configure IVR (Inter Virtual SAN Routing) in ESX Server Environments, up to the individual virtual machine, enabling large end users to reconfigure their fabrics, aggregating islands of storage, fragmenting massive SANs into smaller, more manageable ones and assigning resources on a logical basis.

o Virtual machine migration supports the preservation of the virtual port id when the VM is moved to the new ESX server. This improves the tracking of the RDMs to VMs as well as storage can be restricted to a group of ESX Servers (VMware cluster). If the virtual machine is moved to a new ESX Server, no changes in SAN configuration would be required to adjust for the use of different physical Fibre Channel ports.

(8)

8 Provided the Zones and LUN masking are set up correctly the virtual port name would stay with the VM as it is moved to a new ESX server.

o HBA upgrades, expansion and replacement are now seamless as the physical HBA WWPNs are no longer the entities which the SAN zoning and LUN-masking is Based. So, the physical adapters can be replaced or upgraded without any change to SAN configuration.

IP storage networks over Ethernet offer a couple of alternatives to a Fibre Channel storage network, NFS and iSCSI. Some virtual server platforms, such as VMware, manage their VM resources in a file environment using either VMFS, the VMware proprietary file system, or with NFS, a standard network file system. VMFS requires the use of either Fibre Channel or iSCSI for shared storage applications. With either NFS or iSCSI, using dedicated resources for storage traffic is recommended. With IP storage networks, this can be achieved with separate physical switches or logically by implementing VLAN segments for storage I/O on a shared, switched IP

infrastructure.

10 GB ETHERNET

Consolidating your storage environment onto Ethernet, such as 10 Gb Ethernet, may offer advantages over traditional storage networks. Support for 10 Gb Ethernet was introduced on VMware ESX 3 and ESXi 3. One advantage of 10 GbE is the ability to reduce the number of network ports in the infrastructure, especially but not limited to, blade servers. 10 Gb Ethernet offers the added bandwidth to more effectively share network hardware across multiple applications, such as with the use of VLANs. To verify support for your hardware and its use for storage I/O, see the ESX I/O compatibility guide.

VLAN IDS

When segmenting network traffic with VLANs, interfaces can either be dedicated to a single VLAN or they can support multiple VLANs with VLAN tagging.

For systems that have fewer NICs, such as blade servers, VLANs can be very useful. Channeling two NICs together provides an ESX server with physical link redundancy. By adding multiple VLANs, one can group common IP traffic onto separate VLANs for optimal performance. It is recommended to group Service console access with the Virtual Machine Network on one VLAN, and on a second VLAN the VMkernel activities of IP Storage and VMotion should reside.

VLANs and VLAN tagging also play a simple but important role in securing an IP storage network. NFS exports can be restricted to a range of IP addresses that are available only on the IP storage VLAN. These simple configuration settings have an enormous effect on the security and availability of IP- based Datastores. If you are using multiple VLANs over the same interface, make sure that sufficient throughput can be provided for all traffic.

(9)

9 VIRTUAL INTERFACES

A virtual network interface (VIF) is a mechanism that supports aggregation of network interfaces into one logical interface unit. Once created, a VIF is indistinguishable from a physical network interface. VIFs are used to provide fault tolerance of the network connection and in some cases higher throughput to the storage device.

Multimode VIFs are compliant with IEEE 802.3ad. In a multimode VIF, all of the physical connections in the VIF are simultaneously active and can carry traffic. This mode requires that all of the interfaces be connected to a switch that supports trunking or aggregation over multiple port connections. The switch must be configured to understand that all the port connections share a common MAC address and are part of a single logical interface. In a single-mode VIF, only one of the physical connections is active at a time. If the

storage controller detects a fault in the active connection, a standby connection is activated. No configuration is necessary on the switch to use a single-mode VIF, and the physical interfaces that make up the VIF do not have to connect to the same switch. Note that IP load balancing is not supported on single-mode VIFs.

It is also possible to create second-level single or multimode VIFs. By using second-level VIFs it is possible to take advantage of both the link aggregation features of a multimode VIF and the failover capability of a single-mode VIF. In this configuration, two multimode VIFs are created, each one to a different switch. A single-mode VIF is then created

composed of the two multimode VIFs. In normal operation, traffic flows over only one of the multimode VIFs; but in the event of an interface or switch failure, the storage controller moves the network traffic to the other multimode VIF.

Host Based Replication

One method for consolidating physical or virtual or even entire site locations is using a host based replication solution. Host based replication captures the changes as they occur on a production server and replicates those changes in real-time to an identified target server until a time that is chosen to failover. These products are installed on the operating system of the physical or virtual machine and replicate all changes to the workload as they occur to the new blade or virtual target server. This isn’t bare metal recovery as it does require that a base operating system be installed on the new machine but it isn’t nearly as complicated and significantly reduces the amount of downtime required for that type of recovery or tape restoration. The data is kept synchronized while the production server remains online and users are connected. When the entire workload, application, system state and associated data, has been completely replicated to the new target server the server is ready to failover. Failover from a product server to a target server doesn’t usually take any longer than a few minutes. Because host based replication is hardware and application agnostic it provides the maximum flexibility for workload migrations and portability, not only for server consolidation but also minimizing downtime during this process.

(10)

10

Provisioning

There are a few options for provisioning blade servers or virtual machines as a part of a consolidation effort.

o Base install

Installing the operating systems and applications from scratch is an option but a time consuming process is isn’t the most efficient process if looking to repeat this dozens of times. Not only is it time consuming but there is still the question of how to associate the data with the newly installed server. Tape is an option but all a manual process and there are more efficient ways and better tools to accomplish.

o Server Image Recovery

Using a server image to provision new machines is a much faster process than trying to perform a base install, especially when this needs to be repeated several times to help roll out the server consolidation. There are several image recovery software products on the market that can be used that will not only recover the applications but the system state as well. Many of these are hardware independent so it doesn’t matter if you are recovering to a blade server or a virtual machine the process remains the same. Although, faster than performing a base install, the recovery can take anywhere from 15 minutes to a couple of hours depending on the size of the image that was created based on the volume of data that was on the imaged production server. Some products incorporate asynchronous replication so the data can remain consistent while the production server remains online eliminating the need for a differential restoration after the initial deployment. This is one of the draw backs of some image recovery products is that they are great for a bare metal recovery of base images but not so if looking to migrate from one server to another.

o Rapid Provisioning

Adding virtual servers to an existing virtual environment is very simple. Provisioning storage can also be simple. In larger data centers, however, the storage and server administrators can be in different departments. As a result, coordinating the deployment of a new virtual server with a storage volume may take more time than the actual provisioning of resources. Storage vendors that offer integration of management tools within the virtual server tools deliver added efficiency by enabling rapid provisioning of both server and storage from a single set of tools. Obviously, such processes have to be managed appropriately within a given IT organization. But, the added integration can increase data center efficiency and if IT staff is limited, can greatly reduce costs.

Migration

Migrating servers, workloads and storage is a major component for consolidating infrastructure and there are several applications for server data migration:

• Moving workloads from a dedicated server to a virtual server platform • Migrating virtual workloads between virtual host servers for:

o balance load, or

(11)

11 • Consolidating sites to a central data facility

When moving virtual machine workloads a shared storage environment is desired since it eliminates the requirement of moving data. If data movement is required, such as moving from a physical server with DAS to a virtual host with shared storage, then there are P2V software and conversion tools available to reduce risks associated with server interruption. For site consolidations a first step for migrating is to determine bandwidth throughput (WAN, LAN, maximum throughput speed and data latency). Before beginning the consolidation over a WAN verify the throughput and call the provider if necessary. To check IP latency, perform a simple IP ping to another server across the wire and verify the response time. Here is a formula to follow for calculating transfer rates of specific

bandwidth:

Example for T1 (1.5Mbit)

Current bandwidth throughput in bits / (divided by) 8 to calculate bytes X

(multiplied) by 3600 which is the number of seconds in an hour = total byte transfer rate.

1.5Mb / 8 = 187KBytes per second x 3600 seconds in an hour = 673MB per hour transfer rate.

Subtract 10-15% off the top for IP collision and latency and for planning purposes. The end result is about 572-600MB per hour transfer rate for a T1 (1.5Mbit). If bandwidth is not a concern (such as a 100Mbit or Gigabit LAN line) then the transfer rate won’t likely be limited to the bandwidth throughput but disk write speed. A typically 10KRPM disk will not be able to write blocks any faster than 17-20GB per hour and that depends on the disk configuration (RAID configuration, controller cards, connections and types of disk, which will all vary the overall performance).

Keeping Data Consistent During the Consolidation

Once the length of transfer time is determined from one to another and the next step is to determine the best process to keep the systems online and current while the migration is in progress, which is the typical migration paradigm.

• How to minimize downtime during the migration? • How to keep the migrated system current?

Most P2V conversion products will allow you to covert the physical server while it remains online but the more challenging task is keeping the data current once the conversion has take place. Unless using a real-time replication engine that will allow you to transmit changes to open files options can be somewhat limited. These limitations are typically resolved with tape restoration which will require downtime because users can’t access that data as is backed up and still be current. A differential backup will capture the changes between when the migration started and when it completed.

(12)

12 The same methodology is used to move from older servers to new blades center, or slower storage to faster arrays or creating new co-location datacenters whether physical or virtual. Planning will still be required but the process can minimize interruption to production systems being consolidated.

Bootable Images

Network Boot Image – Network booting a server image (or netboot) is a quick and efficient process for provisioning new servers as well as workstations. Netboot is an easy and

repeatable process that consolidation projects require. A network boot image usually includes the operating system with any applications that are required and stored on a stored on the network, like an iSCSI device and allows servers or workstations boot from that image. A centralized management console is used to point the located server or desktop to the specific image. This provides the ability to rapidly provision a new application server, whether it is Exchange, SQL, Linux or just a base virtual machine. Netbooting drastically reduces the time that a normal operating and application installs would take as well as makes it a consistent and repeatable process.

Choosing the right storage system can further enhance the benefits of booting off a network device. Features such as de-duplication and rapid cloning can reduce the physical data foot print associated with each boot image. Writable clones allow for state information to be captured for each virtual or physical server while still retaining the advantages of sharing common data blocks. Another benefit of netbooting is that the boot images can be stored on a shared boot drive and assigned to multiple servers or workstations so they can all boot the disk image at once.

Backup, Recovery and Availability

Another task in the planning stages is to evaluate the existing procedures to see if they will still meet the requirements of the consolidated infrastructure. It is possible that some tier two or tier three servers will be fine using existing tape backups but other more critical workloads may require a faster recovery time. There are several options to consider for backup and recovery and method selected will depend on the recovery point (RPO) and or recovery time objective (RTO) requirements.

o Tape Backup – Typically performed every 24 Hours on a per server basis with a 24hour RPO and potentially upwards of a 24 RTO.

o Snapshots – Snapshots are usually configured for every 6 hours and can have upwards of 512 snapshots of a particular volume but that can depend on the

(13)

13 hardware vendor as well as what disk space is available. RPO is the interval in which the snapshots are configured and RTO is less than 1 hour.

o System State or Server Image Recovery – These systems provide images of a particular server which includes the system state, applications as well as the data and can be kept up to date with real-time replication. The RPO is near zero but because there is a recovery process involved depending on the amount of data that needs to be restored the RTO could be anywhere from 2-4 hours. This is usually a good solution for tier 2 and 3 servers where immediate availability isn’t necessary but the RTO of a tape solution isn’t fast enough.

o Continuous Data Protection (CDP) – provides any point in time recovery for restoring previous revisions of specific files. CDP is sometimes required for industry compliance regulations such as, Sarbanes Oxley, HIPAA or SEC, where document version control and retention is required. CDP is a very granular type of recovery process where entire servers can be recovered or just a single e-mail. o High Availability – whether host based (asynchronous), disk based (synchronous)

replication where a real-time copy of the data will be available, or another virtual host server can be used for a near zero RPO and RTO and available as little as a few minutes.

o Server-less Backup – most advanced storage systems offer a variety of backup or disaster recovery features. These features offload the backup duties from dedicated servers and also reduce network congestion. Shared storage can be backed up to a secondary site, a virtual tape library, or to tape independent of the servers, allowing the servers to focus on their tasks.

Summary

Moving forward with server consolidation will be a necessity for reducing data center costs as well as providing better workload management, workload availability as well as provisioning and workload portability. In addition to reducing power and cooling, a smaller footprint, multi-site consolidation, enhanced scalability overall ease of management. Many of these technologies like blade servers, virtualization, high density power and cooling will help migrate and consolidate workloads more efficiently all while minimizing impact to production operations. The

consolidation process has several options available to reduce downtime associated with typical migrations and help facilitate a rapid deployment of more efficient workloads. Many of these methods for provisioning new servers or migrating existing systems during production hours will minimize interruption as well as provide a centralized workload optimization solution for a dynamic infrastructure. There will always be a need for planning when any type of IT deployment but considering the technologies discussed and best practices for the deployment will result in a more efficient and flexible infrastructure to reduce overall cost of ownership.