A White Paper Prepared for BMC Software February 2007
Table of Contents
Executive Summary ...1
Introduction ...2
Managing Service Delivery in Virtual Environments ...2
A Short Background of Virtualization ...2
Management Difficulties Introduced by Virtualization ...3
Service, Performance, and Availability Monitoring...3
Service Level Management ...3
Capacity Planning ...3
Managing Cost Accounting and License Compliance ...3
Incident Resolution ...4
Vendor Support ...4
Solving Virtualization Management Problems with ITIL Service Delivery ...4
Misconceptions of Virtualization Management ...4
The Basics of ITIL Service Delivery...4
Availability Management ...4
Capacity Management ...5
Financial Management for IT Services ...5
Service Continuity Management ...5
Service Level Management ...5
Managing Virtual Environments with BMC Software ...5
EMA’s Perspective ...7
Executive Summary
Virtualization—a technique for hiding the physical char-acteristics of computing resources from the way in which other systems, applications, or end users interact with those resources—is changing the way IT is delivering business benefits. In particular, server virtualization— the ability to run multiple independent operating envi-ronments on a single server or system—has been mature technology for many years in mainframe and midrange systems, but is now a viable, production-ready technol-ogy for commodity x86-based PC and server platforms as well.
EMA’s research shows that most organizations are deploying virtualization, much of it in production sys-tems. Virtualization is capable of delivering significant business benefits such as improved business continuity, faster software development, increased business agility, less downtime, and significant cost savings.
However, EMA’s research also highlights several new management difficulties that may inhibit the benefits of virtualization. For example:
Application performance monitoring and availability management become much more complex, as virtualization makes it harder to map virtual resources to physical systems, and provide performance metrics that account for the differences and overlaps of the different environments. It is harder to accurately detect, measure, and plan capacity, because virtual services are so rapidly deployed, consumed, and destroyed, and because of the additional potential for resource constraints due to resource sharing, so it is harder to ensure business service is not disrupted by capacity constraints and performance bottlenecks. Cost accounting and financial management becomes much more intricate and time-consuming, as tools must measure rapidly changing virtual environments, and license management and compliance are much harder to measure and enforce as software is rapidly deployed into virtual environments, potentially without procurement or authorization
•
•
•
Ensuring service levels becomes harder, because the new virtualization layer increases the potential for errors and chokepoints and reduces visibility into the application stack, making root cause analysis more complex, delaying diagnosis and recovery time in the case of disruption to business service.
The solution to these problems is to apply mature, prov-en, industry-standard best practices to virtual systems in the same way that they are applied to physical systems, using specific tools that are built to accommodate the unique difficulties exposed by virtualization, integrated with traditional physical management systems.
One such best practice is the IT Infrastructure Library (ITIL), which has proven its effectiveness in assuring IT Service Delivery over many years, in mainframe, mid-range, and x86 server environments. It provides proven processes and procedures that easily translate to virtual environments, including Service Level Management, Capacity Management, Service Continuity Management, Availability Management, and Financial Management for IT Services. ITIL is certainly a best practice discipline for managing any environment—physical or virtual. Unfortunately, while many software vendors that help to deliver ITIL-based management solutions claim to support virtualization, much of that support is confined solely to marketing materials, with no specific enhance-ments to their software. Such solutions will not deliver the holistic management required to overcome the man-agement difficulties imposed by virtualization.
BMC Software is one of the few exceptions to this situation. BMC Software has significant experience and maturity delivery solutions based in ITIL best practices. BMC Software’s solutions have been proven in main-frame and midrange virtualization environments for many years. BMC Software has now enhanced these solutions with specific capabilities for the latest x86 virtualization environments. Solutions such as BMC Performance Manager, BMC Performance Assurance, BMC Discovery, and BMC Atrium CMDB have been extended with specific capabilities designed to man-age virtual environments, based on proven technology built for physical environments, integrated into holistic, enterprise-wide solutions for best practice operations management of system virtualization.
Best Practice Operations Management
for System Virtualization
Page
©2007 Enterprise Management Associates, Inc. All Rights Reserved.
With these solutions, EMA believes that BMC Software is helping enterprises apply mature technology and best practice discipline to virtual and physical environments, to achieve both IT and business benefits across the en-terprise. This is a goal and a strategy that EMA endorses for any organization.
Introduction
Virtualization—which Enterprise Management
Associates (EMA) defines as a technique for hiding or abstracting the physical characteristics of comput-ing resources from the way in which other systems, applications, or end users interact with those resourc-es—is being rapidly adopted across IT. Examples of virtualization include making a single physical resource (such as a server, an operating system, an application, or network or storage devices) appear to function as mul-tiple logical resources; or it can include making mulmul-tiple physical resources (such as storage devices or servers) appear as a single logical resource.
Because virtualization fundamentally changes the archi-tecture and processing structure of information technol-ogy, it also brings with it some significant management difficulties. There is no way to ignore these problems, and it is inefficient and potentially harmful to deal with virtualization separately from existing infrastructure. This poses a fundamental question that CIOs, IT Directors, Operations Managers, System Architects, and Capacity Planners need to address: How should I man-age these new virtual systems as a part of my complete infrastructure, to make sure I can deliver and maintain high value IT service to my business users?
This EMA white paper addresses this question with key analysis and actionable recommendations, focusing on established and proven processes to help IT organiza-tions achieve best practice operaorganiza-tions management for system virtualization. It explores some of the specific architectural and operational differences between virtual and physical environments, highlights some of the new and specific management difficulties these differences create, and explains how IT organizations should apply the same rigor and discipline they require in their physi-cal environments.
Finally, it describes how software solutions from BMC Software are addressing these problems, integrating the
management of physical and virtual environments, and providing best practices solutions to these difficult man-agement issues.
Managing Service Delivery in
Virtual Environments
A Short Background of Virtualization
Virtualization has long been a core capability of both mainframes (since the 1960s) and midrange platforms (since 1999); however, it is now changing the face of the x86 platform, as vendors like VMware, Microsoft, and SWsoft have brought virtualization to production qual-ity in commodqual-ity x86-based PCs and servers. The most significant types of virtualization in the x86 space are Operating System virtualization (where multiple “guest” operating systems run on top of a fully functioning “host” operating system) and Server Virtualization (where multiple guests run in one system, without a host operating system).
These environments add significant architectural com-plexity with virtual machine managers (VMMs) and ded-icated low-level interface software (hypervisors). Virtual systems allocate and use resources differently—more dynamically, and less transparently. They have different operational and performance profiles, and much more complex configurations. Monitoring must operate at multiple virtual and physical levels, and without physi-cal boundaries, management discipline becomes much harder to enforce.
Nevertheless, EMA’s research shows that the adoption of virtualization is growing at around 25% per year, and it is being adopted by over 95% of enterprises, increas-ingly in production applications, because it can deliver benefits including:
High availability, business continuity, and disaster recovery advantages, through faster and easier relocation and recovery
Faster software development from decoupling IT processes like test and development from physical hardware Improved business agility and flexibility by allowing much faster IT response to rapid changes in business demand •
•
Reduced downtime caused by failures and malicious penetration (such as a virus infection or other malware) by improving response and restore capabilities
Significantly reduced costs, through
improved hardware utilization, consolidation, power reduction, and space savings.
Management Difficulties Introduced by
Virtualization
However, virtualization also brings additional manage-ment challenges, especially from an operational perspec-tive, which traditional tools simply do not address.
Service, Performance, and Availability Monitoring
Service, performance, and availability monitoring be-comes much more complex because the virtual envi-ronment introduces a more complex architecture, with many more and different components to be measured, monitored, and managed.
IT must continue to monitor the physical environ-ment—network traffic, CPU performance, memory usage, I/O rates, etc.—to ensure consistent perfor-mance and availability. In addition, IT must monitor the same characteristics in the virtual environment, plus the VMM, hypervisor, and other virtualization-specific components. Virtual environments are also highly dy-namic, so measurement must happen in real time, and must adapt to real-time architecture changes.
Traditional physical-based monitoring tools cannot ad-equately monitor these virtualized environments. If IT is using the same old tools, problems can go undetected, and business users can suffer from service outages, slow performance, and critical downtime.
Service Level Management
Similarly, measuring network traffic, transaction activity, and response times is very different in a virtual world. Achieving service levels for response time and availabil-ity of a physical system does not ensure IT is achieving the service levels of dependent virtual systems.
For example, a physical server and host OS may be run-ning fine, while in the virtual guest, a critical problem may have affected the operating system or the application. Network traffic may be flowing freely from the physical
•
•
network interface, while inside the virtualization layer the virtual network interface may have completely failed. Alternatively, the host OS may be operating to service standards, yet the VMM is failing to allocate resources properly to one or more guests, causing severe bottle-necks and slowdowns. Standard physical service level measurements will not consider these major differences, and will probably yield entirely spurious measurements for service level reporting.
Capacity Planning
Capacity measurement and capacity planning is also more difficult. It is insufficient to measure CPU, mem-ory, bandwidth, etc. separately in and virtual physical environments. IT must have a holistic view of capacity to ensure:
Physical systems can support the required virtual images (including the additional resource demands of the VMM or hypervisors)
Virtual images have sufficient capacity, headroom, and workload balancing to support application demands and user load
IT needs integrated analytics for both physical and vir-tual environments, with access to detailed data at both levels. This will allow proper capacity measurement and planning, to avoid both under-provisioning (and the associated response problems) and over-provisioning (which increases infrastructure costs and negates many virtualization benefits).
Managing Cost Accounting and License
Compliance
Lack of visibility and fast changing resource allocation makes IT cost management much harder in virtual environments. IT needs to be able to see and measure physical assets, as well as highly dynamic virtual systems, storage, and resource allocations. It becomes much more difficult to allocate real costs (hardware, power, storage, space, etc) to virtual applications. Proliferation of virtual images can lead to unauthorized software deployment, risking severe financial penalties, additional license costs or both. Underutilization of physical and virtual systems can lead to unnecessary hardware and software costs. IT needs to manage costs in the virtual environment with the same level of discipline as in the physical
en-•
Best Practice Operations Management
for System Virtualization
Page
©2007 Enterprise Management Associates, Inc. All Rights Reserved.
vironment, with tools that can work with both physical and virtual resources, to provide visibility and measure-ment of rapidly changing environmeasure-ments.
Incident Resolution
Virtualization can significantly damage service levels, as problem diagnosis and recovery becomes more complex as IT must deal with another layer in the stack of pos-sible problem causes. Tools and processes must track problems not just to a physical system, but also to vir-tual systems and applications. IT must trace problems with multi-tier transactions not just through the physi-cal infrastructure, but also through virtual systems that may also be subject to rapidly changing configurations. Business users may not be aware that they are running a virtual application, so IT must be able to detect and record these complex relationships to rapidly diagnose and resolve application problems. Existing root cause, discovery, inventory, and application mapping tools are not able to handle the added complexity and speed of change that is specific to virtual environments.
Vendor Support
Many management software vendors claim to manage virtual environments, but many are simply exploiting a hot topic with marketing statements. Unless vendors have engineered specific technical enhancements or add-ons to support virtualization, they will not be able to handle the architectural differences properly. Similarly, virtualization platform vendors may provide specific and comprehensive management tools, but these will rarely manage the physical environment as well, and therefore cannot manage the whole environment.
It is critical that IT can manage not just the physical, and not just the virtual, but can manage equally across both environments. This is the only way to provide the holistic oversight necessary to manage these new virtual systems as a part of the complete infrastructure, to make sure IT can deliver and maintain high value IT service to its business users.
Solving Virtualization Management
Problems with ITIL Service Delivery
Misconceptions of Virtualization Management
EMA’s research into virtualization reveals some sig-nificant misconceptions about management of virtual environments. Most enterprises expect management of
virtual environments to be easier than, or at worst the same as, management of physical environments in many important categories. However, most enterprises also acknowledge that they do not have the skills or tools for adequate management, and in-depth interviews un-covered many management problems in real enterprise deployments. EMA therefore recommends enterprises should deploy automated policy-based management tools and best practice disciplines as soon as possible, preferably alongside deployment of the virtualization systems themselves.
The Basics of ITIL Service Delivery
The IT Infrastructure Library (ITIL) is a set of IT best practices that outlines a standards-based approach to address these and many other problems. Its focus is ensuring IT provides valuable and consistent service to business users through two major areas: Service Delivery and Service Support.
ITIL Service Delivery specifically focuses on operational best practices including:
Availability Management – ensuring that IT services are available to business users through requirement gathering, reliability assessment, contingency planning, change management, etc Capacity Management – measuring, optimizing, and planning for performance, utilization, and changes in demand for resources including CPU, memory, bandwidth, storage, etc. Financial Management for IT Services
– measuring and managing IT assets and services to maximize service value and minimize costs Service Continuity Management – planning for continued IT service through analysis, risk mitigation, disaster recovery planning, failover, and high availability
Availability Management
In a virtual environment, availability management re-quires a much greater understanding of the status and interrelationship of both virtual and physical environ-ments. Determining and ensuring a physical system is available will not ensure availability of the virtual sys-tems it supports; planning downtime for a virtual system without taking into account the availability requirements of the physical environment will similarly damage avail-ability. However, assigning availability requirements to, and subsequently monitoring and planning for, both physical and virtual systems will achieve service ability goals. Change planning, measurement of avail-ability, reliability assessment, and contingency planning should holistically accommodate both physical and vir-tual environments.
Capacity Management
It is important to be able to apply both real-time capacity management, and predictive planning discipline, to both physical and virtual systems. IT should utilize manage-ment tools that will concurrently measure both physical and virtual resource utilization, with particular and spe-cific visibility into the individual application workloads, that are able to optimize real-time allocation of resourc-es across both virtual systems and parallel physical ap-plications, and which can perform analysis and enhance planning for performance, utilization, and changes in demand for both virtual and physical resources.
Financial Management for IT Services
IT must provide accurate cost accounting to their entire physical and virtual environment—whether or not they are engaged in real chargeback—as this provides the main way to measure the cost of IT and ensure it pro-vides value to business users. IT must apply standards, tools, and processes to manage both physical and virtual IT assets, including assigning nominal costs to, and mea-suring usage of, virtual resources just as they would to physical resources. This also involves accurate tracking of virtual environments for unauthorized change such as software deployment, to accurately measure and man-age license compliance and usman-age costs.
Service Continuity Management
IT needs to have documented processes to manage continuity for both physical and virtual environments.
This should include careful planning, testing, and docu-mentation of recovery procedures, and allowances for reprovisioning in case of a partial failure. Procedures should account for all permutations, such as:
Restoring or reprovisioning one or more failing virtual systems
Recovering multiple virtual guests from a failing physical host
Co-locating previously separate environments as virtual guests in a single physical host Shifting an overloaded application into a new virtual machine to improve performance
These plans should account specifically for the available hardware, required application load, and potential impact to business service of any break in service continuity.
Service Level Management
IT must determine and measure response times, down-time, availability, etc. for both physical and virtual sys-tems, ideally based on new SLAs that accommodate the unique differences of virtualization. Metrics should apply to both physical and virtual resources, and take into account potential bottlenecks such as disk I/O, net-work bandwidth, processor and memory availability to deliver realistic business goals with a different physical/ virtual resource mix. SLAs should also clearly prioritize between different virtual applications on a single physi-cal machine, to enable IT to resolve potential resource conflicts, and feed into policy-based measurement and management of resources. SLA reporting must map physical and virtual infrastructure to business services to reflect the true availability to business users.
Managing Virtual Environments with
BMC Software
BMC Software has a long history of providing IT man-agement solutions based on these ITIL best practices in both mainframe and midrange environments, with spe-cial focus on the key best practices around ITIL’s Service Support and Service Delivery recommendations. BMC Software has a long-standing maturity in managing large enterprise systems, including mainframe and large mid-range environments. The best practices that underpin the ITIL disciplines come substantially from codifying
Best Practice Operations Management
for System Virtualization
Page
©2007 Enterprise Management Associates, Inc. All Rights Reserved.
the best practices that have existed in the mainframe world for 20 years or more. Discipline around change management, capacity planning, chargeback and cost ac-counting, service level management and more have been core to mainframe discipline for many years, and it is this discipline that BMC Software has provided in these mission-critical platforms for over 25 years.
BMC Software also has substantial experience manag-ing enterprise-class virtualization environments, with a very mature toolset for capacity planning, performance measurement, service level management, cost account-ing, and continuity planning for virtual environments including z/VM, z/OS, Solaris, HP/UX, AIX, Linux, and OS/400. BMC Software has now extended these solutions to provide integrated best practice manage-ment solutions for the latest wave of x86 virtualization environments.
For example, BMC Software provides BMC Atrium CMDB—a Configuration Management Database (CMDB) that is among the leaders in this technology. This is the centerpiece of an ITIL implementation, as it provides a central location and a “single source of truth” for information about how IT systems, applica-tions, software, and services across the enterprise are deployed, connected, and configured.
BMC Software can also accurately and automatically populate and maintain the contents of the CMDB through BMC Discovery. This will scan the IT environ-ment and populate the CMDB with updated informa-tion about hardware and software deployment through-out the enterprise—including both physical and virtual environments. An automated discovery solution such as BMC Discovery is essential to maintain the ongoing ac-curacy of the CMDB.
The CMDB provides the documentation, awareness, and visibility into the environment that is critical for ITIL Service Delivery. For example, it allows IT to maintain an accurate and up-to-date picture of what ap-plications are running in any given virtual environment, and what virtual environments are running on any given physical system. This is critically important to the ITIL discipline of Financial Management for IT Services, for example, as it provides accurate information for asset inventories, software usage, license reporting, system accounting, and chargeback, and allows IT to efficiently
reuse, redeploy and/or virtualize underutilized hardware and software.
ITIL also defines a Capacity Management Database (or CDB) that stores historical performance, service, utilization, and capacity information. This serves as a foundation for ITIL processes such as capacity planning and availability measurement. The CDB exists as part of a federated CMDB ecosystem, and because the two share a common index of configuration items, an accu-rate CMDB is important to ensure usability of the CDB. Otherwise, the CDB may end up wastefully storing utili-zation data for configuration items that no longer exist, or not making capacity data available for configuration items that do exist. An accurate CMDB with a federated CDB helps to avoid these problems, by ensuring infor-mation in the CDB can be used in the context of other management tasks, such as Availability or Service Level Management. This in turn allows IT to ensure resource availability is sufficient to achieve service goals, thereby delivering consistent service to business users—a key objective of Service Level Management.
Capacity Management involves more than just configu-ration information—it also requires in-depth monitor-ing, analysis, and prediction to understand how current and future workload will fit into existing capacity, and what new capacity may be needed to meet service objec-tives. This is where a solution like BMC Performance Assurance provides additional capabilities to help achieve ITIL objectives. BMC Performance Assurance is a suite of solutions (including specific support for both physi-cal and virtual servers, across mainframe, midrange, and x86 platforms) that provides analytics and predictive modeling of system and application performance. IT can analyze current performance, detect potential conflict and resource contention before it happens, and adjust capacity requirements, including predicting and migrating load to and from virtual servers. This ability to measure, analyze, and report on both physical and virtual resource utilization, and use that information to predict and plan for resource allocation across these environments, is an important tool to enable IT to meet Capacity Management objectives.
The ability to deliver Service Level Management and Availability Management is built on a foundation of Capacity Management, but it is consistently maintained through strong real-time monitoring and control. The BMC Performance Manager suite of products is one of the most mature solutions available for real-time monitoring and management of system and application availability and performance. The BMC Transaction Management family of products takes care of track-ing and controlltrack-ing transaction performance, Quality of Service (QoS), response times, and availability, and provides abilities for rapid problem isolation and remediation. Finally, BMC Service Level Management collects and analyzes key performance data as it relates to defined service level agreements, providing insight into (and allowing IT to prevent) potential service level breaches. All three product families are tightly inte-grated to provide a holistic Service Level Management capability.
Again, BMC Software’s expertise in mainframe, mid-range, and physical x86 servers is now extended into virtualization management, with specific capabilities within its products for managing virtual environ-ments. It is able to map virtualized resources to physi-cal resources, and provide a business context for the
virtual environments that delivers key advantages for Availability Management. Perhaps more importantly, it is able to integrate management of physical and virtual environments to deliver a holistic management platform for Service Level and Availability Management across the entire enterprise.
Through solutions such as BMC Performance Manager, BMC Performance Assurance, BMC Discovery, and BMC Atrium, BMC Software is bringing its mature main-frame, midrange, and x86 server management processes, based in many years of ITIL discipline, and experience managing the most mature virtualization environments, to the management of the latest x86 virtualization plat-forms such as VMware. More importantly, it is provid-ing specific toolsets that are designed to manage virtual environments, based on proven technology built for physical environments, and integrating them into ho-listic, enterprise-wide solutions for best practice opera-tions management for system virtualization.
EMA’s Perspective
Enterprise Management Associates (EMA) believes that the disciplines and best practices outlined in the IT Infrastructure Library help IT to deliver significant business benefits, including competitive advantages, im-proved agility in fast-moving markets, better customer service, reduced cost, and improved productivity, not only of IT, but also for business services in general. ITIL best practices have proven their effectiveness across the enterprise for many years, in many environments, and because ITIL describes generally applicable processes and procedures, these best practices are logically adapt-ed to new environments. EMA believes that applying ITIL disciplines to virtualization will help IT to deliver significant business benefits.
However, few management software or virtualization platform vendors have delivered more than lip service to the unique requirements of implementing ITIL in a virtual environment. They lack integrated management for both physical and virtual environments, and are not providing the enterprise-wide capabilities required for real ITIL discipline.
envi-Best Practice Operations Management
for System Virtualization
Page
©2007 Enterprise Management Associates, Inc. All Rights Reserved.
ronments to enable ITIL-based best practice systems management across the enterprise—including main-frame, midrange, and x86 systems, whether physical or virtual. With products like BMC Performance Manager, BMC Performance Assurance, BMC Discovery, and BMC Atrium, enterprises are able to apply proven tech-nology and best practice ITIL discipline to virtual and physical environments across the enterprise.
EMA believes that BMC Software will therefore help IT organizations involved in a virtualization strategy to deliver and maintain high value IT service to its business users, by providing the holistic oversight necessary to manage both virtual and physical systems as a part of the complete infrastructure.
About BMC Software
Enterprise Management Associates, Inc. All opinions and estimates herein constitute our judgement as of this date and are subject to change without notice. Product names mentioned herein may be trademarks and/or registered trademarks of their respective companies.
©2007 Enterprise Management Associates, Inc. All Rights Reserved. Corporate Headquarters:
2585 Central Avenue, Suite 100 Boulder, CO 80301