ITIL Capacity Management:
Is it really ‘Best Practice’
or is there room for improvement?
Andy Bolton
Capacitas Ltd.
Agenda
• Defining ‘Best Practice’
• ITIL Overview & ITIL Capacity Management
• ITIL Capacity Management sub-processes
• ITIL Capacity Management activities
• ITIL Capacity Management process interfaces
• How does it fit together?
• What is good, what is missing and what could be done better?
• Conclusions
‘Best Practice’: A definition
To assess whether ITIL Capacity Management is really ‘best practice’ we need a working definition.
For this presentation we will adopt the following as a general definition of ‘best practice’, based on the relevant Wikipedia entry:
‘Best Practice’ determines the most broadly effective and efficient means for organising a system or performing a function.
Note: The downside to this, or any, definition of ‘best practice’ assumes there is only one way to organise a system or perform a function that is broadly effective and efficient, in all circumstances.
Application Lifecycle & Capacity Management
‘Phase-out’ ‘Live’ ‘Development’ ‘Concept’ Design • Review design for perform-ance problems, costs and scalability Live • Business-as-usual capacity management – ensuring systems can meet demands upon them based on current and future work Roll-out • Provide capacity assurance assessment that transient capacity require-ments can be met Testing • Review perform-ance testing results for any problems Coding • Provide design guidance to avoid perform-ance anti-patterns Feasibility• Need to produce an approximate cost of system to meet specified performance
End-of-life • Plan for de-commissioning requirements including any transient capacity required for migration to new platform Changes
• Assess all software release changes to the application to ensure they will not affect system Require
• Review if require-ments are achievable within budget
A Brief ITIL Overview
“ITIL is the only consistent and comprehensive documentation of best practice for IT
Service Management†. Used by many hundreds of organisations around the world, a whole
ITIL philosophy has grown up around the guidance contained within the ITIL books and the supporting professional qualification scheme.
ITIL consists of a series of books giving guidance on the provision of quality IT services, and on the accommodation and environmental facilities needed to support IT. ITIL has been developed in recognition of organisations' growing dependency on IT and embodies best practices for IT Service Management.”
What comprises ‘ITIL Capacity Management’?
ITIL 1: Original CCTA publication of ITIL Capacity Management in 1991 (Brian Johnson) consisted of a 92 page book. The original publication set is now known colloquially as ‘ITIL 1’.
ITIL 2: Around 2000 the OGC (a successor of CCTA) released a new set of ITIL books (known as ‘ITIL 2’) with the ten service management processes divided into ‘Service
Support’ and ‘Service Delivery’. Capacity Management is one of the five subjects within the 300 page Service Delivery book, but itself only consists of 39 pages plus 5 pages containing appendixes.
ITIL 3: Currently in progress involving the OGC, itSMF and other industry bodies.
Please note that this presentation is based only on ITIL 2, the current OGC release, and will only focus on the Capacity Management process.
What is ‘ITIL Capacity Management’?
Broken down into three separate tiered sub-processes:
• Business Capacity Management (IT Capacity Management assisting business decision-making)
• Service Capacity Management (focus on the end-to-end capacity requirements of each service)
• Resource Capacity Management (focus on individual system’s capacity requirements) This provides a sensible method for partitioning different activities by their primary goals, customers and deliverables.
Contains the following discrete components, most of which are termed as activities: • Iterative activities
• Storage of Capacity Management data • Demand Management
ITIL Capacity Management Overview
Business Capacity Management (BCM) Service Capacity Management (BCM) Resource Capacity Management (BCM)
Production of the Capacity Plan Iterative
Activities
Demand Management
Modelling
Application Sizing
Storage of Capacity Management
Data
CDB
Covering all aspects of BCM, SCM and RCM
Business Capacity Management
“A prime objective of the Business Capacity Management sub-process is to ensure that the future business requirements for IT Services are considered and understood, and that sufficient Capacity to support the services is planned and implemented in an appropriate timescale”
This is the most confused and least well-described of the tiers of capacity management; sadly a great opportunity missed!
ITIL appears to be at a loss, or at least confused, as to what Business Capacity
Management (BCM) is. It appears to replicate details about Service Level Requirements (SLRs) and Service Level Agreements (SLAs) that are already within Service Capacity
Management (SCM). I believe these service-focussed activities should remain wholly within SCM. Business Capacity Management should be about planning at the business level, driven by business volumetrics, rather than at any service or resource level. This will be covered later on in this presentation.
Business Capacity Management
Agree Budget
Figure – Crown Copyright 2001
Operational system complies
with SLA Sign SLA Negotiation and
verify SLA
Resolve Capacity related Incidents
& Problems
New requirements
Design procure amend configuration Identify and
agree SLRs
Implement under Change Management Update CMDB /
Service Capacity Management
“A prime objective of the Service Capacity Management sub-process is to identify and understand the IT Services, their use of resource, working patterns, peaks and troughs, and to ensure that the services can and do meet their SLA targets, i.e. to ensure that the IT Services perform as required. In this sub-process, the focus is on managing service performance, as determined by the targets contained in the SLAs or SLRs”
Service Capacity Management (SCM) is focussed on the IT services provided and used,
irrespective of what underlying platforms they use, and so is interested in only performance and capacity aspects of each service. However ITIL suggests that SCM only comes into play “once the service becomes operational”; this is because of its confusion over what Business Capacity Management should really be about, placing the pre-live aspects of Service
Capacity Management in that sub-process instead; I believe this is wrong. Service Capacity Management should cover all aspects of the ‘IT Service’ throughout its lifecycle, including pre-live.
Resource Capacity Management
“A prime objective of Resource Capacity Management is to identify and understand the Capacity and utilisation of each of the component parts in the IT Infrastructure. This
ensures the optimum use of the current hardware and software resources In order to achieve and maintain the agreed service levels. All hardware components and many software components have a finite capacity, which, when exceeded, has the potential to cause performance problems.”
Resource Capacity Management is focussed on reviewing individual components of the IT infrastructure, usually at a platform-level, such as Solaris, Windows, Z/OS, etc. It concerns resources such as processors, memory, disk and network and so recognises the need to collect resource utilisation information on a regular (‘iterative’) basis. It recommends that “monitors should be installed on the individual hardware and software components … configured to collect the necessary data”. As this is has traditionally been the most
common form of Capacity Management it is surprising to find it covered in four paragraphs. On the positive side RCM does then go on to also cover:
• the necessity for capacity managers to understand and recommend the benefits of new technology
Iterative Activities
ITIL groups many of the business-as-usual activities together as they “need to be carried out iteratively and form a natural cycle”; it calls these the “iterative activities” as shown in the diagram on the following page.
The Monitoring activity focuses on monitoring the utilisation of resources and services; typical data includes CPU utilisation, transactions per second, transaction response time and queue lengths.
The Analysis activity should “identify trends from which the normal utilisation and service level, or baseline, can be established”.
The Tuning activity is where areas of the configuration identified in the Analysis activity “could be tuned to better utilise the system resource or improve the performance of a particular service”.
ITIL Capacity Management:– ‘iterative activities’
Implementation
Tuning
Analysis
Monitoring
Capacity Management
Database (CDB)
SLM thresholds Resource
utilisation thresholds
SLM exception
reports
Resource utilisation exception
reports
Capacity Management Database (CDB)
“The Capacity Management Database (CDB) is the cornerstone of a successful
Capacity Management process. Data in the CDB is stored and used by all the sub-processes of Capacity Management because it is a repository that that [sic] holds a number of
different types of data viz. business, service, technical, financial and utilisation data.
However the CDB is unlikely to be a single database and probably exists in several physical locations.”
The CDB is the central repository for all capacity management reporting and as such should contain (for all platforms, services and businesses):
• Business data • Service data • Technical data • Financial data • Utilisation data
Capacity Management Database (CDB)
ITIL Capacity Management specifies the following as inputs of the CDB: • For Business Data this includes:
• Number of accounts and products supported • Seasonal variations of anticipated workloads • For Service Data this includes:
• Response times • SLM thresholds
• For Technical Data this includes:
• Resource utilisation limitations, e.g. 40% utilisation for a shared Ethernet segment
• For Financial Data this includes: • Financial plans
• IT budgets
• For Utilisation Data this includes: • CPU utilisation for servers
Capacity Management Database (CDB)
ITIL Capacity Management specifies the following as outputs of the CDB: • Service and Component Based Reports:
“reports must be produced to illustrate how the service and its constituent
components are performing and how much of its maximum Capacity is being used.”
• Exception Reporting:
“Reports that show … when the Capacity and performance of a particular component or service becomes unacceptable are also a required output …” • Capacity Forecasts:
“… the Capacity Management process must predict future growth. To do this, future component and service Capacity must be forecast. A simple example of a Capacity forecast is a correlation between a business driver and a
component utilisation, e.g. CPU utilisation against the number of accounts supported by the company.”
Demand Management
“The prime objective of Demand Management is to influence the demand for computing resource and the use of that resource.”
This is initially a really strong inclusion in ITIL Capacity Management, as too many capacity professionals only concentrate on controlling supply, forgetting that demand is the other side of the equation.
ITIL does recognise the difficulty in operating Demand Management as it could cause “damage to the business Customers or to the reputation of the IT organisation”, but does not seem to acknowledge the necessity for workload characterisation to undertake it
accurately. It covers this important topic in only seven paragraphs, covering less than one page!
Modelling
“ A prime objective of Capacity Management is to predict the behaviour of IT Services under a given volume and variety of work. Modelling is an activity that can be used to beneficial effect in any of the sub-processes of Capacity Management.”
Modelling, according to ITIL Capacity Management, only offers the following options: • Trend Analysis
• Analytical Modelling • Simulation Modelling • Baseline Models
However, ITIL barely distinguishes where each of these techniques should be used; it appears to simply offer them as a ‘toolkit’ of available modelling methods. Although
recognised as an underlying support activity to the overall process it is documented in only ten paragraphs.
Application Sizing
“The primary objective of Application Sizing is to estimate the resource requirements to support a proposed application Change or new application, to ensure it meets its required service levels. To achieve this application sizing has to be an integral part of the application lifecycle.”
Importantly ITIL recognises that “it is much easier and less expensive to achieve the
required service levels if the application design considers the required service levels at the very beginning of the application lifecycle, rather than at some later stage”; however it does not explicitly state the role Capacity Management has in performance assurance or vice versa.
This is probably the most important recommendation in ITIL Capacity Management, so could do with being more strongly emphasised rather than being a mere seven short paragraphs. Unfortunately this recognition of the importance of Capacity Management within the development lifecycle is not a mandatory requirement; also it doesn’t translate well into BS15000, the closely related British Standard, which only states the capacity management process “should provide support to the development of new and changed services”.
Capacity Plan
“The prime objective is to produce a plan that documents the current levels of resource utilisation and service performance, and after consideration of the business strategy and plans, forecasts the future requirements for resources to support the IT Services that underpin the business activities. The plan should clearly indicate clearly any assumptions made. It should also include any recommendations quantified in terms of resource
required, cost, benefits, impact etc.”
ITIL refers to this as Production of the Capacity Plan. The Capacity Plan is the
fundamental output that any capacity management function must deliver, yet ITIL accords it only four sentences in addition to the above objective paragraph (plus a template
Capacity Plan in an annex).
ITIL recommends capacity plans “be published annually, in line with the business or budget lifecycles”, and updated quarterly thereafter. This recommendation does not recognise that a Capacity Plan should really be produced in line with the rate of change on the platform or service under scrutiny. For example a government department may be only require an
Activity Frequency
ITIL describes when various activities should be undertaken as: On-going:
• Iterative activities • Demand Management
• Storage of Capacity Management Data Ad-hoc:
• Modelling
• Application Sizing Regularly:
• Production of the Capacity Plan
It also states that “any one of the sub-processes of Capacity Management may carry out any of the activities, with the data that is generated being stored in the CDB”.
Process Interfaces
Service Support Incident Management Problem Management Change Management Configuration Management Release Service Delivery Availability Management IT Service Continuity Management Financial Management Service Level ManagementInformation and resolutions on Capacity-related Incidents
Provision of Configuration Item information
Assess Changes for Capacity impact
Provide assistance and resolutions on Capacity-related
Problems
Assistance with developing the
Ensuring that performance and Capacity targets can be
achieved in SLAs
Provision of cost summaries and Charging mechanisms
Determination of Capacity requirements for all recovery
options
Close alignment as capacity issues result in service
unavailability
Capacity Management
Application
How does it all fit together?
‘Phase-out’ ‘Live’ ‘Development’ ‘Concept’ Design • Review design for perform-ance problems, costs and scalability Live • Business-as-usual capacity management – ensuring systems can meet demands upon them based on current and future work Roll-out • Provide capacity assurance assessment that transient capacity require-ments can be met Testing • Review perform-ance testing results for any problems Coding • Provide design guidance to avoid perform-ance anti-patterns Feasibility• Need to produce an approximate cost of system to meet specified performance
End-of-life • Plan for de-commissioning requirements including any transient capacity required for migration to new platform Changes
• Assess all software release changes to the application to ensure they will not affect system Require
• Review if require-ments are achievable within budget Business-focussed activities Supporting activity Tools Application Sizing Modelling Capacity Plan Demand Management Performance Monitoring Application Sizing Iterative Activities Capacity Database
What is good in ITIL Capacity Management?
• Coverage of Response Time Monitoring • ITIL Framework for Service Management
• ITIL Capacity Management – although basic it has a good breadth
• Recognition of potentially high cost of Capacity Management, especially tools • Recognition of potentially valuable benefits of Capacity Management, including:
• Increased effectiveness and cost savings • Reduced risk
• More confidence in forecasts • Value to application lifecycle
• Interfaces to other Service Management processes • Capacity Plan template
• Close relationship between Capacity Management and Availability Management • Capacity Management Database overview
• Activity frequency timetable
What is good in ITIL Capacity Management?
• Recognises shortcoming of ‘pay for upgrades as required’ approach to capacity management
• Recognises the complexity of distributed capacity management compared to the ‘good old days’ of the mainframe
• Recognises the dependence of other service management processes on an effective capacity management process
• “Good Capacity Management ensures NO SURPRISES”
• Recognition that capacity management is about meeting current and future business requirements cost-effectively
• Capacity Management process’s goal is ‘to ensure that cost justifiable IT Capacity always exists and that it is matched it the current and future identified needs of the business’
• Scope of the Capacity Management process – it should be the focal point for all IT performance and capacity issues
• Capacity Management has a close, two-way relationship with the business strategy and planning process
What is good in ITIL Capacity Management?
• Recognition that the Capacity Management process requires accurate information on the business and IT strategy and plans to function effectively
• Capacity Management needs to assess all changes for their impact on capacity of the infrastructure
• Recognition that Capacity Management process activities are categorised into proactive and reactive activities
• “The more successful the proactive activities of Capacity Management, the less need there will be for the reactive activities of Capacity Management”
• “Capacity Management should not be a last minute ‘tick in the box’ just prior to Operations Acceptance and Customer Acceptance”
• Recognition that SLAs should be verified by Capacity Management process using modelling
• Recognition that Capacity Management should identify new technology opportunities
• Capacity Management is a key enabler for business success
What is missing from ITIL Capacity Management?
1. A recognition of the need for performance assurance / performance engineering within Capacity Management:
• Application Sizing and Modelling appear to be simply used to reactively size target platforms rather than assist with optimising the application design during the
development lifecycle
• Without some level of performance assurance / performance engineering SLRs may not be met on any size platform
• Performance risk analysis at Change stage involving Capacity Management 2. Application co-existence modelling (within Modelling)
3. Workload characterisation, profiling, modelling & management 4. Demand Forecasting provided in business units (# accounts, etc.) 5. Organisational structures for large organisations
What is missing from ITIL Capacity Management?
‘Phase-out’ ‘Live’ ‘Development’ ‘Concept’ Design • Review design for perform-ance problems, costs and scalability Live • Business-as-usual capacity management – ensuring systems can meet demands upon them based on current and future work Roll-out • Provide capacity assurance assessment that transient capacity require-ments can be met Testing • Review perform-ance testing results for any problems Coding • Provide design guidance to avoid perform-ance anti-patterns Feasibility• Need to produce an approximate cost of system to meet specified performance
End-of-life • Plan for de-commissioning requirements including any transient capacity required for migration to new platform Changes
• Assess all software release changes to the application to ensure they will not affect system Require
• Review if require-ments are achievable within budget ITIL activities Application Sizing Modelling Capacity Plan Demand Management Performance Monitoring Application Sizing Iterative Activities Performance Assurance Workload Characterisation Demand Forecasting
What could be done better?
Firstly, Business Capacity Management: If you consider this is to be aligned and interfacing with the Business Management tier then this is a major lost opportunity. The business converses in metrics which are not easily useful or even identifiable to the IT user. An example in a financial services company would be:
• Business Manager: Talks in number of customer accounts, funds under management, etc. • Service Manager: Talks in number of IT services, SLAs, response times, etc.
• Resource Manager: Talks in number of servers, OS, hardware specification & configuration, etc.
These are not the same language and the translation between them is often non-trivial. A Business Manager will be using the units or metrics that he understands, cares about and relates to his bonus! So, for example, he could be interested in the number of customer accounts that the company has and expects to obtain in the future. Customer accounts may be useful for some simple capacity metrics but generally does not map 1:1 for any resource or service. More detail along these lines would be extremely useful.
What could be done better?
Other areas that could do with improvement in ITIL Capacity Management:
• Application Sizing and Modelling activities sections should be re-written to explain these complex subjects more effectively
• While there is a recognition of the Time-to-Market pressures on Capacity Management process, ITIL provides no advice or recommendations to help
• The relationships between each of the activities needs more detailed explanation • Increased complexity of Distributed Capacity Management mentioned briefly but not
elaborated on
• Covers many important details of the activities and sub-processes only in the Implementation section
• Need to recognise that Capacity Management recommendations will often be ignored until too late (a lack of pragmatism)
• Recognition of the IT organisation’s reliance on Capacity Management to produce a consolidated budget forecast
• The dependence on Capacity Management to provide transient capacity in Service Continuity situation is not well explained
Conclusions
Q. Is ITIL Capacity Management really ‘best practice’ as under our working definition? A. No. It is, on balance, a very good starting point, but lacks a consistent and coherent
philosophy that should be evident within a ‘best practice’ document. It could arguably be called good practice though
Q. Why do I think it isn’t ‘best practice’? A. In summary, for the following reasons:
• It defines a broad set of activities that should be undertaken by everyone to achieve appropriate Capacity Management, not recognising differing circumstances across organisations.
• It poorly describes key activities such as Modelling, Application Sizing and Production of the Capacity Plan
• It poorly describes the key sub-process of Business Capacity Management
• It is missing key activities including Performance Assurance, Workload Characterisation and Demand Forecasting
Any Questions?
Andy Bolton
Capacitas Ltd.
Bibliography
IT Infrastructure Library:
• Service Delivery, TSO Books, 2001
• Application Management, TSO Books, 2002
[Please note that ITIL® and IT Infrastructure Library® are Registered Trade Marks of OGC.]
British Standards:
• BS 15000-1:2002, IT service management, Part 1: Specification for service management
• BS 15000-2:2003, IT service management, Part 2: Code of practice for service management
Other Service Management & IT Governance Frameworks:
• Microsoft Operations Framework, Microsoft Corporation, www.microsoft.com
• COBIT (Control Objectives for Information and related Technology), IT Governance Institute (ITGI), www.itgi.org