Service DesignService Design
10. Managing complaints and compliments
7.9 Capacity Management
7.9.4 Capacity Management Activities
Some of the activities of Capacity Management are defined in the context of three sub-processes consisting of Business, Service and Component Capacity Management.
Besides these, there will also be discussion of the operational activities required as well as the techniques that are utilized in various forms by the three different sub-processes.
7.9.4.1 Business Capacity Management
Business Capacity Management is the sub-process that covers the activities responsible for ensuring that the future business requirements for IT services are considered and evaluated in terms of their potential impact on capacity and performance. As business plans, operations and processes continually change, this will consequently affect the service provider‟s ability to satisfy the business and customers requirements, including those already documented in SLAs.
Some of the primary inputs into Business Capacity Management that trigger the activities here come from:
Project and Program Management;
Change Management;
Service Portfolio Management (as investments are evaluated and authorized);
Patterns of Business Activity(from Demand Management); and
New Service Level Packages and Service Level Requirements (from Service Level Management).
The activities that are recommended to be performed within the context of Business Capacity Management are:
Assisting with the development of Service Level Requirements;
To design, procure or amend service configuration;
To advise on appropriate or revised SLA targets;
To support SLA negotiations; and
To assist in the evaluation and control of proposed Change Requests.
When Capacity Management is provided with early opportunities to be involved with these processes then the planning and design of IT capacity and performance can be closely
aligned with business requirements and provisioned in an optimal and cost-effective manner.
7.9.4.2 Service Capacity Management
The primary focus of the Service Capacity Management sub-process is to monitor and analyze the use and performance of IT services and ensure that they meet their agreed SLA targets. The regular monitoring of service capacity and performance, and comparison against normal service levels will identify trends, breaches or any near misses that might occur.
As a result, this process will need to work closely with Service Level Management to understand the agreed levels of service and to report back any targets achieved and breached, and concerns or advice in regards to capacity and performance issues. There will also need to be process integration with Incident and Problem Management so that there is early detection of disruptions where the root cause may be due to insufficient capacity being provided, as well as appropriate action taken to resolve the disruption so that agreed business targets are still met.
7.9.4.3 Component Capacity Management
The objective of Component Capacity Management is to ensure that the implementation and management of each of the individual components that support IT services is
performed effectively to deliver optimum capacity and performance to meet business needs. This includes the constant monitoring and analysis of each component to
understand the performance, capacity and utilization characteristics that exist. This sub-process is a vital component to the overall quality of Capacity Management, as each component does have a finite capacity that, when reached, will begin to impact on the service levels being delivered and business operations being supported.
While much of the actual monitoring will be performed within Service Operation, the direct feedback should be provided to Component Capacity Management to interpret this data and take corrective action where necessary. There is a proactive side to the sub-process as well, and where possible there should be forecasting of any issues or events that might occur so that proper planning and preventative maintenance can be performed.
Other than the previously mentioned elements, there are three typical activities that occur within the Component Capacity Management sub-process:
1. Exploitation of new technology
As new technologies emerge, the IT service provider should seek to evaluate whether they might be able to deliver enhanced capacity and performance levels in a more
cost-effective manner than those already used. Recent examples of technologies that have been used successfully in this manner include virtualization, cloud computing and blade server implementations.
2. Designing resilience
In conjunction with Availability Management, there should be analysis as to where it is cost-effective to build resilience into the infrastructure, by assisting in techniques such as a Component Failure Impact Analysis (CFIA) and other risk assessment management
activities. Depending on the availability levels that have been agreed, Capacity
Management will evaluate what level of spare capacity of infrastructure components are required to meet these targets, and strive to ensure these requirements are considered early in the design stage of new or modified services.
3. Threshold management and control
Within Service Operation, there should be an ongoing set of monitoring and control activities that assist in providing assurance that agreed service levels are being delivered and protected. In the context of Capacity Management, there should be thresholds set for various components and services that raise warnings and alarms when approached or breached. Event Management will be primarily involved to support this capability and ensure that an appropriate level of capacity events are monitored and escalated to avoid staff being flooded with alerts.
7.9.4.4 Common Capacity Management Activities
So that each of the three sub-processes of Capacity Management operate effectively, there are some common activities that should be employed at each level (when necessary). The two most important activities in this regard are:
1. Modelling and trending
One of the major benefits provided to the Service Design phase by Capacity Management, is the capability to predict the behavior of IT services under certain conditions. This may be a given volume of utilization by users, a particular type of use or a combined variety of work being performed.
There are many different types of modeling techniques that rely heavily on simulation and mathematical calculations, so depending on the size and complexity of the new or
modified service offering, there may be very little or quite comprehensive modeling performed.
The main techniques utilized for modeling include:
Baselining – where a baseline of current performance and capacity levels is identified and documented;
Trend analysis – where services and components are monitored over time for their utilization to assist in the identification of trends and the potential forecasting of future utilization and performance levels;
Analytical modeling – where mathematical techniques are used to predict the performance levels that might be achieved under certain conditions or after making modifications to the infrastructure. Analytical modeling is typically quicker and cheaper to perform than Simulation Modeling, but also typically provides less accurate results; and
Simulation modeling – where a set of discrete events are modeled and compared against a defined hardware configuration. This will often involve simulation
transactions across the service and infrastructure, and as a result will typically yield more accurate results.
2. Application Sizing
Application sizing is an activity that begins during the early design of a new or modified service and ends when the service has been accepted into the production environment.
The sizing activities relate to all elements and components required for service, including estimation of the required capacity levels of hardware, data, environments and
applications that are involved.
The main objective is to accurately estimate the resource requirements needed to support a proposed change and ensure that it meets its required service levels. This includes consideration as to the resilience measures that might be required to deliver a set level of capacity, performance and availability. This will be an iterative process, including constant negotiation with Service Level Management to define a cost-effective approach that
satisfies the business objectives.
While some aspects of quality may be improved after implementation (including adding additional hardware and other components), in most cases quality must be built in from the start, otherwise much higher costs are incurred trying to fix issues once the service is in production.
7.9.4.5 Operational Activities of Capacity Management
Whereas the previously mentioned activities of application sizing and modeling are those primarily executed in the design stages of a service, the following activities are the
common operational activities that are performed across the three sub-processes. The major difference between the sub-processes and their use of these activities comes down to the data being collected and the perspective from which it is analyzed. For example, Component Capacity Management is concerned with the performance of individual components, where Service Capacity Management is concerned with the performance of the entire service, monitoring transaction throughput rates and response times.
Figure 6.10: – The operational activities of Capacity Management
© Crown Copyright 2007 Reproduced under license from OGC
1. Utilization monitoring
The monitoring applied should be specific to a particular CI, whether it is an IT service, an operating system, a hardware configuration or application. It is important that the monitors can collect all the data required by Capacity Management for each of the three
sub-processes.
Some of the typical monitored data collected include:
Processor utilization;
Memory utilization;
Per cent processor per transaction type;
Input/output rates;
Queue lengths;
Disk utilization;
Transaction rates;
Response times;
Database usage;
Index usage;
Hit rates;
Concurrent user numbers; and
Network traffic rates.
When collecting data intended for use by the Service Capacity Management
sub-processes, the transaction response time for services may be monitored and measured by:
Incorporating specific code within client and server applications software;
Using „robotic scripted systems‟ with terminal emulation software;
Using distributed agent monitoring software; and
Using specific passive monitoring systems.
2. Analysis
The data collected by the various monitoring activities and mechanisms will then be used to identify trends, baselines, issues and conformance or breaches to agreed service levels.
There may be other issues identified such as:
Bottlenecks within the infrastructure;
Inappropriate distribution of workload across the implemented resources;
Inefficiencies in application design;
Unexpected increased in workloads and input transactions; and
Scheduled services that need to be reallocated.
3. Tuning
After analysis of collected data has occurred, there may be some corrective action that is required in order to better utilize the infrastructure and resources to improve the
performance of a particular service. Examples of the types of tuning techniques that might be used include:
Balancing workloads – transactions may arrive at the host or server at a particular gateway, depending where the transaction was initiated; balancing the ratio of initiation points to gateways can provide tuning benefits;
Balancing disk traffic – storing data on disk efficiently and strategically, e.g. striping data across many spindles may reduce data collection;
Definition of an acceptable locking strategy that specifies when locks are necessary and the appropriate level, e.g. database, page, file, record and row delaying the lock until an update is necessary may provide benefits; and
Efficient use of memory – may include looking to utilize more or less memory depending upon the circumstances.
Before implementing any of the recommendations arising from the tuning techniques, it may be appropriate to consider using one of the on-going activities to test the validity of the recommendation.
4. Implementation
The objective of implementation is to control the introduction of any changes identified into the production environment. Depending on the changes required, this may be
implemented via a normal change model (using all the normal steps of Change Management) or a standard change where there is already change approval and an established procedure for the work required.