INTRODUCTION AND SCOPE
Most organisations’ dependency on their IT systems is such that the loss of key applications or infrastructure could cause the company to fail within days if not earlier. Because of this, organisations need to plan how they will recover their key systems within an appropriate timescale in the event of a failure. This is the focus of the IT service continuity management (ITSCM) process.
Organisations can of course suffer from the loss of systems other than IT sys- tems and should therefore have a general business continuity plan that protects against any eventuality that could threaten its vital business functions (VBFs). ITSCM should therefore support and align with the organisation’s business continuity management (BCM) process where this exists.
PURPOSE AND OBJECTIVES
The purpose of ITSCM is to support business continuity management by ensur- ing that the IT resources, systems and services can be reinstated within agreed timescales in the event of a major incident. This is achieved by creating and maintaining the necessary facilities and recovery capabilities.
The objectives of the process are:
•
to create and maintain the IT service continuity plans and recovery plans;•
to carry out regular business impact analysis (BIA) exercises to ensure that the plans remain aligned with changing business requirements;•
to carry out regular risk analysis and management exercises to determine the potential for failure and identify and implement appropriate responses that meet agreed business continuity targets;•
to assess the impact of changes and take appropriate action to continue to pro- vide the required level of protection;•
to ensure that the appropriate third-party contracts and agreements are in place and kept up to date to maintain the continuity and recovery plans;•
to proactively enhance recovery capabilities where it is cost-effective to do so;IT SERVICE MANAGEMENT
KEY ACTIVITIES
The service continuity management lifecycle
Establishing and maintaining ITSCM is a cyclical process that ensures continued alignment with business continuity plans and business priorities. This process is shown in Figure 17.1.
The first two steps, initiation and then requirements and strategy, mainly relate to BCM. ITSCM begins with producing an ITSCM strategy to underpin the BCM strategy. The ITSCM strategy must ensure that cost-effective plans exist to recover IT services and any required IT infrastructure necessary to maintain VBFs.
Figure 17.1 ITSCM process (Source: The Cabinet Office ITIL Service Design ISBN 978-0-113313-05-1)
Business continuity management (BCM) Key activities Policy setting Scope Initiate a project
Business impact analysis Risk assessment
IT Service continuity strategy
Develop IT service continuity plans Develop IT plans, recovery plans and procedures
Organisation planning Testing strategy
Education, awareness and training Review and audit
Testing Change management Lifecycle Business continuity strategy Initiation Requirements and strategy Implementation Ongoing operation Business continuity plans Invocation
The situation is more complex where some or all of the IT services are outsourced to another organisation. In this case, the ITSCM manager must ensure that the outsourcer’s continuity and recovery plans meet the objectives and timescales of the business.
Business impact analysis
Business impact analysis (BIA) is the activity performed by ITSCM, often together with availability management, that works with the business to understand the impact on the organisation of suffering degraded service or losing an IT service or component. The analysis will identify business functions that are critical to the success of the organisation (VBFs) and it is these functions that ITSCM must protect from the impact of an IT failure. The business will define the recovery requirement for these functions that ITSCM must address through its IT continuity plans. Over time, the importance of business functions can 116
IT SERVICE CONTINUITY MANAGEMENT
change and new ones appear, so ITSCM must undertake regular BIA exercises and feed the results back into the continuity plans to ensure they remain appropriate and up to date.
Risk analysis and management
The first step in protecting VBFs is to understand their dependency on the IT services and infrastructure. This information can be discovered from the configura- tion management system. Next, ITSCM must consider a number of factors:
•
What could cause a service or component to fail? Examples can include fire, flood and security breaches in addition to simple mechanical or electrical failure.•
What is the likelihood of this happening? In other words, what are the chances that each of the events defined above could occur?•
What is the impact of such an occurrence? If one of the events did occur, what effect would this have on the business? This might be expressed in terms of the impact on its reputation, its customers, its finances or its legal or compliance requirements, for example.The outcome of these considerations will determine the appropriate actions ITSCM has to take to mitigate the risks adequately and cost-effectively. Typically, the greater the likelihood of failure and the greater the impact, the greater the level of protection needed and the greater the justification for the necessary expense. The above underlines the importance of risk analysis and management to ITSCM.
RISK
A possible event that could cause harm or loss, or affect the ability to achieve objec- tives. A risk is measured by the probability of a threat, the vulnerability of the asset to that threat, and the impact it would have if it occurred.
The first stage of risk analysis and management is to identify potential threats to an asset or service, estimate the probability that the threat might materialise, assess how vulnerable the asset or service is to these threats and to assess the impact should the threat materialise. For example, as identified above, flood is one example of a threat that might be relevant to an asset such as a data centre. We would determine the probability that the centre might be flooded, assess the vulnerability of the data centre to flooding and the impact on the organisation if it did flood. Putting all these together would give us a measure of risk.
The second part of risk management is doing something about the risks identified. Generally, we can do a number of things about risks:
•
Some risks can just be accepted and provision made in case the worst happens. If we cannot insure our data centre because it sits in a flood plain, we may decide to hold a contingency fund in case it does flood.IT SERVICE MANAGEMENT
•
We can avoid or eliminate the risk; for example, we can eliminate the risk to our data centre by deciding to go back to manual processing. This is not always a practical solution.•
We can transfer the risk to somebody else, for example by taking out insurance or by outsourcing the data centre and disaster recovery.•
We can reduce the risk by reducing the probability of the threat or by reducing the severity if the risk materialises. For our data centre we might move it to the top of a hill to reduce the probability of a flood or reduce the impact of a flood by replacing under floor cables with fibre optics.In many cases, the response to risk will be a combination of all or some of these options, with a balance being established between the business’ tolerance to risks and the cost of countermeasures.
A key issue for IT service management, and ITSCM in particular, is to have some way of analysing and managing risk, and the best and safest approach is to use a tried and tested framework that covers all aspects of risk identification and management. Management of Risk (M_o_R®), a part of the Best Practice Guidance
portfolio published by the The Cabinet Office is a recommended framework.
RELATIONSHIPS WITH OTHER SERVICE MANAGEMENT PROCESSES Availability management
There is clearly an overlap between the ITSCM process and the availability management process. The distinction is that availability management is primarily concerned with maintaining the availability of VBFs, whereas ITSCM provides contingency in the event of a failure that either availability management could not prevent or from which IT could not quickly recover.
Change management
Changes need to be assessed for their impact on continuity plans and consequent changes incorporated into the change planning. The continuity plan itself is subject to change control.
Service level management
Service level management will provide advice on the definition of VBFs and the expectations of the business with regard to the permissible time delays in the resto- ration of services.
Capacity management
Capacity management helps to ensure adequate resources are available to accom- modate services after the continuity plan is invoked and that agreed service levels can be maintained in this situation.
Asset and configuration management
Configuration management maintains records of recovery CIs, their status and specification.
IT SERVICE CONTINUITY MANAGEMENT
Information security management
The potential for a security breach to cause a major incident means that informa- tion security management contributes to the BIA and risk analysis activities.
METRICS
Metrics that can be used to measure the performance of the ITSCM service and process in respect of the effectiveness and preparedness of the organisation are as follows:
•
The number of services not covered by the continuity and recovery plans (that should be covered).•
The number of issues identified in the last continuity test that remain to be addressed.•
The number of errors found in an audit of the information in lists of key people, their responsibilities and contact details.ROLES
The IT service continuity manager is responsible for ensuring that the objectives of the process are met. Their activities therefore include:
•
undertaking BIA and risk management exercises for both existing and new services;•
implementing and maintaining the ITSCM process and continuity strategy and maintaining the alignment with business continuity planning;•
preparing and maintaining the continuity and recovery plans and ensuring that these continue to support the organisation’s business continuity strategy and plans;•
regularly testing the plans for effectiveness, reviewing the results and taking action to overcome any identified deficiencies;•
ensuring that any personnel who have a role in transitioning from one location to another are fully trained and aware of their responsibilities;•
managing third-party suppliers of recovery equipment and facilities to main- tain the integrity of the continuity and recovery plans;•
attending change advisory board (CAB) meetings as required and assessing changes for their impact on the plans and updating the plans accordingly;•
managing the continuity plan during invocation and restoring the service back to the primary or other designated facility.TEST QUESTIONS FOR CHAPTER 17
SD 14 A 14