17 IT SERVICE CONTINUITY MANAGEMENT (SD 4.6)

INTRODUCTION AND SCOPE

Most organisations’ dependency on their IT systems is such that the loss of key applications or infrastructure could cause the company to fail within days if not earlier. Because of this, organisations need to plan how they will recover their key systems within an appropriate timescale in the event of a failure. This is the focus of the IT service continuity management (ITSCM) process.

Organisations can of course suffer from the loss of systems other than IT systems and should therefore have a general business continuity plan that protects against any eventuality that could threaten its vital business functions (VBFs). ITSCM should therefore support and align with the organisation’s business continuity management (BCM) process where this exists.

PURPOSE AND OBJECTIVES

The purpose of ITSCM is to support business continuity management by ensuring that the IT resources, systems and services can be reinstated within agreed timescales in the event of a major incident. This is achieved by creating and maintaining the necessary facilities and recovery capabilities.

The objectives of the process are:

•

to create and maintain the IT service continuity plans and recovery plans;

•

to carry out regular business impact analysis (BIA) exercises to ensure that the plans remain aligned with changing business requirements;

•

to carry out regular risk analysis and management exercises to determine the potential for failure and identify and implement appropriate responses that meet agreed business continuity targets;

•

to assess the impact of changes and take appropriate action to continue to provide the required level of protection;

•

to ensure that the appropriate third-party contracts and agreements are in place and kept up to date to maintain the continuity and recovery plans;

•

to proactively enhance recovery capabilities where it is cost-effective to do so;

IT SERVICE MANAGEMENT

KEY ACTIVITIES

The service continuity management lifecycle

Establishing and maintaining ITSCM is a cyclical process that ensures continued alignment with business continuity plans and business priorities. This process is shown in Figure 17.1.

The first two steps, initiation and then requirements and strategy, mainly relate to BCM. ITSCM begins with producing an ITSCM strategy to underpin the BCM strategy. The ITSCM strategy must ensure that cost-effective plans exist to recover IT services and any required IT infrastructure necessary to maintain VBFs.

Figure 17.1 ITSCM process (Source: The Cabinet Office ITIL Service Design ISBN 978-0-113313-05-1)

Business continuity management (BCM) Key activities Policy setting Scope Initiate a project

Business impact analysis Risk assessment

IT Service continuity strategy

Develop IT service continuity plans Develop IT plans, recovery plans and procedures

Organisation planning Testing strategy

Education, awareness and training Review and audit

Testing Change management Lifecycle Business continuity strategy Initiation Requirements and strategy Implementation Ongoing operation Business continuity plans Invocation

The situation is more complex where some or all of the IT services are outsourced to another organisation. In this case, the ITSCM manager must ensure that the outsourcer’s continuity and recovery plans meet the objectives and timescales of the business.

Business impact analysis

Business impact analysis (BIA) is the activity performed by ITSCM, often together with availability management, that works with the business to understand the impact on the organisation of suffering degraded service or losing an IT service or component. The analysis will identify business functions that are critical to the success of the organisation (VBFs) and it is these functions that ITSCM must protect from the impact of an IT failure. The business will define the recovery requirement for these functions that ITSCM must address through its IT continuity plans. Over time, the importance of business functions can 116

IT SERVICE CONTINUITY MANAGEMENT

change and new ones appear, so ITSCM must undertake regular BIA exercises and feed the results back into the continuity plans to ensure they remain appropriate and up to date.

Risk analysis and management

The first step in protecting VBFs is to understand their dependency on the IT services and infrastructure. This information can be discovered from the configuration management system. Next, ITSCM must consider a number of factors:

•

What could cause a service or component to fail? Examples can include fire, flood and security breaches in addition to simple mechanical or electrical failure.

•

What is the likelihood of this happening? In other words, what are the chances that each of the events defined above could occur?

•

What is the impact of such an occurrence? If one of the events did occur, what effect would this have on the business? This might be expressed in terms of the impact on its reputation, its customers, its finances or its legal or compliance requirements, for example.

The outcome of these considerations will determine the appropriate actions ITSCM has to take to mitigate the risks adequately and cost-effectively. Typically, the greater the likelihood of failure and the greater the impact, the greater the level of protection needed and the greater the justification for the necessary expense. The above underlines the importance of risk analysis and management to ITSCM.

RISK

A possible event that could cause harm or loss, or affect the ability to achieve objectives. A risk is measured by the probability of a threat, the vulnerability of the asset to that threat, and the impact it would have if it occurred.

The first stage of risk analysis and management is to identify potential threats to an asset or service, estimate the probability that the threat might materialise, assess how vulnerable the asset or service is to these threats and to assess the impact should the threat materialise. For example, as identified above, flood is one example of a threat that might be relevant to an asset such as a data centre. We would determine the probability that the centre might be flooded, assess the vulnerability of the data centre to flooding and the impact on the organisation if it did flood. Putting all these together would give us a measure of risk.

The second part of risk management is doing something about the risks identified. Generally, we can do a number of things about risks:

•

Some risks can just be accepted and provision made in case the worst happens. If we cannot insure our data centre because it sits in a flood plain, we may decide to hold a contingency fund in case it does flood.

IT SERVICE MANAGEMENT

•

We can avoid or eliminate the risk; for example, we can eliminate the risk to our data centre by deciding to go back to manual processing. This is not always a practical solution.

•

We can transfer the risk to somebody else, for example by taking out insurance or by outsourcing the data centre and disaster recovery.

•

We can reduce the risk by reducing the probability of the threat or by reducing the severity if the risk materialises. For our data centre we might move it to the top of a hill to reduce the probability of a flood or reduce the impact of a flood by replacing under floor cables with fibre optics.

In many cases, the response to risk will be a combination of all or some of these options, with a balance being established between the business’ tolerance to risks and the cost of countermeasures.

A key issue for IT service management, and ITSCM in particular, is to have some way of analysing and managing risk, and the best and safest approach is to use a tried and tested framework that covers all aspects of risk identification and management. Management of Risk (M_o_R®_{), a part of the Best Practice Guidance}

portfolio published by the The Cabinet Office is a recommended framework.

RELATIONSHIPS WITH OTHER SERVICE MANAGEMENT PROCESSES Availability management

There is clearly an overlap between the ITSCM process and the availability management process. The distinction is that availability management is primarily concerned with maintaining the availability of VBFs, whereas ITSCM provides contingency in the event of a failure that either availability management could not prevent or from which IT could not quickly recover.

Change management

Changes need to be assessed for their impact on continuity plans and consequent changes incorporated into the change planning. The continuity plan itself is subject to change control.

Service level management

Service level management will provide advice on the definition of VBFs and the expectations of the business with regard to the permissible time delays in the resto- ration of services.

Capacity management

Capacity management helps to ensure adequate resources are available to accom- modate services after the continuity plan is invoked and that agreed service levels can be maintained in this situation.

Asset and configuration management

Configuration management maintains records of recovery CIs, their status and specification.

IT SERVICE CONTINUITY MANAGEMENT

Information security management

The potential for a security breach to cause a major incident means that information security management contributes to the BIA and risk analysis activities.

METRICS

Metrics that can be used to measure the performance of the ITSCM service and process in respect of the effectiveness and preparedness of the organisation are as follows:

•

The number of services not covered by the continuity and recovery plans (that should be covered).

•

The number of issues identified in the last continuity test that remain to be addressed.

•

The number of errors found in an audit of the information in lists of key people, their responsibilities and contact details.

ROLES

The IT service continuity manager is responsible for ensuring that the objectives of the process are met. Their activities therefore include:

•

undertaking BIA and risk management exercises for both existing and new services;

•

implementing and maintaining the ITSCM process and continuity strategy and maintaining the alignment with business continuity planning;

•

preparing and maintaining the continuity and recovery plans and ensuring that these continue to support the organisation’s business continuity strategy and plans;

•

regularly testing the plans for effectiveness, reviewing the results and taking action to overcome any identified deficiencies;

•

ensuring that any personnel who have a role in transitioning from one location to another are fully trained and aware of their responsibilities;

•

managing third-party suppliers of recovery equipment and facilities to maintain the integrity of the continuity and recovery plans;

•

attending change advisory board (CAB) meetings as required and assessing changes for their impact on the plans and updating the plans accordingly;

•

managing the continuity plan during invocation and restoring the service back to the primary or other designated facility.

TEST QUESTIONS FOR CHAPTER 17

SD 14 A 14

18 INFORMATION SECURITY MANAGEMENT AND

In document vm33i.IT.Service.Management.A.Guide.for.ITIL.Foundation.Exam.Candidates.Second.Edition.pdf (Page 143-148)