• No results found

Contingency planning. DAU Marts 2013

N/A
N/A
Protected

Academic year: 2021

Share "Contingency planning. DAU Marts 2013"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Contingency planning

(2)

Agenda

Introduction

Process definition

Activation and notification

Recovery

Reconstruction

Evaluation

Examples

Do and Don’t

(3)

DAU Slide no 3 07 March 2013

Contingency plan

Why bother?

Information provided by information technology systems must be based on

reliable, relevant and accessible data, but before this data can add any

value, the data must be transformed into knowledge based decisions and

actions.

That means if data to bee seen as an valuable asset then data must be

protected and taking care of, analogue to any other asset management

disciplines.

One instrument for data asset management is to recover IT systems quickly

and effectively after an disaster has occur.

By other word if your IT systems are vital for running the business then you

need to develop and implement some kind of IT contingency plan.

(4)

DAU Slide no 4 07 March 2013

Contingency plan

Scope

Incident

Problem

Emergency

Disaster

Error

handling

Problem

management

Continuity

plan

A Problem is the unknown

underlying cause of one or more

Incidents

Problem

A Incident with a high impact or

potentially high impact, witch

requires a responses that is

above a normal operation

Emergency

An occurrence causing

widespread destruction and

disruption of the overall

business processes.

Disaster

Perform error handling according current procedures

Expert to analyze and solve the problem

Initiate continuity plan and disaster team to manage the disaster

(5)

Framework •Define scope and approach Business impact •Identify and categorize the business impact of critical system components Risk assessment •Identify, analyze and evaluate risk items Prepare •Develop a detailed contingency activity plan Implement •Perform training and test the planed contingency activities Evaluate •Review and update the contingency plan based on lesson learnt 5

Define, develop, implement and evaluate an effective contingency plan

based on a phase divided process.

Contingency plan

(6)

Slide no 6

Activation and Notification

Recovery

Reconstruction

Activation of the contingency plan occurs after

disruption or outage. When a disaster is detected the disaster team is established and an recovery

approach is decided.

The detailed recovery activity and resource plan is execute. Current procedures and instructions are performed by skilled persons that can recover the system without intimate system knowledge.

In the reconstruction phase, temporary recovery solutions are terminated and the system is transfer back to fully normal operation mode.

Contingency plan

Content

DAU 07 March 2013

Process definition

A contingency plan enables the organization to respond quickly and structured when an disaster occurs. Recovery time decrease by having the right tools, documentation and resources in place.

Evaluation

Evaluation of how durable the contingency plan is to support high recovery performance based on test and review activities.

(7)

Process definition

Slide no 7

Activate and notify

Recovery

Reconstruction

Scope Process overview Responsibilities Risk assessment DAU 07 March 2013

Contingency plan

Definition

Business impact Introduction

Evaluate

(8)

DAU Slide no 8 07 March 2013

Contingency plans

Roles and responsebilities

Disaster team

System owner

System manager

System experts

Process expetrs

Service providers

Planning

System recovery

Business continuity

Communication

Business managers

System users

Extern parties

Recover activities

Toolbox

Establish Infrastructure

Install and configure server

Install and configure clients

Test and operate

(9)

DAU Slide no 9 07 March 2013

Contingency plan

B

usiness impact

Process

Impact

MTD

Forecast

Missing demand plan

5 days

Schedule

No order scheduled

3 days

Shipment Goods not issued

1 day

Release

Batch is not released

2 days

Review

Batch is not reviewed

3 days

Recipe

Recipe issues

2 days

Execute

Production shortage

1 day

System

RTO

RPO

SAP

2 days

24 Hours

LIMS

1 days

8 Hours

BO

5 days

48 Hours

MES

1 days

8 Hours

PCS

8 hours

2 Hours

Maximum Tolerable Downtime

Recovery Time Objective

(10)

DAU Slide no 10 07 March 2013

Contingency plan

Risk Assessment

(11)

DAU Slide no 11 07 March 2013

Contingency plan

Risk assessment

Im p a c t Likelihood Unlikely Moderate Critical Major

Possible Likely Very Likely Minor

No Disaster Consequents Basic control Mitigations Recovery strategy

1 Fire outbreak Server is inaccessible Fire protection inspection Fire extinguisher Warm system swop 2 Power supply Uncontrolled server shot down Unbreakable power supply Redundant power supply Warm system swop 3 Virus attack System malfunction Virus protection

Operation system patching

Firewalls

Separated network

Isolate network area and operate manual until virus is removed 4 Network failure Data loss Updated documentation Redundant network Hot system swop

5 Room condition don't work

Low system performance Preventive maintenance Room surveillance Service agreement

Contact vendor and wait until the room temperature is normal 6 Break down Control system is damage Updated baseline

Spare part on stock

System surveillance Incident process in place

Exchange equipment and restore application 5 3 2 5 2 1 3 4 4 Net risk Gross risk Basic risk

Difference between Gross , basic and Net risk

1 6 6 1 2 5 3 4 6

(12)

Slide no 12

Activate and notify

Recover

Reconstruct

Gather information Priorities actives

Establish team Communicate

DAU 07 March 2013

Process definition

Contingency plans

Activate and notify

(13)

Assess

H10

H9

H8

H7

H6

H5

H2

H2

H1

H4

Plan Inform Reconstruct Operate Recover Qualify

Activate and Notify

Reconstruction

Contingency plan

Disaster recovery plan

Recovery

Evaluate Notify DAU 07 March 2013 Slide no 13 Verify

(14)

Gather information and establish a status overview of the disaster System manager

Notify the disaster team and initiate the first planning meeting System owner

Based on the disaster impact a prioritized activity plan is created Disaster team

Identify effected key stakeholders and inform about the disaster situation and the planed activities

System owner

Reestablish faulty network components, exchange damaged equipment, install/config software modules and recover data

System manager

Verify through a test plan system installation, operation and performance is correct

System manager

Reestablish system and all service at primary location System manager

Qualify through a test plan system installation, operation and performance is correct

System manager

Start the system operation and control that system operate satisfactorily and can be used as intended

System manager

When all the disaster activity is successfully executed the disaster process performance is evaluated and documented

System owner

Slide no 14

Contingency plan

Disaster recovery plan

DAU 07 March 2013 Access Notify Plan Recover Qualify Operation Evaluate Reconstruct Inform Verify

(15)

Slide no 15

Activate and notify

Recovery

Reconstruction

DAU 07 March 2013

Process description

Contingency plan

Recovery

Exchange damage equipment Establish

infrastructure Install/config platform

Install/config application

Install/config

database Verify installation

(16)

DAU Slide no 16 07 March 2013

Contingency Plan

Documentation in the recovery box

System documentation

Network tropology

Configuration item list

Installation manuals

License files

Software installation files

User documentation

User manuals

Exception guidance

Business continuity plan

Service documentation

Known error database

IT continuity plan

(17)

DAU Slide no 17 07 March 2013

Contingency plan

Restore strategies

Service agreement Hot system Cold system Warm system Resilience

(18)

DAU Slide no 18 07 March 2013

Contingency plan

Data recover strategies

Zero Backup Archive Data replication Backup

(19)

Slide no 19

Activate and notify

Recovery

Reconstruction

DAU 07 March 2013

Process description

Contingency plan

Reconstruction

Verify operation Verify performance Hypecare

(20)

DAU Slide no 20 07 March 2013

Contingency plan

Disaster scenarios

Cable Upgrade Virus

(21)

DAU Slide no 21 07 March 2013

Situation

A virus found on a central application server was not identified by the

virus scanner

Issue

The virus was polling the network to find possible other computers to

attack

Consequence

Performance on many process computers was low and this has

impact on the product deliveries

Action

Isolate process net

Close down process computers and remove virus manually

Install new windows path

Develop and install a new virus cure

Evaluation

Install data surveillance between administrative and process domain

Contingency plan

Virus attack

(22)

DAU Slide no 22 07 March 2013

Situation

After system upgrade the system performance was very slow

Issue

The system parameter with handle the amount a services was not updated

Consequence

Information exchange with process equipment was very slow with effect the

production output

Action

Close down some lines to keep the process area running

Manually material handling

By analyzing the program a system parameter fault was found

Evaluation

The system parameter was added as a critical item to the configuration item list

Contingency plan

Upgrade

(23)

DAU Slide no 23 07 March 2013

Situation

After construction work the fiber between the server room and

process net was broken

Issue

No information could be exchanged between the central server

and the process clients

Consequence

Order information was not downloaded and process performance

information was not uploaded

Action

Order parameter has to be typed in manually

Performance information has to be log manually

Information has to reviewed by another before use

Temporary cable repair was conducted

Evaluation

Establish redundant server room with separated fiber and switch

Contingency plan

Cable

(24)

Slide no 24

Activate and notify

Recovery

Reconstruction

DAU 07 March 2013

Process description

Contingency plan

Evaluate

Evaluate

(25)

Plan Review

Does the plan account for all current critical business processes

Is the contact details accurate

Verify the completeness of the recovery plan

Mature disaster team

Sufficient skilled and trained restore individuals

Updated system documentation and backup procedure

Simulation

Coordination between disaster team internally and externally

Quality of documentation, instructions and backup media

Key personnel are proper trained and skilled to manage a disaster recovery

Evaluate

What have done right ?

What could have been done differently ?

Did we perform any not value adding activity ?

What shall we improve ?

Slide no 25

Contingency plan

Evaluation

DAU 07 March 2013

(26)

Requirement

Operational backup/restore procedure

Qualified resources available

Updated system documentation

Clarify roles and responsibilities

Mature change management process

Do

A formal document with can support the disaster process recovery in effective and

operational way.

Don’t

“So ein ding must wir auch haben” which means that the document are only been to be

written on a computer and never going to be tested or evaluated.

Slide no 26

Contingency plan

D

o and don’t

DAU 07 March 2013

(27)

DAU Slide no 27 07 March 2013

Appendix

Definitions and Abbreviations

Reference

(28)

DAU Slide no 28 07 March 2013

Contingency Plan

Definition and Abbreviations

Abbreviation

Definition

Contingency plan System-specific plan developed recovering an IT system in case of Disaster Disaster An occurrence causing widespread destruction and disruption of the overall

business processes (e.g. fire at the global server centre) System recovery The process of bringing the system back to operational status

Business continuity The business area’s ability to operate its vital operations without the normal use of IT

Hot system A fully operational redundant equipped system

Warm system A partly equipped system with require some addition work to be fully operational

Cold system Backup equipment with may need to be installed, configured and tested before the system is fully operational

IT service agreement A agreement with specify the service provided to a customer by an IT Vendor

Resilience The ability to quickly adapt and recover from any known/unknown change

MTD Maximum Tolerable Downtime is amount of time a critical process can be

disrupted without cause server harm to the business

RPO Recovery Point Objective is the maximum tolerated time data can be lost

without huge impact on the business

RTO Recovery Time Objective is the overall length of time before a breakdown

(29)

DAU Slide no 29 07 March 2013

Contingency Plan

Reference

IT disaster recovery planning, Dummies

Contingency planning Guide, NIST

Backup and recovery, DELL

Your Backup is not an Archive, Symantec

Forøg virksomhedens informationssikkerhed, ITEK

References

Related documents

This project has been funded over the years by the Swanland Village Association, the Parish Council, the Swanland Festival, Swanland Nurseries and donations from

In the context of the famous Internet file size data of Crovella and some very recent data sets from a wireless mobility network, we examine the new class of LogPH

We have already done it in the case of the interior Dirichlet problem for the Stokes equations in [13], where we presented a suitable representation of the velocity part of

The data analysis is used to evaluate what is the moderating effects of green IS implementation and the verification against standards on the relationship between the

These growth forecasts are constructed by combining the demographic projections of United Nations for the working-age population (15-64) with a productivity trend based on a

We conclude the general plan cannot identify substantial problems that will emerge with its state highway system, further report that no known funding sources are

Our results suggest that when the assumed model doesn’t include certain types of terms (e.g. non- linear or interactions between the predictors) then wrong predictors will be

With eighteen 4-Gbps Fibre Channel ports, four Gigabit Ethernet IP Storage Services ports, a four port FCIP software license and an open modular expansion slot that can be used