• No results found

How To Build A Disaster Recovery Testing Program

N/A
N/A
Protected

Academic year: 2021

Share "How To Build A Disaster Recovery Testing Program"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

Presented by

Steve Carroll

Building a Disaster Recovery

Testing Program

Email: [email protected] Phone: 717-256-1865

(2)

About Our Speaker

Steve Carroll is a Senior Consultant with Abound Resources. With more than 25 years’ experience as a community financial

institution executive, Steve has worked in a variety of capacities in financial institutions, including consultant and CEO.

Since 1996, Steve has worked as a lead consultant on more than 100 financial institution consulting engagements across the country. His areas of expertise include business continuity planning, risk management, strategic business planning, and strategic

technology planning.

Steve has developed software applications to assist financial institutions in improving their risk management positions, including Abound Resources’ bPLAN Web-based Business Continuity Planning system .

Steve has completed Institute of Financial Education courses at the University of Texas at Austin, the University of Georgia, and the University of Connecticut.

(3)

Who We Are

• Management consulting firm for the

Community Financial Institution (CFI) industry

• We empower CFIs to achieve their goals.

“Goals achieved. Guaranteed.”™

• Based in Austin, TX; clients in 40+ states

• Founded in 1997 by industry execs and Big 5

consultants

• 500+ software evaluations • Vendor Neutral

• Advisors average 25+ years in CFI

management; lending, cash management, compliance, operations and IT

(4)

What We Do

Sales & Marketing

(5)

Presentation Highlights

• Regulatory Issues & Terminology • Building a Testing Program

• Conducting Tests – Examples

• High Availability Environments

• A Simple Pandemic Exercise

(6)

Regulatory Background

• FFIEC Guidance – March, 2008

– The Board must approve the Testing Program

& review test results

IT is responsible for DR Testing

– The Crisis Management Team should be

involved in the testing process

– Those responsible for Facilities should be

involved in the process

– Test results must be subjected to an

(7)

Common Regulatory and Audit

findings

• There is no Comprehensive Testing Program in

place

• Testing activities show an over-reliance on a

single testing methodology (Table-top)

• Test activities do not involve departments/users

in a meaningful way

• Test documentation is inadequate

• No “step-by-step” restoration procedures • No “order of restoration” defined

(8)

Terminology

• Testing Program – A schedule of test events spanning

a complete testing cycle.

– What? When? Who? How? Where?

• DR Test – An event that demonstrates that a given resource can be restored to a production state within a target time frame using a documented restoration method.

• BCP Test – An event that demonstrates that a Business Function can be completed using a

(9)

Budgeting Your BCP Effort

36%

5% 18%

9%

32% Business ImpactAnalysis

Risk Assessment Documentation

Emergency Response Testing

(10)

Testing Methodologies

Methodology

Type of Test/Administrator

Tabletop Exercise BCP/BCP Coordinator

Due Diligence BCP/BCP Coordinator

Vender Service Levels BCP/BCP Coordinator Independent Review BCP/BCP Coordinator

Incident Tracking BCP/BCP Coordinator Compatibility Testing Disaster Recovery/IT

Simulation Disaster Recovery/IT

(11)

Getting started

• Testing Team – Technical representation – Operations – Department Staff • Inventory of Resources

– Software Applications (Core and Network) – Critical Services (Data Communications,

Internet)

– Outsourced Applications & Services • List of Business Functions

– By Department

(12)

Build a Database

Resource Name* Critical Level Test? RTO (hours) RPO (hours) MAD (hours) Control Group

Core System 1 Yes 8 2 72 Core Data

Communications

1 Yes 1 -- 24 Network Loan Prospector 3 Yes 48 72 96 Loans Fedline 1 Yes 4 24 48 Fed Network Files 3 Yes 24 72 72 Network Branch Capture 1 Yes 4 8 48 Item Proc Internet Access 1 Yes 1 -- 24 Internet Acrobat Reader 5 No 96 -- 120 --

EMail 1 Yes 12 24 48 Email Internet Banking 1 Yes 4 8 24 Core Laser Pro 3 Yes 48 72 96 Loans

(13)

Assign Criticality Levels

• Criticality is assigned to both Resources and Business

Functions

– Sometimes called “Mission Critical” or “Business

Critical”

• Better to use multiple criticality “Levels” for flexibility – Three or five levels, matched to a time frame

• Example: Level 1 = 1 to 24 hours, Level 2 = 24

to 48 hours, etc.

• Test Flag

– Will we test this? (yes or no)

(14)

Assign Target Timeframes

• Recovery Time Objective (RTO)

– Target time frame for resource restoration • RPO

– The maximum capacity for data loss of a given

information system, measured in time.

– Can be assigned to any application, but should be

applied at a minimum to Transaction Interfaces.

– RPO’s should be supplemented with a description of

how lost data could be reconstructed.

• Maximum Allowable Downtime (MAD)

– Estimated maximum downtime for a given

(15)

Assign Control Groups

• Control Groups

– Create a Control Group for resources that should

be tested together.

• Examples:

– Core System

– Loan Systems

– Internet (Web sites)

• Examples

– By Server

– By Criticality Level – By Application type

(16)

Create Test Events

• Build a Control Document for each Test Event: – Statement of Objective – be clear and concise

• Example “Show that the [software application]

can be restored onto new hardware from backup media. Users will log in and verify that the

system has been returned to a production state”

– Description of Test Environment • How will hardware be replaced? • Preinstalled software

• External connections needed – Most likely test methodology – Test Date

– Who is responsible? – Who will be present?

– What documentation (evidence) will be retained? • Write a Test Script

(17)

Example

Test Script

Step # Start Time Activity Expected Results Actual Results

1 Set up server hardware, workstation & test LAN

2 Install O/S & backup/restore utility 3 Install application from original media

(d/l from vendor Web site)

4 Locate backup image for application data & restore to server

5 Install client onto workstation

6 Have user login and verify that work can resume

7 User runs samples of typical transactions

8 Print screens and reports – retain for documentation

(18)

Build a Testing Timeline

• Test Cycle – 12 months

• Assign a target test date to each Test

Event/Control Group

• Strategically space test events across the entire

Test Cycle

– “Easy” tests can happen more quickly – Allow more time for complex tests

– Consider likely unplanned outages (Incident

(19)

Resource Restoration Methods

• Applications & Data

– Restore from backup

– Reinstall from original media

– Installed in multiple locations (redundant) – “High Availability” System – failover

• Hardware

– Backup Equipment

– Replace from available market sources (add time

to RTO!)

• Services

(20)

Test Day

• Print the appropriate Test Control Documents and

Scripts, or open the documents on a laptop

• Line up the Test Participants – Test Administrator

– Technical support

– Observers/documenter

– Department users when appropriate

• Execute test script – note start time and results for

each step. Complete the document as you go.

• Testing – 80% preparation & documentation, 20%

(21)

Reporting

• Electronic files are better that physical

– Create a folder structure on your network – folders

and test events with the same name

– Scan completed Test Scripts and Control

Documents; attach to Test Event

• Keep a schedule of all Test Events, past and future – Be able to sort by Date and Status (Pending,

Complete)

• When you’re ready to distribute -- copy or Zip the

folder structure for emailing or copy to media

(22)

High Availability Environments

• Virtual servers

– Pro - can immediately cut RTO’s in half – Con • Testing challenges • Bandwidth • Licensing • Staffing • Core Synchronization

– Synch how often?

– Is more always better?

(23)

A Simple Pandemic Exercise

• Preparation – BCP Coordinator

– Create an Excel Spreadsheet with 2 columns

• Column A = Employee Name

• Column B = Department

– Use a random selection formula to select 40% of

the employee records

• Make sure you can reference which

departments become impacted (use a “count record” formula or a pivot table)

(24)

A Simple Pandemic Exercise

• Pull your team together for the exercise meeting • Use the Pandemic Simulator (Excel spreadsheet) to

determine which employees are absent with the flu

• Determine which Department has the highest level

of absenteeism are most affected

• Review the Business Functions for that Department

and develop a strategy for dealing with the incoming work

(25)

Top Five Testing Mistakes

5. Procrastination/Cramming 4. Hiding Failed Tests

3. Reliance on a single methodology 2. Failure to leverage “real life”

(26)
(27)

Steve Carroll

Abound Resources, Inc. Senior Consultant

Cell: 717-256-1865

E.Mail: [email protected] Twitter: @bankbcp

References

Related documents

Select a figure from the options which will continue the same series as given in the Problem

Faith- Based Investor Mid Cap Stocks Large Cap Stocks Real Estate Small Cap Stocks International Stocks Emerging Market Stocks Bonds Alternative Investments. At

The number of data points, N, the bias (model – observations), and the RMSE are shown separately for observations over water (i.e., ice-free ocean) and over the MIZ. Points are

This 4-step DR planning framework – Business Impact Assessment, Risk Assessment, Risk Management, and Recovery Testing – were developed by Zetta’s Director of Operations,

Heuristic optimisation is used when an optimal solution is not required or exact methods are computationally intractable. Heuristics find good solutions that are not guaranteed to

Using the results from a large number of simulations we compared the evolutionary his- tories of the resulting planetary systems with the following three properties of the

The first objective of the present study was to determine whether the overall structure of roles criminals see themselves as acting out when committing their crimes can

1) Determinar os macronutrientes e compostos bioativos do jambo-vermelho e avaliar a sua capacidade antioxidante. 2) Verificar os efeitos de dietas normo e hiperlipídicas