Document Details
Title: 247Time Backup & Disaster Recovery Plan
Author:
TABLE OF CONTENTS
1 INTRODUCTION ... 3 1.1 OVERVIEW ... 3 1.2 DEFINED REQUIREMENT ... 3 2 DISASTER OVERVIEW... 3 2.1 DISASTER DEFINITION... 32.2 EXAMPLE OF DISASTER SCENARIO ... 3
2.3 SCOPE OF DISASTER RECOVERY PROCEDURES ... 4
2.4 ALTERNATIVE LOCATION ... 5
2.5 SERVICE DESCRIPTION ... 6
3 PROCESS OVERVIEW... 7
3.1 OWNERSHIP ... 7
3.2 ROLES AND RESPONSIBILITIES ... 7
3.3 INVOKING DISASTER RECOVERY ... 8
3.4 MEETING ARRANGEMENTS ... 8
3.5 CONVENING THE TEAM ... 8
4 RECOVERY PROCESS ... 9
4.1 REPLICATION/SYNCHRONISATION ... 9
4.2 CUTOVER TO THE ALTERNATIVE SYSTEM ... 9
5 PROCESS TESTING ... 11
247Time Disaster Recovery Plan
1 Introduction
1.1 Overview
The purpose of this document is to define the disaster recovery service to be provided by 247Time and the outsourcing teams of the server business and the customer. (The Team).
The process by which disaster recovery will be invoked is defined as are the criteria that must be met. This version of the document includes the means by which transfer of the system in Recovery to the alternative site we take place and how The Team will establish their connection to it.
This document is from a practical point of view and does not reflect contractual responsibilities. The definitions of responsibilities are in the agreements between The Team and its hosting partner. This document is to provide a clear description of who will actually take what action in the event of a disaster and serves no other purpose.
1.2 Defined Requirement
The definition of disaster recovery requirement in the contract between The Team and its hosting partner is:
“The Supplier shall provide appropriate disaster recovery plans to include alternative hosting location (different city to main hosting site) and guarantee the Supplier System will be up and available within 1 hour of the disaster recovery process being invoked.
The Team shall have the right to test the Supplier’s disaster recovery plans on an annual basis.”
The further definition of the requirement is as described below:
The Hosting Partner has an existing disaster recovery site hosting systems in place. This site meets the requirement of being in a different city to the main hosting location and shares no service connections with the main site.
Only those Servers protected by Double-Take Availability Software will be available at the (remote) DR location.
2 Disaster Overview
2.1 Disaster Definition
Disaster recovery to the alternative location should there be an event of an incident that materially damages the main location such that repairs cannot be guaranteed to allow recovery of the system within 4 hours will be immediately invoked.
The Team or its hosting partner may only invoke disaster recovery to the alternative location.
In the event of service failure due to material damage to the main location or hardware failure such that repairs cannot be achieved within 4 hours of the failure first being reported a decision shall be taken jointly between The Team and its hosting partner based on overall expected recovery time.
2.2 Example of Disaster Scenario
The most serious event would involve complete loss of the main site such that all equipment and data at that site were to be lost and no personnel from that site are available to recover the system. An event such as a large explosion adjacent to the building that caused a structural collapse and prevented access would fall into this category
Other events envisaged that would cause part of the main suite to become unusable but would mean that personnel were available to recover the system but that the system itself beyond use. An extremely serious flood would be an example.
An event where the situation is less clear would be a serious hardware fault within the Hosted Partners infrastructure such that repairs would take a significant amount of time. An example of this type of event would be multiple board failure within the core network that made a large portion of the network to be unavailable.
2.3 Scope of Disaster Recovery Procedures
In Scope:Alternative Location
The process to invoke disaster recovery to the alternative location.
Transfer to the alternative location where the restoration of services from the main location cannot be guaranteed within 1 hour of the disaster recovery process being invoked.
Access to the service at the alternative location from The Team.
Business continuity for a period of less than 4 hours. Out of Scope:
Recovery of any lost data or re-work to bring the recovered system up to date from the point of the latest available system backup. The (remote) DR location does not have provision to restore tape-based Backups.
Service failure due to software or data issues where the production environment may be repaired or reconstituted in situ.
1st line response to service failure using spare equipment and normal maintenance arrangements.
Routine hardware maintenance and repair procedures.
The Team Disaster Recovery Plan
Specific responsibilities in regard of this process are:
Responsibility Owner
Defined requirement to be met by this process Facility Services Manager, The Team Overall process design and maintenance to meet the
defined requirement.
Facility Services Manager, The Team
Technical design of alternative system Director of Technical Services, Hosting Partner Detailed process for bringing alternative hardware and
operating environment online
Data Centre Support Manager,Hosting Partner Process for connecting The Team Users to the
alternative location.
Facility Services Manager, The Team
2.4 Alternative Location
The building is constructed and configured to industry standards.
The building measures approximately 40,000 sq ft over ground and two upper floors. The specification includes:
4MW power supply from a 6.6Kv primary ring main
2000 KVA stand by generator
Fully air-conditioned
Fire detection and suppression including VESDA
State-of-the-art security systems
Multiple ducting to site boundary Items of plant monitored are as follows:
24*7*365 Facility Monitoring
Facility power supply characteristics including voltage, frequency, current and harmonics
Online individual rack power consumption,
Humidity and temperature monitoring /control
Leak detection
Critical infrastructure monitoring
Network monitoring. CCTV
CCTV monitoring is in operation and is visible at the facility. CCTV starts at the car park and extends right the way through the data centre and facility to individual rack level. CCTV records are kept for 30 days. Movement Sensors
Power Backup Systems
The local electrical utility company provides a high capacity HV ring from their primary substation. Being a ring the site is protected against cable damage from road works and the like. The facilities have several independent n+1 UPS Systems and multiple diesel generators within onsite energy centres.
Each hall benefits from an independent power supply, backed by UPS and generator. If the power goes down totally, the UPS systems take over, providing uninterrupted power whilst the generators kick in. Once activated, the generators have a weeks’ worth of fuel on-site
HVAC
Temperature is stabilised between 18˚C and 22˚C –within the recommended guidelines for IT equipment established by the well respected ASHRAE organisation.
Fire Detection & Suppression
Very Early Smoke Detection Apparatus (VESDA) equipment. VESDA systems constantly analyse air composition by passing particles in front of a laser, and triggering warning systems if particles of combustion are found.
FM200 and Argonite gas are used to suppress any potential fires before they take hold. Both gases are harmless to systems and all fires are extinguished within 10 seconds, minimising toxins and the presence of soot.
Relocation Process
The process by which the system is recovered to an alternative location is designed to be as simple as possible and to rely only on replication-based technology that synchronises a copy of specified Production system Virtual Machines to the alternative location. Alternative hardware will be maintained to the same specification of the Production environment.
2.5 Service Description
The alternative system shall:
Be based upon a real-time replicated / synchronised copy of the specified Production Environment.
Be created from the latest available synchronised copy of the specified Production Environment.
Data available will be from the latest available synchronised copy of the specified Production Environment.
Shall not have any particular steps taken to accelerate recovery in the event of a system failure that causes the delay of any critical process.
Use hardware pre-racked and commissioned at the alternative site dedicated for the creation of the alternative system and that has been tested for use in this manner.
Allow connection from The Team using the same connection methods as the Production Environment (Client-based VPN). The Team shall be responsible for establishing their connection to the alternative site. These responsibilities are for initial setup, any maintenance required and invocation in the event of a disaster.
The Team Disaster Recovery Plan
Be supported using the reasonable endeavours of both The Team and its Hosting Partner. Normal service level terms cannot apply as the circumstances of any event requiring the use of the alternative location cannot be predicted.
Shall be created by a process documented to the degree that it may be established by its Hosting Partners personnel who are unfamiliar with the particulars of the system in the event of the experienced staff being unavailable.
3 Process Overview
3.1 Ownership
Element of Service Owner
Invocation of Disaster Recovery Joint Ownership:
Data Centre Support Manager, Hosting Partner Payroll Manager, The Team
Relocation to the alternative site Data Centre Support, Hosting Partner Detailed process for bringing alternative hardware
and operating environment online
Data Centre Support Manager, Hosting Partner The Team connection to the alternative site Facility Services Manager, The Team
Business continuity for the first 24 hours Payroll Manager, The Team
3.2 Roles and Responsibilities
Individual Role / Activities Escalation Point
Payroll Manager, Server Provider
Joint responsibility for invocation of disaster recovery.
Responsible for communications with Safe.
The Team connection to the alternative site.
Business Continuity for the first 24 hours.
The Team
System Administrator, Safe Outsourcing
The Team connection to the alternative site.
Facility Services Manager, Server Provider
Hosting Partner Data Centre Support Manager
Joint responsibility for invocation of disaster recovery.
Communications with Safe Outsourcing
Director of Technical Services, Hosting Partner
Hosting Partner
Implementation Manager
3.3 Invoking Disaster Recovery
System Failure
Obvious
Disaster? No Continue repairs for 4 hours Yes
No
Yes The Team and Hosting Partner to decide
whether to invoke disaster recovery
Invoke Disaster Recovery
3.4 Meeting Arrangements
No physical meetings shall take place. The timescale of the recovery process is such that all meetings shall take place as conference calls. The only scheduled meeting shall be the a three way meeting between the nominated individuals at The Team and its Hosting Partner to decide whether or not a situation warrants invoking disaster recovery.
All further communications are two way between The Team and its Hosting Partner. These communications are scheduled in the process and will be telephone calls backed up by email.
3.5 Convening the team
The conference call to discuss the invocation of disaster shall take place between the Data Centre
The Team Disaster Recovery Plan
4 Recovery Process
4.1 Replication/Synchronisation
The Production system is subject to a real-time replication and synchronisation process utilising Double- Take Availability Virtual Host Edition Software. The Software runs on the Operating System hosting the The Team Services, and replicates all changes and updates to the alternate location in real-time. The Software is configured to use 1 GB for RAM caching, 10 GB of Disk caching and up to 8 Mb/s Network Bandwidth per Server for replication.
This provides the means to restore the alternative system to the latest available replication point.
4.2 Cutover to the alternative system
ID Task Owner
1 Disaster recovery invoked.
Either Disaster Recovery will have been initiated automatically or invoked during a conference call between the The Team and its Hosting Partner. This call may have been initiated by any party. The decision and the time it was made shall be confirmed during the call and be the subject of an email to all parties.
The Team to assume that the system will be restored no longer than 24 hours after disaster recovery has been invoked and follow their business continuity plan.
The Team and Hosting Partner
2 Hosting Partner is to follow the steps detailed in the document The Team Disaster Recovery Procedure which describes the exact method to cut over to the alternative system. The tasks below are taken directly from that document.
Hosting Partner
2.1 The Data Centre Service Desk will take initial receipt of the alerting mechanism and record the incident.
Hosting Partner Data Centre Service Desk
2.2 The Data Centre Service Desk will notify the Data Centre support manager should the incident fall outside the normal recovery procedure.
Hosting Partner Data Centre Service Desk
2.3 The Data Centre Support Manager will determine whether the pre-requisite disaster recovery invocation conditions have been met and the service to the hosted Safe computing cannot be materially recovered before invoking the disaster recovery procedure specific to Safe Computing.
Hosting Partner Data Centre Support Manager
2.4 Ensure that all the incident details have been correctly recorded and carry before carrying out the disaster recovery procedure actions.
Hosting Partner Data Centre Support Manager
2.5 The Data Centre Support Manager will contact the Implementation Manager who will confirm/deny that any further action is possible before the Disaster recovery procedure is invoked for The Team.
Hosting Partner Data Centre Support Manager / Hosting
ID Task Owner
2.6 Arrange for the necessary support personnel to be available. Hosting Partner Data Centre Support Manager 2.7 All Hosting Partner management and monitoring systems will
be updated in accordance with the recovery operation.
Hosting Partner Data Centre Service Desk
2.8 The Data Centre Support Manager will notify the nominated The Team personnel on estimated timescales for the recovery operation. This task is dependent upon input from the Implementation manager/team.
Hosting Partner Data Centre Support Manager
2.9 Hosting Partner Implementation Manager to initiate the alternate location recovery procedure for The Team.
Hosting Partner Implementation Manager 2.10 Failover XXX-XXX-XXX-01 onto DR Platform
Connect to DR location Host Server
Launch Double-Take Console
Failover XXX-XXX-XXX-01
Launch Console
Start XXX-XXX-XXX-01
Login to XXX-XXX-XXX-01
Assign XXX-XXX-XXX-01 DR Network Address
Connect XXX-XXX-XXX-01 to DR Network Virtual Switch
Restart XXX-XXX-XXX-01
Verify Network connectivity
Hosting Partner Implementation Team
2.11 Failover XXX-XXX-XXX-02 onto DR Platform
Connect to DR location Host Server
Launch Double-Take Console
Failover XXX-XXX-XXX-02
Launch Console
Start XXX-XXX-XXX-02
Login to XXX-XXX-XXX-02
Assign XXX-XXX-XXX-02 DR Network Address
Connect XXX-XXX-XXX-02 to DR Network Virtual Switch
Restart XXX-XXX-XXX-02
Verify Network connectivity
Hosting Partner Implementation Team
2.12 Failover XXX-XXX-XXX-03 onto DR Platform
Connect to DR location Host Server
Launch Double-Take Console
Failover XXX-XXX-XXX-03
Launch Console
Start XXX-XXX-XXX-03
Login to XXX-XXX-XXX-03
Assign XXX-XXX-XXX-03 DR Network Address
Connect XXX-XXX-XXX-03 to DR Network Virtual Switch
Restart XXX-XXX-XXX-03
Verify Network connectivity
Hosting Partner Implementation Team
2.13 Failover XXX-XXX-XXX-04 onto DR Platform
Connect to DR location Host Server
Launch Double-Take Console
Failover XXX-XXX-XXX-04
Launch Console
Start XXX-XXX-XXX-04
The Team Disaster Recovery Plan
ID Task Owner
Login to XXX-XXX-XXX-04
Assign XXX-XXX-XXX-04 DR Network Address
Connect XXX-XXX-XXX-04 to DR Network Virtual Switch
Restart XXX-XXX-XXX-04
Verify Network connectivity
2.14 Hosting Partner Implementation Manager will notify the Hosting Partner Data Centre Support manager that Disaster Recovery process has been completed. Hosting Partner implementation manager will also update the Hosting Partner account manager that the Disaster Recovery procedure has been completed so that commercial requirements can be met.
Hosting Partner Implementation Manager
2.15 Hosting Partner Data Centre Support Manager will ensure that all Hosting Partner management and monitoring systems are updated.
Hosting Partner Data Centre Support Manager 2.16 Hosting Partner Data Centre Support Manager will notify
The Team that the Disaster Recovery process has been completed.
Hosting Partner Data Centre Support Manager 3 Following task 2.16 Hosting Partner shall inform Safe
Outsourcing of the date and time of the replication point used to create the alternative system.
Hosting Partner 4 The Team shall then follow the business continuity plans to
prepare re-work or recover additional data from dated input documents.
The Team 5 The Team connection to the alternative site is pre-
configured so the switchover mechanism is simply to start using the DR shortcuts issued to Users.
The Team
6 Confirm System available and accepted. The Team
7 Carry out re-work as necessary and restore/reproduce non-
database data using dated input documents. The Team
8 Convene a meeting to plan restore of system to a main system
at the original or an alternative permanent location. Hosting Partner and The Team
5 Process Testing
The process of disaster recovery to the alternative site shall be tested once every twelve months. Staff of all parties shall be aware that a test is planned in order that the performance of the test may be monitored and the effect on other customers of The Team and Hosting Partner is minimised. Initiation of a test shall be the responsibility of The Team and planned with the Facility Services Manager and Data Centre Support Manager of Hosting Partner. Following the test a joint report shall be produced with listing all activities completed and stating those which were successful and those that were not. The process shall then be amended to deal with the issues found and a confirmation report produced.
Additional costs dependent on the level of resource and duration will apply for invocation of a DR Trial or DR invocation. Resource will be required for the invocation of DR and for reconfiguration of the System after restoration of the Production Environment.