1 Page
State Data Centre Disaster Recovery
Handbook
2 Page
Table of Contents
Chapter 1: Introduction ... 3
1.1
Overview
3
1.2
Disaster Recovery Exclusion List
4
1.3
Important References
4
Chapter 2: Business Impact Analysis & Risk Assessment ... 5
2.1
Objective
5
2.2
BIA Summary of Applications for DR
6
2.3
Risk Assessment
6
Chapter 3: Disaster Recovery Planning (DRP) Task Force & Event Handling
Procedures ... 8
3.1
Team Composition
8
3.2
Disaster Recovery Planning Coordinator
8
3.3
Crisis Management Team
8
3.4
Damage Assessment Team (DAT)
13
3.5
Operations Recovery Team (ORT)
13
3.6
Help Desk
14
3.7
Disaster Recovery Process Flowchart <team coordinator>
15
3.8
Criteria for Disaster Declaration and the Recovery Strategies
16
3.9
Procedures for Emergency Response, Recovery & Restoration
16
3.10
Time bound Disaster Recovery Directives
17
3.11
Application level Classification for Disaster Recovery
17
Chapter 4: Disaster Recovery Plan Testing & Updation ... 18
4.1
Plan Maintenance
18
4.2
Test Plan
18
4.3
Drill Plan
18
Annexure – I : Damage Assessment Report ... 19
Annexure – II: Team Details ... 20
Annexure - III: DR Site Activation Checklist ... 22
3 Page
Chapter 1: Introduction
As per the National eGovernance Plan (NeGP), State Data Centres (SDC) have been coming up in all States / UTs of India to support various State Departments in running their applications catering to G2G (Government to Government), G2C (Government to Citizen), and G2B (Government to Business) services. With the maturity of SDCs and criticality realization of States’ applications and data, it has become imperative to provision a mechanism to secure the critical data of States. The Disaster Recovery (DR) Handbook, is a step in this regard. DR Handbook is a template DR Plan that the State is expected to populate with relevant entries, including extracts from the DR Strategy Document, as desired.
It has been identified that the 4 NDCs of NIC located at Delhi, Pune, Hyderabad, and Bhubaneswar shall act as DR sites for the SDCs.
This Handbook has been formulated keeping in mind the overall common State Data Centre Applications, Operations and Infrastructure. Individual Application level disasters to be handled by the User Dept. The SDC will support the User Departments in this activity.
1.1
Overview
Every SDC, to protect itself from potential disruptions due to be caused by the occurrence of a disaster, needs to be in a certain DR readiness State so as to be in the best position to perform critical operational recovery activities in the a time efficient manner in the event of an actual disaster. An effective SDC DR Plan, thus, requires thorough coverage of all aspects in the event of a Disaster for the SDC, in order to ensure a smooth recovery of operations from the DR Site. Following are the aspects that are essential to be covered in this regard:
What - Actions to be taken How - Procedures to be followed
Who - Stakeholders expected to perform specific tasks When - Timelines to be adhered to,
before, during, and after a disaster has been declared.
A number of activities need to be performed to reach a state of DR readiness. These activities have been mentioned briefly as follows:
i. Selection of Critical Applications to be considered for Disaster Recovery
After due analysis to be conducted by the Composite Team along with the individual User Departments, critical applications shall be taken for Disaster Recovery. A Business Impact Analysis (BIA) needs to be conducted for the selection of critical applications. The same has been discussed in detail in Chapter 2.
ii. Identification of Risks
Various threats, however regular or event triggered, may pose a risk to the SDC Infrastructure to make it more vulnerable to external / internal disruptions. A Risk Assessment needs to be carried out at the SDC to identify which threat has the potential to cause more damage to the SDC and accordingly adopt Prevention Strategies to mitigate them.
iii. Identification of DR Procedures, mapped with appropriate Timelines
iv. Forming Action Teams for the Disaster Recovery Lifecycle and identifying their Roles and Responsibilities
4 Page
The succeeding chapter describes in detail all the above mentioned points.
1.2
Disaster Recovery Exclusion List
While the complete DR Plan discusses in detail the various Disaster causing threats, there are certain scenarios, that may cause disruptions to the SDC operations, but are not valid scenarios for the declaration of a Disaster, as follows:
Known Data Center equipment malfunctioning, where procedures and guidelines are already known to SDC on the recovery of the same.
Network spikes caused owing to momentary high traffic flows and not due to any SDC common networking equipment / Software issues.
Resignation / Unplanned extended Leave of any Data Center employee, however critical he / she may be to the daily Data Center operations.
Virus / Spamming attacks on a single Server causing an isolated application outage. Any non-critical application shutting down, irrespective of the down time duration
Planned Individual critical DR application shut down for a period less than the Defined DR RTO.
Natural Calamity in the neighboring areas not bound to affect Data Center premises / operations.
1.3
Important References
While all activities required for formulating an efficient SDC DR Plan are required to be done independently, there are specific parameters of some activities that overlap with other activities expected to be undertaken at the SDC as part of specific functions at the SDC. Following is a list of reference documents that may assist the State in specific DR Strategy planning activities:
Document Name
Original
Purpose
of
Document
Reference for DR Strategy
SDC DR Strategy Document Disaster Recovery StrategyPlanning
Overall SDC DR Planning SDC Information Security
Information System (ISMS) Risk Assessment Report
ISMS set up at the SDC for ISO 27001 Certification
Risk Assessment and Threat identification for DR Planning
5 Page
Chapter 2: Business Impact Analysis & Risk Assessment
2.1 Objective
The objective of BIA is to understand the impact that could be caused to the organization if the business processes under consideration are disrupted and the concerned departments are unable to continue with their core processes. It has been carried out to develop an understanding of processes, resources required to carry out the processes, and recovery time frames for the same. The analysis includes gathered information regarding User Department processes and prioritizing them based on the following impacts:
Financial Services
Target Citizen base Legal and Regulatory
Note: The State may append its own parameters with the ones mentioned above, as per its own individual priorities.
A Draft Template for BIA is placed in the Annexure.
Composite Team shall undertake the task of conducting BIA of User Department Applications along with the User Departments
The approach adopted to achieve the objective of the BIA exercise shall be as follows:
Identify business critical applications through interactions with representatives of the User Departments:
Understand Applications and their relevance as per the above mentioned impacts Understand operational and functional interdependence between applications
Conduct Business Impact Analysis sessions through personal interactions with a BIA Questionnaire
Identify time-sensitive criticality ratings for critical applications
Compile Recovery Objectives after thorough analysis of all of the above Prioritize recovery schedules of critical applications
A detailed BIA result of all User Department applications shall be documented in the SDC BIA Report. As and when new Departments come on board, the BIA Report shall reflect the amended Applications analysis data.
The BIA Report shall include the following Application specific data:
1. Recovery Point Objective (RPO): The point beyond which data loss is not permissible. It will act as the basis for the development of appropriate backup strategies.
2. Recovery Time Objective (RTO): The time within which the Systems/ Applications/ Functions must be recovered after an outage. It will act as a basis for the development of suitable recovery strategies.
3. Criticality Ranking of Applications
The objectives above are set keeping in mind the threats and impact on the operations, coupled with the minimum recovery time required for the restoration of services. RTO planned shall take
6 Page
into consideration all the threats, including natural calamities. However, acceptable RTO and RPO for individual applications have been distinctly defined as per the Business Impact Analysis (BIA) findings.
2.2
BIA Summary of Applications for DR
Following is the Table describing individual Recovery objectives for Applications and Data. The same shall be a compiled summary of the detailed BIA Report:-
S No Applications RTO RPO
1 2 3 4 4 6 7 8 9 10 11 12 13 14
2.3
Risk Assessment
The objective of a Risk Assessment is to set priorities for the inherent threats to SDC and highlight exposures in the SDC environment. For the purpose of assessment of the potential risks, a relevant mix of IT related and generic threats are chosen, which may compromise the resources available at SDC.
The approach towards risk assessment shall be taken to ascertain certain parameter ratings for each threat. These parameters are:–
Vulnerability - Indicating exposure of SDC to threats. This is a function of the specific weaknesses existing in spite of the mitigation, which exposes SDC to the respective threats.
Probability - Indicating the probability of a threat occurring. This is a function of the inherent vulnerabilities in the environment and the existing mitigation for the threats.
Impact - Indicating impact of a threat on SDC. This is a function of the technology enablers or facilities resources that may be affected due to occurrence of the threats.
The assessment shall be conducted by the Composite Team, and shall be based on:- i. Discussions with SIA, DCO, and Application owners
ii. Physical visits and observations at the Data Center site iii. Past history of disasters
iv. Known relevant intelligence available in reliable public domain like Government websites.
7 Page
The Risk Assessment sheet prepared for the Information Security Management System implementation for SDC at the State may be referred for the above activity.
Following is an indicative classification of the threats identified:-
Physical and Environmental Threats
- Fire - Earthquake - Cyclones - Power Outage
- Physical Location Insecurities - Physical Security
IT Services Threats
- Weak Data Back-up
- Inefficient Storage Management -
- Weak Server Management
- Vulnerable Operation Systems / Software
8 Page
Chapter 3: Disaster Recovery Planning (DRP) Task Force & Event
Handling Procedures
In order to facilitate the efficient recovery and restoration of critical business functions, key SDC staff members have been assigned to different teams. Any DRP event would be handled by four teams:
Crisis Management Team (CMT), Damage Assessment Team (DAT), Operations Recovery Team (ORT), and Help Desk
This section covers the composition, and indicative roles & responsibilities and the actionable steps to be followed by each of these teams. The functions of the above teams would vary with the extent and impact of the different disasters that could hinder SDC operations.
The above mentioned teams shall be lead by the Disaster Recovery Planning Coordinator who shall be the responsible authority for timely recovery in the event of a disaster.
3.1
Team Composition
Each team must have a designated Team Leader (Team coordinator) to drive the planning process as well as the team’s response in the event of disaster. The first person listed on the team list is the Team Coordinator. Each coordinator is responsible for ensuring that the tasks and procedures detailed in the plan accurately reflect actions that will be taken during an actual disaster. Team listings must contain the names, phone number(s), and addresses of all team members. Because of the uncertainty of staff availability, team leaders are equipped to assign individual roles to team members at the time of Plan activation.
3.2
Disaster Recovery Planning Coordinator
The key to success in developing and maintaining an effective and efficient Disaster Recovery capability is the leadership provided by the Disaster Recovery Planning Coordinator, who works closely with SDC stakeholders in ensuring absolute readiness in the wake of a disaster.
Following are the key responsibilities of the DRP coordinator:-
Provide overall guidance during the emergency response and recovery efforts Review damage assessment reports
Initiate recall procedures
Keep senior management and the concerned Department officials advised of recovery status, and
Provide overall coordination support and assistance
Note: In the absence of a DRP Coordinator, the Crisis Management Team Coordinator shall act as the DRP Coordinator.
3.3
Crisis Management Team
The Crisis Management Team (CMT) comprises of senior staff, which commands the resources needed to recover SDC’s operations in the event of a Disaster. The CMT members shall be listed in Annexure II. Members of CMT as well as other teams have been annexed so that various DR
9 Page
Stakeholders may keep isolated teams’ information without having to keep the entire SDC DR Strategy document.
The Crisis Management Team, headed by the CMT Coordinator can operate from any location, provided they are available for communication.
Note: The SDC Project Co-coordinator from the State Implementing Agency shall be the Crisis Management Team Coordinator in the absence of the nominated CMT Coordinator.
3.3.1 Roles & Responsibilities of Crisis Management Team
Coordinator
The CMT Coordinator shall have the overall responsibility for all response and recovery actions taken. However, he may delegate the team management and co-ordination responsibilities to other members of the Team. This will entail the complete delegation of decision-making power and authority for taking quick decisions as and when necessary.
The CMT coordinator has a number of other responsibilities such as liaisoning with other departments in the SDC Group, for any recovery support. He/She would also be responsible for coordinating with all critical vendors for relevant support during the resumption of services.
Responsibilities include:
Disaster Declaration
Overall responsibility for response & recovery actions Assisting in decision making, and data processing on impacts Authorizing crucial action steps
Making arrangements for immediate relief to next of kin of any deceased staff Briefing staff of overall situation & giving overall guidance
Vetting sensitive communications
Assisting in crucial negotiations (financial & legal) Keeping the SIA informed of the status of the situation
10 Page
Crisis Management Team Members
The CMT’s responsibility is to manage and co-ordinate the response to, and recovery from, a crisis. This role will continue throughout the restoration until the situation returns to normal. That is, until SDC can cope with the situation without additional senior management supervision.
The CMT carries out project management and decision-making, overseeing a senior State Data Centre management team that has the experience and expertise to provide necessary support in driving the recovery.
Crisis Management Team does not perform any recovery tasks, focusing rather on the co-ordination and management roles
It involves gathering relevant information and options from the various Operations Recovery and Damage Assessment teams to enable accurate decision-making, and to delegate and follow up tasks to ensure ground level implementations
The individual recovery teams need to focus on their specific roles and responsibilities. However, it is important for them to understand the overall recovery strategy and appreciate the functions of other teams. The Crisis Management Team is responsible for communicating this information on a regular basis in order to prevent information isolation.
The role requires absolute control over all aspects of recovery. The only way to achieve this is for all decisions to funnel through the Crisis Management Team. This will help reduce problems caused by individuals taking initiatives that upset the overall recovery progress. In short, the recovery teams must do what they are instructed to do, and all decisions must be referred to the Crisis Management Team.
Responsibilities include
Formalizing operational requirements Damage Assessment
Coordinating and managing recovery of Facility, Operations and IT infrastructure Coordinating recovery of critical processes in different departments
Deciding on teams to be invoked as deemed necessary for DR Liasioning with vendors for emergency / recovery support Monitoring Staff Welfare
Proposing legal action, if required
3.3.2 Crisis Management Team Recovery Actions
Based on the initial information about the disaster, CMT would identify whether enablers / facilities have been affected or whether there is a risk of damage to premises or danger to employees. They would authorize the relevant parts of the CMT recovery actions as per the type and intensity of the disaster.
The recovery actions of the Crisis Management Team can be classified into the following different categories:
I. Emergency Actions II. Situation Assessment III. Plan Activation IV. Status Monitoring V. Recovery Support I. Emergency Actions
11 Page
1. Notify critical emergency contacts (internal and external)
2. Decide on location where the CMT will operate from, including exploring realistic possibilities of prolonged video conferencing support
3. Inform all CMT members of the selected location and time of the initial CMT meeting 4. Contact the DAT Coordinator to verify:
Evacuation of employees undertaken, if deemed necessary Emergency security at the primary site
Resumption of entry to premises
5. Receive Initial Assessments from the Damage Assessment Team Coordinator including list of missing persons / casualties, if any.
6. Decide Operations Recovery Team Coordinators to be mobilized and provide immediate instructions to the same
7. Establish the readiness of the DR Site through coordination with the NIC members at the mapped NDC, before the initial CMT meeting.
II. Situation Assessment 1. Hold the initial CMT meeting
2. Complete an interim impact assessment. Consider the following:- Loss of life/ casualties, if any
Extent of damage to premises Loss of IT Hardware
Loss of applications
Loss of communication links Loss of critical data
Loss of other assets
3. Refine the Recovery Strategy according to the situation. Decide which DR Strategy to be invoked and brought into action. The SDC DR Strategy Document may be referred to in this regard.
4. Contact the personnel at the designated DR Site to:-
Verify the level of resources and materials required. Facilitate the same. Plan occupation of alternate sites by Operations Recovery Teams Verify retrieval of emergency resources from off-site storage Verify timeframes for the availability of critical servers
Verify voice line redirection, message content and call routing / handling to DR site 5. Establish timelines for the facilities and equipment available at the DR site to be operational,
keeping individual RPOs and RTOs in concurrence.
6. Contact Damage Assessment Team (DAT) and establish whether access has been allowed to the damaged premises, and if so:
What has been salvaged and its condition What has been irretrievably lost or destroyed What is intact, but inaccessible
Infrastructure damage and access availability
Expected rebuild timeframes (including possibilities of alternate Data Center site development)
7. Hold the Operations Recovery Team Coordinators’ briefing, which will include: Internal press release, résumé of events and status
Damage and impact assessment Salvage status
Recovery strategy and critical milestones Roles and responsibilities
12 Page
Operation recovery targets
Staff transport arrangements to the alternate site(s) Timeframes for critical resource recovery (Systems etc) Funding and emergency purchase limits
Team reporting and problem escalation guidelines Voice and fax communications availability and usage Progress reporting
8. The message to Team Coordinators must:
Provide the minimum data to initiate the response and explain the current situation Verify the Team’s individual Emergency Response Tasks
Identify any business-critical activity demanding priority
Confirm CMT and Team Coordinator’s immediate contact details Give notice of the CMT and Team Coordinator’s briefing time and place
III. Plan Activation
1. Determine if assistance from third parties is required.
2. Confirm with the Operations Recovery Team coordinator at DR Site on the following:
The recovery status of critical applications affected by the disaster event and being recovered at the DR sites.
Obtain status of the following:
Redirection of data communications Retrieval of back-up media
Access to critical servers Establishing of IT Help Desk
Intimation of IT Emergency procedures
3. Intimate User Departments for critical issues needing their involvement for data gathering & analysis
IV. Status Monitoring
1. Contact Operations Recovery Team coordinator at DR Site for:- Progress against Critical Timeframes
Assessment of availability and performance of systems and IT equipment Identification of current and anticipated resource needs.
Assessment of current and anticipated problem areas in terms of technology and resource availability.
Establishment of overall recovery progress. Review and adjustments in the Recovery Strategy.
2. Determine the extent of backlogs and their impact on recovery timeframes.
3. Continue contact with the Damage Assessment Team to review status of damage at the affected premises.
4. If the affected premises cannot be recovered or it will not become habitable within an acceptable timeline, make arrangements for a long-term recovery operation. Otherwise, initiate request to State for reconstruction and refit of affected premises.
5. Liaise with the Operations Recovery Team coordinator to begin to develop a long-term recovery plan. Convene a Crisis Management Team meeting to confirm and communicate updates to the recovery objectives and strategies.
6. Consolidate the detailed damage assessment and salvage report from the DAT. 7. Assess recovery expenditure outlay to date.
13 Page
8. Finalize Recovery action.
9. Update the DRP document for lessons learnt with respect to the DR process, if any. 10. Update internal operating and emergency procedures.
V. Recovery Support:
1. Maintain contact with other Teams’ coordinators:
Respond promptly to requests for information
Inform of notable occurrences, which may affect priorities 2. Perform the following activities at each milestone:
Receive Recovery Team reports of recovery progress against target time scales Review and update operational requirements
Update the timeframe schedule
Prepare updates for all Team coordinators
Assess well-being of staff and identify need for Administration support Determine the need for third party assistance, and communicate the same Provide approved statements for use by the Recovery Team
3. Control all expenditure decisions and maintain regular contact with Finance Department
3.4 Damage Assessment Team (DAT)
Perhaps the most important issue to be resolved immediately following a disaster is the status of affected SDC's resources, such as:-
Information Technology
Telecommunications equipments People
The primary responsibility of the Damage Assessment Team is to assess the damage caused by the disaster and obtain key information concerning the level of serviceability of the facility and its resources.
A Damage Assessment Team coordinator heads the Team, which includes members who are knowledgeable in the following areas:
IT security, IT Infrastructure, and primary vendors Physical security for the damaged site
Strong abilities to recover and salvage computer equipments and data/voice communications networks
Knowhow of all relevant vendors and suppliers to determine equipment recovery requirements.
Clarity of vision to drive the coordination channel between the various teams.
The Damage Assessment Team makes an initial estimate in the mean time necessary to repair and/or replace infrastructure necessary for the resumption of operations. A Damage Assessment Report is prepared and reported to the Crisis Management Team. The damage assessment report shall follow the template as given in Annexure I.
14 Page
The Operations Recovery Team would comprise of the head of the primary Data Center facility or a senior official nominated by him/her leading the team as a the Team coordinator, assisted by key personnel from the designated DR site. The objective of this team is to ensure that the IT Infrastructure is properly handled during the recovery process and the required resources are available on time. The focus of this Team is to recover the IT enablers supporting SDC’s critical business processes, to be up and running in concurrence with the identified Recovery objectives (RTO and RPO).
The Team must be aware of the Disaster Recovery Planning document. The other Teams must be able to assist the Operations Recovery Team, lead by the coordinator with the implementation of the plan. A detailed recovery checklist is given in Annexure-III.
Detailed list of Operations Recovery Team composition and their contact numbers are given in Annexure-II.
3.6
Help Desk
The Information Help Desk would play a crucial role in providing information proactively to various SDC Stakeholders. The information help desk should be manned by employees with good communication skills. The Help Desk shall assist the DR teams in communication setups like TeleConference, Video Conferencing, etc. Travel arrangements shall also be done by the Help Desk members.
The Help Desk team details are available in Annexure.
All employees should be advised to call the INFORMATION HELP DESK rather than operational staff or various Disaster Teams’ members for information. This will enable the operational staff members to focus on recovery procedures, rather than just providing information.
The State shall consider the Roles & Responsibilities of all above mentioned teams and compile a Run Book of the events to take place post Declaration of Disaster, for DR and running of operations from the DR Site.
The existing Help Desk at the SDC shall perform the emergency Help Desk operations described in detail in the SDC DR Handbook Document. Relevant training and drills shall be conducted for the Help Desk for coordination among various DR teams so as to assist in seamless recovery of operations from the DR Site in event of a Disaster.
15 Page
3.7
Disaster Recovery Process Flowchart <team coordinator>
3.7.1 First Contact
If any employee suspects a disruption of services falling under the potential Disaster Recovery criterion, the concerned employee should immediately inform his team lead. The lead should then communicate to the network / application owners in SDC for operational status / initial failure analysis. In case of a physical security incident, Help Desk shall inform Fire Brigade, Police and other Emergency Services, as the case may be. Crisis Management Team (CMT) coordinator should be contacted depending on the initial assessment carried out by Security / IT Department / Administrator or Facility in-charge.
3.7.2 DR Execution Verification
Damage Assessment Team coordinator shall inform the Crisis Management Team Coordinator. The steps to be taken are:
1. Reporting a possible Disaster with a copy of the Initial Damage Assessment checklist.
2. Meeting the individual who gave the first alert and with personnel of the civic emergency services (fire station, police etc.) to assess extent of damage to the business.
3. Enter premises only with the consent of civic service personnel and evaluate extent of damage.
16 Page
Based on the intensity of an event, the initial assessment of the Damage Assessment Team (DAT), and the expected time to recover normal operations, the CMT shall suggest a DR event to the CMT Coordinator, who in turn will declare a Disaster. In the absence of the CMT Coordinator, the SDC Project Manager shall declare the disaster.
3.8
Criteria for Disaster Declaration and the Recovery Strategies
The Disaster Recovery (DR) site for the Data Center has been established at < Designated DR Site>.
3.8.1 Decision for shifting to Disaster Recovery Site
Approval for shifting operations to the DR site shall be obtained from the DRP coordinator. The procedure for transmission of data from the primary site to the DR Site, for restoration of database at the DR Site, as well as for switching over the system (with network changes) to DR Site shall be clearly documented.
The SDC DR Strategy Document describes the various DR Strategies that may be adopted for different Disaster scenarios.
The service recovery has to be initiated from the DR Site only after Disaster Declaration.
3.9
Procedures for Emergency Response, Recovery & Restoration
Key administrators at the Data Center are identified in the following areas:-System
Network & Security Database
Storage & Backup Infrastructure
Logistics / Administration
All of the above are required to first assess the disruption in their respective identified DR services, and subsequently follow the problem resolution steps and appropriate recovery procedures for various threats.
The Crisis Management Team should be notified in case the disruption is of a major level such that the restoration of services isn’t possible for at least the minimum RTO time.
17 Page
3.10 Time bound Disaster Recovery Directives
There needs to be clearly defined timelines for the steps to be taken from the Outage identification to Disaster Declaration. The State shall come up with a timeline as below to map the activities to their concluding periods.
Timeline Activities to be performed Ownership
0-2 (minutes)
Problem / outage noticed
Monitoring Team
Members Respective employee informs lead and / or
physical security head Employee
2-10 (minutes)
Lead performs initial analysis and communicates to administrators / administration
In-charge / physical security head Team Lead Initial analysis mail sent to all DAT and CMT
members.
Team Lead
10-15 (minutes)
DAT meeting for damage analysis, control, and recovery timelines
Damage Assessment Team coordinator
Communications to third party and vendors for recovery assistance
Damage Assessment Team coordinator
Status mail sent from to all DAT and CMT members, copying DRP coordinator.
Damage Assessment Team coordinator
CMT plans to meet, coordinating members' presence for call
Crisis Management Team first contact
15-20 (minutes)
DAT continues analysis, sends initial report to CMT
Damage Assessment Team coordinator
CMT meeting with DAT coordinator, with DRP
coordinator as passive participant Crisis Management Team CMT discusses with DRP coordinator whether
to proceed for Disaster declaration Crisis Management Team
20-30 (minutes)
Disaster Declared. Communication through mail / notice board announcement / verbal announcement sent to all SDC employees
Crisis Management Team Coordinator
3.11 Application level Classification for Disaster Recovery
Criticality of all Applications shall be categorized into 3 classes,Class I - Highly Critical Class II - Critical
Class III – Not Critical
All User Department applications covered under Class I and II shall be undertaken for Disaster Recovery, or as per individual decisions of the State. Also, all applications needn’t necessarily be in the same recovery timeline bracket, i.e., some applications can be recovered earlier than the others. Thus, all applications have been classified under 3 priority scales. These scales represent the time brackets to be considered while recovering the critical applications at the respective DR sites.
18 Page
Chapter 4: Disaster Recovery Plan Testing & Updation
An effective DRP is an evolving documentation, accommodating all changes within SDC in terms of people, process, and technology. To keep Disaster Recovery free from real-time operational obstacles, regular or need-based revisions need to be undertaken by concerned stakeholders.
The ownership of DRP revisions shall lie with the Disaster Recovery Planning coordinator
Plan updates will be the result of monitoring and testing of DR related activities. It is the responsibility of the DRP coordinator to revise the Disaster Recovery Plan appropriately once changes have been identified. Changes to the plan need to be discussed with the relevant personnel.
4.1
Plan Maintenance
Plan maintenance
includes time driven activities aimed for periodic revisions of the Disaster
Recovery Plan to keep it up to date with the ever so changing dynamics of the organization.
The frequency and type of reviews that need to be performed to maintain a Disaster Recovery Plan can be decided by the State as it finds feasible.4.2
Test Plan
AssumptionsThe test plan shall be formulated, based on the DR strategy selected by SDC and on the following assumptions:
Adequate steps are taken so as to not affect the production environment All components of DR selected as part of DR strategy are implemented Dedicated resources are available during the DR testing
Required test environment and tools are available
Tests are being carried out in a low customer traffic time frame
4.3
Drill Plan
Initially, SDC shall have a Bi-annual cycle of Disaster Mock drills between the production and DR sites. Over a period of time, with process maturity after conducting multiple drills, the same can be an annual activity. At least one participant from each team Identified for Disaster recovery activity should be available for the Disaster recovery. Drill Team members may be identified and informed prior to the Drill activity.
For any drill, there shall be two nominated positions, being a DR planner and a Recorder. It is the Disaster Recovery Planning coordinator’s responsibility to nominate himself / others as a DR Planner, and a Crisis Management Team member as a Recorder.
19 Page
Annexure – I : Damage Assessment Report
Damage Assessment Report
Event Date
Damage Assessment Team Member Event Reported by
SDC Area Affected
Initial Description of Event as given by the user
Event Impact Description
Damaged Area Damage Description
Loss of life/ casualties, if any Extent of damage to premises Loss of IT Hardware
Loss of applications
Loss of communication links Loss of critical data
Loss of other assets
Root cause of the event as per Initial Assessment
Expected Recovery Time Recommendation of DAT Team
DAT Team Member Name: Signature:
20 Page
Annexure – II: Team Details
An indicative list of Team members have been mentioned in the SDC DR Strategy Document, for the State’s reference. However, the State may choose to nominate members as per its own convenience.
A. Crisis Management Team (CMT) Coordinator
Team Member Contact Details
Name Designation Office Mobile
B. Damage Assessment Team (DAT)
Team Coordinator
Team Member Contact Details
Name Designation Office Mobile
C. Operations Recovery Team (ORT) Team Coordinator
Team Member Contact Details
Name Designation Office Mobile
21 Page
Team Coordinator
Team Member Contact Details
22 Page
Annexure - III: DR Site Activation Checklist
All DR Teams must be aware of the Disaster Recovery Planning document. Other Teams must be able to assist the Operations Recovery Team, lead by the coordinator with the implementation of the plan
Activity Responsibility Activity Status
Meeting with CMT Coordinator, Operations Recovery Team Summarization of Root cause
analysis as given by DAT Team
Coordinator, Operations Recovery Team Refer to Recovery steps to adopt Coordinator, Operations Recovery Team Perform Necessary configuration
changes for Recovery
Operations Recovery Team
Update CMT representative and Handover
Operations Recovery Team Recovery testing Operations Recovery Team
Update CMT representative Coordinator, Operations Recovery Team Notify users about Service
resumption
23 Page
Annexure IV: Business Impact Analysis
Business Impact Analysis (BIA) Template
Application User Department
Critical Processes Supported ApplicationDeveloper Hosting Model (Co-Located / Shared) X <State> SDC Address Head of Application Maintenance activity Interfaces with other applications Server Name OS Details Database Server and details Brief on X's functionality
Business Impact on non - availability of business application system
Type of Impact Description of impact
24 Page Services
Citizen base
Legal and regulatory
Others
Please indicate the rating of losses in term of Low, Medium or High
Impact Upt o 1 hou r 1-3 hou rs 3 - 8 hou rs 8 - 24 hou rs 1 - 3 da ys 3 da ys - 1 we ek 1 - 3 wee ks Mo re tha n 3 wee ks Financial Services Citizen base
Legal & regulatory
Maximum Acceptable Downtime for the business
application ( RTO ) :
Maximum Acceptable Timeframe for data loss
(RPO ) :
Overall Criticality Rating :