• No results found

Sound Transit Internal Audit Report - No

N/A
N/A
Protected

Academic year: 2021

Share "Sound Transit Internal Audit Report - No"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

Sound Transit Internal Audit Report - No. 2014-6

Maturity Assessment: Information Technology Division

Disaster Recovery Planning

Report Date: June 5, 2015

Table of Contents

Page

Executive Summary 2

Background 3

Audit Approach and Methodology 3

Maturity Assessment 4

Management Response 7

Audit Timeline

Audit Entrance Meeting 02/13/15 First draft report issued 06/05/15

Exit Meeting 06/04/15

Final Management responses received 08/12/15 Final report issued 08/19/15 Presented to Audit & Reporting Committee 10/15/15

(2)

Executive Summary

The Information Technology Division’s Disaster Recovery Plan was included on the Internal Audit Division’s Work Plan in 2012, 2013 and 2014. The audit was deferred in both 2012 and 2013 because the IT Division was in the process of updating their plan. When we were advised that the IT Disaster Recovery Plan was to be updated again in 2014, we again considered deferring the audit, but instead determined to conduct a maturity assessment of the current state of disaster recovery planning in the IT Division.

To perform the assessment, we used two well-established industry standards. First, we used “COBIT 5” as the framework for establishing disaster recovery requirements. Second, we utilized the capability ratings standards established by the International Standards Organization.

According to IT Division management, their 2014 effort was focused on their data center as an expedient method to develop a disaster recovery plan for the most critical agency applications. The agency built two new data centers in 2013 and 2014, which provide “fail-over” redundancy. Our maturity assessment found that the 2014 Data Center Disaster Recovery Plan did not score well in terms of COBIT 5

requirements, primarily for three reasons. First, it was developed utilizing a “top-down” approach that assumed all agency business needs would be captured within the data center, when in fact they were not. Second, the plan assumed recovery time objectives,1 rather than analyzing business practices to

determine how long agency personnel could operate while awaiting service restoration following a disruption. Third, it assumed all applications within the data center were of equal criticality, thus it did not provide guidance regarding the priority to restore service to each application.

Because of these limitations, the current IT Division disaster recovery planning effort scored low in this assessment. Please refer to the detailed reporting within. Note that IT Division management is aware of this and has contracted with a consulting firm to create a new Disaster Recovery Plan. We will plan to review this effort and report a revised maturity assessment in future years.

This audit only pertains to information technology under the control of the IT Division, which includes information technology infrastructure located in the data center and certain transit systems located throughout the region (TVM, CCTV). This audit does not include the Supervisory Control and Data Acquisition or Positive Train Control, because disaster recovery planning for these systems is the responsibility of the Operations Department and is considered outside the scope of this maturity assessment.

1 Recovery Time Objective: the targeted duration of time and a service level within which a business process must

be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.

(3)

Background

IT Division management is currently in the process of aligning business processes with the COBIT 5 framework and also with division strategic planning and performance monitoring. COBIT 5 provides a manageable and logical structure for internal controls. The COBIT 5 business process that aligns with the IT Division Disaster Recovery Program at Sound Transit is DSS04, which is titled, “Manage

Continuity”.

The main focus of the IT Division Disaster Recovery Program is to continue agency operations effectively and efficiently after a disaster or unexpected business interruption. Prior to the 2014 Data Center

disaster recovery planning effort the program was last updated in 2007. A plan to update the program was presented to the Technology Governance Team (TGT) in June 2014. The plan included a three-year, three-phase process that focused on IT infrastructure in year one, IT business applications in year two and Supervisory Control and Data Acquisition (SCADA) in year three.

In 2014, the IT Division worked with an IT disaster recovery consultant to complete the Data Center Disaster Recovery Plan. In 2015, the IT Division is working with a new IT disaster recovery planning consultant to complete the IT business applications and SCADA disaster recovery plans. The consultant will:

1. Define and document the IT Continuity/Disaster Recovery Program policy, objectives and scope. 2. Maintain a continuity strategy.

3. Develop and implement a business continuity response. 4. Manage backup arrangements.

Audit Approach & Methodology

Internal Audit approached the audit by gaining an understanding of the current plans and processes described above. We reviewed COBIT 5 guidance and other resources to gain an understanding of industry best practices in disaster recovery planning. We met with management to discuss audit scope, objectives, timing and to obtain general knowledge of current practices. Based on analysis of the data gathered and discussion with ST management, the following objective was developed:

1. Perform a COBIT 5 maturity assessment regarding IT Division disaster recovery planning and management.

During the fieldwork phase of the audit, all collected information was examined, including the IT Data Center Disaster Recovery Plan, TGT presentations and IT Division procedure documents. All information collected was used to formulate conclusions and recommendations.

The final phase was reporting. All information was summarized and organized. Preliminary results were communicated with management, findings were clarified, and conclusions and recommendations were presented. The report was provided for appropriate Sound Transit personnel for review and comment. The report was revised to include the required management responses.

We conducted this performance audit in accordance with Generally Accepted Government Auditing Standards and the International Standards for the Professional Practice of Internal Auditing. Those

standards require that we plan and perform the audit to obtain sufficient, appropriate evidence to provide a reasonable basis for our findings and conclusions based on our audit objectives. We believe that the evidence obtained provides a reasonable basis for our findings and conclusions based on our audit objectives.

(4)

IT Division Disaster Recovery Planning Maturity Assessment

This audit evaluated the maturity of eight management activities within the COBIT 5, “Manage Continuity” process (see Table below). According to COBIT 5 standards and ISO rating methodology2, seven

management activities are rated Level 1 (Partial) and one is rated Level 0.

The management activities with maturity levels rated Level 1 are qualified as “partially achieved” because controls within the processes are not adequate to ensure predictable outcomes. The key to achieving Levels 2 and 3 is improved documentation and development of self-audit processes that evaluate the effectiveness of control processes.

The following table describes the current state of the eight defined management activities applicable to the COBIT 5, “Manage Continuity” process, which is described as “Establish and maintain a plan to enable the business and IT to respond to incidents and disruptions in order to continue operation of critical business processes and required IT services and maintain availability of information at a level acceptable to the enterprise.”

Management Activity

Description of Current State Capability

Level

1 Define the business continuity policy, objectives and scope.

The Data Center Disaster Recovery Plan is not aligned with the agency-wide Emergency Management Plan because:

 It was not developed based on analysis of agency

services/business processes that are critical to the agency.

 All business processes and systems are not included.

 Performance metrics to track progress of the Data Center Disaster Recovery Plan have not been developed.

1 -Partially Achieved

2 Maintain a continuity strategy.

The Data Center Disaster Recovery Plan does not assess the likelihood of disasters, is not based on business impact

analyses or recovery time objectives for critical agency services and business processes.

1 -Partially Achieved

3 Develop and implement a business continuity response.

The Data Center Disaster Recovery Plan does not include agency operational continuity plans, key suppliers or outsourced partner’s plans or backup requirements.

1 -Partially Achieved

4 Exercise, test & review the Business Continuity Plan.

Annual continuity testing plan has not been developed, documentation of existing testing should be improved, and performance metrics should be used to determine whether test results were addressed adequately and timely.

1 -Partially Achieved

5 Review, maintain & improve the

continuity plan.

The Data Center Disaster Recovery Plan has not been reviewed and approved and needs further improvement.

1 -Partially Achieved

6 Conduct continuity plan training.

A continuity training program has not been developed. Staff competencies required for continuity training and testing have not been defined or training plans documented.

0.00

7 Manage backup arrangements.

The IT Division procedure document addressing backup and retention requirements was last reviewed in 2012.

1 -Partially Achieved

8 Conduct post-resumption review.

The Data Center Disaster Recovery Plan includes steps to conduct post-resumption review, however it does not address all applicable systems, and the procedures have not been tested.

1 -Partially Achieved

(5)

Recommendations:

Based on interviews with management and analysis of the two consultant agreements for Disaster Recovery Planning (the 2014 effort and the current contract) it appears that IT Division management understands the current plan needs improvement. As noted previously in this report, the current plan is deficient primarily because it was based upon three incorrect assumptions. First, it was developed utilizing a “top-down” approach, assuming that all agency business needs would be captured. Second, the plan assumed recovery time objectives,3 rather than analyzing business practices to determine how

long agency personnel could operate while awaiting service restoration following a disruption. Third, it assumed all applications within the data center were of equal criticality, thus it did not provide guidance regarding the priority to restore service to each application.

We recommend the Information Technology Division consider the following: Planning

1. In order to better align the IT Business Continuity and Disaster Recovery Plan with the ST

Emergency Management Plan, identify IT responsibilities and document the policies and procedures required to continue business operations after a disaster.

2. Assess the likelihood of business disruption for each incident type. This can help focus training and preparation based on incident type.

3. Develop a backup and restore test plan that includes periodic testing of on-site and off-site data for critical systems.

4. Develop an annual business continuity and disaster recovery testing plan. A testing plan should include a schedule of types of testing, systems to test and a full test of the Disaster Recovery Plan.

Monitoring

5. Identify all internal IT services and business processes that are critical to the agency. Creating and maintaining this list will help IT focus its resources.

6. The IT Business Continuity and Disaster Recovery plan development should be reported to the TGT and executive management annually, to improve management controls and agency involvement in the process. The three-year, three-phase process to develop the plan was last presented to the TGT in June 2014.

7. Review the IT Data Center Disaster Recovery Plan on a regular basis against major changes to: agency organization, business processes, outsourcing arrangements, technologies, infrastructure, operating systems and application systems. Track and report the frequency of updates to the risk profile.

8. Ensure business impact assessments are revised when changes to agency business practices are identified.

9. Ensure that all changes in policy, plans, procedures, infrastructure, and roles and responsibilities are approved by agency management and communicated to appropriate agency staff.

10. Develop, track and report performance metrics for the IT Business Continuity and Disaster Recovery plan development project. The COBIT 5 framework for managing continuity recommends many performance metrics, which we have provided for consideration in Appendix II.

11. Periodically assess adherence to the documented Disaster Recovery Plan.

12. Determine the effectiveness of the plan, continuity capabilities, roles and responsibilities, skills and competencies, resilience to incidents, technical infrastructure, and organizational structures and relationships.

13. Identify weaknesses or omissions in the plan and capabilities and make recommendations for

3 Recovery Time Objective: the targeted duration of time and a service level within which a business process must

be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.

(6)

improvement. Track and report percent of agreed-on improvements to the plan that have been reflected in the plan and percent of issues identified that have been subsequently addressed in the plan.

Documenting

14. Determine whether agency divisions have developed operational business continuity plans for critical business processes and/or temporary processing arrangements, including links to plans of

outsourced service providers. Track and report percent of agency divisions satisfied that IT service delivery required in their continuity plans meet agreed-upon service levels.

15. Develop adequate Business Impact Assessments and Recovery Time Objectives (RTO). The assessments should include input from relevant stakeholders of each IT system and business function and should be documented and approved by stakeholders. The business impact assessment and RTO are listed as required outputs by COBIT 5.

16. Ensure that key suppliers and outsource partners have effective continuity plans in place. Track and report the percent of critical key suppliers and outsource partners who do not have effective

continuity plans in place.

17. Review and update the Information Technology – Procedure No. 9, “Backup and Retention Requirements: Production Environment.”

18. Include a reference to system backup requirements in policy and procedures required to support the IT Data Center Disaster Recovery Plan.

19. Include systems, applications, data and documentation maintained or processed by third parties in the ST Procedure No. 9, “Backup and Retention Requirements: Production Environment.” . 20. Improve the documentation of test results by:

a. Recording the date of the test

b. Recording the roles and responsibilities of test participants

c. Labeling recommendations identified from the post-test debriefing and analysis d. Including review and approval signature from IT management

21. Document post-resumption review following the successful resumption of business processes after service interruption.

22. Obtain management approval of the post-resumption review documentation.

Training

23. Create an IT business continuity and disaster recovery training program. Track and report percent of issues identified that have been subsequently addressed in the training materials.

24. Define training requirements for agency staff performing continuity planning, impact assessments, risk assessments, media communication and incident response. Ensure that the training plans consider frequency of training and training delivery mechanism. Track and report the percent of internal and external stakeholders that have received training.

25. Document agency staff competencies in business continuity and disaster recovery based on completed trainings and participation in business continuity tests.

(7)

Management Response

Recommendation - Planning Management Response

1. In order to better align the IT Business Continuity and Disaster Recovery Plan with the ST Emergency Management Plan, identify IT responsibilities and document the policies and procedures required to continue business operations after a disaster.

Party Agree: Disaster Recovery/Business Continuity should be an Agency-level policy. Procedures for IT recovery will be in the form of updated Runbooks, as part of 2016-2017 deliverables of the DR Program.

2. Assess the likelihood of business disruption for each incident type. This can help focus training and preparation based on incident type.

Partly Agree: Incident types have been described in IT Vulnerability Assessment Report in 2015. This assessment will influence how much investment we make in the DR program going forward. It will not be likely used for classifying incident response processes or training. This is not intended to be repetitive task in the DR program.

3. Develop a backup and restore test plan that includes periodic testing of on-site and off-site data for critical systems.

Partly Agree: Procedure #9 will be updated to include Backup and Recovery will be

completed by the end of July 2015. 4. Develop an annual business continuity and disaster

recovery testing plan. A testing plan should include a schedule of types of testing, systems to test and a full test of the Disaster Recovery Plan.

Not Agree: IT will conduct a planned DR test annually, beginning Jan. 2016. IT does not have the resources to fully test the DR plan annually.

Recommendation - Monitoring

5. Identify all internal IT services and business processes that are critical to the agency. Creating and maintaining this list will help IT focus its resources.

Partly Agree: A Business Impact Analysis and was completed in the 2015 deliverables of the DR Program. This is not intended to be a repetitive task in the DR Program. 6. The IT Business Continuity and Disaster Recovery plan

development should be reported to the TGT and executive management annually, to improve

management controls and agency involvement in the process. The three-year, three-phase process to develop the plan was last presented to the TGT in June 2014.

Partly Agree: The DR plan is scheduled to be presented to the TGT by 3Q2016.

7. Review the IT Data Center Disaster Recovery Plan on a regular basis against major changes to: agency

organization, business processes, outsourcing arrangements, technologies, infrastructure, operating systems and application systems. Track and report the frequency of updates to the risk profile.

Partly Agree: The DR Plan will be reviewed at a minimum every 3 years, beginning 2019.

8. Ensure business impact assessments are revised when changes to agency business practices are identified.

Partly Agee: Business impacts will be identified as part of new application/system rollouts. The BIA’s are not intended to be a repetitive task.

9. Ensure that all changes in policy, plans, procedures, infrastructure, and roles and responsibilities are approved by agency management and communicated to

appropriate agency staff.

Partly Agree: Changes in DR/BC policy will be communicated as part of the Agency policy process. Internal to IT communication may not take on a formal communication channel. The DR program will not be

(8)

requesting approval other than IT

management; and the policy committee for policy changes.

10. Develop, track and report performance metrics for the IT Business Continuity and Disaster Recovery plan

development project. The COBIT 5 framework for managing continuity recommends many performance metrics, which we have provided for consideration in Appendix II.

Partly Agree: Given IT resources, this will not occur other than as a result of the annual IT DR exercise which will provide Pass/Fail on the exercise and lessons learned.

11. Periodically assess adherence to the documented Disaster Recovery Plan.

Agree: This will be completed with annual DR exercises.

12. Determine the effectiveness of the plan, continuity capabilities, roles and responsibilities, skills and competencies, resilience to incidents, technical infrastructure, and organizational structures and relationships.

Agree: The effectiveness of the plan will be determined by the success or failure of annual DR exercises.

13. Identify weaknesses or omissions in the plan and capabilities and make recommendations for

improvement. Track and report percent of agreed-on improvements to the plan that have been reflected in the plan and percent of issues identified that have been subsequently addressed in the plan.

Not Agree: Tracking and reporting on a percentage basis will be overly cumbersome for the available resources and Agency commitment to the DR Program and therefore will not be developed.

Recommendation - Documenting

14. Determine whether agency divisions have developed operational business continuity plans for critical business processes and/or temporary processing arrangements, including links to plans of outsourced service providers. Track and report percent of agency divisions satisfied that IT service delivery required in their continuity plans meet agreed-upon service levels.

Partially Agree: During the Business Impact Analysis developed in the DR Program, IT has documented whether plans for business process continuity exist. It will be the Agency COOP who provides continued oversight with the business to keep critical business

processes updated.

15. Develop adequate Business Impact Assessments and Recovery Time Objectives (RTO). The assessments should include input from relevant stakeholders of each IT system and business function and should be

documented and approved by stakeholders. The business impact assessment and RTO are listed as required outputs by COBIT 5.

Completed: This task was completed during Phase 1 of the DR Program. Stakeholders filled out and went through thorough review of the Business Processes and desired RTO’s.

16. Ensure that key suppliers and outsource partners have effective continuity plans in place. Track and report the percent of critical key suppliers and outsource partners who do not have effective continuity plans in place.

Agree: This task is being planned as part of 2017 deliverables of the DR Program development.

17. Review and update the Information Technology – Procedure No. 9, “Backup and Retention Requirements: Production Environment.”

Agree: Updated as part of 2015 DR Program

18. Include a reference to system backup requirements in policy and procedures required to support the IT Data Center Disaster Recovery Plan.

Reference item 1

(9)

maintained or processed by third parties in the ST Procedure No. 9, “Backup and Retention Requirements: Production Environment.” .

deliverables.

20. Improve the documentation of test results by: Agree: This will be included as part of the annual IT DR exercises.

a. a. Recording the date of the test Blank

b. b. Recording the roles and responsibilities of test participants

Blank

a.

c. Labeling recommendations identified from the post-test debriefing and analysis

Blank

a.

d. Including review and approval signature from IT management

Blank

21. Document post-resumption review following the successful resumption of business processes after service interruption.

Agree: This will be included as part of annual IT DR exercises and any significant actual incidents.

22. Obtain management approval of the post-resumption review documentation.

Agree: This will be included as part of the annual IT DR exercise and any significant incident(s).

Recommendation - Training

23. Create an IT business continuity and disaster recovery training program. Track and report percent of issues identified that have been subsequently addressed in the training materials.

Not Agree: Developing a training program, tracking and reporting on a percentage basis will be overly cumbersome for the available resources and Agency commitment to the DR Program.

24. Define training requirements for agency staff performing continuity planning, impact assessments, risk

assessments, media communication and incident response. Ensure that the training plans consider frequency of training and training delivery mechanism. Track and report the percent of internal and external stakeholders that have received training.

Partly Agee: IT will develop and maintain a DR training program for appropriate staff. Developing a training program, tracking and reporting on a percentage basis will be overly cumbersome for the available resources and Agency commitment to the DR Program. 25. Document agency staff competencies in business

continuity and disaster recovery based on completed trainings and participation in business continuity tests.

Partly Agree: IT will maintain a training inventory for staff to insure key DR personnel have received proper training. However, developing a training program, tracking and reporting on a percentage basis will be overly cumbersome for the available resources and Agency commitment to the DR Program.

(10)

Appendix I: Description of COBIT 5 and ISO Ratings

In COBIT 5, an ISO/IEC 15504 compliant process capability assessment system is used to assess whether process goals have been achieved. The capability level of a process is determined on the basis of the achievement of specific process attributes.

Table—COBIT 5 Process Capability Ratings (based on ISO/IEC 15504)

Capability Level

Level Name

Description

5 Optimizing process

The level 4 predictable process is continuously improved to meet relevant current and projected business goals.

4 Predictable process

The level 3 established process now operates within defined limits to achieve its process outcomes.

3 Established process

The level 2 managed process is now implemented using a defined process that is capable of achieving its process outcomes.

2 Managed process

The level 1 performed process is now implemented in a managed fashion (planned, monitored and adjusted) and its work products are appropriately established, controlled and maintained.

1 Performed process

The implemented process achieves its process purpose. 0 Incomplete

process

The process is not implemented or fails to achieve its purpose.

Attributes may also be rated with a standard qualification scale that is also defined in the ISO/IEC 15504 standard:

Rating Symbol

Rating Description

N Not achieved.

There is little or no evidence of achievement of the defined attribute in the assessed process.

P Partially achieved.

There is some evidence of an approach to, and some achievement of, the defined attribute in the assessed process. Some aspects of achievement of the attribute may be unpredictable.

L Largely achieved.

There is evidence of a systematic approach to, and significant achievement of, the defined attribute in the assessed process. Some weakness related to this attribute may exist in the assessed process.

F Fully achieved.

There is evidence of a complete and systematic approach to, and full

achievement of, the defined attribute in the assessed process. No significant weaknesses related to this attribute exist in the assessed process.

(11)

Appendix II:

The COBIT 5 framework for managing continuity recommends many performance metrics, including the following:

a. Percent of critical business processes, IT services and IT-enabled business programs covered by risk assessment.

b. Number of significant IT-related incidents that were not identified in risk assessment. c. Frequency of update or risk profile.

d. Number of business disruptions due to IT service incidents.

e. Percent of business stakeholders satisfied that IT service delivery meets agreed-on service levels.

f. Percent of users satisfied with the quality of IT service delivery.

g. Level of business user satisfaction with quality and timeliness (or availability) of management information.

h. Number of business process incidents caused by non-availability of information.

i. Ratio and extent of erroneous business decisions where erroneous or unavailable information was a key factor.

j. Percent of IT services meeting uptime requirements.

k. Percent of successful and timely restoration from backup or alternate media copies. l. Percent of backup media transferred and stored securely.

m. Number of critical business systems not covered by the plan.

n. Number of exercise and tests that have achieved recovery objectives. o. Frequency of tests.

p. Percent of agreed-on improvements to the plan that have been reflected in the plan. q. Percent of issues identified that have been subsequently addressed in the plan. r. Percent of internal and external stakeholders that have received training.

References

Related documents

4 IT Assurance - Business Continuity and Disaster Recovery │Audit Summary

UMMS’s continuity-of-operations plans for certain departments, its internal control plan, and its Emergency Operations Plan contained multiple elements of disaster recovery

Data supporting the above variables are (1) the application of online mode continuous professional development, (2) online mode guidance model, (3) mechanism for implementing online

DATA BASE DATA BASE SECTOR-9 SECTOR-9 ROHINI ROHINI... LAL JYOTI

Business Continuity Plan Virtual Server Backup Data Verification Remote Backup Disaster Recovery.. Trust ABELDent to support

• Insufficient Disaster Recovery Plan and Business Continuity Plan – As recommended in the FY 2012 and 2013 IT internal audit remediation plan, the current disaster recovery plan

5.1 Taking account of the issues identified in paragraphs 5.2 to 5.4 below, in our opinion the control framework for disaster recovery and business continuity

The purpose of this policy is to describe the backup and contingency plans, including disaster recovery planning, that will be implemented to ensure that