• No results found

Root Cause Analysis Report

N/A
N/A
Protected

Academic year: 2021

Share "Root Cause Analysis Report"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

RCA Name IT Customer Complaints RCA Report Number 2012.67

Report Date 4/2/2012

RCA Owner Problem Manager

Root Cause Analysis Report

Problem Statement

Focal Point Customer Complaints

When

Start Date 3/5/2012 End Date 3/5/2012

Start Time 8:44am End Time 2:31pm

Unique Timing While database admin was on vacation Where

Company website

Company IT infrastructure

Actual I mpact Cost

Estimated 1,500,000.00

Negative impact Negative publicity

Actual I mpact Total: $1,500,000.00 Frequency 2 times

Much more

Higher negative impact Location System Revenue Customer Service Other... overall Potential I mpact Revenue Customer Service Page 1 \ 8

(2)

Report Summaries

Cause and Effect Summary

On March 5, 2012 we received numerous complaints from customers about our website being down while they were attempting to use it. The website was down from approximately 8:44am to 2:31pm EST. The customers were unable to access the website because they were receiving "500" errors from our web server. "500" errors prevent users from accessing the website. The server was returning "500" errors because the application server which processes requests was timing out, and we had an

unusually high amount of web traffic. The application server was timing out it was receiving requests, and the associated database was not working. The database was not working because the SQL

database cluster was not processing queries. The SQL cluster could not process new queries due to the fact that the transaction log stopped growing. The log couldn't grow because the T:Drive was full and we were using only one database cluster. There was only one database cluster in use because the other cluster was being used for UAT testing. The drive was full because there is fixed capacity, the log file storage grew, the logs were not truncated, and truncating the logs reduces memory needs. The logs weren't truncated because the database administrator (DBA) manually truncates them, and was on vacation. The backup DBA was not aware the logs needed truncating because there was no process in place to inform the backup DBA of critical tasks.

(3)

Label Solution Cause Note Assigned Criteria Due Status Term Cost Solution Cause Note Assigned Criteria Due Status Term Cost Solution Cause Note Assigned Criteria Due Status Term Cost Solution Cause Note Assigned Criteria Due Status Term Cost Solution Cause

Solutions

ID Description

1 Implement process to notify backup DBA of critical tasks when taking over duties. No process in place to inform backup DBA

Barrier = 3 months until

Jennifer Elderberry Fail

Selected

Choose $0.00

2 Create document highlighting DBA duties in case of turnover or emergency backup DBA

appointed

Jennifer Elderberry Pass

Selected

Choose $0.00

3 Explore automating log truncation

Logs are manually truncated by DBA

Dave Flynn Fail

Selected

Choose $0.00

5 Use seperate RD SQL clusters for UAT testing

Other SQL cluster being used for UAT testing

Ted Dezember Pass

Identified

Choose $0.00

6 Use multiple databases for application servers

(4)

Note Assigned Criteria Due Status Term Cost Solution Cause Note Assigned Criteria Due Status Term Cost

Jennifer Elderberry Pass

Selected

Choose $0.00

7 Increase space on T:Drives

T:Drive at zero bytes free

Ted Dezember Pass

Selected

(5)

Team

ID Label Description Label Description

1 First Name Cory Last Name Boisoneau

Phone (1) 425-225-5885 Phone (2)

Role Facilitator Group

Email [email protected]

2 First Name Jennifer Last Name Elderberry

Phone (1) 206-985-4845 Phone (2)

Role Database Admin Group

Email [email protected]

3 First Name Dave Last Name Flynn

Phone (1) 206-254-8890 Phone (2)

Role Backup DBA Group

Email [email protected]

4 First Name Ted Last Name Dezember

Phone (1) 206-795-4353 Phone (2)

Role IT Infrastructure Manager Group

Email [email protected]

5 First Name Hannah Last Name Zweikwinden

Phone (1) 206-658-0098 Phone (2)

Role IT Analyst Group

(6)

Label Evidence Cause(s) Location Link Contributor Type Quality Evidence Cause(s) Location Link Contributor Type Quality

Evidence

ID Description 1 Log file

Requests made of application server The application server was timing out Transaction log was unable to grow T:Drive at zero bytes free

Site was live

People visiting website Logs were not truncated

Customers attempted to access site Database not working

Jennifer Elderberry Document

2 Statement from DBA

Application server processes requests Only one database cluster in use Only one application server exists We only have two SQL clusters

SQL trans. log needs to grow to process queries Storage required for log to grow

Storage file size fixed

Database Admin (DBA) was on vacation Logs are manually truncated by DBA No process in place to inform backup DBA Transaction log located on T:Drive Company only has one DBA

Jennifer Elderberry Direct Statement

(7)

Evidence Cause(s) Location Link Contributor Type Quality Evidence Cause(s) Location Link Contributor Type Quality Evidence Cause(s) Location Link Contributor Type Quality Evidence Cause(s) Location

3 Statement from backup DBA

SQL server was not processing queries Backup DBA not aware logs needed truncating Application server relies on working database

Dave Flynn Direct Statement

4 Statement from IT Infrastructure Manager

Other SQL cluster being used for UAT testing

Ted Dezember Direct Statement

5 Statement from IT Analyst

"500" errors prevent access to website Time outs result in "500" errors

Functional database relies on working SQL server

Hannah Zweikwinden Direct Statement

6 Client compaint log

Customers not able to access our web site Web server returned error ("500"-type) Chose to contact web support with complaint

(8)

Link Contributor Type Quality Evidence Cause(s) Location Link Contributor Type Quality Jennifer Elderberry Document 7 Customer statement

Customers need/want to access site

Jennifer Elderberry Direct Statement

(9)

Chart Type Legend Transitory Non-transitory Omission - Transitory Omission - Non-transitory Focal Point Solution Implemented Customer Complaints Evidence

Client compaint log

Customers not able to access our web site

Evidence

Client compaint log

Web server returned error ("500"-type)

Evidence

Log file

The application server was timing out Evidence Log file Requests made of application server Evidence Log file People visiting website Evidence Log file

Site was live Terminated because:

Desired state

END

Evidence

Statement from DBA

Application server processes requests Terminated because: Desired state END Evidence

Statement from backup DBA

Application server relies on working database Terminated because: Desired state END Evidence Log file

Database not working

Evidence

Statement from backup DBA

SQL server was not processing queries

Evidence

Statement from DBA

SQL trans. log needs to grow to process queries

Terminated because:

Other causal paths more productive

END

Evidence

Log file

Transaction log was unable to grow

Evidence

Statement from DBA

Transaction log located on T:Drive

Terminated because:

Other causal paths more productive

END

Evidence

Statement from DBA

Solutions

Use multiple databases for application servers

Criteria Pass Status Selected

Only one database cluster in use

Evidence

Statement from DBA

We only have two SQL clusters

Terminated because:

Other causal paths more productive

END

Evidence

Statement from IT Infrastructure Manager

Solutions

Use seperate RD SQL clusters for UAT testing

Criteria Pass Status Identified

Other SQL cluster being used for UAT testing

Terminated because:

Other causal paths more productive

END

Evidence

Statement from DBA

Storage required for log to grow

Terminated because:

Other causal paths more productive

END OR T:Drive damaged Evidence Log file Solutions

Increase space on T:Drives

Criteria Pass Status Selected

T:Drive at zero bytes free

Evidence

Statement from DBA

Storage file size fixed

Terminated because:

Other causal paths more productive

END

Evidence

Log file

Logs were not truncated

Evidence

Statement from DBA

Database Admin (DBA) was on vacation

Terminated because:

Other causal paths more productive

END

Evidence

Statement from DBA

Solutions

Explore automating log truncation Criteria Fail Status Selected

Logs are manually truncated by DBA

Terminated because:

?

END

Evidence

Statement from DBA

Company only has one DBA

Terminated because:

Desired state

END

Evidence

Statement from backup DBA

Backup DBA not aware logs needed

truncating

Evidence

Statement from DBA

Solutions

Implement process to notify backup DBA of critical tasks when taking over duties. Criteria Fail Status Selected Barrier = 3 months until

No process in place to inform backup DBA

Terminated because:

Other causal paths more productive

END

Evidence

Statement from IT Analyst

Functional database relies on working SQL server Terminated because: Desired state END Evidence

Statement from IT Analyst

Time outs result in "500" errors

Terminated because:

Other causal paths more productive

END

Evidence

Statement from DBA

Only one application server exists

Terminated because:

?

END

Evidence

Statement from IT Analyst

"500" errors prevent access to website Terminated because: Desired state END Evidence Log file Customers attempted to access site Terminated because: Desired state END Evidence Customer statement Customers need/want to access site Terminated because: Desired state END Evidence

Client compaint log

Chose to contact web support with

complaint

Terminated because:

Other causal paths more productive

References

Related documents

Transparency in healthcare relates to formally reporting medical errors and disclosing bad outcomes to patients and families. Unfortunately, most physicians are not in the habit of

The North Carolina Board of Nursing, a contributing entity to the NCSBN taxonomy of error supports investigation of errors committed by licensed nurses working within the state

While working in interprofessional teams, students are to apply their professional knowledge and team skills to resolve and reduce errors of a fabricated sentential event, case

Environmental Factors Blunt End: Root Cause(s) Contributing Factors Management/ Organizational/ Regulatory Factors Sharp End Examples: Medication adverse events, Nosocomial

Monitor Start Time Start Time: 288608, Reason: Autostart. Unmonitor End Time End Time: 320131, Reason: Terminated

The Tank 650 Hazardous Materials Warning Sheet (HMWS) did not include precautions for potential for pyrophoric material. This was one factor in why the hazards of

Monitor Start Time Start Time: 146464, Reason: Child Process Unmonitor End Time End Time: 310052, Reason: Terminated by Timeout. Monitor

Incident type: Patient accident Healthcare specialty: Ambulance Actual effect on patient: Wrist fracture Actual severity level: Moderate Scope and level