• No results found

Closed Loop Incident Process

N/A
N/A
Protected

Academic year: 2021

Share "Closed Loop Incident Process"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Andreas Gutzwiller

Presales Consultant, Hewlett-Packard (Schweiz)

Closed Loop Incident Process

From fault detection to closure

HP Software and Solutions

(2)

Closed Loop Incident Process Solution

The CLIP solution is a:

– Highly automated fault detection- to-recovery solution

– Focused on end-to-end service availability and performance

– Reducing mean time to recovery

and improves mean time between

system failures

(3)

Agenda

1.  Event and Incident Processes 2.  Closing the Loop

3.  Architecture

4.  Why CLIP

(4)

Agenda

1.  Event and Incident Processes 2.  Closing the Loop

3.  Architecture

4.  Why CLIP

(5)

5

Neither process can stand alone in today’s IT environments

ITILv3 Linkage of Event & Incident Management

 Event – A change of state or alert that has significance for the

management of a Configuration Item (CI) or IT Service.

 Incident – Unplanned interruption, or reduction of quality, of an IT service

 IT Service – People, processes &

technology deliverable that

supports a customer’s business processes

 Event Management

•  Responsible for managing events

throughout their lifecycle. Main activity of IT Operations.

•  Event  Filtered/Correlated  Resolve or forward to Incident  Close

 Incident Management

•  Includes any event which, or could,

disrupts a service. From users or IT staff

•  Incident -> Categorize /Prioritize ->

Diagnose -> Resolve -> Close

(6)

6

ITIL Areas Involved in CLIP

–  Operations Bridge (aka NOC)

•  Central coordination point

•  Manages various classes of events

•  Detects incidents

•  Manages routine operational activities

•  Reports on the status and performance

•  May provide first-level support for those events which generate an incident

“The Service Desk is not typically involved in Event Management … unless the Service Desk and Operations Bridge have been combined”

–  Service Desk

•  Single central point of contact for all users of IT

•  Logs and manages all incidents, service requests and access requests

•  Provides interface to all other Service Operation processes and activities

(7)

Traditional Incident Management

From diagnosis to resolution

Multiple un-integrated systems and data stores, manually coordinated hand-offs → inconsistent troubleshooting, high MTTR

Identify service performance

degradation 1

Troubleshoot problem to isolate root cause

2

Identify actionable condition / changes to be

implemented 3

Create TT/RFC to implement

change 4

Implement and automate change

to close RFC 5

Update CMS (Federated CMDB)

6

End User Help Desk “Fire Storms” CMDB

1.  Service performance

notification 2. Gather data to

assign SME 3. Bouncing the incident

4. Ticket is finally assigned to the correct SME

5. Impact analysis and change

management

6. Update CMDB - timely & correctly?

SME: Subject Matter Experts

7

(8)

Agenda

1.  Event and Incident Processes 2.  Closing the Loop

3.  Architecture

4.  Why CLIP

(9)

9

Closed Loop Incident Process solution for ITIL Event and Incident Management

From Fault Detection To Recovery & Closure

ITIL Process

Event Management Incident Management

Event Generation

& Detection

Event Correlation

& Business Impact

Incident Submission

Investigation &

Diagnosis Resolution

Recovery &

Closure

(10)

10

Closed Loop Incident Process solution for ITIL Event and Incident Management

Event Generation & Detection

Event Generation &

Detection

Event Correlation &

Business Impact

Incident Submission

Investigation &

Diagnosis Resolution

Recovery &

Closure

Operations bridge console collects events & alerts from servers, networks, apps & 3rd party

Challenge

  Bottom-up alert and event overload

  Lack of qualitative cross domain “actionable”

and causal event data Solution

  All events come to one place, correlated and enriched against an auto-updated service model

User Example – Events to single console

  End user experience slow

  SQL slow query performance alert

  J2EE DB collection pool issue

(11)

Event Generation &

Detection

Event Correlation &

Business Impact

Incident Submission

Investigation &

Diagnosis Resolution

Recovery &

Closure

11

Closed Loop Incident Process solution for ITIL Event and Incident Management

Event Correlation & Business Impact

Business services, business impact relationship, and SLAs determined

Challenge

  Struggle to link causal events to top down end- user experience and business impact

Solution

  Proactive end-user experience linked to business process and business transaction flow to identify high revenue generating service impact

User Example - Cause from symptoms and impact

  Oracle database is the cause, topology based correlation

  Critical funds transfer business service impacted

(12)

Event Generation &

Detection

Event Correlation &

Business Impact

Incident Submission

Investigation &

Diagnosis Resolution

Recovery &

Closure

12

Closed Loop Incident Process solution for ITIL Event and Incident Management

Incident Submission

Automatic submission to service desk with annotations and cause area

Challenge

  Quality and enrichment of data

  Siloed, broken service lifecycle

  Duplication of effort wasting time Solution

  Better collaboration

  Automation and integrated of event to incident process lifecycle

User Example - Automatic incident ticket creation

  Ticket visible to ops bridge

  Assignment to subject expert

(13)

Event Generation &

Detection

Event Correlation &

Business Impact

Incident Submission

Investigation &

Diagnosis Resolution

Recovery &

Closure

13

Closed Loop Incident Process solution for ITIL Event and Incident Management

Investigation & Diagnosis

Problem isolation, SME tools, and KM used to determine root cause

Challenge

  Significant problem resolution time spent on pinpointing problem in a dynamic

heterogeneous IT universe

  Incident assigned and reassigned to multiple silos

Solution

  Cross domain data visualization and analysis User Example - Diving deeper to find root cause

  Expert sees corrupt DB tables

  Finds runbook automation fix in knowledgebase

(14)

Event Generation &

Detection

Event Correlation &

Business Impact

Incident Submission

Investigation &

Diagnosis Resolution

Recovery &

Closure

14

Closed Loop Incident Process solution for ITIL Event and Incident Management

Resolution

Change request with attached run book automation to repair CI’s

Challenge

  Little or lack of automation leads to increased manual efforts impacting quality and efficiency Solution

  Expert created/authorized run book

automation to empower lower level teams

  Manage change, configuration, and release process

User Example - Processing the change

  Get change request approval

  Use runbook to reindex database tables

(15)

Event Generation &

Detection

Event Correlation &

Business Impact

Incident Submission

Investigation &

Diagnosis Resolution

Recovery &

Closure

15

Closed Loop Incident Process solution for ITIL Event and Incident Management

Recovery & Closure

Automatically close incident & related incidents acknowledging related events

Challenge

  Struggle to improve speed of restoration, recovery and closure of incident and verify post compliance of SLA/OLA

Solution

  Automate all notifications & updates,

continuously monitor SLA/OLA compliance User Example – Verify the change worked

  User, DB and connection pool OK

  Ticket and events closed

(16)

Agenda

1.  Event and Incident Processes 2.  Closing the Loop

3.  Architecture

4.  Why CLIP

(17)

Integrated ITIL event and incident management process optimizing MTTR and MTBF

Closed Loop Incident Process Integration Points

Service Desk Integrated

CMDB

Automation Monitoring

1 2

3 5

1

5

1.  Sharing CIs, topology and state information 2.  For creating and updating incidents

3.  For updating events

4.  Incident-, Problem- and Change-Mgmt 5.  Runbook automation to remediate

17

4

(18)

Integrated ITIL event and incident management process optimizing MTTR and MTBF

HP’s Closed Loop Incident Process Solution

Service Manager UCMDB

Operations Orchestration

SA O

ther

CA

NA SE

BSM CIs, Topo,

Events, Status

Net Ops App Other

1

2 3

4 5

6

7

1.  CIs, topology, events, status measurements flowing into BSM

2.  Sharing events and topology

3.  For creating and updating incidents

4.  To access Business Impact View for a CI

5.  Runbook automation to enrich, diagnosis and remediate

6.  Sharing CIs and state information 7.  Runbook automation to remediate

18

(19)

Agenda

1.  Event and Incident Processes 2.  Closing the Loop

3.  Architecture

4.  Why CLIP

(20)

Closed-Loop Incident Mgmt Process

Incident management from diagnosis to automated resolution

•  Key processes—incident, change and configuration—need to be tightly linked

•  Seamless process linkage requires tools to be consistently service-oriented IT service

management Business service

automation

Configuration Management System (Federated CMDB) Business service

management

1. Identify service performance issue

3. Create RFC to make change

2. Gather data to identify root cause

4b. Review, assess, plan and

govern change 5a. Implement change

Identify service performance

degradation

1

Troubleshoot problem to isolate root

cause

2

Identify changes to be

implemented

3

Create TT/RFC to implement

change

4

Implement and automate change to close

RFC

5

Update CMS (Federated

CMCB)

6

6. Update Configuration Management System 4a. Initiate change

5b. Close change request?

20

(21)

Drive innovation value of IT

Closed Loop Incident Process Key Benefits

Cost •  Drive efficiency through automation

•  Optimize service lifecycle process efficiency

72% lower

maintenance cost

Quality •  Eliminate error-prone manual tasks

•  Predict and prevent negative business impact

2.5x increased availability and performance

Transparency •  The cost/value ratio of delivered services is understood by the business

•  Any service from everywhere

99.5% availability via integrated delivery

Agility •  Saved labor can be spend on innovation

•  Measure and optimize time to develop and successfully deploy new services

30% faster time to market for new apps

Business

risk •  Reduce risk of failure when deploying changes

•  Enable compliance

70% fewer bad changes

21

(22)

References

Related documents