©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
Andreas Gutzwiller
Presales Consultant, Hewlett-Packard (Schweiz)
Closed Loop Incident Process
From fault detection to closure
HP Software and Solutions
Closed Loop Incident Process Solution
The CLIP solution is a:
– Highly automated fault detection- to-recovery solution
– Focused on end-to-end service availability and performance
– Reducing mean time to recovery
and improves mean time between
system failures
Agenda
1. Event and Incident Processes 2. Closing the Loop
3. Architecture
4. Why CLIP
Agenda
1. Event and Incident Processes 2. Closing the Loop
3. Architecture
4. Why CLIP
5
Neither process can stand alone in today’s IT environments
ITILv3 Linkage of Event & Incident Management
Event – A change of state or alert that has significance for the
management of a Configuration Item (CI) or IT Service.
Incident – Unplanned interruption, or reduction of quality, of an IT service
IT Service – People, processes &
technology deliverable that
supports a customer’s business processes
Event Management
• Responsible for managing events
throughout their lifecycle. Main activity of IT Operations.
• Event Filtered/Correlated Resolve or forward to Incident Close
Incident Management
• Includes any event which, or could,
disrupts a service. From users or IT staff
• Incident -> Categorize /Prioritize ->
Diagnose -> Resolve -> Close
6
ITIL Areas Involved in CLIP
– Operations Bridge (aka NOC)
• Central coordination point
• Manages various classes of events
• Detects incidents
• Manages routine operational activities
• Reports on the status and performance
• May provide first-level support for those events which generate an incident
“The Service Desk is not typically involved in Event Management … unless the Service Desk and Operations Bridge have been combined”
– Service Desk
• Single central point of contact for all users of IT
• Logs and manages all incidents, service requests and access requests
• Provides interface to all other Service Operation processes and activities
Traditional Incident Management
From diagnosis to resolution
Multiple un-integrated systems and data stores, manually coordinated hand-offs → inconsistent troubleshooting, high MTTR
Identify service performance
degradation 1
Troubleshoot problem to isolate root cause
2
Identify actionable condition / changes to be
implemented 3
Create TT/RFC to implement
change 4
Implement and automate change
to close RFC 5
Update CMS (Federated CMDB)
6
End User Help Desk “Fire Storms” CMDB
1. Service performance
notification 2. Gather data to
assign SME 3. Bouncing the incident
4. Ticket is finally assigned to the correct SME
5. Impact analysis and change
management
6. Update CMDB - timely & correctly?
SME: Subject Matter Experts
7
Agenda
1. Event and Incident Processes 2. Closing the Loop
3. Architecture
4. Why CLIP
9
Closed Loop Incident Process solution for ITIL Event and Incident Management
From Fault Detection To Recovery & Closure
ITIL Process
Event Management Incident Management
Event Generation
& Detection
Event Correlation
& Business Impact
Incident Submission
Investigation &
Diagnosis Resolution
Recovery &
Closure
10
Closed Loop Incident Process solution for ITIL Event and Incident Management
Event Generation & Detection
Event Generation &
Detection
Event Correlation &
Business Impact
Incident Submission
Investigation &
Diagnosis Resolution
Recovery &
Closure
Operations bridge console collects events & alerts from servers, networks, apps & 3rd party
Challenge
Bottom-up alert and event overload
Lack of qualitative cross domain “actionable”
and causal event data Solution
All events come to one place, correlated and enriched against an auto-updated service model
User Example – Events to single console
End user experience slow
SQL slow query performance alert
J2EE DB collection pool issue
Event Generation &
Detection
Event Correlation &
Business Impact
Incident Submission
Investigation &
Diagnosis Resolution
Recovery &
Closure
11
Closed Loop Incident Process solution for ITIL Event and Incident Management
Event Correlation & Business Impact
Business services, business impact relationship, and SLAs determined
Challenge
Struggle to link causal events to top down end- user experience and business impact
Solution
Proactive end-user experience linked to business process and business transaction flow to identify high revenue generating service impact
User Example - Cause from symptoms and impact
Oracle database is the cause, topology based correlation
Critical funds transfer business service impacted
Event Generation &
Detection
Event Correlation &
Business Impact
Incident Submission
Investigation &
Diagnosis Resolution
Recovery &
Closure
12
Closed Loop Incident Process solution for ITIL Event and Incident Management
Incident Submission
Automatic submission to service desk with annotations and cause area
Challenge
Quality and enrichment of data
Siloed, broken service lifecycle
Duplication of effort wasting time Solution
Better collaboration
Automation and integrated of event to incident process lifecycle
User Example - Automatic incident ticket creation
Ticket visible to ops bridge
Assignment to subject expert
Event Generation &
Detection
Event Correlation &
Business Impact
Incident Submission
Investigation &
Diagnosis Resolution
Recovery &
Closure
13
Closed Loop Incident Process solution for ITIL Event and Incident Management
Investigation & Diagnosis
Problem isolation, SME tools, and KM used to determine root cause
Challenge
Significant problem resolution time spent on pinpointing problem in a dynamic
heterogeneous IT universe
Incident assigned and reassigned to multiple silos
Solution
Cross domain data visualization and analysis User Example - Diving deeper to find root cause
Expert sees corrupt DB tables
Finds runbook automation fix in knowledgebase
Event Generation &
Detection
Event Correlation &
Business Impact
Incident Submission
Investigation &
Diagnosis Resolution
Recovery &
Closure
14
Closed Loop Incident Process solution for ITIL Event and Incident Management
Resolution
Change request with attached run book automation to repair CI’s
Challenge
Little or lack of automation leads to increased manual efforts impacting quality and efficiency Solution
Expert created/authorized run book
automation to empower lower level teams
Manage change, configuration, and release process
User Example - Processing the change
Get change request approval
Use runbook to reindex database tables
Event Generation &
Detection
Event Correlation &
Business Impact
Incident Submission
Investigation &
Diagnosis Resolution
Recovery &
Closure
15
Closed Loop Incident Process solution for ITIL Event and Incident Management
Recovery & Closure
Automatically close incident & related incidents acknowledging related events
Challenge
Struggle to improve speed of restoration, recovery and closure of incident and verify post compliance of SLA/OLA
Solution
Automate all notifications & updates,
continuously monitor SLA/OLA compliance User Example – Verify the change worked
User, DB and connection pool OK
Ticket and events closed
Agenda
1. Event and Incident Processes 2. Closing the Loop
3. Architecture
4. Why CLIP
Integrated ITIL event and incident management process optimizing MTTR and MTBF
Closed Loop Incident Process Integration Points
Service Desk Integrated
CMDB
Automation Monitoring
1 2
3 5
1
5
1. Sharing CIs, topology and state information 2. For creating and updating incidents
3. For updating events
4. Incident-, Problem- and Change-Mgmt 5. Runbook automation to remediate
17
4
Integrated ITIL event and incident management process optimizing MTTR and MTBF
HP’s Closed Loop Incident Process Solution
Service Manager UCMDB
Operations Orchestration
SA Other
CA
NA SE
BSM CIs, Topo,
Events, Status
Net Ops App Other
1
2 3
4 5
6
7
1. CIs, topology, events, status measurements flowing into BSM
2. Sharing events and topology
3. For creating and updating incidents
4. To access Business Impact View for a CI
5. Runbook automation to enrich, diagnosis and remediate
6. Sharing CIs and state information 7. Runbook automation to remediate
18
Agenda
1. Event and Incident Processes 2. Closing the Loop
3. Architecture
4. Why CLIP
Closed-Loop Incident Mgmt Process
Incident management from diagnosis to automated resolution
• Key processes—incident, change and configuration—need to be tightly linked
• Seamless process linkage requires tools to be consistently service-oriented IT service
management Business service
automation
Configuration Management System (Federated CMDB) Business service
management
1. Identify service performance issue
3. Create RFC to make change
2. Gather data to identify root cause
4b. Review, assess, plan and
govern change 5a. Implement change
Identify service performance
degradation
1
Troubleshoot problem to isolate root
cause
2
Identify changes to be
implemented
3
Create TT/RFC to implement
change
4
Implement and automate change to close
RFC
5
Update CMS (Federated
CMCB)
6
6. Update Configuration Management System 4a. Initiate change
5b. Close change request?
20
Drive innovation value of IT
Closed Loop Incident Process Key Benefits
Cost • Drive efficiency through automation
• Optimize service lifecycle process efficiency
72% lower
maintenance cost
Quality • Eliminate error-prone manual tasks
• Predict and prevent negative business impact
2.5x increased availability and performance
Transparency • The cost/value ratio of delivered services is understood by the business
• Any service from everywhere
99.5% availability via integrated delivery
Agility • Saved labor can be spend on innovation
• Measure and optimize time to develop and successfully deploy new services
30% faster time to market for new apps
Business
risk • Reduce risk of failure when deploying changes
• Enable compliance
70% fewer bad changes
21