5. Appendix II – Guideline for IT Processes
5.3. Problem Management Process
5.3.1. Introduction to the Process
Other to the incidents that are caused by the user lack of training or knowledge (which should be identified through trend analysis activities by the Service Desk Function and provided to management by a report), there are incidents that are caused by a common cause or error in the infrastructure.
Problem Management Process is concerned with identifying and solving the root cause of these incidents that are impacting the work activities of a certain group or sometimes the entire organization.
In addressing this situation, Problem Management Process relies on two aspects: A reactive aspect and a proactive aspect. The reactive aspects focuses on incidents escalated through the Incident Management Process for major incidents that could not be
solved through the Incident Management Process. The major incidents usually will have the criteria of impacting a number of users or carry an adverse impact on the work activities beyond a single user. Proactive Problem Management works to control the occurrence of incidents before they impact the organization and work activities.
5.3.2. Benefits of Problem Management Process Implementation
Adopting a Problem Management Process will give the entire organization an extended maturity level that Incident Management Process will have high difficulty to reach. Similar to Incident Management Process, the benefits of Problem Management Process can be seen at both sides of the organization (Users of the IT Services and Providers of the IT Services (IT Department)). Major benefits achieved include:
§ Better perception of the IT as a service provider due to ability to control the number of incident volumes either by reactive activities or the proactive ones
§ Release pressure on the Service Desk and hence can focus more on the image of IT and not be consumed with fire-fighting and solving incidents
§ Provide other processes with permanent solutions (not work-arounds) and hence, the incidents will stop of recurring
§ Contribute to better learning of the IT staff through building the knowledge and providing access to tools such as Known Error and Problem Database and Knowledge Bases
5.3.3. General Process Activities
Problem Management Process has three sets of activities within its scope of work. These three sets are classified in relation to the reactive mode and proactive mode. The reactive mode consists of two of these sets, while the proactive mode has one set of activities.
Within the reactive mode and when incidents are escalated to Problem Management Process, the process will aim at identifying if the problem reported is a known-error. If it is an unknown-error, then what is called Problem Control activities take over and the Problem Management Process will work to analyze the problem and identify the root cause of it until it becomes a known-error. A known-error will be classified as problem that can be solved either through a work-around or a permanent solution and the Error Control activities will take over. Hence, distinctively, Reactive mode constitutes of: Problem Control and Error Control.
Proactive Problem Management activities are one set that focus on solving or addressing the problems before incidents occur.
Below tables provides a general description of Problem Control, Error Control, and Proactive Problem Management activities.
Table 11 – Problem Control Activities
Activity Name Main Activity Tasks / Actions Problem Identification
and Recording
(In addition to Problem Management Process, this step can be performed by
processes like Capacity Management,
Availability
Management, and of course Incident Management)
Analyze the escalated incident:
§ Query Known-Error database to find a match for the reported problem, if not successful
§ Query the Problem database to find a match for the reported problem, if not successful
§ Open a new problem record and record the details based on the incident description and related support activities conducted thus far
§ allocate the problem to problem management team
Problem Classification Similar steps to Incident classification and covers aspects and information related to:
§ categorization
§ Prioritization (Identifying impact and urgency) Problem Investigation
and Diagnosis
Similar steps to incident investigation and covers aspects and information related to assessing the details and Accuracy of information reported in problem identification and recording and classification
The output at this stage will be when the root-cause of the problem is identified and either a work-around or a permanent solution is ready to be implemented.
Figure 13 – Problem Control Activities 9
9 CHAPTER 6: PROBLEM MANAGEMENT, Book for Service Support(Published 2000), page 101 © TSO 2005, All rights reserved
Table 12 – Error Control Activities
Activity Name Main Activity Tasks / Actions Error Identification and
Recording
§ Known Error is confirmed to the status of the identified problem root-case infrastructure component
Error Assessment § Conduct initial assessment on how to resolve the known-error
§ Conduct Error Resolution Impact Analysis with the help of specialist groups
§ Identify if the error resolution requires a Request for Change to be raised (RFC)
Error Resolution Recording
§ Record error resolution activities in the Known-error database
Error Closure Upon implementation of the error resolution activities, the following must be updated:
§ Known-error record in Known-error Database
§ Problem Record in Problem Database
§ Related incidents in the Incident Management tool
Figure 14 – Error Control Activities 10
10 CHAPTER 6: PROBLEM MANAGEMENT, Book for Service Support(Published 2000), page 106 © TSO 2005, All rights reserved
Table 13 – Proactive Problem Management Activities
Activity Name Main Activity Tasks / Actions
Trend Analysis Procedures to conduct continuous activities in relation to:
§ analysis of incidents and problem records
§ Analysis of incidents and problem categorization
§ Analysis of reports generated by Systems / Network Management tools
§ Vendor literature
§ Attending conferences
§ Analyzing customer surveys
§ The internet, and looking for user groups inputs Targeting Preventive
Action
§ Introduce necessary metrics and Key Performance Indicators to measure areas related to the volume of incidents, impact on work activities, duration required to solve these incidents, involvement of vendors or third-parties in solving these incidents
§ Define Reporting procedures and roles to address outcomes of analyzing the metrics
§ Produce weekly reports to capture these metrics
§ Take corrective actions and act upon the reports outcome
5.3.4. Relationship with Other Processes
Problem Management Process has main interactions with other ITIL processes such as:
§ Incident Management Process
§ Change Management Process
§ Configuration Management Process
§ Service Level Management Process (Including Capacity and Availability)
Below tables provide a summary of the required interactions and relationships:
Table 14 – Problem Management Relationship with Other Processes
Process Name Relationship with Problem Management Process
Incident Management (As described in Table 10 above)
Process Name Relationship with Problem Management Process
Change Management § For solutions identified for problems or known errors, Change Management process must be followed through the issuance of RFC and obtaining the approval of CAB
§ Change Management should coordinate with Problem Management in RFC issuance and assessment in relation to problem work arounds and permanent solutions identified.
§ Change Management providing full support to Problem Management on Major Incidents Resolution
§ Change Management involvement of Problem Management in CAB for RFCs related to Problems. Change Management must include all this info in FSC
§ In relation to changes based on Problem Management identified workarounds and permanent solutions, Change Management and Problem Management need to agree on the rollback changes needed if the Changes are not successful
§ Change Management must advise Problem Management workers and specialists on the suitable approach and procedure to follow in handling changes related to Problems earlier identified
Configuration Management § Problem Management need to report to Configuration Management trend analysis results on CIs reviewed
§ As an outcome of trend analysis and if there is an identified unauthorized change, Problem Management need to report to Change Management and Configuration Management any inconsistency in CI state
§ For certain CIs (Based on classification), provide reports to Problem Management on the number of changes due to faults for certain periods. This will enable Problem Management to carry some trend analysis activities
Process Name Relationship with Problem Management Process
Service Level Management § Problem Management should inform SLM for OLAs breaches with Support Groups
§ Problem Management should inform SLM for needs in improving OLAs
§ Problem Management should provide SLM with historical trends and statistical data to help it analyze quality of service for all IT services provided by IT to its customers
§ Problem Management should inform SLM on Major Incidents to properly assess the impact on the business
§ SLM need to ensure that service requirements and updates on requirements are
communicated to Problem Management to reflect the required priority.
§ SLM need to ensure that Problem
Management’s inputs/recommendations on vendor services are communicated to them for adherence and implementation. This will not only help to improve their services but also prevent further incidents/problems
§ SLM should ensure that business priority is correctly reflected by the prioritization and category classification process used by Problem Management
§ SLM should provide inputs during problem review meetings from SLM perspective to enable Problem Management improve its process.
§ Based on Service Level targets defined in SLAs, Availability and Capacity Management
Processes should be proactively involved in identifying problems and advising Problem Management Process accordingly