Problem management, done well, can be highly beneficial to IT departments and the employees and businesses they serve. It can increase first-time fix rates, reduce incident volumes and related costs, and make an improvement to the service experience, employee experience, and/or customer experience (CX) – depending on the business stakeholders involved.
But what happened to problem management when the ITIL 4 was released? In particular, the detailed guidance in the Problem Management ITIL 4 Practice Guide. If you want to know, then this blog is for you.
The problem management basics and ITIL 4
ITIL 4 defines the key purpose of problem management as being “to reduce the likelihood and impact of incidents by identifying actual and potential causes of incidents and managing workarounds and known errors.”
It’s not a massive change from the previous ITIL iteration but it’s important to note that the updated definition now includes a focus on identifying actual and potential root causes as well as on workarounds. Both weren’t given enough attention in previous versions of ITIL, and focusing on them now demonstrates a shift in focus.
Calling out the difference between incidents and problems
Something that has caused confusion in the IT service management (ITSM) world since the dawn of time – OK, since the first version of ITIL – is the difference between incidents and problems. Let's be real here. All IT issues are problems to the end user. After all, how often do your end users call the IT service desk and use the words “I’d like to log an incident about X”? But incident management and problem management are, and need to be, different things in ITIL terms.
The ITIL 4 version of problem management attempts to address this confusion by calling out the differences at the beginning of the chapter in the ITIL 4 Foundation Edition publication. It explains that:
- Incidents are break-fix pieces of work, that cause a negative impact on our people and as such need to be resolved so that normal work can continue.
- Problems cause incidents. They need to be analyzed and investigated so that workarounds and resolutions can be identified which will in turn reduce the number and impact of future incidents.
Key change #1: ITIL 4 brings back control
Problem and error control that is. One of the key changes in the new version of ITIL is the return of problem and error control. Both problem and error control were in earlier versions of ITIL but were not included in ITIL v3/2011.
What is problem control we hear you ask? Put simply, problem control is the set of activities around problem analysis and documenting workarounds and known errors (more on these later). In the previous version of ITIL, it was possible to get lost in problem management. Lots of people had the mindset of “OK, we’ve logged the problem record and stuck it in a report for visibility – now what?”
Problem control now gives that next step and structure to the analysis process including:
- How to prioritize problems. So that if there’s a backlog of work, the issues causing the most pain can be identified and worked on first.
- Understanding complexity. We know that incidents can have more than one cause and problem control encourages people to consider all factors, including things that impacted the incident severity and duration as well as those that led to the incidents happening.
- The importance of workarounds. Not every problem can be fixed permanently. Maybe the cost is too high (and therefore prohibitive), or the benefits don’t justify the time and effort needed. A workaround is something that can get the customer working again, reducing the pain and adverse business impact.
Error control is the part of problem management that deals with known errors. Error control activities include:
- Making sure that any problems are flagged as known errors once they’ve been analyzed, and the root cause or faulty component has been identified.
- Evaluating the effectiveness of workarounds and identifying opportunities for improvement.
- Suggesting permanent resolutions and supporting the decision to progress based on tangible costs, benefits, and risk.
Key change #2: Effective modeling
In previous ITIL problem management processes, investigating problems tended to focus on two things: hardware and software. The IT landscape is changing, and the problem management practice takes this into account with problem modeling.
As well as the hardware and software, ITIL 4 suggests researching documentation, third-party components, standard data, highly-sensitive data, consumer resources, and highly-regulated services and systems. By prompting people to focus on these other aspects of the service, modeling gives people the space to manage the issue more appropriately. For example, if you think that a problem is caused by a hardware fault, the first thing you do is swap out that component. But without certainty, we could not only fail to fix the problem, the action could also make things worse.
By leaving room for the possibility that the fault could be caused by the ways of working (outdated working practices or poor documentation), technical debt, or even a compatibility issue with a supplier system, it makes people focus on the overall problem rather than getting pulled into a silo and thus making it more likely that we’ll fix it properly.
So, that’s our take on two of the key changes in the new ITIL 4 problem management practice. What would you add to these? Please let us know in the comments.