Welcome back to part two of this problem management tips blog. You can catch up on the first five tips – related to scope, form design, environment knowledge, absence of a problem management or IT service management (ITSM) toolset, and workarounds – here.
Please read on for our next five problem management tips.
Focus on Delivering Permanent Resolutions
You hopefully know that many problems are repeat incidents. The repeat offenders that pop up time and time again. Perhaps the network is always slow on a Monday morning. Or overnight processing always over runs at month end. Or are there persistent email-performance issues?
In our last blog, we started with the use of workarounds to garner some quick wins, but now the hard part begins – building out your problem management process to concentrate on finding permanent resolutions.
This approach won't be quick, but it will deliver move value over time – because you’ll reduce the number of incidents and the amount of time spent firefighting, plus the business impact of repeat incidents will be minimized.
But remember – while permanent resolutions are great, there’ll still be times when workarounds will be the best, and perhaps only, solutions.
Proactive problem management is a capability that looks at problems that might otherwise be missed. It’s the analysis of incident records, plus the use of data collected by other ITSM processes, to identify trends or significant issues.
This can be done by:
- Trend analysis – reviewing previous incidents and looking for common or recurring themes
- Working with support teams and service delivery managers – asking them what keeps them awake at night
- Working with your customers – getting a heads up on any high-volume or business critical times so you can plan accordingly, be it by working with capacity management to look at performance, availability management to look at uptime, or change management to limit change volumes
- Raising changes to prevent incidents from recurring or better still, occurring in the first place; for example, maintenance reboots, monthly patching, capacity alerts, and network monitoring
- Preventing problems from affecting other areas and systems (proactively minimizing the effect of the related incident)
Balance is key though. If you spend too much time on reactive problem management, you’ll be constantly firefighting, stuck in a break-fix mindset and missing opportunities for continual service improvement (CSI). But if you focus too much on being proactive, the business-as-usual (BAU) issues might spiral out of control.
Ring in the Changes
Get more closely involved with change management. Firstly, if you’re not attending change advisory boards (CABs), then you should be.
As a problem manager you’ll be best placed to identify any trends or issues associated with potential change activity. And, on the other side of the coin, you may be involved with raising changes to resolve particular issues – so the change management process will be key in terms of delivering that change effectively and safely.
Still not convinced (about the need for a close relationship with change management)? Sometimes a change has to go live despite there being known issues with it. Sometimes it’s a business-critical product change. Sometimes it’s just too expensive to back out and it’s more cost effective to deal with issues as-and-when they occur.
If you’re faced with such situations, insist on raising a “known error” and sharing any workarounds or fix information with the IT service desk and onward support teams.
What’s a known error? We’re glad you asked – because so many people get confused by all the ITIL terminology. A known error is a type of problem where we’ve figured out the root cause and either have a workaround in place or a permanent fix is being planned. By documenting any known errors, especially with regards to planned changes, you give your service desk a head start in planning for any issues.
Create a KEDB
Speaking of known errors, it’s good practice to collate them in, and share them via a known error database (KEDB). This database is created and maintained by problem management personnel and is used by both incident and problem management processes.
When documenting known errors it’s important to capture the following details to help your service desk colleagues as much as possible:
- Nature of the issue
- Service(s) affected
- Common symptoms
- Most impacted business units
- What to avoid doing.
Having a list of known errors and workarounds in one central location will not only avoid duplication and rework, it can be used as a tool to upskill your service desk and will also act as a springboard for knowledge management. For instance, through using a KEDB you can increase efficiency by working with the service desk to develop helpful scripts for handling common calls, which will in turn help call handling and interim resolution times (until the problems are resolved).
Look at How Far You’ve Come
AKA bring on the metrics!
As with any business process, there’s a need to use the right set of metrics to understand operational performance and business results.
When creating a reporting pack/dashboard, it’s useful to start with the basics and then to build up more detail over time. Why? Because it’s all too easy to get carried away with reporting when the reality is: if you create pages upon pages of reports, you’ll generate significant work for the team that may not actually be needed.
So, when creating a report pack for problem management, start in a limited way. Perhaps using the following as a starting point:
- Management summary – are there any key trends? Are ticket volumes up on previous months? What services were affected?
- Number of problems opened and closed – to give you an idea of volumes
- Number of problems by customer – to understand which customers or business units are most adversely affected
- Number of problems by service – to get a handle on which applications and business services are being most affected
- Number of problems linked to known errors – preferably with a proven workaround
- Number of problems on hold with third-party suppliers – to understand the external reliance, if they are being progressed in a timely manner, and whether you need support from supplier management to escalate or get updates.
So, that’s the second five of our ten tips for problem management. What would you add? Please let us know in the comments.