The IT service management (ITSM) process incident management is the process, or set of activities, that ensures all IT issues (termed “incidents” by ITIL, the ITSM best practice framework) are logged and progressed effectively and consistently through to resolution. All while ensuring that nothing is lost, ignored, or forgotten about.
ITIL defines incident management as the process responsible for managing the lifecycle of all incidents to ensure that normal service operation is restored as quickly as possible and that business impact is minimized. In other words, incident managers are the superheroes of the ITSM world, swooping in to save the day to get the business back up and running again when things go wrong. In this blog, we’ll look at how incident management can add value to your organization, along with some tips on how to make it work effectively in the real world.
The Incident Management Process
In a traditional corporate ITIL adoption (people tend not to call it an “implementation” due to ITIL being about people as well as process and technology), the incident management process contains the following steps:
- Initial diagnosis (incident matching)
- Investigation and diagnosis
- Resolution and recovery
- Ownership, monitoring, tracking and communications
Identification is the part of the process where we figure out that something’s wrong or isn’t performing as it should be. The ITIL textbook definition of an incident is the following:
- An unplanned interruption to an IT service or reduction in the quality of an IT service
- Failure of a configuration item (CI) that has not yet impacted service
When an incident is identified by IT personnel the onus is on that person to ensure that an incident ticket is raised and to work with service desk colleagues, and other support teams, to manage it through to resolution.
In the case of an end user or customer identifying an incident, the priority is for the service desk agent to get a ticket raised and prioritized, with the incident corrected as quickly as possible. In some instances, service desks will resolve the issue and then log the details in order to help the end user more quickly.
Key tip: separate incidents from service requests to allow for both better task prioritization and reporting on operational performance.
This is the part of the process where the incident is captured into an incident record or service desk ticket. One of the biggest issues that we see with incident logging is around overly-complicated forms within the service desk or ITSM tool. One of the golden rules of following any best practice methodology is that you should always make it easy for people to do things right. With that in mind, the incident form should be short and to the point, with easy to use drop-down menus and free text fields.
You can always add in more details as your incident management process matures, but when starting out with incident management, I recommend the focus to be on asking the most critical questions so that the fix effort can get under way as soon as possible. Some example questions include:
- What’s happening?
- What impact is this causing?
- Is anyone else affected?
- When did it start?
- Has anything changed on your device?
Key tip: remember that while logging incidents is important for operational management, knowledge management, and service improvement purposes – helping the end user get back to work quickly is the priority.
Categorization and Prioritization
Categorization and prioritization are the steps needed to ensure that the resolving team has the best chance of resolving the incident at the first point of contact. The first level of categorization should be really simple to make it as easy as possible for end users to log incidents, especially in a self-service environment. As an example, the first level of categorization could be something like the following:
You can always add more complexity to further levels, but by keeping the initial level simple, it’ll make it easier for end users and service desk analysts alike to log incidents with the correct category and assign them to the right resolution team.
Prioritization is the part of the process that helps the resolving team to manage their workload. When establishing incident priorities best practice suggests that you look at:
- Impact – the degree to which the provision of services is disrupted within the organization, and the effect the interruption has on other areas of the infrastructure.
- Urgency – the speed with which the incident must be resolved.
- Expected effort – the anticipated amount of energy, time, and cost required to be able to begin restoring services after the occurrence of an incident.
Effective incident prioritization is key to making sure that the right incidents get seen to and resolved first. If your ITSM tool has an inbuilt priority matrix use it, if not, have a set of standard questions or build your own matrix such that you can assign a sensible priority to each incident rather that going down the “if in doubt just tick the middle option” route.
Key tip: Avoid using high/medium/low terminology at the point of end-user engagement because to the individual logging their incident, it’s highly likely always to be of high urgency, and all you’ll end up with is a queue of high-priority incidents and a support team not knowing what to fix first.
Initial Diagnosis (Incident Matching)
Initial diagnosis is the part of the call where you decide whether the incident can either be fixed at the first line or needs to be escalated to other support personnel or teams. Initial diagnosis is like the triage stage in a hospital – if you’ve ever been unfortunate enough to go to the emergency department, the first person you see after booking in at the front desk is the triage nurse who assesses whether you can be patched up there and then or if additional treatment is needed.
In a service desk environment, the first analyst assesses the call to determine if they can fix it straight away over the phone or if they need to escalate it to second line support. Scripts, known error databases (KEDs), and knowledge bases can all help to improve first time fix rates. One easy way of improving fix rates at the service desk stage is to invite other support teams to your weekly service desk meeting to give their top tips for troubleshooting specific issues over the phone – bring the fix closer to the end user. The advantages of this approach are twofold – the service desk analysts are upskilled, empowered, and more engaged; and, if more incidents are fixed on the front line, second and third line support teams are free to focus on the harder stuff.
Key tip: having a high first contact resolution level is considered industry best practice, but be careful – metrics drive behaviors. So look out for the metric driving the wrong behavior, with the first-time fix seen as more important than wasting the end user’s time (as they wait for a suitable fix to be found).
So, we've covered incident identification, logging, categorization, prioritization, and the initial diagnosis (incident matching) here. Come back soon for part two of this blog post where we’ll look at escalations, recovery, and closure. In the meantime, feel free to explore our incident management capabilities [LINK].
What are your top tips for incident management? Please let us know in the comments.