Major incidents are, by their very nature, stressful and intense. The ITIL 4 definition of a major incident is:
“An incident with significant business impact, requiring an immediate coordinated resolution.”
High-stress situations can cause conflict that left unchecked could delay the fix effort. Since we already have a definitive guide on incident management, this blog post will focus specifically on the major incident management process.
3 musts for an effective major incident management process
1. Get it right first time
Start as you mean to go on. So much conflict can be avoided by being upfront with your team about what is happening, so it’s important to have the facts. Every major incident is different but here are some sensible things to ask, so that when you’re updating colleagues, customers, and senior stakeholders you can give them solid information:
- How is the issue presenting? What are end-users telling us and when was it first noticed?
- Are there any immediate health and safety concerns?
- Has a major incident been raised? What is the reference number?
- What service is affected?
- What is the business impact?
- What user base is affected? Is it a specific team or location, or is it everyone?
- Do we need to invoke disaster recovery?
- What support team is looking at it? Do we have the right people engaged?
- Do we need to make other support teams aware? What about third-party support?
- When did this start happening? Was there any change activity around the time?
- Is there a workaround?
- Do we know what’s causing the issue? If so, how long will it take to fix?
- Do we need to notify any onward customers?
- Are there any security, legal, or compliance issues that we need to raise?
- Is the service desk able to cope with the current volumes of related calls?
- What is a realistic time for promising an update?
2. Communicate effectively
Conflict is often caused by people not knowing all the facts or not being notified on time. For the major incident management process, great communication skills are a must. If you’re not sure you’ve engaged the correct support teams, take a few minutes to check that the correct people are working on the task.
Prime senior managers and service delivery managers, so if they have to deal with escalations and complaints, they have all the facts first. There’s nothing worse than being blindsided by an angry customer, so make sure anyone dealing with escalations knows what’s going on, the current status, and any plans in place to fix the issue.
3. Work the problem
A major source of conflict during major incidents can be when you’re attempting to investigate and fix the issue. Everyone’s stressed, there are multiple teams potentially involved, and people is pushing for updates. Therefore, it’s really important to coordinate the fix effort in the right way.
What typically works well is that everyone is on the same conference call chaired by a major incident manager. The chairing or running of the call is important because it means that someone is in charge of keeping the fix effort on track, and that everyone is treated fairly and is listened to. The tone to set is brisk, pacey, and kind.
As the call progresses, a number of things could happen:
- The team has confirmed the root cause and has a fix. At this stage it’s important to test the fix and engage with the change process so that it can be implemented quickly and safely.
- We have no idea what’s gone wrong and are feeling a bit panicky. At this point it’s important to calm things down by reminding everyone that it’ll be ok and we’ll figure it out. In this case, we’d look at who else needs to be involved and also check in with the business to give them an update.
- Someone is ranting and looking for blood. Thank them for their feedback and steer the call back to the fix effort.
Once the issue has been resolved, continue to engage stakeholders effectively by ensuring the correct change paperwork is raised, the service desk is updated, and customers are advised that service has been restored.
What to say during conflict management?
It’s often helpful to have a script to help you deescalate a situation. Thus, here are some things that we’ve seen work in the past:
Situation |
What To Say |
The service desk and support teams are feeling overwhelmed. The reality is that service desks will get flooded with calls during a big issue. The important thing to do is to listen to their concerns, and try and find ways to support them/reduce the pressure. It's also worth seeing if any other support team can take calls, ordering in lunch, and promising to review things regularly so that you’re looking after your people. |
“I know this is really, really hard. Here is what we are trying to do to make things better. Just keep doing the best you can and we’ll revisit the situation in the next hour.” |
The root cause hasn’t been identified and colleagues are concerned that things will get worse. |
“This is going to be absolutely fine. Let’s walk through this step-by-step, so we can get a better handle on things. Is there anyone else that we need to loop in so we have everything covered?” |
Senior management is pressing for more details and you don’t yet have all the facts. |
“The situation is under control. We’re just pulling together a timeline and you’ll have it in your inbox in X minutes.” |
A support team isn’t responding or trying to blame another area of the business. |
“We need everyone onboard with this. Let’s focus on X so we can get back on track.” |
A senior stakeholder is being curt or putting too much pressure on your support teams. |
“Thank you for your feedback but we need to focus on the fix effort. We’ll pick X up once the immediate issue has been resolved.” |
Closing the loop
A final source of conflict comes after the incident has been resolved. All too often, major incident review meetings become a source of blame avoidance as people scramble to avoid having the finger being pointed at them.
Instill a culture where major incident reviews are a safe space and not a witch hunt. All you really want is to understand what happened, what the root cause was, how it was fixed, and how to prevent it from happening again. Set expectations early and be prepared to jump in if people start apportioning blame.