Major Incident Management: Process, Roles, And a Practical Runbook

hero image
Join IT Pulse

Receive the latest news of the IT world once per week.

Major Incident Management (MIM) focuses on incidents that have a business-wide impact and demand an immediate, coordinated response. These incidents put critical services at risk, create urgent pressure to restore operations, and can quickly affect revenue, compliance, or reputation if they drag on.

Not every high-priority ticket qualifies as a major incident. A high-priority incident might be urgent for one team or user. A major incident goes further: it disrupts core services, affects many users or customers, escalates quickly, and requires leadership visibility and cross-team coordination to regain control.

In this article, we’ll explain how Major Incident Management works in practice, when to trigger it, and how to use ITSM tools to help teams respond faster when the impact is too big to handle as business as usual.

How to Define Incident Severity Levels For Your Service Desk
Video thumbnail

What makes an incident “major”?

A major incident is defined by impact and urgency, not just priority. The moment an incident threatens core business operations, it stops being handled as routine work and requires a different level of response.

Use the following checklist to decide when to move from regular Incident Management and trigger Major Incident Management.

An incident is considered major when one or more of these conditions apply:

  • Wide impact. The issue affects a large number of users, customers, or locations at the same time, rather than a single team or individual.
  • Critical services involved. Core systems such as email, authentication, ERP, customer-facing platforms, or payment services are unavailable or severely degraded.
  • High urgency to restore service. Delays quickly escalate business risk. Workarounds are limited or nonexistent, and normal response times are not acceptable.
  • Business or financial risk. The incident blocks revenue-generating activities, interrupts operations, or exposes the organization to contractual or regulatory issues.
  • Reputational impact. Customers, partners, or the public are aware of the disruption, or the issue is likely to reach them if not resolved fast.
  • Cross-team dependency. Resolution requires coordination across multiple teams, vendors, or support tiers, often under time pressure.

If several of these criteria are met, the incident should be treated as major, even if the root cause is still unclear.

Typical examples of major incidents

  • A company-wide email or identity service outage preventing employees from working.
  • A production system failure affecting customers during business hours.
  • A network outage impacting multiple sites or regions.
  • A security incident forcing critical systems offline.
  • A failed deployment that breaks a core business application.

The key signal is simple: when the impact spreads beyond a single team and time becomes a business risk, you are no longer dealing with a standard high-priority incident.

Roles and responsibilities during a major incident

A successful Major Incident Management process depends on clearly defined roles. Everyone involved needs to know what’s expected from them — especially when time is critical and the pressure is on.

Here are the main roles and responsibilities typically involved in IT Major Incident Management:

  • Major incident manager – Leads the response effort, coordinates teams, and acts as the central point of contact.
  • IT support teams – Work on diagnosing and resolving the issue, based on their area of expertise (infrastructure, networking, applications, etc.).
  • Service desk – Logs the incident, communicates with end users, and escalates as needed.
  • Communications lead – Ensures consistent, timely updates to all stakeholders, including business leaders, customers, and internal teams.
  • Change manager (when applicable) – Coordinates any emergency changes that need to be deployed to resolve the issue.
  • Business stakeholders – Provide business context, assess impact, and help prioritize efforts if there are competing risks.

A Major Incident Management process you can follow

A solid Major Incident Management process needs to be fast, structured, and clear. In high-pressure situations, improvising is not an option — everyone needs to know exactly what to do and when. Here are the five essential steps.

Step 1: Detect and classify the incident

Detection usually comes from monitoring tools, alerts, or user reports. Classification is the real decision point.

At this stage, teams evaluate:

  • Services affected.
  • Number of users or customers impacted.
  • Urgency and business exposure.
  • Potential reputational or compliance risk.

The objective is to decide whether the incident meets major incident criteria, not to confirm the root cause.

Early communication is part of this first step. When users lack information, they open duplicate tickets, escalate through informal channels, or try risky workarounds, all of which slow down recovery. Even partial updates help set expectations by confirming that an incident is in progress, clarifying which services are affected or under investigation, and stating that teams are actively working on containment or restoration.

Step 2: Coordinate

Once classified, the incident must be escalated to the right teams — including technical experts, business stakeholders, and the service desk. According to the ITIL framework, this step should follow a predefined escalation path.

Key elements of this step include:

  • Assigning a major incident manager.
  • Bringing in technical teams and business stakeholders.
  • Opening live communication channels.
  • Establishing a clear decision authority.

The major incident manager coordinates work and communication. They do not troubleshoot directly, but keep teams aligned and focused on shared priorities.

Step 3: Respond and contain the impact

The goal here is to stabilize the situation and limit further damage, not fixing the underlying issue.

Typical containment actions include:

  • Isolating affected systems or components.
  • Disabling failing integrations or features.
  • Rolling back recent changes.
  • Switching to backups or failover environments.

These actions may be temporary. Their purpose is to stabilize services and prevent the incident from escalating while the investigation continues.

Clear updates during this phase help reduce tension and keep teams aligned on the immediate goal: stopping further impact.

Step 4: Resolve and recover

With the incident contained, teams can work toward a permanent resolution.

This phase usually involves:

  • Identifying and fixing the root cause.
  • Restoring services to normal operation.
  • Validating performance, access, and dependencies.
  • Confirming recovery with affected stakeholders.

Documentation happens here as well, capturing timelines, actions, and decisions while details are still fresh.

Step 5: Review and improve

Once everything is up and running, teams conduct a post-incident review. The goal is to analyze what went wrong, what went right, and what can be done better next time.

Make it a safe space. Reviews shouldn't be about blame — they should focus on facts, root causes, and improvement opportunities. Use them to refine Major Incident Management roles and responsibilities, playbooks, and communication protocols.

Communication templates you can reuse

Clear, consistent communication reduces uncertainty and keeps users aligned with the response. These templates are meant to be brief, factual, and easy to adapt during a major incident.

  • Initial update: Use this message as soon as the incident is classified as major.

We’re currently investigating an incident affecting [service/system].
Some users may experience [brief impact].
Our teams are actively working to contain the issue.
We’ll share another update by [time] or sooner if there’s a change.

  • Ongoing update: Use this while the incident is still active and under investigation.

The incident affecting [service/system] is still in progress.
Impact remains limited to [users/areas], and no additional services are affected at this time.
Teams continue working on containment and recovery.
The next update will be shared by [time].

  • Resolution notice: Send this once services are fully restored and validated.

The incident affecting [service/system] has been resolved.
Services were restored at [time], and normal operation has resumed.
We’re reviewing the incident to identify follow-up actions and prevent recurrence.
Thanks for your patience.

How to manage major incidents with InvGate Service Management

InvGate Service Management supports Major Incident Management by giving teams structure without slowing them down. The idea is to guide response through workflow automation, maintain visibility while the incident is active, and capture everything needed for review afterward with analytics and reporting.

Here’s how it helps your team stay in control when it matters most:

1- Creating a major incident in InvGate Service Management

In InvGate Service Management, a major incident is not an escalation of an existing incident. It is a separate request type, created explicitly to manage high-impact, widespread disruptions.

Only technicians (agents, managers, and administrators) will be able to create major incidents. Access can be further restricted to a certain group of agents using visibility rules. End users are excluded because they don’t have the visibility or authority to determine whether a disruption qualifies as a major incident.

Keep in mind:

  • A standard incident cannot be converted into a major incident.
  • A major incident must be created as a new request from the start.
  • Existing incidents can later be related to the major incident.

Major incidents act as a coordination layer for mass events. Once created, you can link multiple incident requests to the major incident, centralizing communication, tracking, and resolution.

This approach allows teams to:

  • Manage the broader disruption in one place.
  • Keep individual incidents visible for affected users
  • Avoid repeating the same updates or resolution steps.

When the major incident is resolved, the solution can be propagated automatically to all related incidents, applying the same resolution comment and moving them to customer confirmation.

2- Building a major incident workflow

major-incident-request-workflow

A dedicated workflow ensures that major incidents follow a controlled path instead of being handled like standard tickets.

Using the no-code workflow builder, you can define how major incidents move across stages, roles, and actions once they are created.

For a typical major incident workflow, you can include:

  • A structured start form capturing impact, affected services, and urgency.
  • A conditional step to validate risk and impact.
  • Optional approval for escalation confirmation.
  • Mandatory tasks for coordination, containment, and resolution.
  • Automated actions such as announcements, notifications, or reassignments.

3- AI features for Major Incident Management

InvGate Service Management applies artificial intelligence to help teams detect major incidents earlier and communicate more effectively during critical events.

AI-powered major incident detection

Major incidents often emerge from multiple related reports. AI continuously analyzes incoming incidents to identify patterns that suggest a broader issue.

When a potential major incident is detected:

  • Help desk managers receive a system notification and email
  • The suggested major incident includes AI-provided reasoning
  • Managers can create the major incident with prefilled data and linked requests

To enable this functionality, go to Settings → AI Hub → Proactive detection and activate Major Incident detection.

deteccion-incidentes-mayores-funcion-ia-invgate-service-management

Predictive risk and impact analysis

AI also supports classification by suggesting risk and impact levels based on historical data from similar cases. This helps teams assess business exposure faster and apply consistent criteria during escalation.

AI-generated announcement suggestions

Communication is another common failure point during major incidents. InvGate Service Management addresses this with automatic announcement suggestions.

When a major incident is created or updated, the system suggests and drafts an announcement. Agents and administrators can review, edit, and publish them immediately, to keep users informed and prevent a flood of duplicate tickets.

To enable this feature, go to Settings > AI Hub > Agent assistance and activate Suggestions for announcements associated with major incidents.

4- Post-incident review and continuous improvement

After resolving a major incident, the focus shifts to learning and preventing future disruptions. InvGate Service Management provides tools to make post-incident activities structured and actionable.

  • Analytics and reporting: Use built-in dashboards and reports to analyze timelines, escalation patterns, affected services, and team performance. These insights help identify bottlenecks and measure response effectiveness.
  • Problem Management: Link the major incident to problem records to investigate root causes, track recurring issues, and implement long-term fixes. This ensures that the same disruption doesn’t repeat.
  • Document post-incident learnings: Capture key decisions, communication effectiveness, and lessons learned in a structured format. Store this documentation for audits, future reference, and continuous process improvement.

Major incidents are easier to manage when your platform centralizes detection, escalation, and communication in one place. Start a free trial of InvGate Service Management today and see how your team can respond faster, reduce noise, and keep users informed during critical disruptions.

 

Check out InvGate as your ITSM and ITAM solution

30-day free trial - No credit card needed

Clear pricing

No surprises, no hidden fees — just clear, upfront pricing that fits your needs.

View Pricing

Easy migration

Our team ensures your transition to InvGate is fast, smooth, and hassle-free.

View Customer Experience