Many IT professionals are familiar with the popular metrics and measures of IT operational success. Such metrics as Customer Satisfaction, Average Handle Time and First Contact Resolution are typically memorized by service desk managers and stored for quick reference during planning and other types of meetings.
But how do we measure the effectiveness of the processes that support those teams? Sure, we can easily tell how many incidents are being resolved at the first contact, but how do we measure the incident process impact on our ability to achieve this?
Before we get started: keep in mind that KPIs only indicate your organization's ability to support that person being their best. It's not a reflection of the individual. If a metric is slipping it's because we haven't supported them with enough training, talent development, organizational understanding or maybe they are just in the wrong role. Metrics help us change to meet the employee's needs as much as the organization's, if it's a healthy and supporting culture.
Also consider what decisions you will make with these measurements. This will help you understand why you are measuring them in the first place, as well as identifying which KPIs you will value.
Metrics for process managers
Process managers are the talented people tasked with process ownership, continual improvement and stewardship of work participating in the process.
I'm a bit radical, so I've been recommending surveying process participants for a simple 1-5 or 1-10 PSAT (Process Satisfaction) score, and use low scores as indicators where user research, usability and process improvement opportunities exist.
I would also recommend working with your process managers to brainstorm metrics. It's pretty common to have annual reviews and annual goals set for employees - do you do these? Do the metrics align to those goals?
First, assess your goal alignment.
What are the goals of your process team altogether? Do they align to the organization's goals? It's likely that these processes are meant to make regular impacts on Continual Service Improvement (CSI). So using Service Levels (SLAs) and CSI impact (usually financially or operationally measured) as a basis for their contribution thereto can shed some light on the health, flexibility and impact of these processes.
Assess the teams these processes serve.
For instance, your Service Desk team might be measured on Resolution Time, First Contact Resolution, Escalation Volume, Response time etc. Collecting and reviewing these metrics will be very insightful as they relate directly to process efficiency. The problem execution team (not process oversight) probably has less goals to meet.
Some examples of problem metrics include known error record volume, time to known workaround, number of resolved problems and sometimes there is a weight added to problems with more incidents - or if you're super-mature this is tied to financial impact on the organization.
The processes that support these metrics are directly related thereto.
Your governance team metrics will logically be one step above these goals. Some quantitative examples might include volume of process reviews, volume of process objective reviews, process changes per month (count of strategic initiatives), and their process's contribution to the department (and organization) goals.
Some other qualitative metrics that are more common for Incident management or Problem management include the process's service improvement (strategic initiative impact), the number of new services supported, process documentation ratings, documentation update volume, training volume, process codification (how it is implemented/supported), and certainly for these two processes you'll want to measure the amount of significant events and drive this number down. That last one is basically process effectiveness.
Measure the amount of transactions that "fall through the cracks".
These would include things like Incidents that miss SLA (as this is an indication the process doesn't support the timely resolution of incidents - OR - that the SLAs need to be reviewed for this type of Incident).
On the problem side of operations, things that slip through the cracks are problems without progress. Lack of diagnostic effort, the number of problems without a workaround or documented known-error as well as driving down the ratio of incidents affected by problems. This last one is an overall indication of long-term impact of problem management.
Between the two processes, problem can be measured by the number of incidents per known error. And vice versa; a good incident process will have a known error associated to most incidents that are older than a week.
And this is just a starting point, once you get into measuring these items, you'll start to see how some work better than others or new ways to measure them more accurately.