The mean time to repair or MTTR is the average time required to solve a failed computer system. MTTR is a fundamental measurement of an organization’s computer and network infrastructure maintainability. Generally, an increase in MTTR means more time is required to diagnose and remedy network system issues.
Several reasons for this increase are possible, and it is advised to treat increasing MTTR as a troubleshooting issue in itself and field hypothetical questions based on evidence as to the real cause for the increase. For example, some report that 85% of MTTR is spent diagnosing problems, while others found that 36% of their daily efforts are spent reacting to troubleshooting tickets. These statistics suggest some underlying issues could be fixed or changed to alleviate inefficient troubleshooting efforts.
What is MTTR?
When calculating MTTR, not every issue is comparable. Categorize and calculate similar issues together to obtain more accurate MTTR measurements. For example, calculate response time to small tickets like network connectivity issues versus computer system setup issues.
Increasing MTTR can also signify deficiencies within an IT department to address IT issues. Is there enough manpower to adequately respond to troubleshooting load? Do team members have the capacity to solve the issues that continue to arise? Is the system sophisticated enough to assist troubleshooting efforts?
Common Troubleshooting Failure Metrics
Alongside MTTR, other failure metrics are useful for understanding meantime for troubleshooting efforts.
- Mean time between failures (MTBF) — The mean operational time between successive device failures, it can be calculated by marking the elapsed time between component failures during normal operations. MTBF can predict the reliability of systems and components.
- Mean time to failure (MTTF) — The mean functioning time expected of a device before failure, it is typically applied to replaceable system components like hard drives. Contrastingly, MTBF is applied to repairable and replaceable components.
- Mean time to detect (MTTD) — The mean time between the onset of a problem and its detection, this is the time before IT receives a troubleshooting ticket and subsequently when MTTR begins.
- Mean time to investigate (MTTI) — The mean time between the detection of a problem and when an investigation actually begins which is the time between MTTD and MTTR.
- Mean time to restore service (MTRS) — The mean elapsed time between detection of a problem until the system is available again. Differing from MTTR, MTRS continues the clock after the component has been repaired until it is actually restored to use.
- Mean time between system incidents (MTBSI) — The mean elapsed time between the detection of two consecutive issues. Calculated, MTBSI = MTBR + MTRS.
Network troubleshooting is the systematic process of searching for, diagnosing, and correcting network issues. Most critical to troubleshooting efforts is the adherence to a rigorous and repeatable process that relies on using standard and measurable testing methods so that changes to the network can be systematically understood.