MTTR for Network Troubleshooting

Contents

    The mean time to repair or MTTR is the average time required to solve a failed computer system. MTTR is a fundamental measurement of an organization’s computer and network infrastructure maintainability. Generally, an increase in MTTR means more time is required to diagnose and remedy network system issues.

    Several reasons for this increase are possible, and it is advised to treat increasing MTTR as a troubleshooting issue in itself and field hypothetical questions based on evidence as to the real cause for the increase. For example, some report that 85% of MTTR is spent diagnosing problems, while others found that 36% of their daily efforts are spent reacting to troubleshooting tickets. These statistics suggest some underlying issues could be fixed or changed to alleviate inefficient troubleshooting efforts.

    What is MTTR?

    When calculating MTTR, not every issue is comparable. Categorize and calculate similar issues together to obtain more accurate MTTR measurements. For example, calculate response time to small tickets like network connectivity issues versus computer system setup issues.

    Increasing MTTR can also signify deficiencies within an IT department to address IT issues. Is there enough manpower to adequately respond to troubleshooting load? Do team members have the capacity to solve the issues that continue to arise? Is the system sophisticated enough to assist troubleshooting efforts?

    Common Troubleshooting Failure Metrics

    Alongside MTTR, other failure metrics are useful for understanding meantime for troubleshooting efforts.

    • Mean time between failures (MTBF) — The mean operational time between successive device failures, it can be calculated by marking the elapsed time between component failures during normal operations. MTBF can predict the reliability of systems and components.
    • Mean time to failure (MTTF) — The mean functioning time expected of a device before failure, it is typically applied to replaceable system components like hard drives. Contrastingly, MTBF is applied to repairable and replaceable components.
    • Mean time to detect (MTTD) — The mean time between the onset of a problem and its detection, this is the time before IT receives a troubleshooting ticket and subsequently when MTTR begins.
    • Mean time to investigate (MTTI) — The mean time between the detection of a problem and when an investigation actually begins which is the time between MTTD and MTTR.
    • Mean time to restore service (MTRS) — The mean elapsed time between detection of a problem until the system is available again. Differing from MTTR, MTRS continues the clock after the component has been repaired until it is actually restored to use.
    • Mean time between system incidents (MTBSI) — The mean elapsed time between the detection of two consecutive issues. Calculated, MTBSI = MTBR + MTRS.

    Related Terms

    Network Troubleshooting

    Network troubleshooting is the systematic process of searching for, diagnosing, and correcting network issues. Most critical to troubleshooting efforts is the adherence to a rigorous and repeatable process that relies on using standard and measurable testing methods so that changes to the network can be systematically understood.

    Related Products

    LiveNX

    Network Performance
    Management Software

    LiveWire

    Extend Network
    Monitoring

    LiveCapture

    Packet Capture
    and Analysis

    Related Glossary Terms

    QoS, or quality of service, is key to ensuring the performance of critical applications on a network. Learn how QoS works and its benefits.

    A protocol analyzer is an essential tool for network operations. Protocol analyzers act as a vital intermediary between devices within a network, allowing administrators to gain valuable insights into the active communication between these devices.

    By encrypting “stolen” files and demanding a ransom payment for the decryption key, bad actors force organizations to pay a ransom because it is sometimes the easiest and most cost-effective way to regain access to the files.

    Encryption is a data security practice that converts normal, readable information into an unintelligible cypher. Once network traffic is encrypted, it can only be accessed by authorized users with a key, or by advanced encryption practices that can decode cyphertext. This process allows organizations to safely move confidential and sensitive information around without exposing it to bad actors.

    Packet analysis is a primary traceback technique in network forensics, which, providing that the packet details captured are sufficiently detailed, can play back even the entire network traffic for a particular point in time.

    Packet loss causes reduced throughput, diminished security, and other issues in your network. Learn about causes and effects and how you can mitigate its impact.