Network monitoring gives IT teams insight into network health and performance so they can take corrective steps to improve the functionality of their network.
Network troubleshooting is the systematic process of searching for, diagnosing, and correcting network issues. Most critical to troubleshooting efforts is the adherence to a rigorous and repeatable process that relies on using standard and measurable testing methods so that changes to the network can be systematically understood. Comparatively, hacking at a network issue solution by randomly changing settings and configurations is often counterproductive, even damaging in large networks, and is highly ill-advised. Further, troubleshooting efforts benefit from documenting remedies so that if adjustments produce unforeseen consequences the networks can be dialed back to a previous state or analyzed for a deeper understanding of network dynamics and trouble areas.
Generally, to efficiently troubleshoot for both professionals and enthusiasts it is best to start with the simplest remedies that cost the least, are most probable, and have the fewest potential negative consequences. For instance, troubleshooting a computer that is having connectivity issues, a network manager may follow a set of progressively involved solutions, for example:
- Check the hardware connectivity. Are all physical connections in good working order?
- Use the command line and ipconfig to renew the computer’s IP address.
- Use the command line, ping and traceroute to test the connection route.
- Use the command line and perform a DNS check using nslookup to test if the requested server has issues.
- Contact the ISP to determine if there are outages in the area.
- Scan the computer for viruses and malware that may be hijacking network resources.
- At this point, a deeper investigation may be necessary, so review network logs, database logs, etc. for abnormalities.
Notice that each troubleshooting step increases in scope and becomes more involved. But, diagnosis in later steps address less probable network issues—it is more likely that a connectivity issue is local rather than caused by an ISP wide outage, even though it might be easier to call the ISP rather than go through the previous four troubleshooting steps, it is advisable to start with the simplest most likely cause.
Network Troubleshooting Steps
The principle behind troubleshooting is rooted in practicality—rule-out likely causes while ruling-in causes that are characteristic of the network issue symptoms. To this effect, the general approach then is the scientific method—collect information about the issue, develop a hypothesis to the cause, test it, analyze the results, and then determine the next steps. The following six-step troubleshooting methodology recommended by CompTIA can be applied to any troubleshooting issue regardless of scope. (It is recommended to back up the system before undergoing changes.)
1. Identify the Problem
Begin by cataloging the symptoms of the problem but be aware that the symptoms are not the problem. Further investigate by interviewing the user that witnessed the issue, asking them to recreate the issue if possible. Ask, what has changed? What issues are characterized by these symptoms?
2. Establish a Theory of Probable Cause
At this point, symptoms have been identified, and the probable causes to those symptoms listed. Prioritize these potential causes from simplest and most likely, similarly to the example in the beginning section above.
3. Test Probable Cause Theory to Determine Actual Cause
Now, appropriately test each problem, progressing through the simplest to the least likely. This may seem an unnecessary step when considering the next step, “making an action plan”; however, because many common troubleshooting problems are simple, like plugging in a cable, the list of causes and the plan to fix it seem to be one step. In more complex issues, like WAN outages, there can be several issues requiring a rigorous testing phase.
4. Establish an Action Plan and Execute the Plan
Supported by diagnostic testing, a plan for fixing the network issue must be formulated. Clearly, the more complex the problem, the more in-depth the plan, sometimes the problem extends beyond a network manager’s domain and needs to be elevated to a higher level.
5. Verify Full System Functionality
An essential step is verification, which creates the feedback needed in order to eliminate possible causes. A baseline can be used to compare if the network is truly functioning normally. If the system is not functional, begin with the next probable cause on the list.
6. Document the Process
Documentation can be done during the testing and troubleshoot process, or it can be done after. In either case, documenting findings, actions taken, and outcomes create a history that can be returned to if further issues arise. Analyzing solutions to issues over time may also reveal patterns within the network setup and configuration that preventative measures can eliminate.
Clearly, the approach is simple, however, based on the network area in trouble and the nature of the issue, specialized technical knowledge is a prerequisite to determine specifically how and what could possibly go wrong.
Common Network Troubleshooting Problems
Networks small and large will exhibit general types of problems, some basic, some very complex. Listed are categories of basic areas network issues.
- Cable Issues — Cable issues can manifest in several ways, from a simple disconnected cable to frayed or damaged lines. Checking connections is usually the first step in troubleshooting before moving onto more involved causes.
- Connectivity Issues — Connectivity issues can arise if network devices are misconfigured, damaged, or faulty. Testing the network interface and port on the local device is a good first step before moving onto more complex solutions.
- IP and Configuration Issues — Incorrect network configuration settings can extend beyond the local computer, such as routing, can prevent connections or reduce network performance. This problem must be diagnosed thoroughly, changing network configuration settings can have tremendous repercussions throughout the network.
- Software Issues — Software issues and network performance may suffer after software updates; incompatible versions could be a source of network interruptions.
- Traffic Overload Issues — Traffic overload is a real-time network issue when bandwidth is overused, and network devices cannot keep traffic flowing. Unlike discovering a disconnected wire and easily reconnecting it, managing traffic over a congested network requires a deeper technical understanding to troubleshoot properly.
NetFlow is functionality standardized in network devices that gathers flow measurements and exports them to another system for analysis. An analysis of this flow data informs network managers of how the network is performing and other usage details. For instance, flow analysis can help to troubleshoot efforts by tracking IPs and highlighting anomalies like excessive traffic use.
Network forensics is the process of capture, recording, and analysis of network packets to determine the source of network security attacks. It involves identifying an issue, collecting and analyzing data, deciding on the best troubleshooting response, and implementing it.