How To Identify Network Issues Hop-By-Hop
First, let’s define the problem. As networks get bigger they get more complicated. A pretty straightforward statement, but why is that? Well, let’s use the road analogy. Let’s say you have a road through town, and like many small towns, right in the middle of it is an intersection. No problem, add a stoplight. But then the town grows, more roads are built, and another intersection is put in, with another stoplight. And then another, and another, and so on. Very rarely are they planned for growth, and even if there is a plan, they usually have to change.
Every time a new intersection is put in with a stoplight, the city’s roads get more complicated, and harder to navigate and manage. And when accidents happen, and the streets slow down, it is really hard to know why and fix it. This is exactly what happens with networks, except instead of a stoplight at each intersection, there is a router or a switch. And every time a new segment of the network is added, a new switch or router is added, and the network gets more complicated. And by definition, every time a router or switch is added to the network, creating a new segment, it takes longer for a packet to travel across that segment, making the network slower.
A Brief History of Network Devices
But it gets worse. Back in the day, well actually just a few years ago, the function of network devices like switches, routers, and firewalls were pretty well-defined. This is because they were built using ASICs and FPGAs which are pretty well-defined. But these devices have come a long way, and instead of being well-defined, they are nowadays more and more software-defined. And being such, the algorithms in the firmware that decide how, when, why, and where to route packets hop-by-hop can be completely changed without having to change the hardware. And this makes it interesting and less true to limit the distinction in where latency or other problems may be occurring or caused by “the network” or “the application” because in this day and age, with software-defined networking (aka SDN), the whole network is an application.
But that’s the bad news. The good news is that unlike the stoplights at every city intersection, every network intersection with a router or a switch is an opportunity to identify network issues hop-by-hop. But what is a hop? First of all, it should not be confused with an IHop, even though there is an IHop at most major intersections, and the latency through an IHop can be very high. In the world of networks, a hop is the act of a packet passing through a router or switch, and the number of hops is the number of routers and switches that the packet passed through along the way to the destination. And it can be extremely dynamic for many reasons (remember SDN), including the fact that the router and switches can send packets on different routes depending on QoS settings, which can either increase or decrease the performance of the network for particular protocols or groups of people.
How to Identify Network Issues
But there are ways to identify network issues hop-by-hop. The most basic is from a command line using tracerte on an IP address or domain name. Tracerte will show you the path through the routers and switches as each hop that the packets will take along the way, and how long each hop took. This is an incredibly useful network troubleshooting tool, allowing the NetOps team to understand the hop-by-hop behavior of the network. But there are severe limitations with it. First, it is not the real traffic. It is something you are doing alongside, so the tracerte packets might not go where the real application packets might have gone, or have been prioritized in the same way, and thus the results might be different. Also, for security reasons, and good ones, tracerte is not always enabled on switches and routers, or allowed through firewalls.
Another (uhum better) way to identify network issues hop-by-hop is through the use of NetFlow and IPFix that are available on many routers and switches. NetFlow is a Cisco-defined protocol that provides details about the flows of packets through a router or switch. IPFix is a public standard version of NetFlow. More details about NetFlow and IPFix are beyond the scope of this writing, but are riveting topics, and should be better understood in order to get the most out of them. But I digress, as the main point here is that there are solutions available that can collect NetFlow and IPFix from large numbers of routers and switches across huge networks. These solutions visually display the hop-by-hop routes that data packets are taking.
Network Monitoring and Proactive Troubleshooting Solutions
More advanced solutions can show the hop-by-hop paths for different applications over time. And even more, advanced solutions can proactively identify hop-by-hop issues and generate alerts. This is where the game changes from troubleshooting a reported problem, to monitoring the network before a problem is reported.
Lastly, for those segments of the network that do not have NetFlow and/or IPFix capabilities, there are solutions that can tap into the network or provide the packets through a span port that can generate the IPFix all by itself and send it to the IPFix collector. These advanced packet capture and IPFix analysis solutions can also extend IPFix with quality metrics that most routers and switches do not provide. And more than that, these packet capture and analysis solutions capture the packets and thus can be used to drill-down to the packets and perform hop-by-hop analysis in much more detail.
But wait, there’s more. NetFlow and IPFix from routers and switches can also be used in combination with packet capture and analysis appliances to drill down from the router and switch generated flow, to the packets captured on a different appliance. This way, one packet capture appliance can be used for many different routers and switches.
These various increasingly advanced ways to identify hop-by-hop analysis can be very helpful in monitoring and troubleshooting hop-by-hop, as well as many other network behaviors. Of course, the price goes up as the chosen method becomes more advanced and automated. Even if you choose an open-source solution, it has to be deployed, managed, maintained, updated, etc. And each of the methods has its place and time of need, which is why it is important to understand how to use all of them when necessary.
Take the first step
Clearly, there is a range of options for identifying network issues hop-by-hop, and choosing the right solution depends on many factors like architecture, scale, budget, expertise, timing, and others. It is a journey, and the first step in finding the right solution for your organization is to talk with an experienced team of engineers who have been through this many times before. At LiveAction, our team of engineers are experts at helping you navigate these options. We look forward to talking with you.
By: Chris Bloom, Lead Technical Engineer