Analyzing Linux Networking Issues

Note

Users are recommended to follow this section only in the case of intermittent packet drops or packet reordering. Please make sure to double check udp_dest settings at the beginning of this section, as the information provided is not useful if users are getting zero data.

In case the users are getting zero data and are unable to resolve the issue please contact our Field Application Team.

This section captures tools and procedures to troubleshoot networking issues for a system consisting of a PC/Workstation L2 Switch and one or more Ouster Sensors. Though examples use the Linux Operating System as a model, the material is equally relevant to debugging issues in the Windows environment. Where possible Windows command-line and UI analogs will be discussed in passing.

Debugging the Workstation Data Path

The workstation maintains a set of statistics associated with each layer in the network stack that can be used to diagnose packet loss. The correct way to approach a network stack problem is to start with the lowest layer in the stack first, examine the statistics for errors, and work your way up to the highest layer. The reason that we start with the lowest layer is that issues in the lowest layer can cause issues in other parts of the data-path.

IP Statistics

After the link layer the next layer up is IP. IP errors can be identified with the netstat tool:

netstat -s

This tool will output a lot of information, but in this document we will focus on only the IP section.

In this report you can see that there are a few different error categories, and you have to review carefully through all of the text to find them:

Let’s look at each class of error and consider it’s implications:

  • Packets received with invalid address means that they were sent to our MAC, but with an incorrect source IP.

  • Packets dropped because of missing route indicates that the packet was sent to the correct IP address but no client program was listening on the destination port.

  • Fragments dropped after timeout means that we received some data but subsequent data didn’t arrive in time to be processed.

  • Fragments reassemblies failed means that some data was missing due to an Ethernet frame being aborted by the stack or being lost in transit and the IP layer was not able to reassemble a complete datagram.

Debugging a Layer 3 Issue

The best way to debug issues in the IP layer is to find them in the link layer, because generally speaking layer-2 issues are caused by layer-3 bugs, but this is not always the case.

For instance, packets received with invalid address are probably indicative of stale ARP table entries or some other external network bug or temporal state that will most likely clear up on its own. This sort of problem is probably not worth debugging unless its persistent. Packets dropped because of missing route is more indicative of an issue at the application layer (the client or server simply wasn’t listening when the packets arrived).

If a problem is detectable by L3 and not by L2, then its most likely a problem in the NIC itself, and if the NIC isn’t providing a FIFO or DMA stat that explains it. One possibility is packet reordering by the NIC. This can be detected by modifying

/proc/sys/net/ipv4/ipfrag_max_dist

This kernel attribute determines the systems tolerance to receiving out-of-order IPv4 frames. Nominally L2 networks do not reorder packets, so you should be able to configure a value of 1 and not observe a change in behavior. However, if setting a low threshold exacerbates the issue, or setting a high value makes the problem less severe then the NIC is most likely to blame.

Useful network debugging tools

iPerf

iPerf is a useful tool when debugging the performance of a network. It can be used to quickly validate whether or not a system can handle a given throughput. It can be configured to output a stream of data in a variety of formats to mimic the expected load on the system during use. For more information refer to iPerf documentation.

How to use iPerf to debug sensor network issues

iPerf can be used to rule out sensor failures, and quickly reproduce errors that occur when the network is under a high-traffic load. iPerf must be used from two machines:

  • Server (receiving data)

  • Client (sending data)

Both the server and client will measure the number of packets sent/received, and report a percentage of packets lost.

Example usage of iPerf to test sender can send 300Mbps of UDP packets of 20KB to receiver:

Receiver arguments

  • --server : Required to indicate that this is the machine that will be RECEIVING data.

  • --port 5300 : Specify the port at which to listen for incoming data. Useful if testing with multiple sources simultaneously.

Sender arguments

  • --client 192.168.88.248 : The IP address to send data to. Must be the IP address or hostname of the receiver.

  • --port 5300 : The port to send data to. This must match the –port argument provided by the receiver.

  • --udp : Indicates that UDP traffic will be sent. If not supplied, TCP data will be sent.

  • --bitrate 300M : The rate in (in bits per second) to send data to the receiver. This can be used to simulate different amounts of network load.This supports a suffix such as K , M , or G to indicate Kbps, Mbps, or Gbps instead of bps.

  • --length 20K