Finding session-related problems using EndaceVision

Network monitoring tends to focus heavily on bandwidth, addressing the question, “Do I have the capacity to carry the traffic that my business requires?” Capacity, however, must include session count and lifecycle, which are often overlooked until they become a problem. That’s why EndaceVisionTM 6.0 Network Visibility Software has added two new tools to deal with sessions: TCP Flags view and client/server breakdown.

High session counts can break services due to session limits in middle boxes such as firewalls and load balancers. Hitting the limit on your firewall could suddenly “break the network” by blocking new sessions, and load balancers that hit their session limits could prevent access to services.

It’s easy to measure load balancer behavior from the session perspective using the new TCP features in the EndaceVision user interface (UI) for the EndaceProbe 6.0 Intelligent Network Recorders (INRs). Start by capturing traffic on the server side of the load balancer, and then view the traffic breakdown by server IP. The resulting graph shows exactly the type and volume of traffic to each server, eliminating the guesswork as to whether the load balancer is working or not.

Middle boxes aren’t the only devices with maximum session limits. Servers often have hard limits on the number of supported sessions, whether due to configured capacity or licensing. When servers hit their limit, many of them will respond to a session request (TCP SYN) with a TCP RST instead of a SYN/ACK. This is a definitive response to the client that the session will not be established.

Spotting an overloaded server that’s not accepting new connections is straightforward with EndaceVision. The TCP Flags view graphs the count of SYN, RST, and other flags. For a healthy service, the SYN/ACK line should closely follow the SYN line. If instead the SYN/ACK line for a server is low but the RST line is high, then there is a strong implication that the service is not accepting new connections.

The TCP Flags view is useful for finding many different service issues. It shows the total session count, in addition to graphing and counting lifecycle info, including creation (SYN, SYN/ACK) and termination (FIN, RST) signals. A service that has abruptly shut down may drop all of its open connections, sending a large number of RST packets to the formerly connected clients. This shows up visually as a RST spike in the EndaceVision TCP Flags view.

A server that has gone down and just come back up may experience an additional problem. On a frequently accessed server, there may be a “thundering herd” as clients reconnect, which shows up as a spike in SYNs. In an encrypted protocol, the new session processing rate is much lower than the total session count because SSL/TLS starts with compute-intensive public/private key calculations. Fortunately, finding session establishment rates is easy with the EndaceProbe INR – just look at the SYN and SYN/ACK lines on the TCP Flags view.

A famous example of a session establishment rate problem is a SYN flood denial of service (DoS) attack. SYN flood can be found in EndaceVision by filtering in SYN and filtering out ACK, then investigating with other EndaceVision tools to see the target server IP and server port, and the source(s) of the attack.

Looking forward, a concern we have at Endace is session handling in software-defined networks (SDN). Switches have traditionally been optimized for destination address lookup using large-scale ternary content addressable memory (TCAM). That’s changing, however, with technologies such as OpenFlow, which lets administrators apply policy to traffic based on a large variety of source and destination options. Large numbers of sessions could fill the SDN flow tables in switches, which will likely mean that many established sessions will be prematurely removed from the table. Any new packets for those orphaned sessions will be processed according to the table-miss rule for the flow table, which means contacting the SDN controller, adding latency and creating more network traffic. Even worse, the default behavior in the latest OpenFlow specification (1.3) is to drop the packets. When traditional switches drop packets, it’s not a big deal because the sending node can just retransmit the lost packet. In a SDN environment, a switch dropping a session means the session is closed without notice, so that nodes have to wait for the session to time out and then be re-established. Unfortunately, this leads to a “thundering herd” issue, prolonging the service disruption.

Given the issues associated with session errors, the TCP Flags view and client/server breakdowns are welcome additions to EndaceVision in the EndaceProbe Network Recorder. We hope that it helps you find answers quickly. If you have other ideas on how these tools will help in your network, let us know in the comments below.

More information on EndaceVision.

Leave a Reply