Finding session-related problems using EndaceVision

Posted March 4th, 2015 by Emulex

Network monitoring tends to focus heavily on bandwidth, addressing the question, “Do I have the capacity to carry the traffic that my business requires?” Capacity, however, must include session count and lifecycle, which are often overlooked until they become a problem. That’s why EndaceVisionTM 6.0 Network Visibility Software has added two new tools to deal with sessions: TCP Flags view and client/server breakdown.

High session counts can break services due to session limits in middle boxes such as firewalls and load balancers. Hitting the limit on your firewall could suddenly “break the network” by blocking new sessions, and load balancers that hit their session limits could prevent access to services.

It’s easy to measure load balancer behavior from the session perspective using the new TCP features in the EndaceVision user interface (UI) for the EndaceProbe 6.0 Intelligent Network Recorders (INRs). Start by capturing traffic on the server side of the load balancer, and then view the traffic breakdown by server IP. The resulting graph shows exactly the type and volume of traffic to each server, eliminating the guesswork as to whether the load balancer is working or not.

Middle boxes aren’t the only devices with maximum session limits. Servers often have hard limits on the number of supported sessions, whether due to configured capacity or licensing. When servers hit their limit, many of them will respond to a session request (TCP SYN) with a TCP RST instead of a SYN/ACK. This is a definitive response to the client that the session will not be established.

Spotting an overloaded server that’s not accepting new connections is straightforward with EndaceVision. The TCP Flags view graphs the count of SYN, RST, and other flags. For a healthy service, the SYN/ACK line should closely follow the SYN line. If instead the SYN/ACK line for a server is low but the RST line is high, then there is a strong implication that the service is not accepting new connections.

The TCP Flags view is useful for finding many different service issues. It shows the total session count, in addition to graphing and counting lifecycle info, including creation (SYN, SYN/ACK) and termination (FIN, RST) signals. A service that has abruptly shut down may drop all of its open connections, sending a large number of RST packets to the formerly connected clients. This shows up visually as a RST spike in the EndaceVision TCP Flags view.

A server that has gone down and just come back up may experience an additional problem. On a frequently accessed server, there may be a “thundering herd” as clients reconnect, which shows up as a spike in SYNs. In an encrypted protocol, the new session processing rate is much lower than the total session count because SSL/TLS starts with compute-intensive public/private key calculations. Fortunately, finding session establishment rates is easy with the EndaceProbe INR – just look at the SYN and SYN/ACK lines on the TCP Flags view.

A famous example of a session establishment rate problem is a SYN flood denial of service (DoS) attack. SYN flood can be found in EndaceVision by filtering in SYN and filtering out ACK, then investigating with other EndaceVision tools to see the target server IP and server port, and the source(s) of the attack.

Looking forward, a concern we have at Emulex is session handling in software-defined networks (SDN). Switches have traditionally been optimized for destination address lookup using large-scale ternary content addressable memory (TCAM). That’s changing, however, with technologies such as OpenFlow, which lets administrators apply policy to traffic based on a large variety of source and destination options. Large numbers of sessions could fill the SDN flow tables in switches, which will likely mean that many established sessions will be prematurely removed from the table. Any new packets for those orphaned sessions will be processed according to the table-miss rule for the flow table, which means contacting the SDN controller, adding latency and creating more network traffic. Even worse, the default behavior in the latest OpenFlow specification (1.3) is to drop the packets. When traditional switches drop packets, it’s not a big deal because the sending node can just retransmit the lost packet. In a SDN environment, a switch dropping a session means the session is closed without notice, so that nodes have to wait for the session to time out and then be re-established. Unfortunately, this leads to a “thundering herd” issue, prolonging the service disruption.

Given the issues associated with session errors, the TCP Flags view and client/server breakdowns are welcome additions to EndaceVision in the EndaceProbe INR. We hope that it helps you find answers quickly. If you have other ideas on how these tools will help in your network, let us know in the comments below.

Watch this video to see a demonstration of EndaceVision 6.0.

Improving network monitoring performance with the next generation Emulex EndaceProbe INR

Posted March 4th, 2015 by Erez Birenzwig

When the current Emulex EndaceProbe® Intelligent Network Recorder (INR) product range was introduced more than five years ago, most enterprise networks were only starting to think about upgrading to 10Gb Ethernet (10GbE) speeds.  Since then, most IT departments use 10GbE in their core, 1GbE to the desktop and laptop has become standard, and many are organizations are looking to move up to 25GbE, 40GbE or higher speeds.  At the time, Emulex EndaceProbe INRs were the highest performing and most reliable network packet capture device available, helping our customers migrate their monitoring from 1GbE to 10GbE.  In the same way that we enabled that migration, we are now introducing the next generation of INR products as enterprises incorporate higher network speeds.

The three new Emulex EndaceProbe INR models address the increasing demands of performance and data security.   Our aim is to provide IT teams with the tools to meet the challenges of running today’s networks under ever-increasing public scrutiny.  The features of the next generation EndaceProbe INR address these needs:

  • Self-encrypting disks (SED) used in packet storage arrays
  • Improved dock support for virtual machines (VMs)
  • Solid state drives (SSD) used for packet storage
  • Multi-rate 1/10GbE monitoring ports

The EndaceProbe 4104 INR, for example, comes in a compact 1U form factor with four multi-rate monitoring ports that simplify product selection and deployment.  The EndaceProbe 4104 INR also includes 9.6TB of SSD packet storage, which is extremely fast – with more than 20Gb per second (20Gbps) of sustained packet processing with no packet loss.  These two features allow fast setup of on-demand packet capture using Network Packet Broker (NPB) or existing network taps. Captures are used to analyze reported faults and/or access failures.  The amount of link traffic is no longer a concern – the modern Emulex EndaceProbe INRs can handle the load. Nor do you need to worry about the interface speed as both 1GbE and 10GbE are supported.  You only need to concentrate on solving problems using the captured packets.

If data security is one of your key concerns, the EndaceProbe 8004 INR is the right network monitoring appliance for you.  Fully encrypted SED storage, enforced at the hardware level, is locked to the installed RAID controller. This means that even if the drives are removed, they cannot be read outside the monitoring appliance.  The appliance is also equipped with large amounts of main memory and sufficient compute cores to run third party analysis programs in VMs. The VMs read the data directly from the built-in packet storage, further reducing the need to offload data to unsecure laptops or desktops.

In short, the Emulex next generation EndaceProbe INRs provide packet recording and analysis capabilities that fit any business, large to small, and that address the most pressing concerns in network management and maintenance. This is especially true when performing local or remote failure analysis with the recently released EndaceProbe 6.0 software.

Watch this video to see a demonstration of new capabilities in the new EndaceProbe INRs.

The growing requirement for fact-based fault domain isolation

Posted February 23rd, 2015 by Jeff Brown

Some days, I worry that the networked business applications of many organizations are like the Titanic steaming toward its iceberg.  In this case, the iceberg is outsourcing, or more precisely, the challenges that outsourcing imposes on those responsible for fixing their organizations’ business applications when they slow down or break.  It was hard enough getting visibility into what was actually happening when the entire infrastructure was owned and controlled by a single organization.  With the rapid expansion of outsourcing, there are a growing number of blind spots developing throughout these end-to-end applications.  That is because, not surprisingly, a lot of “something-as–a-service” vendors do not really want their customers poking around under the hood looking for problem areas!

It has always been a good idea to avoid involving too many departments when resolving an issue.  However, with outsourcing, this becomes critical. Gone are the days where you can call up your buddy in network operations (NetOps) and ask him to drop everything and help you debug a critical problem being experienced by a high priority end user.  Now NetOps might be the responsibility of an outsourced network provider that has made a point of definitely NOT giving you the phone numbers of anyone in NetOps!  And gone are the days when the organizations’ expert analysts can log onto their network equipment, grab some network performance stats and packet traces, then log onto their web servers and databases and poke around the system logs looking for errors.  When an entire technology tier is outsourced, what you have is a massive blind spot keeping you from performing root cause analysis within that technology domain.

So, how do you get someone in another organization to look for a fault when it’s not necessarily in their best interest to automatically assume it is actually their problem to resolve?  It won’t be good enough to say “I think your service might have a problem.”  No matter how much of an expert you are, your best guess may no longer be enough.  You’ll probably need hard evidence, irrefutable proof to really get their attention.  Indeed, having hard evidence before they will agree to respond might be a requirement of the Service Level Agreement.  That’s the important change we are seeing in incident response workflows.  To accommodate outsourced technology, we must clearly define the purpose and requirements of the Fault Domain Isolation (FDI) stage of the incident response workflow compared to the Root Cause Analysis (RCA) stage.

You may be thinking the technical infrastructure to perform evidence-based FDI is too complex and expensive to implement.  Turns out that isn’t the case.  If you are interested learning more see the Solution Brief Fault Domain Isolation: FDI using EndaceProbe™ Intelligent Network Recorders.  Hopefully evidence-based FDI will help you avoid any outsourced icebergs!


NetPod: Dynatrace and Emulex Team to Modernize AA-NPM

Posted December 2nd, 2014 by Jeff Brown

By Jeff Brown (Emulex) and Gary Kaiser (Dynatrace)

So what’s going on?

Dynatrace and Emulex have announced NetPod™, a fully integrated solution that combines Dynatrace’s Data Center Real-User Monitoring and Emulex’s EndaceProbe™ Intelligent Network Recorder. It is no small thing when two independent companies agree to take their market-leading products and create a new branded offering, so you have to figure there is something valuable going on here.

Continue reading…

On decreasing incident response time

Posted October 14th, 2014 by Boni Bruno

Seems like security incidents are occurring more often with mild to significant impact on consumers and various organizations, such as Target and Sony.

Referring to the Verizon Data Breach Report year after year confirms that incident response times to such incidents are increasing, rather than decreasing, with root cause identification of the problems not occurring for months after the security incident in many cases. This can cause a pessimistic view among many security teams, however, there are a lot of good things happening in the security space that I want to share with you.

Many organizations have readily invested in various effective security technologies and personnel training to help improve security posture and minimize risk accordingly. A critical component to the incident response problem is the time associated with weeding through all the false alarms generated by various security devices, including firewalls, intrusion prevention systems, and security reporting agents. The problem is further exacerbated by the growing speeds of networks and network virtualization, where many security tools simply can’t process data fast enough on 10Gb Ethernet (10GbE), 40GbE, or 100GbE network environments or simply lack visibility.

The good news is that solutions are available to help maintain visibility in such high-speed networks. Such solutions can also correlate network transactions with security alarms to help identify problems faster and decrease incident response times. The key is to integrate lossless network recording systems with existing security tools using feature-rich application programming interfaces (APIs). The APIs help with automating security related tasks.

Security automation is key to decreasing incident response time. Imagine being able to automate the retrieval and correlation of network transactions to any security log event aggregated into a security information event management (SIEM) system, or mapping packet data to any IPS alarm, or pinpointing application threads that trigger a specific application performance alarm. This is all possible now with high-speed lossless recording systems and API integration with SIEMs, firewalls, IPS devices, and Application Performance Monitoring (APM) systems. Yes, I am assuming your organization invested in these solutions…

As a side note, real-time NetFlow generation on dedicated appliances is proving to be a good solution where full recording options are not available due to privacy policy conflicts. These solutions can provide much better network visibility than legacy NetFlow implementations that rely on network sampling, especially over 40GbE and 100GbE network environments. NetFlow is coming back in a strong way to provide security teams much needed visibility, NetFlow isn’t just for Network Operations anymore.

The bottom line is this, mainstream security products are becoming more open to integration with third party solutions and high-speed network recording system are becoming more affordable. As a result, the security automation described above will become more prevalent among security operation teams as time goes on and this is a very good thing in my humble opinion.

The security industry as a whole is improving, there is much more collaboration going on now than ever before, and I am seeing some significant improvements being made among hardware and software vendors that make me feel very optimistic about our capabilities to decrease our incident response times moving forward. If you’re interested in seeing some of the concepts discussed here in action, comment on this post, and we would be glad to setup a conference call and provide you a live demonstration of our network visibility technology.

User and device attribution comes to EndaceVision: Empowering network and security incident analysis

Posted October 1st, 2014 by Barry Shaw

We’ve all heard that the application is now the network. This paradigm shift moved us from the simple port-based definition of applications that was prevalent up until the end of the last decade, to the more awkward reality that applications are much more complex and no longer conformed to such a simple scheme. For network operators, understanding the applications on the networks was paramount and Emulex responded to this by incorporating deep packet inspection (DPI) technology into its EndaceProbeTM Intelligent Network Recorders (INRs) in 2012.

Continue reading…

Network (In)Visibility Leads to IT Blame Game

Posted August 26th, 2014 by Mike Heumann

Significant changes in the structure and use of IT, including such seismic trends as Bring Your Own Device (BYOD), virtualization and cloud computing, have introduced new challenges to IT administrators and staff. Added layers of complexity require new skill sets and knowledge bases as well as tools to effectively run a modern enterprise network. This raises a few questions about how IT teams are coping with the changes.

Well, it appears that IT teams are struggling to gain visibility into what is causing IT problems, and are in many cases not implementing monitoring tools to help. In an Emulex survey of 547 US and European network and security operations (NetOps and SecOps) professionals conducted in the spring of 2014, 77% of respondents said that they had inaccurately reported the root cause of a network or security event to their executive team on at least one occasion. Additionally, 73% of surveyed IT staff said they currently have unresolved network events.

With more than half of US respondents (52%) confirming it costs their organization more than half a million dollars in revenue per hour when they have a network outage or performance degradation, you would assume that identifying unresolved network events would be a critical priority for IT organizations. This expectation is very much not the case – our survey revealed that 45% of organizations are still manually monitoring their networks.

With the flood of “unknown” devices resulting from BYOD (this can be hundreds or thousands of new devices daily), it would seem impossible for IT teams to derive the root cause of any network or security events if they do not have automated network surveillance tools. Startlingly, more than a quarter (26%) of European respondents said they have no plans to monitor the network for performance issues related to BYOD.

As a result of this lack of visibility, 79% of organizations have experienced network events that were attributed to the wrong IT group. This creates an “IT blame game” in which departments have to spend cycles proving their innocence, rather than getting to the root cause of network events and fixing them. If this trend continues, in tandem with increased virtualization and device proliferation, it will almost certainly lead to more outages and lost revenue.

It is also interesting to note that 83% of respondents said there has been an increase in the number of security events they have investigated in the past year. What will it take to make IT teams realize that without 100% visibility across their networks, the business is in jeopardy? The time is now for IT managers to take back control.

This blog originally appeared in APM Digest.

Black Hat 2014 – Gathering spot for government agencies, corporations, hackers and the occasional doomsday prepper

Posted August 5th, 2014 by Sonny Singh

Sorry to break it to you, but the annual “Doomsday Preppers Tradeshow and Convention” isn’t taking place this week. However, that’s not to say something far more isn’t happening instead. You see, Black Hat 2014 is upon us and for those that don’t know, it’s the show that sets the benchmark for all other security conferences the world-over. Black Hat brings together the meeting of the security minds to talk about the future of the security landscape and arming IT pros with the knowledge and skills needed to defend the government and enterprises against today’s threats.

Continue reading…

Protecting what is of value isn’t always about dollars and cents

Posted July 31st, 2014 by Brett Moorgas

When you think of the cost of a security breach in your network, the immediate thought is often a dollar amount; for example how much money has the breach caused in lost sales? Consequently, many think that private enterprises are the only ones that are prone to be at risk for attacks on their networks. The fact is public sector, educational institutions and non-profit organisations are just as much at risk and the potential costs are both great and varied.

While public sector or educational institutions may not undertake as many financial transactions as a commercial enterprise, they do utilise their networks to transact data and at times, large amounts of data. In the case of a government department, this may be in the form of personal data of its constituents. For an educational institution, it may be in the form of a key research project or vital intellectual property (IP). While this may not have any immediate dollar value, if it is missing or has been stolen, then the concerns for those that own the data may feel the same if there was a direct and tangible financial cost involved.

Continue reading…

The World Cup is Upon Us – But Are We Prepared?

Posted June 10th, 2014 by Mike Heumann

It seems as if the Sochi Winter Olympics and March Madness happened just yesterday, but the month of June is here, and with it, one of the most highly anticipated sporting events of any four year period. It could be argued that the FIFA World Cup is the most popular sporting event in the world, and with a soccer powerhouse country like Brazil hosting the tournament this year – across 12 venues – sports fans are getting the eye drops ready so they can watch every moment of the action between June 12 and July 13. ESPN will present all 64 matches across three networks (ESPN, ESPN2 and ABC).

Continue reading…

«Older Posts