By Jeff Brown and Gary Kaiser (Dynatrace)
So what’s going on?
Dynatrace and Endace have announced NetPod™, a fully integrated solution that combines Dynatrace’s Data Center Real-User Monitoring and Endace’s EndaceProbe™ Network Recorder. It is no small thing when two independent companies agree to take their market-leading products and create a new branded offering, so you have to figure there is something valuable going on here.
In this blog, we want to go beyond the solution’s cleaner integration, better packaging, and enhanced support. So let’s take a look at the “Top Three Technical Good Things” that the NetPod solution yields as it fully addresses the need for combined network, application and user insight for today’s complex and heterogeneous data center application architectures.
Technical Good Thing #1 – Ensuring “Closed loop” fault domain isolation
NetPod’s architecture provides automated and actionable fault domain isolation (FDI) within complex application and network environments. It is FDI at its best. Without user intervention, a problem affecting – or threatening to affect – application response time is highlighted on the NetPod dashboard; the icon corresponding to the faulty network or application component – what we call the technology tier – turns red, and an alert is generated. In response, the responsible technology team takes immediate and sole ownership, armed with transaction-specific insight. Very quickly, the FDI stage of the incident response workflow is done. Or is it?
That’s the theory, and very often the practice as well. But we all know the other paths this story can take, don’t we? Even with this information, your Integrated Operations Center (IOC) can face pushback. Your outsourced database provider, or your mainframe team, or your network service provider want more proof, as their monitoring dashboards show no issues. Or your web team is pointing the finger back at the network infrastructure, as this has become quite complex (and may have a history fraught with odd behaviors). To address these challenges, NetPod’s contextual integration of user-experience driven FDI and high-fidelity packet storage and retrieval provides the multi-disciplinary and irrefutable proof you need.
The NetPod report that automatically highlights the offending transaction – a database query, an IBM MQ operation, a Citrix user’s Outlook complaint, a web services call – also provides the filtering context from which you can go back to the time of the problem and, in just a few mouse clicks, retrieve the pertinent network packets. NetPod’s Network Analyzer then puts these packets into their transaction context to provide the integrated application and network-level views that, in short order, provide all the hard, irrefutable facts needed to force the feedback loop to close. Root cause analysis can then begin on the faulty technology tier, and only that tier.
The key is NetPod’s combination of transaction visibility and deep, back-in-time packet store capabilities, plus the full line-rate packet capture performance that NetPod’s custom hardware provides, even on the most heavily loaded 10G links.
Technical Good Thing #2 – Enabling “Drive to completion” intermittent fault resolution
Not all faults are persistent. Indeed sometimes they are infuriatingly random, although they do seem to know when it’s the end of quarter or the CEO is using the application! (In the words of more than one customer, for intermittent problems, capturing the data is the hardest part of the problem.) The faulty technology behind every problem in today’s complex and constantly changing application delivery environment cannot always be immediately identified.
In these situations, you need two things to deterministically drive the fault isolation and subsequent root cause analysis to successful completion: the user and application contexts within which to start looking, and the tools and network data to investigate. And that combination is what NetPod delivers.
Why are the user and application contexts so important? Consider that a single application has many different transactions, and their response times may differ dramatically; some may complete in a few hundred milliseconds, while others may take many seconds. Without distinguishing between these, without reporting on these in different buckets as unique transactions, there is no user-experience relevant context. This is especially true for intermittent problems, but is also applicable to any fault domain isolation approach. So-called “application metrics” such as session-layer response time or ART measurements are no more useful than rudimentary network round-trip times.
NetPod defines transactions the same way a user does – e.g., page load time for web applications, T-code for SAP, or more generically a meaningful label – such as “Change Order” – on each transaction measurement corresponding to the user’s Click to Screen Update. Based on this alignment with the user’s perspective, and by extension with the interest of the business, NetPod continuously monitors and reports on end-user experience across complex network, web, middleware, and database tiers. This user transaction context, important for effective FDI, becomes critical for intermittent problems.
So when those strange unpredictable problems suddenly appear, and just as suddenly disappear, NetPod will have already identified and captured the combination of user, application and network context, enabling the IOC to determine exactly what is happening in a deterministic way, one that mirrors the generally more confident investigation of simpler reproducible problems. Only NetPod provides this contextual insight, not just for web applications, but across an extensive set of enterprise environments, including SAP, Oracle E-Business Suite, middleware, Exchange, Citrix-hosted applications, and many others.
#3 – Simplifying “What the #$% is the network doing?!” deep-dive investigations
Complex network infrastructures can significantly complicate fault domain isolation and root cause analysis. Indeed, the boundary between the network and the application becomes increasingly blurred as more and more “application fluent” network components are deployed. Understanding the performance impact of load balancers, firewalls, WAN optimization, high-latency WAN links, thin client protocols, etc., requires solutions that can correlate application and network behaviors, putting these in the context of the end user’s experience.
NetPod may isolate a fault domain to the network, and to the likely underlying condition causing the fault; for example, a routing change, an increase in packet loss or unexpected congestion may result in degraded user response time. It is important to be able to drill into network details to further isolate and begin network-centric root cause analysis. This is most commonly true for first-tier network analysis, where the availability of packet-level proof is critical to defend the network from all-too-common finger-pointing.
Ultimately, though, it is not simply the packet data, but rather the interpretation of that data that matters. NetPod offers compelling network and application visualizations that make short work of isolating even the most esoteric anomalies. Examples include:
- Detecting micro-bursts
- Viewing top talkers
- Analyzing TCP flags
- Illustrating any throughput constraint imposed by TCP window size
- Graphing the impact of packet loss on transaction time
- Detecting, measuring, and visualizing each application’s traffic across the network
The goal is to accurately and conclusively identify the exact nature of the problem. At times, the result will allow you to defend the network, absolving it from blame while identifying the real fault. Other times, the results will help identify and prove the root cause of a network-induced problem. In either case, NetPod is fully agnostic, focused solely on making quick work of network performance analysis regardless of where the fault lies.
While your responsibilities may spring from a role in application performance, network performance, or even packet analysis, they will necessarily cross these boundaries; NetPod can meet that challenge.