How an industry leader improved digital experience with network insights delivered without the time sink or money pit
While there’s no doubt that digital transformation and hybrid IT have delivered massive benefits to businesses, the evolution and complexity of a new wave of virtualized services also bring its fair share of headaches. The digital experience you want to give to your employees, customers and partners is not always easy to optimize and resolving performance issues can be a big challenge.
The harsh reality is that it’s impossible to keep everybody satisfied if you try to manage digital experiences without network expertise. With today’s highly virtualized architectures, it’s like looking for a needle in a haystack.
Here’s an example of what I’m talking about. We’ve been working with a commercial real estate services company with a problem that was starting to impact revenue. Typifying the challenges that many global organizations face, they had performance issues with an application hosted in one country that affected an entire business unit in another.
The problem was sporadic but chronic, causing major disruption to the business every time it happened. After several months of this they leaned in on their network engineers.
This is where we came in. It’s network analytics that rapidly puts our clients on the “find answers fast” path. Having Corvil automatically analyze traffic gave them immediate visibility into individual application transactions, types of database queries, along with packet sizes and performance metrics.
For example, by filtering and comparing characteristics of individual transactions, they discovered that the problem only affected particularly large data queries from the application to the desktop. This was why users only experienced intermittent glitches. It’s also why doing manual packet analysis intermittently, after the problem was reported, failed to identify the root cause (but more on that later).
In this case, the root cause was a bug in a virtual machine cluster. The bug caused large datagrams to be mishandled so they were not delivered correctly between the users, application servers and databases. All those big packets were being resent. This 20-20 hindsight also explains why there was a visibility gap that could only be closed by network-based multi-tiered monitoring.
The virtualization layer is designed intentionally to abstract these activities from users, applications and databases. As a result, applications only logged that the initial request was sent from the user at a particular time and it took half a minute or more to get a response, even though the network performance was fine at the time.
On the database end, the failed delivery events from the users and applications were never logged because the first set of data deliveries never got past the virtualization layer to the database. The logs only saw the final request that came through and not all the prior requests that did not make it past the bug. When the database finally received and logged the request it was able to process the query quickly. Then the same thing happened in the opposite direction, when the query result would be sent back to the user.
Only when Corvil’s network analytics linked specific problematic transactions to packet sizes, and to the spikes in retransmissions requests, was any real progress made towards nailing the root cause. However, this begs the question – why not get network engineers involved earlier?
The answer is money and time. Instrumenting every desktop and virtual machine for network intelligence isn’t (or, more accurately, has historically been) economically feasible even for the largest, most profitable companies. As a result, troubleshooting this type of application issue is laborious and requires the most skilled network engineers.
In this case, before Corvil got involved, the network engineers had to schedule time with users to install packet capture software, export hours’ worth of packets to Wireshark for manual investigation (requiring advanced skills), and then repeat if the problem couldn’t be identified. This took so long that they had the time and resources to resolve only the most severe issues. Worse yet, the manual approach perpetuates a visibility gap that is difficult to bridge and performance problems continue.
And this again is where Corvil helped out. Activating our sensors within their virtual environment and on desktops eliminated all of the manual packet wrangling of their old process. The network insights needed to analyze the digital experience were delivered without the time sink or money pit.
At Corvil, we’ve had enough experience with different clients to be able to measure the difference that network analytics can make. Typically, we can get to the cause in half as many steps and deliver a 94 percent improvement in the time it takes to troubleshoot a customer experience issue – that’s 240 minutes down to 15.
What this means for the digital enterprise is less disruption, increased user productivity and a reduced impact on revenues.
In short, there’s a compelling business case to be made for rethinking how to ensure your business has the digital experiences it will need to compete and succeed. You have to make network analytics part of the solution.