As a Sales Engineer with Corvil, my role includes working very closely with customers and helping them utilise the Corvil solution to resolve application and service issues across the network infrastructure.
One such example was when, I had a recent customer support call that perfectly demonstrates how Corvil can be the difference between uptime and downtime. This multinational firm was running VoIP across different continents and was using Corvil to monitor its VoIP performance in regional data centers. Their voice infrastructure uses kit from multiple vendors and depends on interoperability between various combinations of software, servers and devices. After one supplier carried out an upgrade, problems started to occur leading to a major outage and users complaining that they couldn’t make calls.
The customer had support folks from multiple vendors busy looking through server logs trying to figure out the issue while the customer’s voice operations were checking their Corvil dashboards. The vendors were forming a couple of theories what might be the issue but were struggling to prove one way or another. While looking at our dashboards, voice operations saw an anomaly around SIP messaging and contacted me to help him dig deeper.
Drilling down into the data collected on their Corvil appliances, we discovered the problem centered around SIP subscribe messages sent from one vendor’s devices to the other vendor’s server. We could see that there was a huge increase in the number of these messages and analyzing these messages showed how they were looping around the infrastructure and not reaching their destination, eating up vital server power and preventing calls from being properly processed.
We were able to filter and pivot the data to see the messages in the days leading up to the event and compare them with what happened before and during the outage. This live and retrospective analysis was the key to understanding the issue.
We could see the servers being impacted and could provide pcaps to provide to the vendors as proof and for further analysis.
I was lucky enough to be on the multi-vendor call where I heard one of the vendors say how great the customer's analytics were – meaning the data provided by Corvil. Bringing visibility to complex environments is just one of our strengths. We don’t just identify issues that other tools miss, we provide information to make sure it doesn’t happen again.
In this case, the subscribe feature was disabled and the way the servers receive messages was improved. At the same time the client was able to add a new graph to their dashboard and set up a new alert that would immediately tell them if a similar problem occurred.
This type of incident is far from unique. Not long after, the same client was able to solve a second looping issue, caused by audio codec incompatibilities with one of their voice systems. Changes were quickly made to handle these calls and a potential second outage was avoided because the new Corvil alert they setup caught something early that would previously have gone unnoticed for some time until it caused a major issue.
As organizations invest in collaboration suites, multi-vendor services, and hybrid cloud infrastructure, they are finding themselves with increasingly complex communication platforms. The reality is that legacy voice service assurance tools struggle to identify technical issues, let alone solve them. At Corvil, it’s what we do.