MetriSight Ep.94 - De-risking AI Voice Agents

[00:00:00] Speaker A: Foreigners. And thanks for tuning in to this edition of Metro G's metrocyte podcast. I hi, I'm Beth Schultz. I'm the Vice President of Research and a Principal Analyst at metrogy, and I'm pleased to introduce my guest for today, Mark Rowan. Mark is a co Founder and Chief Operating Officer at Clearcom. Claircom is a CX testing, monitoring and optimization vendor that focuses on the voice channel. Clearcom has been in the CX assurance space for half a dozen years or so, and like many of its traditional competitors and a variety of other companies, it is evolving its portfolio for generative and agentic AI. Now this is a really hot market, CX assurance, and it's one that metrogy explores in our recently published cxassurance Market Overview Report. And Mark is here today to share his perspective for enterprises exploring ways to assure their AI agents are, first of all ready for deployment and once live, don't go off the rails. So Mark, thank you for joining me. [00:01:17] Speaker B: Thanks so much Beth. Pleasure to be on today and thank you for having me. [00:01:21] Speaker A: Right, so let's dive in. As AI agents shift from just answering questions to taking actions in 2026 and beyond, the cost of a failure isn't just a bad answer, right? It's a broken transaction. So why is traditional testing no longer enough for these autonomous systems? [00:01:39] Speaker B: Yeah, it's a great question. I suppose when we look at traditional QA that was built for deterministic systems, you write a test that passes or it fails the same way every single time. But AI agents aren't deterministic, they're probabilistic. And so I guess what I mean by that is they don't just return answers anymore, they take actions. They're transferring calls, they're updating records, they're processing refunds, even things like booking appointments. So a failure isn't a wrong answer that you can log and fix later. It's a broken transaction like you've just outlined that a real customer experiences in real time. Worse than that, the failure might not surface immediately. If you have an agent that could confidently handle 95% of a call, it only takes a small hallucination at the end of that. So if they hallucinate something like a policy detail that voids a customer claim, or if the damage is done before any test case catches it, what's needed is continuous validation in production, not a gate before a release, essentially, but you need more of a pre launch lab test, and that's still necessary with the current environment. So it's table stakes. It's not a safety net. So I suppose when we look at traditional testing versus automated AI testing, that's where the world is changing and like I said earlier on, into a more probabilistic world as opposed to a deterministic world. And the pathways can change significantly. [00:03:08] Speaker A: Let's talk about that lab environment. Right. Because a lot of testing goes in, takes place in, I'll use quotes, a lab environment. Right. But there's danger in assuming that if your AI voice agent works in the lab, then it's going to work everywhere. Now you have a lot of global customers rather. So what are the kind of the regional gotchas that we might find during an AI voice agent rollout that maybe would catch a brand off guard if they're not paying attention? [00:03:40] Speaker B: Yeah, this is such a hot topic specifically within our sector and within our industry right now when we're talking about the CX impact from a customer end perspective. Because when we look at regionalizing, so if we look at regional gotchas, traditional lab testing essentially is that it's a lab environment. So if you're testing in a US environment into a US contact center and everything is pretty much on net and it's localized, you're not going to catch any of those regional global gotchas working with some of the biggest companies in the world across all the major sectors. And the nature of what we do is we step outside that global contact center network and we replicate the customer experience from the outside in. So when you're doing that and you're localizing that process in over 100 countries around the world, you're getting a much bigger visual impact on what's essentially impacting the AI agents. When you take for example, West US Right. It's still pretty much a localized lab environment. But if you start looking at Asia infrastructure traversing all the way back to Asia, the US Take things like you're going to have challenges with codec compression. You can have things with latency profiles, you can have different audio processing chains. You're introducing multiple layers into that communication chain and there's multiple hops and there's multiple telco environments that are passing on voice services. But what's happening essentially, like I said before, with things like codec compression and latency profiles, that's having a dramatic impact across that global voice chain because you're starting to see latency impact. And when you introduce that into not a lab environment, but if you're introducing that into a real live AI agent environment, latency has a dramatic impact. Codec compression, you can start to see degradation of voice quality. But not just that. Right. You take this a step beyond. If you're looking at AI agents being trained on different acoustic conditions, for example, you take the likes of dialects, so a Spanish language agent trained on Catalan Spanish for example, they can have real trouble learn trying to comprehend Argentine or Caribbean Spanish. Even take me, right, Irish English. Right. So for me to speak normal English, how the Irish dialect carries the. The English language is slightly different and it's interpreted differently. So when you factor in all these multiples in a real world environment taken away from a lab environment, it's going to have a dramatic impact. And then you're localizing this across all these countries. Voice traversing multiple carriers, multiple environments and having multiple impacts on things like codecompression latency and different audio processing chains, all of these real world attributes and real world environments has a massive impact on what the end destination at the end impact is going to be. [00:06:25] Speaker A: I mean voice is still such a critical means of communicating for customers. And companies really have to think about this just getting their AI voice agents, right? I mean they can't mess around here. [00:06:39] Speaker B: Yeah, absolutely. [00:06:41] Speaker A: So kind of a similar question. When a CX leader sees they've, you know, they have a 90% accuracy rate as they're building their AI agent, maybe that's really makes them feel really confident. But what happens to that accuracy when you start to factor in real world audio quality and network jitter and all of that? [00:07:00] Speaker B: Yeah, again we're getting back to that kind of perfect lab environment scenario. Right. When these environments are built out, that 90% figure measured in a clean audio environment, full silence, distant microphone input, no packet loss. But then you move it into a production environment. Right. And that's when you're dealing with background noise. You're dealing from a customer calling in a car or from a grocery store, you can have things like network jitter that's chopping up syllables, even Bluetooth audio compression that smears frequencies. There's so many different variables within a real world environment that has a dramatic impact. And I suppose when you're looking at the heads of global contact center having that comfort level at 90% where the real world environment isn't giving that 90%, that's when things start to compound. So you're going to start seeing things like a noisy environment over a degraded cellular connection, you can easily start to see that that 90% essentially realistically is probably down around 70% or even lower depending on the exact transaction confirmations, things like account numbers, addresses, all these things that are being traversed over these engagements. That's where you start to really pick up on the nuances and the challenges. Right. So if you lose a number in some level of a. If you lose a number and when you're traversing some level of an account number, for example, that has a dramatic impact on the experience and what you're getting back from your AI agent. And that can be down to some background noise, a little bit of jitter online clips, a little bit of clipping into one of the numbers. All of these real world scenarios aren't visible when you're running this in a lab test environment. That 90% metric, realistically when you take it into a real world scenario, probably down around 70% if not lower. So the problem is that CX leaders rarely see these because the dashboards will show that the report accuracy across all sessions is perfect. There's no accuracy specifically challenging the acoustic levels of the call. So in a test environment you will get your 90%. The outliers are where the brand damage tends to live above anything else. And those outliers are the real world interactions, the real world engagements. And they're the ones that, where you can start to see a little bit of degradation, clipping loss of numbers into financial transactions and stuff like that. And that's where the customer frustration comes into things. And that's where you start to see a lot of brand damage that's having a much bigger live impact versus the test environment. [00:09:28] Speaker A: So I'm curious, what is sort of the, have you seen sort of an accepted range of accuracy when you are deployed live? [00:09:37] Speaker B: Yeah, yeah, I suppose when we deploy live on a lot of our customers and like I said, some of the customers we're working with are the biggest financial institutions in the world. When we're deploying live, an acceptable level of accuracy with the current environment and current setup, we're seeing anywhere from 70 upwards to about 85% acceptable accuracy. So the 90% accuracy is very much that on net. It's the off net side of things where you're going to start to see a lot more degradation. But you're taking in a global aspect at this point, you're not just taking in a localized aspect. And that's where there has to be a degree of acceptance in terms of what the deliverable is actually going to be versus the reality. So what we tend to see mostly is drop in audio quality. We can start to see over periods of time, things like transcoding can impact the AI agents and then little things like I said before things like clipping and so forth, that's where you can start to see more impact. But what we tend to see over time when customers test with us is they move from that 70, 75% up towards 80, 85% because they're enhancing their changing things in terms of background noise, validating, or they're editing or amending the level of acceptance or the tolerance level of their AI agents. And that's where we're seeing the improvements when they're using solutions like ours. [00:10:59] Speaker A: Okay. Now at this stage, you know, as companies think about deploying AI voice agents, do they all, do they all value the need for CX assurance tools or are you encountering any concern from CX and IT leaders that adding these kinds of tests and these kinds of tools in order to launch their new AI agents is, you know, it's really just adding bloat to an already, you know, bloated tech stack. Do you ever hear that? And if you do what, you know, what's your response to that kind of sentiment? [00:11:29] Speaker B: Yeah, I suppose I'm going to be biased here given the nature of the industry we're in, but it's, it's, it's a fairly, it's fair concern. Okay. And because people have spent a lot of money, businesses have spent a lot of money deploying, enhancing and growing their customer experience. And customer experience years ago wasn't even really a thing. It was almost a nice to have but not a need to have. Whereas now it's very much a need to have and not a nice to have. And things have evolved. But the cost base of deploying CX assurance tools is very low in comparison to the impact of what happens when you don't have something in place. So if we look at this, and I suppose the honest answer, if we're looking at this, the assurance layer requires a separate team mostly and some people think it requires separate dashboards and months long integration projects and so forth. And historically it may have required that type of a setup, but solutions like the clearcon solution. Deploying CX assurance platforms is as easy as turning on. Like we can have customers up and running within the space of two hours on a global scale. So we can deploy those services very, very quickly, very effectively because we've done all the hard work behind the scenes to ensure that CX leaders and CX assurance departments can deploy these types of solutions within a matter of minutes on a global scale, which makes a big, big impact. But the other side of things is the non intrusive element. So we can Deploy these without any installation, any deployment on a global scale across all of those countries in real time, which from a customer perspective ensures that they don't have to invest lots of times in separate teams, management, maintenance and so on and so forth. The type of solution we deploy for customers ensure it's very much hands off. The beauty of what we do as well is it's eyes on 24 7. So it's not just a piece of software that will generate all these AI environments and run all these different varying scripts with different LLMs across the board. It's also backed off with a 24, 7, 365 Service Assurance team as well. So there's a. As long as we've got that full bodied AI test analyst process from an AI agent perspective. We also back it off with our 24, 7, 365 human element as well, which I think it's prevalent and one of the most important things about it's overseen by a human. So to go back to one of my points initially is what's the impact of a failure versus what's the impact of having this solution? If you look at any business out there, the impact of where an AI agent mishandles a payment dispute or gives customer wrong information or recalls, you know, certain things that affects customers in a real triggering situation, those types of things can surface regular reviews and the ROI conversation changes completely when you're framing it that way. So the cost of deploying these types of solutions has become a lot more cost effective. The benefits of deploying this type of a solution is adding multiples to what the business is benefiting out of it long term. So it's not a nice to have anymore, it's a need to have because the cost of not having something like this and the impact to the business and most importantly the impact impact of brand reputation is significantly more. [00:14:45] Speaker A: You don't want to be the company that or the decision maker at a company that foregoes that purchase and then ends up with an incident that really damages the brand. [00:14:56] Speaker B: We've seen historically it was always the engineers that were, you know, the ones that were deploying these types of solutions and that had the biggest drivers and the biggest use cases for these types of solutions. But that's no longer the case. Now it's the CX Leaders at C suite and it's a top down approach and it's been driven within the businesses because they really understand the ROI solution like this. Because when you're looking at something that's a percent of the overall cost of a CX solution that you've deployed. It makes financial sense every single time. [00:15:24] Speaker A: I wanted to jump over to contact center metrics. That's something that we really pay attention to here at Metrogy and trying to think as we move forward, you know, through the rest of the year, into 2027, et cetera, as companies deploy their agents, their AI agents, you know, how should they be evolving what they're looking at in terms of context, center metrics. I mean, we used to have bot containment rate, that was the gold standard, but what about now? Are we moving towards something more meaningful, maybe solution quality? [00:15:55] Speaker B: Yeah, let me touch on the bot containment rate point initially, right? So bot containment rate made sense when the question was, can the bot handle this without a humor? And the question is mostly settled as, yeah, bots can handle high volumes. And it took a, it took a degree of time for people to have that comfort level with it. But that metric is no longer a valid metric because things have moved on so much further than where it historically was. So the new question is, did the customer actually get what they needed and do they trust us more now or less after having interacted with the bot? So the metrics have dramatically changed from is this working correctly? To okay, I know it's working correctly. Bot, what's the interaction like? And what kind of resolution point qualities are we looking at now? So did the interaction that the bot took actually resolve the underlying issue, or did it create a follow up? And these are the other metrics that people really need to start focusing on. And it points to different things like customer effort, how many turns did it take the customer to get a resolution, and did the customer have to repeat themselves multiple times? And that's when we start getting into quality degradation, we start getting into other varying metrics that can impact the business itself. And most importantly, did the customer leave the interaction more frustrated than when they actually arrived? Right, so if it could pick up on sentiment analysis and so forth, that takes you to a whole new level of understanding of that customer journey. So I suppose increasingly it points more towards the accuracy over time, not just based on one interaction, but based on all interactions over a period of time. So if the bot told a customer that their package would arrive on Thursday, did it, did it, did the package arrive on Thursday? And the metrics that need to close that loop out now, rather than stopping at just containment. So the metrics that we historically would have looked at are no longer valid. The metrics that we're looking at now is the whole communication chain right through to resolution, not just what's the uptime like and did it work? Was it effect? [00:17:53] Speaker A: Yeah, that's really interesting. I'm, you know, just because the transaction or the interaction has ended but like you still have to get to that end. Resolution, as you said, you know, the package, it was delivered or it wasn't delivered. I don't think that a lot of people think about that, you know, going beyond that step. [00:18:10] Speaker B: No, and that's, this is, this is one of the most prevalent things that we're looking at right now. So when we think about that first interaction, you know, did that first interaction start on chat? Did that next interaction traverse from chat into an IVR engagement? That IVR engagement, was that an AI agent that's engaging with the end customer? Was that a dial back function? And then is that interaction happening all the way through? But are they seeing resolution and performance all the way through that interaction right through to true to resolution really is the most important thing. Not just, yeah, we interacted, we got a response, it may not have been the one that they needed. Whereas now it's very much focused on, well, we've given back that reaction. We're now traversing various options, we're getting through all the various contact centric type solutions. Everything is correlated, everything together and all those varying solutions are talking to each other. But that interaction could start as a chat environment, that interaction could start at a AI agent environment. But the most important thing is it's how that interaction finishes and what the experience is like at the end. [00:19:14] Speaker A: Well, Mark, I have loads of other questions for you, but how about we just take a quick moment, leave our listeners with one last bit of advice. If there's one proactive step a CX leader can take this week to audit the health of their AI agent ecosystem, what should it be? [00:19:29] Speaker B: Okay, that's a big question answer small. [00:19:33] Speaker A: Answer's small. [00:19:35] Speaker B: Yeah, yeah, keep it small. I suppose the biggest thing and the biggest bit of advice I would say is to run a structured red team exercise. And what I mean by that specifically I mean is across your highest volume call flow, so not in your lab environment, but through actual production, telephony, stack testing. So from two diff, two or three different regional points, using real world scenarios, real world conditions, and validating what that localized experience across all those geographies is like all the way back to your contact center because your lab environment's going to tell you everything's fine or you're going to get your 90% metric and you're going to have that confidence level. But you don't want to find out after the fact that localizing across all these regions across your telephony stack, that's not the case. And you're dropping down to your 60s, 50s percent in terms of confidence levels. So that'd be my best bit of advice. [00:20:23] Speaker A: Okay. And that is great advice. So let's end there. Listeners, thank you for tuning into this Metro G podcast. And Mark, thank you for joining us. Thanks so much, everybody. Be sure to Visit [email protected] to check out our new in depth CX Assurance Market Overview report. And as always, we're happy to hear from you. So feel free to reach out to me directly via [email protected] or you can use the contact button on the Metro G website. That's all for now. On behalf of Mark and the Metrogy team. Goodbye till next time. And take care, everybody.

Show Notes

Episode Transcript

Other Episodes

Episode 93

MetriSight Ep.93 - Introducing Metrigy's CX Consumer & AI Sentiment Indexes

Episode 0

MetriSight Ep.2 – Workflow Management: Understanding the Basics

Episode

MetriSight Ep.13 – Carriers & CCaaS: What’s the Value-Add?