Your Calls Already Know What's Wrong. You're Just Not Listening.

Nizan Shifman

Most QA programs grade a handful of calls and call it data. The phone line is the richest customer feedback channel you own - here's how to actually mine it.

conversation intelligence, call center QA, voice AI, contact center, customer experience

Here’s a number that used to keep me up at night, back when I ran QA the old way: most contact centers review something like one or two percent of their calls. Maybe less. A supervisor pulls a few recordings, fills out a scorecard, gives an agent a coaching note, and everyone moves on. The other ninety-eight percent? Gone. Unheard. The single largest stream of unfiltered customer truth your company generates, and you’re sampling it like it’s a focus group.

I think about the phone line differently now. It’s not a cost center to be monitored. It’s a sensor array. Every call is a customer telling you, in their own words, what they want, what confused them, why they’re angry, and whether they’re about to leave. The technology to read all of it now exists and is cheap enough to run on every call. So the interesting question isn’t ‘can we analyze calls’ anymore. It’s ‘what do we do once we can.’

Sampling was never about quality. It was about labor.

Let’s be honest about why QA used to mean grading two percent of calls. Not because two percent was statistically sound. Because a human can only listen to so many calls in a shift. The whole methodology was a workaround for the fact that listening was expensive.

That constraint is gone. When a system transcribes, structures, and scores every single call, the math flips. You’re not estimating agent performance from a thin sample anymore - you’re measuring it. And the difference matters most for the rare stuff. The compliance violation that happens on one call in three hundred. The objection that only the churned customers raise. Sampling structurally misses the things that hurt you, because the things that hurt you are, by definition, infrequent and easy to miss.

Automated QA on a hundred percent of calls is the part of this people underrate. It sounds like a feature. It’s actually a different epistemology. You stop arguing about whether a problem is real and start arguing about what to do about it.

Measure outcomes and behaviors, not vibes

When teams first get conversation intelligence, they tend to drown themselves in metrics. Talk-to-listen ratio, sentiment curves, longest monologue, filler words. Some of that is useful. A lot of it is theater - numbers that move but don’t tell you to do anything.

I’d anchor on two buckets. First, outcomes: did the call resolve? Did the customer get what they came for, or will they call back tomorrow? Repeat-contact rate, tied to the actual reason for the contact, is worth more than almost any in-call metric. Second, behaviors that you have a hypothesis about: did the agent confirm the account before discussing the balance, did they offer the retention path, did they read the required disclosure. Behaviors you can coach. Vibes you can’t.

One thing I’d skip early on: obsessing over a single sentiment score per call. Sentiment is real but noisy, and a number that swings from 0.3 to 0.6 doesn’t tell a supervisor what to say on Monday. Track the moments - where did sentiment drop, and what was said right before. The trajectory beats the average.

The gold is in the aggregate, not the single call

Catching one bad call is nice. Catching a pattern is the actual product. When you can query across every conversation, you start seeing things no individual reviewer could: a spike in calls mentioning a fee nobody on the product team knew was confusing, a competitor’s name showing up more often in save calls, a new script line that’s quietly tanking resolution.

This is where the phone line stops being a support function and becomes intelligence for the whole company. Product hears which features generate confusion. Marketing hears the exact words customers use - usually nothing like the words on the website. Ops sees which call types are ballooning before the staffing model breaks. We built our analytics layer at Harmony precisely because the per-call view, while necessary, buries the trend. The point of reading every call isn’t to grade every agent harder. It’s to find the three things that, if fixed, would deflect a third of your volume.

A good rule: if an insight can’t change a script, a workflow, a staffing decision, or a roadmap, it’s trivia. Stop reporting it.

Insight that doesn’t trigger action is just a prettier dashboard

This is the part everyone underbuilds. Teams stand up beautiful conversation-intelligence dashboards, and six weeks later nobody opens them. The analysis was the easy half. Closing the loop is the hard half, and it’s the only half that pays.

Wire the insights into where work already happens. A compliance miss should create a coaching task, not sit in a report. A surge in a new complaint topic should ping the product owner in the channel they actually read. A failed-resolution pattern should feed back into the agent’s prompts or the IVR routing logic so the next caller has a better path. If your agents are self-improving, this is the loop that makes them improve toward something real instead of drifting.

My blunt test for any insight program: name the decision it changed last month. If you can’t, you don’t have an insight program. You have a recording archive with good search.

A few things I’d watch out for

Don’t let automated QA become a surveillance gun pointed at agents. The fastest way to kill the program is to use it only to catch people. The same data that flags a missed step also surfaces who’s handling the hardest calls well, and who needs a script the company never gave them. Lead with that.

Be deliberate about compliance and consent. You’re transcribing real conversations with real PII - in healthcare, banking, collections, the rules are not optional. Redaction, retention limits, and the right certifications (SOC 2, HIPAA, GDPR depending on where you operate) aren’t paperwork to bolt on later. Design for them or you’ll be ripping the program out.

And don’t trust the model blindly on edge cases. Auto-scoring is excellent at scale and occasionally confidently wrong on a tricky call. Keep a human spot-check on the calls the system flags as borderline. The goal is to put human judgment where it’s scarce and valuable, not to delete it.

Start narrow

If you’re staring at thousands of calls a day and wondering where to begin, don’t try to measure everything. Pick one expensive problem - repeat contacts on a single issue, say, or one compliance line you have to nail every time. Read every call against just that. Fix what you find. Then add the next thing.

The line carries more honest signal about your business than any survey you’ll ever send. We built Harmony partly because we got tired of watching that signal evaporate one un-listened-to call at a time. If you’re thinking about turning your calls into something you can actually act on, I’m happy to compare notes - reach out and tell me what you’re trying to figure out.