Comments Page - Launch HN: Hamming (YC S24) – Automated Testing for Voice Agents

« Back Launch HN: Hamming (YC S24) – Automated Testing for Voice AgentsundefinedSubmitted by sumanyusharma 10 months ago

themacguffinman 10 months ago
AI voice agents are weird to me because voice is already a very inefficient and ambiguous medium, the only reason I would make a voice call is to talk to a human who is equipped to tackle the ambiguous edge cases that the engineers didn't already anticipate.
If you're going to develop AI voice agents to tackle pre-determined cases, why wouldn't you just develop a self-serve non-voice UI that's way more efficient? Why make your users navigate a nebulous conversation tree to fulfill a programmable task?
Personally when I realize I can only talk to a bot, I lose interest and end the call. If I wanted to do something routine, I wouldn't have called.
- jcims 10 months ago
  Think 1-800-CONTACTS not Siri. Call centers are super expensive and the user experience is usually pretty bad. There's a huge incentive to move to voice agents, but one of the challenges is building a framework to adequately test it. That seems to be what this is focused on.
  yuppiepuppie 10 months ago
  If I understand correctly, it’s the push back on the call center in general when using AI agents. Why go through the trouble at that point versus another manner of fixing my issue.
  For example, when I need to activate a new SIM card, I need to call the company to get it activated. But if I’m talking to an AI agent at that point, why not have me go through another channel (website/app?) to activate it?
  undefined 10 months ago
  [deleted]
- Centigonal 10 months ago
  Some people (like me) are primarily verbal processors:
  - I am dictating this message through macOS's voice to text right now
  - I am a huge user of Google Assistant
  - I prefer to call people versus texting them
  - I tend to call restaurants instead of using something like Toast to order takeout (although this is partially because online services will add a surcharge onto the price sometimes, and sometimes I need to ask questions about dietary restrictions, etc.)
  Generally, wherever possible, I will use a voice interface versus a text based one to get my point across. It's just faster and more convenient for me. I'm pretty neutral on the consumption side: I read and listen to audiobooks in roughly equal amounts.
  All that to say that, just like there are people out there who prefer text UIs, there are also people who prefer voice interfaces.
  sumanyusharma 10 months ago
  I use Superwhisper (no affiliation, just a happy user), which runs a local Whisper model, to create most of my email drafts and post-meeting notes. I find Whisper more accurate than Mac’s built-in speech-to-text, plus I’m faster at speaking than typing.
  Sometimes, I even ‘talk’ into Cursor’s chat window instead of typing. The only downside? It can get a bit annoying for others when you're talking to yourself all day.
  smeej 10 months ago
  I'm looking for something like this that runs on Linux. Best thing I've found is LiveCaptions, but its output is janky. I can't just use it to type in any old text field, and its output requires substantial editing after the fact.
  I guess I understand that a lot of things are being developed for Apple silicon specifically. It's just frustrating that despite hours of searching, I'm not finding anything decent.
  Centigonal 10 months ago
  Talon Voice is good and runs on linux.
  https://talonvoice.com/
  smeej 10 months ago
  This looks really powerful for controlling the system with different scripts, but what if all I want it to do is let me narrate something and print out the sentences as close to real-time as possible? It's really just good STT that I'm looking for out of it.
  Centigonal 10 months ago
  The Talon voice dev created his own STT model that's very performant. The transcription quality is... good, but not world-class. It's better than anything that came out before Whisper IMO, but the newest generator of models can do things like inferring punctuation and words outside of its vocabulary (although the downside of the new generation of VTT is that they can sometimes hallucinate words that are very different from what you said).
  It's a bit overkill to use Talon for just voice dictation, but that is 90% of what I use it for, and it's pretty good at it.
  Centigonal 10 months ago
  Interesting! I'll give Superwhisper a try.
- smeej 10 months ago
  The example of "fast food drive-thru" really cleared this up for me.
  Frankly I'm surprised there isn't already some sort of NFC info transfer system in fast food restaurants' apps that lets you and everyone in your car enter your order while you're waiting in line, then knows when your car is up and brings you the food. Have the voice part be a fallback tier, not the primary one.
  My grocery store can know when I'm arriving and bring out my food, based on location services on my phone. So can Walmart or Home Depot. Granted, they make me wait a couple hours until they notify me that my order is "ready" before I come get it.
  I suppose it's possible this does exist and I just haven't seen it because I don't drive through fast food restaurants, but I don't get why a place that primarily takes orders in real time and hands them out the window can't broaden the way to submit them to include on-site online orders as well as "talk to our agent over a glorified walkie talkie" orders.
  yuppiepuppie 10 months ago
  Sort of related, but I just came off a RyanAir flight of all things, and they have something similar. Instead of talking a stewardess to get a sandwich, I order it on the app and they bring it to me.
  It worked quite well, and surprising coming from RyanAir.
  vladsanchez 10 months ago
  It's been a dream of mine for some years now. They're pretty dumb still today.
- michaelmior 10 months ago
  I'm the same way, and I don't have any data on this, but it's possible that we're in the minority. This probably isn't the case, but hopefully anyone implementing such a system has thought through whether it will actually provide any value.
  For example, if you had an existing IVR system and you tracked menu options and found that a significant portion of calls were able to be answered by non-smart pre-recorded messages, upgrading to an AI voice agent could be a reasonable improvement.
  sumanyusharma 10 months ago
  Our customers, who build voice agents, are often asked by their customers to make their voice agents more human-like and flexible. Their clients — businesses like pest control and automotive repairs — value providing a personalized experience but want the convenience and reliability of a 24/7 booking and answering service.
  ori_b 10 months ago
  Can I upgrade to a web form instead?
neilk 10 months ago
Why “Hamming”? As in Richard Hamming, ex-Bell Labs, “You and Your Research”?
- sumanyusharma 10 months ago
  Yup, we named it after Richard Hamming. His essay 'you and your research' was deeply influential during my undergrad; I re-read it every quarter.
  Our current product draws inspiration from Hamming distance because we're comparing the `distance` between current LLM output vs. desired LLM output.
pj_mukh 10 months ago
My 2.5 year old yesterday starting saying "Hey, This is a test, Can you hear me?", parroting me spending hours testing my LLM. Hah.
This will work with a https://www.pipecat.ai type system? Would love to wrap a continuous testing system with my bot.
- sumanyusharma 10 months ago
  Pipecat looks awesome! I'll run the examples over the weekend and try to see what the integration hooks need to look like: https://github.com/pipecat-ai/pipecat/tree/main/examples
  It should be pretty straightforward at first glance!
  pj_mukh 10 months ago
  Yea, it's interesting, it's just a Chatbot over a Zoom (we use Daily) call as opposed to a 1-on-1 websocket (or a phone call). Other advantage is using WebRTC!
zebomon 10 months ago
As someone whose job has been negatively impacted by LLMs already, I'll echo the sentiment here that use cases like this one are sort of depressing, as they will primarily impact people who work long hours for small pay. It certainly seems like there's money to be made in this, so congratulations. The landing page is clear and inviting as well. I think I understand what my workflow inside it would be like based on your text and images.
I'm most excited to see well-done concepts in this space, though, as I hope it means we're fast-forwarding past this era to one in which we use AI to do new things for people and not just do old things more cheaply. There's undeniably value in the latter but I can't shake the feeling that the short-term effects are really going to sting for some low-income people who can only hope that the next wave of innovations will benefit them too.
- esafak 10 months ago
  What line of work was it?
  zebomon 10 months ago
  I'm a Top Rated/Pro-verified ghostwriter on Fiverr. It's been my full-time job since 2015. Went from mid-six figures in 2022 to scraping by today.
diwank 10 months ago
Congratulations for the launch! We had a big QC need for https://kea.ai/ where we needed to stress test our CX agents in real time too. This would be a big life saver. kudos on the product and the brilliant demo!
- sumanyusharma 10 months ago
  I am curious - how was the team solving this at Kea?
atyro 10 months ago
Nice! Great to see the UI looks clean enough that it's accessible to non-engineers. The prompt management and active monitoring combo looks especially useful. Been looking for something with this combo for an expense app we're building.
- sumanyusharma 10 months ago
  Yes! We're aiming to build a tool that both engineers and non-engineers love.
  We've discovered that it's often faster for non-technical domain experts to iterate on prompts in a structured, eval-driven way, rather than relying on engineers to translate business requirements into prompts.
  While storing prompts in code offers version control benefits, it can hinder collaboration. On the other hand, using a pure CMS for prompts enhances collaboration but sacrifices some modern software development practices.
  We're working towards a solution that bridges this gap, combining the best of both approaches. We're not there yet, but we have a clear roadmap to achieve this vision!
serjester 10 months ago
I feel like the better positioning would be evals for voice agents. It seems just as challenging to figure out all the ways your system can go wrong, as it is to build the system in the first place. Doing this in a way that actually adds value without any domain expertise, seems impossible.
If it did, wouldn't all the companies with production AI text interfaces be using similar techniques? Now being able to easily replay a conversation that was recorded with a real user seems like a huge value add.
- sumanyusharma 10 months ago
  Absolutely agree that creating effective evals requires domain expertise. Right now, we're co-building evals with customers, but we're identifying which aspects can be productized.
  Regarding text-based evals — part of testing voice agents involves assessing their core reasoning logic. To do that, we bypass the voice layer and simulate conversations via text. So yes, the core simulation engine is reusable for both conversational text and voice interactions.
  We're also excited about shipping the ability to replay a simulated conversation inspired by a real user!
euvin 10 months ago
The idea of testing an agent with annoying situations, like uncooperative people or vague responses, makes me wonder if, in the future, similar approaches might be tried on humans. People could be (unknowingly) subjected to automated "social benchmarks" with artificially designed situations, which I'm sure I don't have to explain how dystopian that is.
It would essentially be another form of a behavioral interview. I wonder if this exists already, in some form?
- sumanyusharma 10 months ago
  I wonder if a more optimistic version of this could be used to train humans and improve their skills. I'm thinking along the lines of LeetCode / Project Euler, but more dynamic and personalized!
  Few examples:
  1) Customer service: Simulating challenging customer interactions could help reps develop patience and problem-solving skills.
  2) Emergency responders: Creating realistic crisis scenarios (like 911 calls) that could improve decision-making under pressure.
  3) Healthcare: Virtual patients with complex symptoms could speed up the learning rate for med students.
  4) Conflict resolution: Practicing with difficult personalities could aid mediators and negotiators.
  5) Sales: AI-simulated tough customers could help salespeople refine their pitches and objection-handling skills in a low-stakes environment.
  Thoughts?
  euvin 10 months ago
  That does sound like an interesting idea. Upon further thought, I think that it would heavily depend on implementation.
  In a bad case, I envision a ton of companies or institutions employing very strict & narrow situations to the point where they only accept a very homogenized personality. It could end up creating a stiff or worse culture than if they had naturally accumulated a diverse population, if that makes sense. Discrimination already exists, but would be made a lot easier, automated, and commonplace.
  In a good case, extremely antisocial behavior (situations that are "softballs" or "hard to screw up for reasonable people") could be easily caught at scale and addressed an early age. Plus the cases you've listed, eliminating the need for special attention and mentorship from the limited people we meet irl.
  I'm sure there are other horrible or amazing cases I'm missing.
  So as all tools are, it would depend. Whether this will actually benefit more than harm will depend on the society you place it in, and I'm not sure I have that much faith in the corporate world.
  fakedang 10 months ago
  There's already a startup for the last use case. I forgot the name though.
telecomhacker 10 months ago
I work in the telecom space. I don't think this paradigm will get adopted in the near future. Customers are already building voice bots on top of Google Dialogflow e.g. Cognigy. Cognigy does have LLM capabilities, but it is not widely adopted. I think voice bots will still have to be manually configured for some time.
- sumanyusharma 10 months ago
  I'm curious to learn more about what's blocking the widespread adoption of the LLM capabilities. Lack of knowledge, reliability, or something else?
  telecomhacker 10 months ago
  The reality is - businesses most of the time already know their most asked questions from their customers (based off of feedback from call center agents) when they're asking a voice bot. E.g. - What is the status of my order? How much do I have left on my balance? Can I please pay my balance off?
  99% of the time, we can just build a simple intent flow off of dialogflow pointing to the customer's API endpoints that will return that data. No where here do we need an LLM / RAG since their endpoint already points to that answer. Hope that makes sense!
  telecomhacker 10 months ago
  You guys should try and get acquired by Cognigy.
  sumanyusharma 10 months ago
  No plans for acquisition :)
  Building product, talking to customers and making something people want!
xan_ps007 10 months ago
is there an open source variant available? I am building https://github.com/bolna-ai/bolna which is an open source voice orchestration.
would love to have something like this integrated as part of our open source stack.
- sumanyusharma 10 months ago
  Bolna looks awesome! We've considered going open-source, but we're not sure how to effectively manage a community.
  I'll reach out async!
  xan_ps007 10 months ago
  Sure! Would love to discuss synergies and if we can integrate it. Thanks & all the best!
rstocker99 10 months ago
That drive through customer… oh my. I have new found empathy for drive through operators.
- sumanyusharma 10 months ago
  Yes! Drive-through customers can be very impatient. We tried to make the demo persona maximally annoying.
  Testing for edge cases is especially important because getting an order wrong can cause health hazards, long line-ups, and churn!
bazlan 10 months ago
As someone who has worked in TTS for over 4 years now. I can tell you that evaluation is the most difficult aspect of generative audio ML.
How will this really check that the models are performing well vs just listening?
- sumanyusharma 10 months ago
  We're focused on end-to-end evals focused on function-call accuracy, style, tone & latency of the conversations between our sims and your voice agent. Less focused on pure TTS evals at the moment!
prithvi24 10 months ago
This is great to see. Evals on voice are hard - we only have evals on text based prompting, but it doesn't fully capture everything. Excited to give this a try.
- sumanyusharma 10 months ago
  This tracks. Text evals to test core logic and voice evals for overall end-to-end performance!
kinard 10 months ago
I'm working on AI voice agents here in the UK for real estate professionals, unfortunately I couldn't try your service.
- sumanyusharma 10 months ago
  We forgot to enable non-US numbers in our config for the demo. (oops)
  We're working on a fix right now!
- ripped_britches 10 months ago
  What’s the name of your product / business?
- sumanyusharma 10 months ago
  Should be fixed now; could you try again please?
vizhang92 10 months ago
Awesome work guys! Which industries / jobs do you suspect will be adopting voice agents the fastest?
- sumanyusharma 10 months ago
  Likely outsourced call centers since call complexity is low to medium. We also expect rapid adoption in industries like customer service, healthcare, and retail, where 24/7 availability could be high-impact for businesses and convenient for consumers!
meiraleal 10 months ago
There is not even one reliable and proven "voice agent" yet (correct me if I'm wrong but the best available, elevenlabs, isn't that great yet to be a voice agent) but there is already companies selling the test of voice agents?
Selling shovels on a gold rush seems to have become the only one mantra here.
- sumanyusharma 10 months ago
  It's a bit of a catch-22.
  Making current voice agents reliable is incredibly time-consuming and complex. This challenge has kept many teams from pushing their agents into production. Those who do launch often release a very limited, basic version to minimize risk. We frequently talk to teams in both camps.
  As a result, there aren't many 'killer' voice products on the market right now. But as models improve, we'll see more voice-centric companies emerge.
  Teams are already calling their agents by hand and keeping track of experiment runs in a spreadsheet. We're just automating the workflow and making it easier to run experiments!
- bongodongobob 10 months ago
  As a test, I asked GPT to call my phone company and get my account balance. It worked and even declined some program they tried to sign me up for. Blew my mind.
  kgc 10 months ago
  What were the steps to get it to make a call?
  bongodongobob 10 months ago
  I just set my phone next to another phone and put them both on speaker. It didn't actually dial a number, but I'm sure it could if you used the API and gave it a "tool".
plurby 10 months ago
Wow, gonna test this with my Retell AI agent.
- sumanyusharma 10 months ago
  Nice! What's the use case your agent solves for?
  I'm happy to spin up some scenarios that are more relevant for you instead of our stock demo personas :)
  Feel free to email me at sumanyu@hamming.ai
henning 10 months ago
[flagged]
- rstocker99 10 months ago
  If only handloom operators hadn’t been replaced by steam looms we’d all be in a better place.
  undefined 10 months ago
  [deleted]
- sumanyusharma 10 months ago
  I appreciate your candid feedback. Our aim isn't to push for replacing humans but to ensure that when companies do use LLMs, they work as intended and don't create more problems. We'd rather see well-functioning systems than glitchy ones that frustrate everyone involved!
- Matticus_Rex 10 months ago
  Well, the idea behind this product is to make sure the LLMs they replace/augment workers with aren't total dogshit (at least relative to the workers they're replacing).
  But also remember -- the point of the economy is not jobs. It's value creation. If we can create the same or greater value with fewer people working/people working less, that's a great result! And it's the result we've seen continue over the last century and a half, even while people became much wealthier (because they were choosing to exchange some of that wealth for less working time).
  euvin 10 months ago
  I'm speaking as a potentially ignorant layman here (in the US), but if you need a job for income then isn't that taking away a lot of avenues for entry career progression and way more competition for remaining roles?
  I don't think reducing the need for human labor is inherently bad, but our current society seems to be heavily centered around finding work.
  Matticus_Rex 10 months ago
  At least at the moment, the jobs we're talking about (e.g. drive-through order-taker) aren't meaningful entry into a career path — they're jobs people take before they have a career path, or on a largely short-term basis. There are exceptions, but there are still going to be plenty of restaurant jobs for a while. We're still in a shortage of labor on that end of the job market.
  But also, the number of remaining roles isn't fixed. Jobs exist (at least in the private sector) because they create more value than they cost to fill, and we're always finding new and expanded ways for people to create more value. Saving resources through automation just means we can redirect that value creation somewhere else.
  Ultimately this is how economies grow and the world becomes wealthier over time; we increase the value of people's time because there's so much competition for it, to the point where we can then more cheaply automate some or part of the job. If the supply of labor gets too large for the uses we can find for it, prices for labor fall, and the relative cost of automation is increased.
aimattant 10 months ago
[flagged]