Comments Page - The Realtime API

« Back The Realtime APIopenai.comSubmitted by doener 3 days ago

M4v3R 3 days ago
We were investigating speech-to-speech for a project before and estimated that creating an end to end solution with the previous method would take us weeks at best for the MVP (because the pipeline was basically: speech -> whisper STT model -> text, retrieval, API calls, etc. -> prompt -> LLM -> text -> TTS model -> speech). If this works as advertised it could cut the amount of work required quite significantly, excited to try it out (when it’s available in Europe that is…).
- joshstrange 3 days ago
  It’s not for a production-type thing but Home Assistant has this pipeline built in and you can swap out any of the 3 steps:
  * STT
  * LLM
  * TTS
  It’s pretty cool to be able to replace one of the parts, do some tests, then change another part.
  Again, it’s nothing you would use directly for a product but it’s fairly easy to test your pipeline by plugging into different aspects. (Also HA provides each component out of the box if you want them to handle STT/TTS and just test your LLM).
  BrutalCoding 3 days ago
  Add VAD to this list and it’s basically the same stack that I am running on mobile phones (on-device). It doesn’t beat OpenAI’s voice chat in terms of speed and intelligence, but it’s funny.
  The LLM part isn’t great ofc due to the small size. Still experimenting with different models/tweaks until I’m satisfied enough with the total outcome on a recent’ish iPhone/Pixel.
- michaelanckaert 3 days ago
  For what it's worth, I created an MVP solution using that pipeline that took about 3 days. I used the Azure AI Speech service and SDK. Worked pretty good despite the obvious long pipeline you described.
serf 3 days ago
it's frustrating that things like this get released from oAI but one still cannot use voice on the web-app, nor any of the advanced voice model stuff, without essentially emulating a phone.
it's hard to know who oAI is working for -- is it a developer resource group or an actual customer-facing business? it feels like they don't know, either.