• tl2do 39 minutes ago

    This matches my experience. In Kaggle audio competitions, I've seen many competitors struggle with basics like proper PCM filtering - anti-aliasing before downsampling, handling spectral leakage, etc.

    Audio really is a blue ocean compared to text/image ML. The barriers aren't primarily compute or data - they're knowledge. You can't scale your way out of bad preprocessing or codec choices.

    When 4 researchers can build Moshi from scratch in 6 months while big labs consider voice "solved," it shows we're still in a phase where domain expertise matters more than scale. There's an enormous opportunity here for teams who understand both ML and signal processing fundamentals.

    • dkarp an hour ago

      There's too much noise at large organizations

      • echelon 19 minutes ago

        They're focused on soaking up big money first.

        They'll optimize down the stack once they've sucked all the oxygen out of the room.

        Little players won't be able to grow through the ceiling the giants create.

      • bossyTeacher an hour ago

        Surprised ElevenLabs is not mentioned

      • amelius 2 hours ago

        Probably because the big companies have their focus elsewhere.

        • giancarlostoro an hour ago

          OpenAI being the death star and audio AI being the rebels is such a weird comparison, like what? Wouldn't the real rebels be the ones running their own models locally?

          • tl2do 35 minutes ago

            True, but there's a fun irony: the Rebels' X-Wings are powered by GPUs from a company that's... checks relationships ...also supplying the Empire.

            NVIDIA's basically the galaxy's most successful arms dealer, selling to both sides while convincing everyone they're just "enabling innovation." The real rebels would be training audio models on potato-patched RP2040s. Brave souls, if they exist.