• regenschutz an hour ago

    The Fast demo model is already very impressive. It was way better than expected, but still required being a bit verbose since it didn't seem to understand rarer words ("sauna" didn't get me anything pleasant, "hot sauna" did).

    The generated palette seem to be a great indicator of whether the model understood the prompt or not.

    I Haven't checked out the Python SDK yet, but it seems very interesting!

    I'm curious to know if there is any reason for why you picked Gemma 1B for the Expressive model. Did it generate more cohesive parameters than other 1B models? Or was it just the first one you picked?

    • prabal97 an hour ago

      BTW - we used Gemma 270m model - not the 1B model. It's sheerly the size - I wanted to see if I could get a really, really tiny LLM to generate coherent music. Tbh, it didn't quite work as well as I expected. It barely beats a randomly generated track.

      In fact, the 'fast' model (literally embedding lookup over a pre-generated library of music ... generated using Best-of-N on Gemini Flash) beats nearly everything - including Gemini Flash, Claude Opus, Gemma models.

    • hackingonempty 2 hours ago

      I get clicks and pops every few seconds, using Librewolf.

      But otherwise very cool!

      • prabal97 2 hours ago

        Yeah ... that's a bit of an issue - it's because I built a custom simple synthesizer that's not quite as good as a full blown one. Thank you for the feedback! This is something I intend to tackle in upcoming releases.

        • lorenzohess an hour ago

          Same, on Vanadium. Very cool nonetheless

          • prabal97 an hour ago

            Thank you!!!

      • blasphemous_dev 3 hours ago

        I kinda liked how well you can fine-tune parameters of the music. Could be useful as dynamic soundtracks for games in low resource settings

        • prabal97 3 hours ago

          Yeah! That's one of the reasons I've exposed the Python SDK ... you could, in theory, even attach some sort of DAW that manipulates the music in real-time.

          • thunfischtoast 42 minutes ago

            Thats pretty cool! Some games like Gothic had dynamic composing of existing sound themes. It would be interesting to research how these could fit together

        • cprecioso an hour ago

          Server is down :(

          • prabal97 an hour ago

            Could you try again, please? Just redeployed with a tiny change. Should work now!

          • saranshmahajan 2 hours ago

            Really elegant approach - mapping sentence embeddings to a deterministic synth feels more like building an instrument than generating content, and the instant playback makes it great for flow.

            Would love to know if the same prompt always yields the same sound (reproducibility could be powerful), and whether you’ve considered semantic morphing between two moods over time.

            • prabal97 2 hours ago

              Thanks!

              The same prompt yields largely the same song because the 'Fast' (default) mode retrieves the synth parameters from a pre-existing library.

              But if you use the 'Custom LLM' model, it can generate new and creative music every time you play something - even for the same input!