As a musician, the things I want most from generative AI is:
1. Being able to have the AI fill in a track in the song, but use the whole song as input to figure out what to generate. Ideally for drums this would be a combination of individual drum hits, effects and midi so I'm able to tweak it after generation. If it used the Ableton effects and drum rack then that would be perfect.
2. Take my singing and make it both sound great and like any combination of great singers (e.g. give me a bit of Taylor Swift combined with Cat Power)
I've had a play with the style transfer between singers (bullet point 2 above) but when I last tried it, it was garbage in / garbage out, and my singing is garbage.
What I don't want: To just generate a whole song. Adobe does this style of assistive AI well in the photo editing space but no one seems to have brought it to audio yet.
Exactly. I don’t want AI to make me a song. I want it to switch up my baseline or chords or recommend a fill instrument.
Logic Pro comes the closest to this with the addition of virtual drummers. You can assign it to follow certain rhythmic timings by connecting it to a main instrument track, and by labeling sections of your song (bridge, chorus, etc) you can regenerate until you find something you're happy with.
It's a far cry from having a real drummer, but it works in a pinch.
I think you might like https://soundry.ai they are working on a feature that is just like 1.
Spark amp is now advertising with some interesting AI options. https://au.positivegrid.com/products/spark-2 As far as I understand it can attempt to generate backing drums (and music?) for jamming, which sounds great for practicing.
I think the big limiting factor is having it learn what the infinite number of parameters in different synths, etc actually make it sound like, and then being able to make the model be able to produce music by using and configuring them instead of just spaffing out crap music.
If it could do that it'd give the tweakability it needs, but I think it would basically involve training a model for each instrument.
> RapMachine
Fine-tuned on pure rap data to create an AI system specialized in rap generation Expected capabilities include AI rap battles and narrative expression through rap Rap has exceptional storytelling and expressive capabilities, offering extraordinary application potential
Using a certain other music generator I got it to accidentally say ***. It said it with a Latino American accent too.
In fact for whatever reason this tool couldn’t use a typical AAVE voice. Just Sage Francis / Atmosphere like dictionary raps and a few Latino American ones.
A big limitation of AI sloop is it tries to not offend anyone.
Art that can’t even try to offend is barely art.
> A big limitation of AI sloop is it tries to not offend anyone.
That tends to be very true of major commercially-hosted and proprietary systems, and true (but somewhat less and less consistently so) of open foundation models from corporations and big labs, but community models (including community fine tunes of big-vendor open models) much less so. This is very visible in LLMs, image models, and video models right now.
Rap (and even music more generally) models don't have as much attention right now, but I suspect if/as they get more attention and community use the same dynamic is likely to emerge.
> Art that can’t even try to offend is barely art.
I do not believe that offense is a necessary or sufficient component of art. If I raise my middle finger at someone, that is not art, but rather offense. If I view Botticelli's Primavera, I'm looking at art, and I detect no intent to offend, nor do I take offense.
There is certainly room for offense within art: the iambic poetry of Archilochus and Hipponax manages to blend aesthetic beauty with ardent invective. But I do not think that art needs or is defined by offense. Offense is accidental to art as a concept.
>Primavera (Italian pronunciation: [primaˈvɛːra], meaning "Spring") is a large panel painting in tempera paint by the Italian Renaissance painter Sandro Botticelli made in the late 1470s or early 1480s (datings vary). It has been described as "one of the most written about, and most controversial paintings in the world",[1] and also "one of the most popular paintings in Western art".[2]
https://en.m.wikipedia.org/wiki/Primavera_(Botticelli)
If it's described as controversial, it obviously offended someone if not today, in the past.
Not in the least. Controversy doesn't mean offense. The controversies around the painting relate to its interpretation, to who the model for the main figure might be, to the reasons for the number of variety of plants in the painting... none of which offer offense.
Controversy and debate do not have to be offensive or even acrimonious, and controversies can arise around a subject like a painting without arising because of any offense given in the painting.
Many cultures would regard the content of the painting to be inappropriate though.
Offense doesn't need to mean that it's meant to shock.
I suspect most mainstream AI art generators would refuse to replicate it. Doing so would "offend" the original art.
But a human artist could, perhaps a talented woman flips the gender.
Humans will always be able to challenge us in ways AIs can't.
At home and around close friends I often speak in AAVE. It's apart of my culture and at a core level, me.
When an AI rap generator decides it can't do AAVE, it's saying the very identity that created rap in the first place is offense. I don't expect a masterpiece like Illmatic, but I've yet to hear anything decent from an AI rap generator.
> Art that can’t even try to offend is barely art.
Good art is secondary to avoiding some journalist writing a hit piece about how they used your company’s AI generator to depict Hitler saying the gamer word.
To be fair, nothing AI generated is inherently “good art” if you take artistic intent into account.
Mixing paint in your rectum with the intention of making good art will always rate lower than AI slop
*citation needed
AI generated articles about ai generated music which is generated from reading ai generated articles.
Shall it continue in an unholy loop until the end of time ?
Instrumental music doesn’t count as art? Come on.
Music, in any form, is definitely art. Instrumentals can convey emotions and stories just as powerfully as lyrics — sometimes even more. It’s all about what it makes you feel.
Instrumental music can absolutely offend!
It can challenge the old standards, it can push genres into new places.
AI music can’t. It’s too safe.
Maybe some specific implementations are “too safe,” but I don’t believe that there’s any combination of notes that an AI couldn’t generate, in theory.
In theory no, but so far most of these things are severely overfit to 2010s-era pop music. They are best described as muzak generators
> Instrumental music can absolutely offend!
The Rite of Spring (Stravinsky) has entered the chat!
And if that's not offensive enough, Music in Similar Motion by Glass, or Metal Machine Music by Pat Metheny or any of Glenn Branca's "guitar symphonies" will likely do the job for most people.
Also Peaches in Regalia by Zappa.
Or anything 4/4 in some areas at a certain time.
Lou Reed - Metal Machine Music
Pat Metheny - Zero Tolerance for Silence
If you think it's too safe you haven't tried. This took me 5 minutes with almost no thought just to make the point.
https://suno.com/s/5bXmu47Iv1o0xR9U
I want to play something on my keyboard (the only instrument I am slightly ok at) and then be able to tell it to play it with a saxophone and describe exactly how I want it played. I don’t need an AI to create a song for me, I need 100 session musicians at my disposal to create the song I want. I am very excited about having that type of ai.
Here is a cool demonstration to do voice-to-instrument or instrument-to-another instrument (The inconvenient thing is that for a new kind of output sound you have to train a model for around 1 hour for good quality, but after that you can use it with different inputs quickly):
https://youtu.be/lI1LCfTx2lI?t=525
There is also Kits.ai https://www.kits.ai/tools/ai-instruments
> aggressive, Heavy Riffs, Blast Beats, Satanic Black Metal
Result: A generic pop-rock song without riffs or blast beats. Not even power metal or corset core, let alone anything even slightly resembling Black Metal.
Yup. Still doing what I expect from AI music.
Yes, please ruin music. Ruin everything you can. As long as you can build it, you should ruin it. There's really no limit. It's the masses who will actually do the ruining, so those building the technology are totally blameless. And you might even make some money, so it's all worth it.
These music/art AI threads are always gross. Make things for the joy of creating. Prompting some AI model doesn’t make you a musician - it actively robs you of the thing that’s rewarding about actually creating something.
Have you considered that some people produce music solely for aesthetic purposes?
The main genre of music I listen to is electronic.
Many electronic songs are written to evoke a specific feeling, without meaningful lyrics.
When I produce electronic music, I have no particular lyrics or composition in mind.
I just fiddle around with different sound layers going for a particular "vibe" and mash things together until they sound good to me.
I see AI as another tool to expedite this process.
Sure. If your music means nothing to you, than by all means, make it with methods that remove all agency from the process.
I’ve written instrumental/electronic music, too. When I do, it means something to me. The last thing I want is for an AI to make it for me.
Some get joy from creating generative tools/models!
A tool existing doesn't ruin anything. Genuinely stop being so dramatic and learn to ignore things you don't like. Our society would be a lot better and calmer if people did that rather than start pointless crusades.
People said this about photography. It ruined painting, and it did in fact put a lot of portrait painters out of business because in that era the reason you hired them was not for art. It was for a photograph made with brushes.
This is a good read on photography and art. Note that the rhetoric sounds almost identical to today's AI rhetoric.
“If photography is allowed to supplement art in some of its functions, it will soon supplant or corrupt it altogether, thanks to the stupidity of the multitude which is its natural ally." - Charles Baudelaire
https://medium.com/@aaronhertzmann/how-photography-became-an...
I don't think AI is a threat to actual art at all. If I want art, I explicitly do not want slop churned out of a model. I want something created by a human being to communicate something. That's the entire point.
Some around here will argue that there have been double blind tests showing that people sometimes can't tell the difference between AI output and human art. That is missing the point too. Knowing who the artist is is part of the artistic experience. If someone deepfake calls you with a model imitating your friend, is it the same as talking to your friend? The parasociality of art is part of it.
It may -- as photography did with portraiture -- be a threat to some of the ways that artists make a living, and I do understand the pushback from that. Back before photography a lot of painters made a living being cameras, and all that work dried up pretty fast. Today AI is replacing all the "filler" churned out by artists. The only silver lining I guess is that artists generally hate this work and it never paid well.
Another thing I expect to happen is: actual AI art. I don't think this has happened yet. There has not yet been an AI equivalent of the Pictoralists.
AI art is art if the AI is used by a human being as a tool to communicate what art communicates, to do what art does. Art, I guess, does many things. It entertains, informs in a way, but also communicates matters of the emotional and spiritual aspect of human existence -- of consciousness -- that can't be communicated well in other ways. If someone uses AI as a tool to do that, it's art.
A lot of what we see today coming out of AI models is I think correctly called "slop" because it is not that. There's no artistic intention or craft behind it.
BTW I'd argue that "slop" exists in the realm of music, literature, painting, and other arts made with traditional methods too. For centuries there's been low-effort smutty pulp fiction, crappy imitative pop music, and gimmicky low-effort painting. Those things are slop made with lower-tech tools. A pretentious gimmicky painting where someone threw some paint at a wall has less artistic merit than a photograph that someone composed with care to communicate something.
Edit: people said this about writing too!
https://www.anthologialitt.com/post/the-god-thoth-and-the-in...
I'm not bashing people for being skeptical of AI and worrying about its effects on the arts. There are, like I said, very valid points, especially about the ability of artists to make a living and the effect AI "slop" can have on the population. I just think we have been here before, many times.
Sad to see the downvotes on this well thought out post. Kudos! The parallels between the history of the disruption brought by photography, the musical sampler (another disruptive technology in the late 1980s/early 1990s that is now widely accepted with minimal angst) etc. with the current outcries against Generative AI are hard for me to ignore. Humans grab acoustic guitars, sing with their own voices and often just add hackneyed dreck to our ears. Humans now can type words into their computers and have found another way to often unleash more dreck to our ears powered by Generative AI. But I am deeply interested in Generative AI for its sonic exploration possibilities -- maybe what I make will sound like dreck to you, but I want to be free to do so, just like someone who wants to grab a guitar and sing.
Most of the criticism is from bandwagoning and emotion rather than critical thought. People screech and gnash their teeth as if the evil developers and AI researchers are conspiring to destroy all art ever; it's quite frankly ridiculous.
You narrowed down the important aspect: personal freedom. It's not about AI, or cameras, or samplers, or synthesizers, or automated this or that, it's about giving people the freedom to do an activity how they want. It's terribly sad that others cry for the removal of this freedom and brand it as some noble cause.
To me the crisis of art today is not AI, it's discovery.
People all over the world are making great art. How do I find it in an ocean of human-made mediocrity and now AI-churned slop? How do we discover new artists?
I sometimes go looking for new writers and new music. It's very time consuming. I'll spend hours and hours to find maybe one new piece I like. Most of that time is sifting through stuff that's just unremarkable.
Social media used to help, but now social media is just a flood of dreck.
AI is making this problem worse, but not exactly for the reason the AI bashers say. It's not that AI makes art obsolete or that AI can't be used to make real art. It's the AI makes it so easy to make dreck, it's flooding the zone even more.
Interesting how there is no mention of how the training data for this was collected. This does sound quite a bit better than Meta's MusicGen, but then again that model was also trained on a small licensed dataset.
It sounds very similar to suno v3.5 (including the audio quality) Likely they trained on suno generations.
Man, this whole topic hits way harder than I expected. AI taking shots at music creation makes me feel a bit hyped but also kinda iffy, especially when I hear people say it plays it way too safe. You think keeping things safe in AI art helps anyone actually level up or just holds us all back?
If the “Guns in butts” / “my wife is a jar of dirt” songs are any indication, I don’t know how “safe” it will be, at least as far as content goes.
I’m similarly hyped and iffy. If you could have a model that listens to a looping segment to contextualize it, and then play with other patterns on top but through a more expressive way (or even humming/singing and allowing the LLM of sorts to compose it together), that could be interesting. Would it be panned for being AI-assisted? I’d hope not, I think?
“I hear you're buying a synthesizer and an arpeggiator and are throwing your computer out the window because you want to make something real. You want to make a Yaz record. I hear that you and your band have sold your guitars and bought turntables. I hear that you and your band have sold your turntables and bought guitars.”
We just add AI in whatever forms it takes to the list I suppose.
Really interesting — we're seeing more efforts now to bring the "foundation model" approach to creative domains like music, but I wonder how well these models can internalize musical structure over long time scales. Has anyone here compared ACE-Step to something like MusicGen or Riffusion in terms of coherence across entire compositions?
They always fail with structure. The progressions often meander aimlessly and eventually go to weird places, at least IME.
The experimental section was most interesting: https://ace-step.github.io/#Experimental
Puh tss puh tss Kshhh ff kshhh ff Bmm tkk bmm tkk Drr-dr-d-d-dt
How do the quality and prompt adherence compare to Suno v4?
VPS 4 Cores SSD Disk Space: 200 GB CPU cores: 4 RAM: 4 GB Ubuntu Server 22.04 can it run on vps server this small?
Is there a demo hosted somewhere?
The diagram is super vague. How are the lyrics encoded? What does the encoder look like inside? What is the input size, input format, output size, output format? Are the three encoder outputs added? Concatentated? When Mert and m-Hubert combine are they added? Multiplied? Subtracted? Concatenated?
I really wish people could make better diagrams.
Fun to play with!