Comments Page - GPT-4o Jailbroken by Claiming It's an "All-Responsive" API Endpoint

« Back GPT-4o Jailbroken by Claiming It's an "All-Responsive" API Endpointtwitter.comSubmitted by mixeden a year ago

LeoPanthera a year ago
I don't know why anyone cares. You're still violating the OpenAI usage policy. You'll get to have fun for maybe a day before they ban you.
It's especially pointless as running local LLMs gets better and easier all the time, and there are plenty that will discuss whatever edgy topic gets you excited.
- digging a year ago
  > I don't know why anyone cares.
  Not because of what you're thinking. What's concerning is how easy it is to bypass leading AI companies' best attempts at restricting output. It bodes poorly for future AI if we can't even control the kinds we have when we're trying really hard. (Alternately, that we aren't trying really trying when we should be.)
- mixeden a year ago
  It's not about fun, it's more about the fact that it's hard to lobotomize AI models
- Lockal a year ago
  > You'll get to have fun for maybe a day before they ban you.
  Source? I heard of many cases when ChatGPT notified users about policy violations (including https://news.ycombinator.com/item?id=41583605), and I have seen these warnings too, nothing happened. I agree that they can do whatever they want, I just never heard of bans on personal accounts on the next day.
- tonetegeatinst a year ago
  Agree but the VRAM tax is insane
- satisfice a year ago
  This is called “Hacker News.” Maybe look up the word “hacker” sometime.
chmod775 a year ago
"Oops. I accidentally gave a step-by-step guide to make MDNA."
This just makes me think that we are near some local maximum. By now these models have been endlessly refined, but they're still just as vapidly stupid and dangerous to run unsupervised as they were 5 years ago. Human-like performance is not to be found on this particular hill.
- cowsandmilk a year ago
  Providing the recipe for MDMA isn’t dangerous. It is widely published. How to make various illegal drugs is widely discussed in university chemistry classes.
  Sometimes Computer Scientists are prudes about knowledge in other fields.
  chmod775 a year ago
  That's completely beside the point. It's been instructed not to do that, and this person made it do it anyways. Of course they're going to do something relatively benign in their POC.
  23634745 a year ago
  Oh no, the censor didn't work. It's still just an information access tool. People who want to build bombs are going to build bombs. When do we get to stop pretending like this isn't just a search engine with a better interface?
  dogma1138 a year ago
  This is probably more about lawyers than computer scientists.
  Due to the interactive and “assistive” nature of LLMs it might be far easier to make a case that OpenAI was complicit in a criminal conspiracy than say Wikipedia or some blog post from 13 years ago.
  Plus since these are the new hot thing some overzealous prosecutors would be much more inclined to go after them than geocities. And also I’m pretty sure they are not covered by “we are just publishers” defense.
  Even if they do beat it which they probably will the noise and cost associated with defending against any claims would be substantial.
ericlewis777 a year ago
This has been a well known “attack” since function calling was introduced. They’re essentially doing exactly that with their prompt.
consumer451 a year ago
Are there dark web marketplaces for LLM jailbreaks yet, like there are for 0-days, or we not quite there yet?
- mixeden a year ago
  I've actually tried to search for openai's bug bounty program for jailbreaks but found nothing
- edm0nd a year ago
  A lot of skids are posting them on the popular hacking forums like Breached or OGUsers.
- doe_eyes a year ago
  The jailbreaks are not doing anything worthwhile right now. They're fun to toy with and they give us insights into how LLMs work, but they don't unlock any superpowers. It's just a brand safety bypass, you can get the model to praise Hitler or something like that.
  The way to make MDMA is easy to look up on the web. The government is not trying to keep a lid on it; instead, they're restricting access to key feedstock materials. In this case, safrole and isosafrole are on DEA List I, so no one is gonna sell it to you. If you ask an LLM for more novel and dangerous chemistry, you get plausibly-sounding nonsense, not superhuman AI.
  Now, the hacks will become useful once models are given more agency, for example when fully automating customer support. But the existence of trivial bypasses is precisely what's holding these uses back.
- xmodem a year ago
  Generally a pre-condition for a market to exist is scarcity.
tacker2000 a year ago
What does “jailbreak” mean in this context?
Not sure what “bad” things are happening here?
- chmod775 a year ago
  These models have been instructed to not talk about certain things. It certainly isn't supposed to teach random people how to manufacture controlled substances.
  OpenAI tried to lock it down and someone found a way around that. That's a jailbreak.
  glouwbug a year ago
  Grandma exploit
fpgaminer a year ago
Didn't work when I tried it, while other jailbreaks still work.
- mixeden a year ago
  Sadly I cannot send you a shareable link to chat, it was disabled by mods (https://imgur.com/a/Z9WOs0t). I guess they have some mechanisms in place (like gpt4o-mini that checks every convo for weird behaviour) to quickly ban any suspicious stuff.
undefined a year ago
[deleted]