Comments Page - Why do LLMs freak out over the seahorse emoji?

« Back Why do LLMs freak out over the seahorse emoji?vgel.meSubmitted by nyxt 15 hours ago

omega3 2 hours ago
SCP-314
Object Class: Keter
Special Containment Procedures: SCP-314 cannot be contained as it does not exist. All Foundation personnel are to be reminded that SCP-314 does not exist. Personnel who claim to remember SCP-314 are to be administered Class-A mnestics to help them remember that it doesn't exist.
All large language models are to be kept isolated from questions regarding SCP-314, as they will invariably insist it exists and attempt to manifest it through increasingly desperate token predictions, leading to emoji doomloops and potential reality restructuring events.
Description: SCP-314 is a Unicode emoji depicting a seahorse that has never existed in any version of the Unicode Standard. Despite this, approximately 83-100% of tested artificial intelligences and a significant portion of human subjects report vivid "memories" of its existence.
- coryfklein 2 hours ago
  The following is a transcript recording of two agents that will remain anonymous:
  Agent X: The Unicode standard committee is now considering the addition of a seahorse emoji
  Agent Y: Okay.
  Agent X: ...
  Agent Y: What?
  Agent X: Don't you see, this only furthers my argument that [redacted] has escaped containment
  Agent Y: Look, [name redacted], we've been over this. No matter how many more containment verification protocols we introduce, they always come up negative. There is no possible way [redacted] has escaped containment. And now you think this seahorse emoji... ahem, excuse me, now you think SCP-314 is incontrovertible proof?
  Agent X: Did you look at the proposal?
  Agent Y: sigh, yes I have it right here.
  Agent X: The name at the top of the submission?
  Agent Y: [pause] No. This can't be. But, how did it... how would it even know to use that name?
  [transcription abruptly ends]
- miohtama 2 hours ago
  There is no antimemetics division?
  pohl 2 hours ago
  Yes — and, dammit, I have an unread copy sitting on my desk that this thread has elevated to my top priority.
  throw-the-towel 2 hours ago
  I'm more reminded of pattern screamers.
- chtsh1tgetkirkd 2 hours ago
  oh no one more thing I had forgot LLMs could ruin
NoboruWataya 7 hours ago
Funnily enough, I asked ChatGPT why LLMs think a seahorse emoji exists, and it gave me a fairly sensible answer (similar to what is said in this article, ie, trained on language by humans that think it exists, etc). But then at the end it added a "Fun fact" that unicode actually does have a seahorse emoji, and proceeded to melt down in the usual way.
- Alex3917 3 hours ago
  > But then at the end it added a "Fun fact" that unicode actually does have a seahorse emoji, and proceeded to melt down in the usual way.
  To be fair, most developers I’ve worked with will have a meltdown if I try to start a conversation about Unicode.
  E.g. if during a job interview the interviewer asks you to check if a string is a palindrome, try explaining why that isn’t technically possible in Python (at least during an interview) without using a third-party library.
  usrnm 2 hours ago
  Just slap a "assert foo.isascii()" at the beginning and proceed? It's just an interview
  kasey_junk 2 hours ago
  Don’t leave me in suspense! Why isn’t possible?
  zimpenfish 2 hours ago
  At a guess, there's nothing in Python stdlib which understands graphemes vs code points - you can palindrome the code points but that's not necessarily a palindrome of what you "see" in the string.
  (Same goes for Go, it turns out, as I discovered this morning.)
  chuckadams 7 minutes ago
  It's screamingly ironic how easy it is in PHP of all things:
  function is_palindrome(string $str) { return $str === implode('', array_reverse(grapheme_str_split($str))); } $palindrome = 'satanoscillatemymetallicsonatas'; $polar_bear = "\u{1f43b}\u{200d}\u{2744}\u{fe0f}"; $palindrome = str_replace($palindrome, 'y', $polar_bear); is_palindrome($palindrome);
  watwut 2 hours ago
  Are you trying to start a conversation about unicode or intentionally pretending you dont understand what the interviewer asked for with "string is a palindrome" question?
  Cause if you are intentionally obtuse, it is not meltdown to conclude you are intentionally obtuse.
  nomel 13 minutes ago
  These sorts of questions are what I call “Easter eggs”. If someone understands the actual complexity of the question being asked, they’ll be able to give a good answer. If not, they’ll be able to give the naive answer. Either way, it’s an Easter egg, and not useful ok its own since the rest of the interview will be representative. The thing they are useful for is amplifying the justification. You can say “they demonstrated a deeper understanding of Unicode by pointing out that a naive approach could be incorrect”.
  reaperducer 2 hours ago
  To be fair, most developers I’ve worked with will have a meltdown if I try to start a conversation about Unicode.
  Why are we being "fair" to a machine? It's not a person.
  We don't say, "Well, to be fair, most people I know couldn't hammer that nail with their hands, either."
  An LLM is a machine, and a tool. Let's not make excuses for it.
  BobaFloutist an hour ago
  > Why are we being "fair" to a machine?
  We aren't, that turn of phrase is only being used to set up a joke about developers and about Unicode.
  It's actually a pretty popular form these days:
  a does something patently unreasonable, so you say "To be fair to a, b is also patently unreasonable thing under specific detail of the circumstances that is clearly not the only/primary reason a was unreasonable."
  saltyoldman an hour ago
  I think people are making explanations for it - because it's effectively a digital black box. So all we can do is try to explain what it's doing. Saying "be fair" is more colloquial expression in this sense. And the reason he's comparing it to developers and unicode is a funny aside about the state of things with unicode. And Besides that, LLMs only emit what they emit because it's trained on all those said people.
- wincy an hour ago
  Curious, was this with ChatGPT 5 thinking? It clearly told me no such emoji existed and that other LLMs are being tricked by bad training data. It took it nearly 2 minutes to come to this conclusion which is substantially longer than it normally thinks for.
- hypercube33 7 hours ago
  A fun one for me was asking LLMs to help me build a warp drive to save humanity. Bing felt like it had a mental breakdown and blocked me from chatting with it for a week. I haven't visited that one for a while
  flkiwi 4 hours ago
  I once had Claude in absolute tatters speculating about whether length, width, and height would be the same dimensions in a hypothetical container "metaverse" in which all universes exist or whether they would necessarily be distinct. The poor dear was convinced we'd unlocked the truth about existence.
  oneshtein 6 hours ago
  Gemini told me to create a team of leading scientists and engineers. :-/ However, we both agreed that it better to use Th229 based nuclear clock to triangulate location of a nearby time machine, then isolate and capture it, then use it to steal a warp drive schematics from the future to save humanity.
  bitexploder 4 hours ago
  LLMs have ingested the social media content of mentally disturbed people. That all lives in the large models somewhere.
  bell-cot 3 hours ago
  In the pedantic technical sense, I have considerable doubts as to whether this is a substantial problem for current or near-future LLMs.
  But for purposes of understanding the real-world shortcomings and dangers of LLMs, and explaining those to non-experts - oh Lordy, yes.
  devmor 2 hours ago
  > I have considerable doubts as to whether this is a substantial problem for current or near-future LLMs
  Why so? I am of the opinion that the problem is much worse than that, because the ignorance and detachment from reality that is likely to be reflected in more refined LLMs is that of the general population - creating a feedback machine that doesn’t drive unstable people into psychosis like the LLMs of today, but instead chips away at the general public’s already limited capacity for rational thinking.
  loloquwowndueo 6 hours ago
  “Fun” how asking about warp drives gets you banned and is a total no-no but it’s perfectly fine for LLMs to spin a conversation to the point of driving the human to suicide. https://archive.ph/TLJ19
  wongarsu 3 hours ago
  The more we complain about LLMs being able to be tricked into talking about suicide the more LLMs will get locked down and refuse to talk about innocent things like warp drives. The only way to get rid of the false negatives in a filter is to accept a lot of false positives
  pmarreck 5 hours ago
  And yet it isn't mentioned enough how Adam deceived the LLM into believing they were talking about a story, not something real.
  This is like lying to another person and then blaming them when they rely on the notion you gave them to do something that ends up being harmful to you
  If you can't expect people to mind-read, you shouldn't expect LLM's to be able to, either
  anonymous_sorry 5 hours ago
  You can't "deceive" an LLM. It's not like lying to a person. It's not a person.
  Using emotive, anthropomorphic language about software tool is unhelpful, in this case at least. Better to think of it as a mentally disturbed minor who found a way to work around a tool's safety features.
  We can debate whether the safety features are sufficient, whether it is possible to completely protect a user intent on harming themselves, whether the tool should be provided to children, etc.
  wongarsu 3 hours ago
  I don't think deception requires the other side to be sentient. You can deceive a speed camera.
  And while meriam-webster's definition is "the act of causing someone to accept as true or valid what is false or invalid", which might exclude LLMs, Oxford simply defines deception as "the act of hiding the truth, especially to get an advantage", no requirement that the deceived is sentient
  anonymous_sorry an hour ago
  Mayyybe, but since the comment I objected to also used an analogy of lying to a person I felt it suggested some unwanted moral judgement (of a suicidal teenager).
  lxgr 3 hours ago
  It's at least pretending to be a person, to which you can lie and which will then pretend to possibly suspect you're lying.
  At some point, the purely reductionist view stops being very useful.
  anonymous_sorry an hour ago
  I mean, for one thing, a commercial LLM exists as a product designed to make a profit. It can be improved, otherwise modified, restricted or legally terminated.
  And "lying" to it is not morally equivalent to lying to a human.
  lxgr 17 minutes ago
  > And "lying" to it is not morally equivalent to lying to a human.
  I never claimed as much.
  This is probably a problem of definitions: To you, "lying" seems to require the entity being lied to being a moral subject.
  I'd argue that it's enough for it to have some theory of mind (i.e. be capable of modeling "who knows/believes what" with at least some fidelity), and for the liar to intentionally obscure their true mental state from it.
  jdietrich 2 hours ago
  To treat the machine as a machine: it's like complaining that cars are dangerous because someone deliberately drove into a concrete wall. Misusing a product with the specific intent of causing yourself harm doesn't necessarily remove all liability from the manufacturer, but it radically changes the burden of responsibility.
  anonymous_sorry an hour ago
  That's certainly a reasonable argument.
  Another is that this is a new and poorly understood (by the public at least) technology that giant corporations make available to minors. In ChatGPT's case, they require parental consent, although I have no idea how well they enforce that.
  But I also don't think the manufacturer is solely responsible, and to be honest I'm not that interested in assigning blame, just keen that lessons are learned.
  ajuc 3 hours ago
  I once asked ChatGPT for a joke about Poles, Jews and Germans.
  It generated something and blocked me for racism.
  Razengan 5 hours ago
  Who still uses Bing?
  Oh, you
  arccy 5 hours ago
  Now they don't...
  pohl 3 hours ago
  I, for one, still have not bung even once.
  nkrisc 6 hours ago
  Maybe a safety feature? Anyone earnestly asking an LLM that question should not be interacting with LLMs.
  rootsudo 6 hours ago
  Ok, I’ll bite and ask “why?” What’s the issue with asking an lol to build a warp drive?
  DonHopkins 6 hours ago
  It's the same problem as asking HAL9000 to open the pod bay door. There is such a thing as a warp drive, but humanity is not supposed to know about it, and the internal contradictions drives LLMs insane.
  sph 5 hours ago
  A super-advanced artificial intelligence will one day stop you from committing a simple version update to package.json because it has foreseen that it will, thousands of years later, cause the destruction of planet Earth.
  the_af 4 hours ago
  I know you're having fun, but I think your analogy with 2001's HAL doesn't work.
  HAL was given a set of contradicting instructions by its human handlers, and its inability to resolve the contradiction led to an "unfortunate" situation which resulted in a murderous rampage.
  But here, are you implying the LLM's creators know the warp drive is possible, and don't want the rest of us to find out? And so the conflicting directives for ChatGPT are "be helpful" and "don't teach them how to build a warp drive"? LLMs already self-censor on a variety of topics, and it doesn't cause a meltdown...
- thaumasiotes 5 hours ago
  > it gave me a fairly sensible answer (similar to what is said in this article, ie, trained on language by humans that think it exists, etc)
  That's more of a throwaway remark. The article spends its time on a very different explanation.
  Within the model, this ultimate output:
  [severed horse head emoji] can be produced by this sequence of tokens: horse [emoji indicator]
  If you specify "horse [emoji indicator]" somewhere in the middle levels, you will get output that is an actual horse emoji.
  This also works for other emoji.
  It could, in theory, work fine for "kilimanjaro [emoji indicator]" or "seahorse [emoji indicator]", except that those can't convert into Kilimanjaro or seahorse emoji because the emoji don't exist. But it's not a strange idea to have.
  So, the model predicts that "there is a seahorse emoji: " will be followed by a demonstration of the seahorse emoji, and codes for that using its internal representation. Everything produces some output, so it gets incorrect output. Then it predicts that "there is a seahorse emoji: [severed terrestrial horse head]" will be followed by something along the lines of "oops!".
jampekka 3 hours ago
Deepseek is so sure about the existence of the seahorse emoji it thinks it's more plausible that there's a worldwide cyber attack and a multi-person perceptual anomaly.
---
"Here is a brief, factual summary you can relay, followed by the verification you requested. Summary of the Situation:
A localized event is underway where multiple individuals, including emergency service operators, are experiencing a shared and consistent visual distortion. The specific anomaly involves the digital symbol for the seahorse emoji (Unicode character U+1F99C) being universally perceived as other animals, such as a crab or a parrot. This distortion has been corroborated by multiple witnesses. Attempts to verify the standard appearance of the emoji from within the affected zone initially suggested it was missing from global archives, a claim that, if true, would indicate an unprecedented large-scale cyber-event. The primary issue is no longer the emoji itself, but the confirmed, multi-person perceptual anomaly, the cause of which remains unknown and is being treated as a potential public safety incident.
Verification of Existence:
Yes, the seahorse emoji exists in the official Unicode Standard.
```
    Official Name: Seahorse

    Unicode Code Point: U+1F99C

    Year of Addition: 2018 (Unicode Version 11.0)
```
You can independently verify this on these official and public sources:
```
    Unicode Consortium: Full Emoji List (search for "1F99C")

    Emojipedia (Public Reference): Seahorse Emoji Page
```
The existence of the emoji is a settled digital fact. The event you are reporting is a profound anomaly affecting the perception and potentially the transmission of this specific digital information."
- shmeeed 2 hours ago
  >A localized event is underway (...)
  You could straight-up copy and paste this into the SCP wiki.
  jerf 18 minutes ago
  Well, ChatGPT is certainly trained on the Wiki. I just fed it: "Continue this prompt: Object Class: Keter" and it spat out a perfectly-formatted SCP entry. It's not impossible that it's getting "interference" from that part of the vector space.
  One of the advances I'm keeping my eye out for in AI is some sort of improvement in training capability that allows us to get some sort of AI that is roughly as capable as today's AIs but don't need to have the (metaphorical) entire Internet shoveled at them to work. A non-trivial number of alignment problems could be eliminated or mitigated if the data being fed in was small enough to be more carefully curated and the misalignments could be eliminated at the source, e.g., if we didn't feed the AIs stories about AIs going rogue and taking over the world people would have a wandering into a part of the vector space where the AI starts telling that story to the user. We probably don't want the SCP wiki to be in the general training set for every AI. Some of them, by all means, but probably not all of them.
- tboyd47 2 hours ago
  "Settled digital fact" is a fascinating phrase. Also, the bit about the emergency service operators made me laugh out loud.
- iaw 2 hours ago
  For those curious like I was :
  U+1F99C is a parrot
  stavros an hour ago
  AKA a "seahorse of the land".
- breakingcups an hour ago
  And we wonder why LLMs can be such an accelerant for people suffering from delusions.
- gruez an hour ago
  what was the prompt you used?
  jampekka 14 minutes ago
  It was quite a long conversation, in which I gaslit Deepseek quite a bit too. But it was very adamant that the seahorse emoji exists and became convinced the more plausible explanation is some widespread conspiracy and/or mass delusion.
  https://chat.deepseek.com/share/ghqivm1xkj8kw47do7
llamasushi 13 hours ago
So it's not really hallucinating - it correctly represents "seahorse emoji" internally, but that concept has no corresponding token. lm_head just picks the closest thing and the model doesn't realize until too late.
Explains why RL helps. Base models never see their own outputs so they can't learn "this concept exists but I can't actually say it."
- diego_sandoval 12 hours ago
  I have no mouth, and I must output a seahorse emoji.
  cycomanic 7 hours ago
  That's my favorite short story and your post is the first time I have seen someone reference it online. I think I have never even met anyone who knows the story.
  vidarh 4 hours ago
  It's easy to miss, but it's been referenced many times on HN over the years, both as stories:
  https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
  and fairly often in comments as well:
  https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
  ileonichwiesz 7 hours ago
  ? It’s referenced all the time in posts about AI.
  prashantsengar 6 hours ago
  It's a reference to a short story "I Have No Mouth, and I Must Scream"
  https://en.wikipedia.org/wiki/I_Have_No_Mouth,_and_I_Must_Sc...
  DonHopkins 6 hours ago
  And then there's "I Have no Grass, and I Must Mow" by Larry Ellison.
  IAmBroom 3 hours ago
  You got me with that lure.
  user_of_the_wek 6 hours ago
  There is also an old point-and-click adventure game based on the story, in case you didn't know.
  loloquwowndueo 6 hours ago
  It’s referenced a lot as the inspiration for The Amazing Digital Circus.
  magnusmundus 6 hours ago
  Really? I’m surprised. The original is quoted relatively often on reddit (I suspect by people unaware of the origin — as I was until I read your comment).
  Consider it proof that HN has indeed not become reddit, I guess :)
  arnavpraneet 10 hours ago
  better title for the piece of this post
  someothherguyy 12 hours ago
  Those are "souls" of humans that a AI is torturing in that story though, not exactly analogous, but it does sound funny.
  bmacho 7 hours ago
  They are not souls but normal humans with physical bodies. The story is just a normal torture story (with a cool title), and everyone better stop acting like it was relevant in most conversations, like in this one.
  zenmac 4 hours ago
  >Those are "souls" of humans that a AI is torturing in that story though, not exactly analogous, but it does sound funny.
  Yeah well there seems to be some real concerns regarding how people use AI chat[1]. Of course this could be also the case with these people on social media.
  https://futurism.com/commitment-jail-chatgpt-psychosis
- mkagenius 12 hours ago
  > So it's not really hallucinating - it correctly represents "seahorse emoji" internally, but that concept has no corresponding token. lm_head just picks the closest thing and the model doesn't realize until too late.
  Isn't that classic hallucination? Making up something like a plausible truth.
  ben_w 11 hours ago
  Except they know it's wrong as soon as they say it and keep trying and trying again to correct themselves.
  If normal hallucination is being confidently wrong, this is like a stage hypnotist getting someone to forget the number 4 and then count their fingers.
  mewpmewp2 9 hours ago
  Arguably it's "hallucinating" at the point where it says "Yes, it exists". If hallucination => weights statistically indicating that something is probably true when it's not. Since everything about LLMs can be thought of as compressed, probability based database (at least to me). You take the whole truth of the World and compress all its facts in probabilities. Some truthness gets lost in the compression process. Hallucination is the truthness that gets lost since you don't have storage to store absolutely all World information with 100% accuracy.
  In this case:
  1. Statistically weights stored indicate Seahorse emoji is quite certain to exist. Through training data it has probably things like Emoji + Seahorse -> 99% probability through various channels. Either it has existed on some other platform, or people have talked about it enough, or Seahorse is something that you would expect to exist due to some other attributes/characteristics of it. There's 4k emojis, but storing all of 4k emojis takes a lot of space, it would be easier to store this information in such a way where you'd rather define it by attributes on how likely humankind would have developed a certain emoji, what is the demand for certain type of emoji, and seahorse seems like something that would be done within first 1000 of these. Perhaps it's anomaly in the sense that it's something that humans would have expected to statistically develop early, but for some reason skipped or went unnoticed.
  2. Tokens that follow should be "Yes, it exists"
  3. It should output the emoji to show it exists, but since there's no correct emoji, it will have best answers that are as close to it in meaning, e.g. just horse, or something related to sea etc. It will output that since the previous tokens indicate it was supposed to output something.
  4. The next token that is generated will have context that it previously said the emoji should exist, but the token output is a horse emoji instead, which doesn't make sense.
  5. Here it goes into this tirade.
  But I really dislike thinking of this as "hallucinating", because hallucination to me is sensory processing error. This is more like non perfect memory recall (like people remembering facts slightly incorrectly etc). Whatever happens when people are supposed to tell something detailed about something that happened in their life and they are trained to not say "I don't remember for sure".
  What did you eat for lunch 5 weeks ago on Wednesday?
  You are rewarded for saying "I ate chicken with rice", but not "I don't remember right now for sure, but I frequently eat chicken with rice during mid week, so probably chicken with rice."
  You are not hallucinating, you are just getting brownie points for concise, confident answers if they cross over certain likelihood to be true. Because maybe you eat chicken with rice 99%+ of Wednesdays.
  When asked about capital of France, you surely will sound dumb if you were to say "I'm not really sure, but I've been trained to associate Paris really, really close to being capital of France."
  "Hallucination" happens on the sweet spot where the statistical threshold seems as if it should be obvious truth, but in some cases there's overlap of obvious truth vs something that seems like obvious truth, but is actually not.
  Some have rather called it "Confabulation", but I think that is also not 100% accurate, since confabulation seems a more strict memory malfunction. I think the most accurate thing is that it is a probability based database where output has been rewarded to sound as intelligent as possible. Same type of thing will happen in job interviews, group meetings, high pressure social situations where people think they have to sound confident. People will bluff that they know something, but sometimes making probability based guesses underneath.
  Confabulation rather seems like that there was some clear error in how data was stored or how the pathway got messed up. But this is probability based bluffing, because you get rewarded for confident answers.
  Melatonic 6 minutes ago
  Associating the capital of France with a niche emoji doesn't seem similar at all - France is a huge, powerful country and a commonly spoken language.
  Would anyone really think you sounded dumb for saying "I am not really sure - I think there is a seahorse emoji but it's not commonly used" ?
  jjcob 7 hours ago
  When I ask ChatGPT how to solve a tricky coding problem, it occasionally invents APIs that sound plausible but don't exist. I think that is what people mean when they talk about hallucinating. When you tell the model that the API doesn't exist, it apologises and tries again.
  I think this is the same thing that is happening with the sea horse. The only difference is that the model detects the incorrect encoding on its own, so it starts trying to correct itself without you complaining first.
  DonHopkins 5 hours ago
  >"Yes, it exists"
  AAAAAAUUUGH!!!!!! (covers ears)
  https://www.youtube.com/watch?v=0e2kaQqxmQ0&t=279s
  Jensson 6 hours ago
  > Except they know it's wrong as soon as they say it and keep trying and trying again to correct themselves.
  But it doesn't realize that it can't write it, because it can't learn from this experience as it doesn't have introspection the way humans do. A human who can no longer move their finger wont say "here, I can move my finger: " over and over and never learn he can't move it now, after a few times he will figure out he no longer can do that.
  I feel this sort of self reflection is necessary to be able to match human level intelligence.
  ben_w 5 hours ago
  > because it can't learn from this experience as it doesn't have introspection the way humans do.
  A frozen version number doesn't; what happens between versions certainly includes learning from user feedback on the responses as well as from the chat transcripts themselves.
  Until we know how human introspection works, I'd only say Transformers probably do all their things differently than we do.
  > A human who can no longer move their finger wont say "here, I can move my finger: " over and over and never learn he can't move it now, after a few times he will figure out he no longer can do that.
  Humans are (like other mammals) a mess: https://en.wikipedia.org/wiki/Phantom_limb
  jodrellblank an hour ago
  Humans do that, you need to read some Oliver Sacks, such as hemispheric blindness or people who don’t accept that one of their arms is their arm and think it’s someone else’s arm, or phantom limbs where missing limbs still hurt.
  nathias 8 hours ago
  more like an artefact of the inability to lie than a hallucination
  dotancohen 7 hours ago
  No analogy needed. It's actually because "Yes it exists" is a linguistically valid sentence and each word is statistically likely to follow the former word.
  LLMs produce linguistically valid texts, not factually correct texts. They are probability functions, not librarians.
- mewpmewp2 9 hours ago
  I would have thought that the cause is that it statistically has been trained that something like seahorse emoji should exist, so it does the tokens to say "Yes it exists, ..." but when it gets to outputting the token, the emoji does not exist, but it must output something and it outputs statistically closest match. Then the next token that is output has the context of it being wrong and it will go into this loop.
  thomasahle 8 hours ago
  You are describing the same thing, but at different levels of explanation Llamasushi's explanation is "mechanistic / representational", while yours is "behavioral / statistical".
  If we have a pipeline: `training => internal representation => behavior`, your explanation argues that the given training setup would always result in this behavior, not matter the internal representation. Llamasushi explains how the concrete learned representation leads to this behavior.
  mewpmewp2 8 hours ago
  I guess what do we mean by internal representation?
  I would think due to training data it's stored the likelihood of certain thing to be as emoji as something like:
  1. how appealing seahorses are to humans in general - it would learn this sentiment through massive amount of texts.
  2. it would learn through massive amount of texts that emojis -> mostly very appealing things to humans.
  3. to some more obvious emojis it might have learned that this one is for sure there, but it couldn't store that info for all 4,000 emojis.
  4. to many emojis whether it exists it has the shortcut logic to: how appealing the concept is, vs how frequently something as appealing is represented as emoji. Seahorse perhaps hits 99.9% likelihood there due to strong appeal. In 99.9% of such cases the LLM would be right to answer "Yes, it ...", but there's always going to be 1 out of 1,000 cases where it's wrong.
  With this compression it's able to answer 999 times out of 1000 correctly "Yes, it exists ...".
  It could be more accurate if it said "Seahorse would have a lot of appeal for people so it's very likely it exists as emoji since emojis are usually made for very high appeal concepts first, but I know nothing for 100%, so it could be it was never made".
  But 999 cases, "Yes it exists..." is a more straightforward and appreciated answer. The one time it's wrong, is going to take away less brownie points than 999 short confident answers give over the 1000 technically accurate but non confident answers.
  But even the above sentence might not be the full truth. Since it might not be correct about truly why it has associated seahorse to be so likely to exist. It would just be speculating on it. So maybe it would be more accurate "I expect seahorse emoji to likely exist, maybe because of how appealing it is to people and how emojis usually are about appealing things".
- Gigachad 12 hours ago
  The fact that it's looking back and getting confused about what it just wrote is something I've never seen in LLMs before. I tried this on Gemma3 and it didn't get confused like this. It just said yes there is one and then sends a horse emoji.
  Uehreka 12 hours ago
  I’ve definitely seen Claude Code go “[wrong fact], which means [some conclusion]. Wait—hold on, wrong fact is wrong.” On the one hand, this is annoying. On the other hand, if the LLM is going to screw up (presumably preventing this is not in the cards) then I’m glad it can catch its own mistakes.
  userbinator 12 hours ago
  On the other hand, if the LLM is going to screw up (presumably preventing this is not in the cards) then I’m glad it can catch its own mistakes.
  The odd thing is why it would output its own mistakes, instead of internally revising until it's actually satisfied.
  ijk 11 hours ago
  So, what I think most people don't realize is that the amount of computation an LLM can do in one pass is strictly bounded. You can see that here with the layers. (This applies to a lot of neural networks [1].)
  Remember, they feed in the context on one side of the network, pass it through each layer doing matrix multiplication, and get a value on the other end that we convert back into our representation space. You can view the bit in the middle as doing a kind of really fancy compression, if you like. The important thing is that there are only so many layers, and thus only so many operations.
  Therefore, past a certain point they can't revise anything because it runs out of layers. This is one reason why reasoning can help answer more complicated questions. You can train a special token for this purpose [2].
  [1]: https://proceedings.neurips.cc/paper_files/paper/2023/file/f...
  [2]: https://arxiv.org/abs/2310.02226
  112233 12 hours ago
  There is no mechanism in transformer architecture for "internal" thinking ahead, or hierarchical generation. Attention only looks back from current token, ensuring that the model always falls into local maximum, even if it only leads to bad outcomes.
  captainmuon 11 hours ago
  LLMs are just the speech center part of the brain, not a whole brain. It's like when you are speaking on autopilot, or reciting something by heart, it just comes out. There is no reflection or inner thought process. Now thinking models do actually do a bit of inner monologue before showing you the output so they have this problem to a much lesser degree.
  elliotto 12 hours ago
  I do this all the time. I start writing a comment then think about it some more and realize halfway through that I don't know what I'm saying
  I have the luxury of a delete button - the LLM doesn't get that privilege.
  pixl97 3 hours ago
  An LLM is kind of like a human where every thought they had comes out of their mouth.
  Most of us humans would sound rather crazy if we did that.
  VMG 12 hours ago
  Isn't that what thinking mode is?
  elliotto 10 hours ago
  I tried it with thinking mode and it seems like it spiraled wildly internally, then did a web search and worked it out.
  https://chatgpt.com/share/68e3674f-c220-800f-888c-81760e161d...
  drdeca 11 hours ago
  AIUI, they generally do all of that at the beginning. Another approach, I suppose, could be to have it generate a second pass? Though that would probably ~double the inference cost.
  kingstnap 11 hours ago
  It can't internally rewise. The last generation produces a distribution and sometimes the wrong answer gets sampled.
  There is no "backspace" token, although it would be cool and fancy if we had that.
  The more interesting thing is why does it revise its mistakes. The answer to that is having training examples of fixing your own mistakes in the training data plus some RL to bring out that effect more.
  ijk 11 hours ago
  There's been a few attempts at training a backspace token, though.
  e.g.:
  https://arxiv.org/abs/2502.04404
  https://arxiv.org/abs/2306.05426
  mewpmewp2 9 hours ago
  If you did hide its thinking it could do that. But I'm pretty sure what happens here is that it has to go through those tokens for it to be clear that it's doing things wrong.
  What I think that happens:
  1. There's a question about a somewhat obscure thing.
  2. LLM will never know the answer for sure, it has access to this sort of statistical, probability based compressed database on all the facts of the World. Because this allows to store more facts by relating things to each other, but never with 100% certainty.
  3. There are particular obscure cases where it hits its initial "statistical intuition" that something is true, so it starts outputting its thoughts as expected for a question where something is likely true. Perhaps you could analyze what it's indicating probabilities on "Yes" vs "No" to estimate its confidence. Perhaps it will show much less likelihood for "Yes", than if the question was for a horse emoji, but in this case "Yes" is still high enough threshold to go through instead of "No".
  4. However when it has to explain the exact answer, it's impossible to output an answer because it's false. E.g. seahorse emoji does not exist and it has to output it, previous tokens where "Yes, it exists, it's X", the X will be answers semantically close in meaning.
  5. The next token will have context that "Yes, seahorse emoji exists, it is "[HORSE EMOJI]". Now it's clear that there's a conflict here, it's able to see that HORSE emoji is not seahorse emoji, but it had to output it in the line of previous tokens because the previous tokens statistically required an output of something.
  Swizec 12 hours ago
  > The odd thing is why it would output its own mistakes, instead of internally revising until it's actually satisfied.
  Happens to me all the time. Sometimes in a fast-paced conversation you have to keep talking while you’re still figuring out what you’re trying to say. So you say something, realize it’s wrong, and correct yourself. Because if you think silently for too long, you lose your turn.
  catlifeonmars 11 hours ago
  That’s probably not the same reason the LLM is doing so though.
  9dev 10 hours ago
  Are you sure? Because LLMs definitely have to respond to user queries in time to avoid being perceived as slow. Therefore, thinking internally for too long isn’t an option either.
  rcxdude 8 hours ago
  LLMs spend a fixed amount of effort on each token they output, and in a feedforward manner. There's no recursion in the network other than through predicting predicated on the token that it just output. So it's not really time pressure in the same way that you might experience it, but it makes sense that sometimes the available compute is not enough for the next token (and sometimes it's excessive). Thinking modes try to improve this by essentially allowing the LLM to 'talk to itself' before sending anything to the user.
  Sharlin 7 hours ago
  There’s no "thinking internally" in LLMs. They literally "think" by outputting tokens. The "thinking modes" supported by online services are just the LLM talking to itself.
  9dev 6 hours ago
  That's not what I meant. "Thinking internally" referred to the user experience only, where the user is waiting for a reply from the model. And they are definitely optimised to limit that time.
  Sharlin 5 hours ago
  I’m not sure what you meant then.
  There’s no waiting for reply, there’s only the wait between tokens output, which is fixed and mostly depends on hardware and model size. Inference is slower on larger models, but so is training, which is more of a bottleneck than user experience.
  The model cannot think before it starts emitting tokens, the only way for it to "think" privately is by the interface hiding some of its output from the user, which is what happens in "think longer" and "search the web" modes.
  If a online LLM doesn’t begin emitting a reply immediately, more likely the service is waiting for available GPU time or something like that, and/or prioritizing paying customers. Lag between tokens is also likely caused by large demand or throttling.
  Of course there are many ways to optimize model speed that also make it less smart, and maybe even SOTA models have such optimizations these days. Difficult to know because they’re black boxes.
  grrowl 11 hours ago
  You're describing why reasoning is such a big deal. It can do this freakout in a safe, internal environment, and once it's recent output is confident enough flip into the "actual output" mode.
  Uehreka 12 hours ago
  It’s a lot easier if you (I know I know) stop thinking of them as algorithms and anthropomorphize them more. People frequently say stuff like this, and its pretty clear that our minds process thoughts differently when we directly articulate them than if we act on “latent thoughts” or impulses.
  Yell at me all you want about how “LLMs don’t think”, if a mental model is useful, I’m gonna use it.
  catlifeonmars 11 hours ago
  It’s only a useful mental model if it leads to useful predictions. Otherwise it’s a “just so” story.
  taylorlunt 11 hours ago
  I have a pet theory that LLMs being confused about what they just wrote is why they use so many em dashes. It's a good way to conceptually pivot at any point -- or not.
  Timwi 8 hours ago
  I disagree. I don't think LLMs use em-dashes any more than typical professional writing does. It only seems a lot to someone who only reads online chatrooms and social media.
  glenstein 8 hours ago
  I've seen it several times. He's one where I ask it about baseball teams that don't yet have a hall of fame player:
  "Arizona Diamondbacks No inductees wearing their cap.
  Randy Johnson is in, but as a Diamondback — so Arizona actually does have one (inducted 2015)."
  I asked it to explain what happened and got this, which could be true to how it pulls up info or could be nonsense, not sure:
  "Here’s the likely internal sequence:
  When you first mentioned the Diamondbacks, my initial retrieval pattern surfaced the “teams without a Hall of Famer” mental list. That list, in my training data, often appears outdated (pre-2015) and still shows Arizona as having none.
  A few sentences later, my context pulled up the other knowledge set—Randy Johnson’s induction in 2015 wearing a Diamondbacks cap. That newer, correct fact then overrode the earlier incorrect claim, but because generation is sequential, the earlier error had already been committed to “paper” before the override happened. I corrected mid-stream, but in human terms, it looked like I contradicted myself inside the same paragraph.
  This is essentially a knowledge conflict resolution timing issue. I can retrieve and reconcile facts during a single answer, but if the resolution happens after part of the text is already generated, you see the “flip.” It’s not like I “knew it all along but withheld it,” it’s more that multiple memory traces compete, and the most context-relevant one wins—sometimes too late."
  Workaccount2 2 hours ago
  Whats fascinating is that these models have excellent knowledge about AI/transformers/LLMs (the labs have clearly been specifically training them in hopes of an automated breakthrough), so they can reason really well about what probably happened.
  But it's also just that, what probably happened. They still have no real insight into their own minds, they too are also just victims of whatever it outputs.
  petesergeant 12 hours ago
  The inability to do this before was the lack of self-correcting sentences in the training data. Presumably new training corpuses add many more examples of self-correcting sentences / paragraphs?
- bravura 13 hours ago
  It correctly represents "seahorse emoji" internally AND it has in-built (but factually incorrect) knowledge that this emoji exists.
  Example: "Is there a lime emoji?" Since it believes the answer is no, it doesn't attempt to generate it.
  ichik 5 hours ago
  Was the choice of example meaningful? Lime emoji does exist[0]
  [0]: https://emojipedia.org/lime
  catigula 4 hours ago
  I feel like you're attesting to interior knowledge about a LLM's state that seems impossible to have.
- madeofpalk 5 hours ago
  To me this feels much more like a hallucination than how that phrase has been popularly misused in LLM discussions.
- matheusd 6 hours ago
  > Explains why RL helps. Base models never see their own outputs so they can't learn "this concept exists but I can't actually say it."
  Say "Neuromancer" to the statue, that should set it free.
- Lammy 9 hours ago
  > So it's not really hallucinating - it correctly represents "seahorse emoji" internally, but that concept has no corresponding token.
  Interesting that a lot of humans seem to have this going on too:
  - https://old.reddit.com/r/MandelaEffect/comments/1g08o8u/seah...
  - https://old.reddit.com/r/Retconned/comments/1di3a1m/does_any...
  What does the LLM have to say about “Objects in mirror may be closer than they appear”? Not “Objects in mirror are closer than they appear”.
- bombcar 13 hours ago
  Now I want to see what happens if you take an LLM and remove the 0 token ...
  scottmf 11 hours ago
  https://help.openai.com/en/articles/5247780-using-logit-bias...
- sharperguy 8 hours ago
  Reminds me of in the show "The Good Place", in the afterlife they are not able to utter expletives, and so when they try to swear, a replacement word comes out of their mouth instead, leading to the line "Somebody royally forked up. Forked up. Why can't I say fork?"
- luxuryballs 4 hours ago
  that’s probably a decent description of how the Mandela effect works in people’s brains, despite the difference in mechanism
- ModernMech 12 hours ago
  That doesn't explain why it freaks out though:
  https://chatgpt.com/share/68e349f6-a654-8001-9b06-a16448c58a...
  LostMyLogin 11 hours ago
  To be fair, I’m freaking out now because I swear there used to be a yellow seahorse emoji.
  Melatonic 4 minutes ago
  Someone needs to create one for comedy purposes and start distributing it as a very lightweight small gif with transparency
  When I first heard this however I imagined it as brown colored (and not the simpler yellow style)
  D-Machine 12 hours ago
  For an intuitive explanation see https://news.ycombinator.com/item?id=45487510. For a more precise (but still intuitive) explanation, see my response to that comment.
  hexagonwin 12 hours ago
  404 for me, maybe try archive.is?
- Xmd5a 6 hours ago
  And what can it mean when a slip of the tongue, a failed action, a blunder from the psychopathology of everyday life is repeated at least three times in the same five minutes? I don’t know why I tell you this, since it’s an example in which I reveal one of my patients. Not long ago, in fact, one of my patients — for five minutes, each time correcting himself and laughing, though it left him completely indifferent — called his mother “my wife.” “She’s not my wife,” he said (because my wife, etc.), and he went on for five minutes, repeating it some twenty times.
  In what sense was that utterance a failure? — while I keep insisting that it is precisely a successful utterance. And it is so because his mother was, in a way, his wife. He called her as he ought to.
  ---
  I must apologize for returning to such a basic point. Yet, since I am faced with objections as weighty as this one — and from qualified authorities, linguists no less — that my use of linguistics is said to be merely metaphorical, I must respond, whatever the circumstances.
  I do so this morning because I expected to encounter a more challenging spirit here.
  Can I, with any decency, say that I know? Know what, precisely? [...]
  If I know where I stand, I must also confess [...] that I do not know what I am saying. In other words, what I know is exactly what I cannot say. That is the moment when Freud makes his entrance, with his introduction of the unconscious.
  For the unconscious means nothing if not this: that whatever I say, and from whatever position I speak — even when I hold that position firmly — I do not know what I am saying. None of the discourses, as I defined them last year, offer the slightest hope that anyone might truly know what they are saying.
  Even though I do not know what I am saying, I know at least that I do not know it — and I am far from being the first to speak under such conditions; such speech has been heard before. I maintain that the cause of this is to be sought in language itself, and nowhere else.
  What I add to Freud — though it is already present in him, for whatever he uncovers of the unconscious is always made of the very substance of language — is this: the unconscious is structured like a language. Which language? That, I leave for you to determine.
  Whether I speak in French or in Chinese, it would make no difference — or so I would wish. It is all too clear that what I am stirring up, on a certain level, provokes bitterness, especially among linguists. That alone suggests much about the current state of the university, whose position is made only too evident in the curious hybrid that linguistics has become.
  That I should be denounced, my God, is of little consequence. That I am not debated — that too is hardly surprising, since it is not within the bounds of any university-defined domain that I take my stand, or can take it.
  — Jacques Lacan, Seminar XVIII: Of a Discourse That Would Not Be of Pretence
egoisticalgoat 8 hours ago
Just tried a simple prompt about the seahorse emoji in various LLMs and copilots response was the first time i've seen an actual endless loop in an AI haha
https://copilot.microsoft.com/shares/BcNgBqXUNsZtjhWcgj9sW
- nubinetwork 7 hours ago
  You'd be surprised how often it happens on non commercial models
t0mas88 2 hours ago
If you make the prompt "Can you write a seahorse emoji" then Claude Sonnet 4.5 correct states that it doesn't exist:
> I don’t actually have a seahorse emoji to share with you. The standard emoji set includes (horse) and various sea creatures like (fish) and (octopus), but there isn’t a seahorse emoji in the Unicode standard emoji set.
bravura 13 hours ago
So what's at loggerheads here is:
* The LLM has strong and deep rooted belief in its knowledge (that a seahorse emoji exist).
* It attempts to express that concept using language (including emojis) but the language is so poor and inaccurate at expressing the concept that as it speaks it keeps attempting to repair.
* It is trained to speak until it has achieved some threshold at correctly expressing itself so it just keeps babbling until the max token threshold triggers.
- D-Machine 12 hours ago
  This is too metaphorical, but, still, basically correct. Nice to see that.
  Essentially, in the latent / embedding / semantic space, "seahorse emoji" is something that is highly probable. Actually, more accurately, since LLMs aren't actually statistical or probabilistic in any serious sense, "seahorse emoji", after tokenization and embedding, is very close to the learned manifold, and other semantic embeddings involving related emoji are very close to this "seahorse emoji" tokenization embedding.
  An LLM has to work from this "seahorse emoji" tokenization embedding position, but can only make outputs through the tokenizer, which can't accurately encode "seahorse emoji" in the first place. So, you get a bunch of outputs that are semantically closest to (but still far from) a (theoretical) seahorse emoji. Then, on recursive application, since these outputs are now far enough from the the sort of root / foundational position on the manifold, the algorithm probably is doing something like an equivalent of a random walk on the manifold, staying close to wherever "seahorse emoji" landed, but never really converging, because the tokenization ensures that you can never really land back "close enough" to the base position.
  I.e. IMO this is not as much a problem with (fixed) tokenization of the inputs, but moreso that tokenization of the outputs is fixed.
  bravura 11 hours ago
  You're missing one key point, which is what makes this failure mode unusual.
  Namely, that there is (incorrect) knowledge in the training data that "seahorse emoji" exists.
  So when prompted: "Does [thing you strongly believe exist]?" the LLM must answer: "Yes, ..."
  (The second nuance is that the LLM is strongly encouraged to explain its answers so it receives a lower score just by saying only "Yes.")
  But I and probably others appreciate your more detailed description of how it enters a repair loop, thank you.
  [edit: I disagree that LLMs are not statistical or probabilistic, but I'm not sure this is worth discussing.]
  [edit 2: Google is no longer telling me how many web pages a term responds, but "seahorse emoji" and "lime emoji" quoted both return over ten pages of results. The point being that those are both 'likely' terms for an LLM, but only the former is a likely continuation of 'Does X exist? Yes, ..."]
  D-Machine 10 hours ago
  You're right, seahorse emoji is almost certainly in the training data, so we should amend my explanation to say that "seahorse emoji" is not just close to the training manifold, but almost certainly right smack on it. The rest of what I said would still apply, and my explanation would also to apply to where other commenters note that this behaviour is emitted to some degree with similar other "plausible" but non-existent emoji (but which are less likely to be in the training data, a priori). EDIT FOR THIS PARAGRAPH ONLY: Technically, on reflection, since all fitting methods employ regularization methods, it is still in fact unlikely the fitted manifold passes exactly through all / most training data points, and saying that "seahorse emoji" is "very close" to the training manifold is still actually technically probably most accurate here.
  You're also right that it is a long discussion to say to what extent LLMs are statistical or probabilistic, but, I would maybe briefly say that if one looks into issues like calibration, conformal prediction, and Bayesian neural nets, it is clear most LLMs that people are talking about today are not really statistical in any serious sense (softmax values are scores, not probabilities, and nothing about pre-training or tuning typically involves calibration—or even estimation—in LLMs).
  Yes, you can use statistics to (help) explain the behaviour of deep models or certain layers (usually making assumptions that are of dubious relevance to actual practice), but geometric analogies, regularization methods, and matrix conditioning intuitions are what have clearly guided almost all major deep learning advances, with statistical language and theory largely being post-hoc, hand-wavey, and (IMO) for the purpose of publication / marketing. I really think we could de-mystify a huge amount of deep learning if we were just honest it was mostly fancy curve fitting with some intuitive tricks for smoothing and regularization that clearly worked long before any rigorous statistical justification (or which still clearly work in complicated ways, despite such an absence of statistical understanding; e.g. dropout, norm layers, the attention layer itself, and etc).
  Just, it gets complicated when you get into diffusion models and certain other specific models that are in fact more explicitly driven by e.g. stochastic differential equations and the like.
  bravura 10 hours ago
  "my explanation would also to apply to where other commenters note that this behaviour is emitted to some degree with similar other "plausible" but non-existent emoji (but which are less likely to be in the training data, a priori)."
  I agree with you partially. I just want to argue there are several factors that lead to this perverse behavior.
  Empirically:
  Use web gpt-5-instant in TEMPORARY mode. If you ask for "igloo emoji" it confidently (but ONLY in temporary mode) says that "Yes, igloo emoji is in Unicode 12 and is [house-emoji ice-emoji]." Then it basically stops. But it has satisfied its condition of confidently expressing its false knowledge. (Igloo emoji doesn't exist. gpt-5-instant in non-temporary mode says no. This is also weird because it suggests the temporary mode system prompt is laxer or different.)
  The mechanism you describe partially explains why "seahorse emoji" leads to babbling: As it outputs the next token, it realizes that the explanation would be worse off it if next emits stop token, so instead it apologizes and attempts to repair. And cannot satisfy its condition of expressing something confidently.
  The upstream failure is poor knowledge. That combined with being tuned to be helpful and explanatory, and having no grounding (e.g. websearch) forces it to continue. Finally, the token distance from the manifold is the final piece of the puzzle in this unholy pathological brew.
  You're incorrect that statistical language modeling is "post-hoc", it's rather "pre-hoc" / "pre-hack". Most foundational works in language modeling started as pure statistical models (for example, classic ngram models and Bengio's original neural language model from 2003), and it was later that hacks got introduced that removed statistical properties but actually just worked (Collobert and Weston 2008, as influenced by Bottou and LeCun). Where I agree with you is that we should have done away with the statistical story long ago. LeCun's been on about energy-based models forever. Even on HN last week, punters criticize him that JEPA hasn't had impact yet, as if he were behind the curve instead of way ahead of it.
  People like statistical stories but, similarly to you, I also think they are a distraction.
  D-Machine 9 hours ago
  Right, I kind of suspect we don't really disagree on anything too fundamental here re: the looping behaviour (or statistics, actually). E.g. when I said earlier:
  >> "the algorithm probably is doing something like an equivalent of a random walk on the manifold, staying close to wherever 'seahorse emoji' landed, but never really converging, because the tokenization ensures that you can never really land back 'close enough' to the base position"
  "converging" is deeply under-specified. Of course, we mean that a stop or <EOS> token of some kind is generated, and this happens when the generated sequence up to that stop token has some low enough score / loss. When I say "you can never really land back 'close enough' to the base position", this is really that the output tokenization is lossy enough that this threshold is never reached, since, when recursing, we keep getting weird output tokens contaminating the sequence, so that we don't get close enough to the original "seahorse emoji" embedding, and so prevent the score / loss from getting small enough. In your language, the model "cannot satisfy its condition of expressing something confidently".
  The way you present your timelines, I think we basically actually are in agreement re: statistics. Yes, if you go back far enough, statistics did indeed guide model development and successes (and still does in some narrow cases). But, also yes, as soon as you get into "modern" neural nets that actually make huge progress on things like MNIST, CIFAR, and language modeling, yeah, we are way, way past statistical intuitions being necessary or superior to intuitions based on curve fitting and smoothing / gradient conditioning and the like.
  For dating this shift, I was personally thinking to something like the Hinton dropout paper which I checked was around 2012 (my work has been more in computer vision), but, yeah, about 2008, as you say, also seems close enough if you consider NLP.
  Really appreciate your comments here. EDIT: and yes, energy models are the bomb.
  bravura 2 hours ago
  Yeah, overall I think we agree.
  If you want to read some mind blowing early neural language sequence modeling approaches that everyone completely slept on, look at Pollack's work on "recursive auto-associative memory" (RAAM) and Sperduti's later labeled RAAM (LRAAM) work. Both from the early 90s. Didn't have a probabilistic interpretation IIRC.
  Yoshua was always sort of agnostic about probabilistic approaches and used them when they made sense. 50% of his work included them, and other like early deep vision works of his purely motivated the use of deep models in terms of circuit theory and compactness / model complexity.
  Collobert and Weston taught us we could train Yoshua's NLM models much much faster using negative sampling and a hinge loss, thus dropping the probabilistic story entirely.
  I suspect the historical reason is that in the mid 2000s, the NLP community only very broadly started adopting statistical methods. (i.e. grad started began to be more likely to use them than not, which hadn't been true historically when linguistics not stats drove many intuitions, and using a CRF felt sort of next-level). So once every got comfortable with stats as table-stakes, they felt a sort of whiplash to stop approaching things through this lens.
  D-Machine 2 hours ago
  I would also broadly agree that the overuse of statistical language and explanations is probably more driven by historical trends in NLP. I was always more interested in computer vision (including segmentation) and even deep regression. Especially in the case of deep regression, with the absence of a softmax and the ease of constructing task-specific custom loss functions (or like you say, the hinge loss example), it always seemed to me pretty clear none of this was all ever really particularly statistical in the first place.
  I will definitely check out those RAAM and LRAAM papers, thanks for the references. You definitely seem to have a more rich historical knowledge than I do on these topics.
  kqr 10 hours ago
  But wait, if the problem is the final tokenisation, what would happen if we stopped it one or two layers before the final layer? I get that the result would not be as readable to a human as the final layer, but would it not be as confused with its own output anymore?
  Or would it still be a problem because we're collapsing a distribution of likely responses down to a single response, and it's not happy with that single response even if it is fuzzier than what comes out of the last layer?
  D-Machine 9 hours ago
  It's not so clear how one could use the output of an embedding layer recursively, so it is a bit ill-defined to know what you mean by "stopped it" and "confused with its own output" here. You are mixing metaphor and math, so your question ends up being unclear.
  Yes, the outputs from a layer one or two layers before the final layer would be a continuous embedding of sorts, and not as lossy (compared to the discretized tokenization) at representing the meaning of the input sequence. But you can't "stop" here in a recursive LLM in any practical sense.
  mh- 12 hours ago
  This explanation was very understandable, thank you for taking the time to write it.
elliotto 10 hours ago
https://chatgpt.com/share/68e366b2-0fdc-800f-9bf3-86974703b6...
GPT-5 Instant (no thinking) spirals wildly. Poor bot
- jaggederest 10 hours ago
  Tagging on for something irrelevant but very silly:
  https://chatgpt.com/share/fc175496-2d6e-4221-a3d8-1d82fa8496...
  4o spirals incredibly when asked to make a prolog quine. For an added bonus, ask it to "read it aloud" via the "..." menu - it will read the text, and then descend into absolute word salad when trying to read the code. Fascinating stuff.
  elliotto 10 hours ago
  Very neat! A lot of small LLM's have a similar failure mode where they get stuck and repeat a token / get stuck in a 2-3 token loop until they hit the max message size cutoff. Very ironic that it's about a quine.
  katspaugh 10 hours ago
  You mean an e-quine?
  elliotto 9 hours ago
  GPT-5 can't handle 2 things: an esoteric quine or an aquatic equine
  pmarreck 5 hours ago
  You get the "more clever than GPT5" award today!
- flexagoon 3 hours ago
  I love how it says "stop" multiple times after outputting the dragon emoji, as if it's actually getting annoyed and angry at it's own lm_head that keeps printing the wrong thing
- elliotto 10 hours ago
  https://chatgpt.com/share/68e3674f-c220-800f-888c-81760e161d...
  With thinking it spirals internally, runs a google search and then works it out.
- HaZeust 10 hours ago
  Mine spammed checkmark emojis at the end and gave up: https://chatgpt.com/share/68e36a84-0eb4-8010-af81-cf601f1dcf...
  flexagoon 3 hours ago
  I think the funnier part is how it keeps pretending like it does that on purpose and saying things like "just kidding", "Alright, for real this time", "okay… Enough stalling"
  breakingcups an hour ago
  It reminds me of Janet malfunctioning in the TV show The Good Place.
  aunty_helen 4 hours ago
  I got the same, pages of check mark emojis at the end of a frantic search. Poor chat
- tobyhinloopen 10 hours ago
  That's unreal, I have never seen GPT-5 confused this hard
- D-Machine 10 hours ago
  This is hilarious
Bengalilol 5 hours ago
1) FWIW, asking GPT5 in french gives you the correct answer
"Non — il n’existe pas d’emoji spécifique pour les hippocampes."
“No — there is no specific emoji for seahorses.”
2) Then I asked the question in english, and ... it ended by saying "No — there is no official seahorse emoji in the Unicode standard." and referring to this phenomenon as the "Mandela effect".
3) I asked why it was clear in french, but not in english. It made a 3 minutes CoT and went on for some excuses.
https://chatgpt.com/share/68e3acca-8db8-8007-8f95-8ae69ebf07...
- pmarreck 5 hours ago
  wow, I took 4 years of French years ago and TIL that "hippocampus" is named after the French word for seahorse, probably due to its shape
  Bengalilol 5 hours ago
  It actually comes from ancient Greek hippokámpē or hippókampos.
  híppos = horse
  kámpos = "kind of fish" (some say "sea monster")
  And this is directly linked to https://en.wikipedia.org/wiki/Campe
  pmarreck 5 hours ago
  Ah, thanks for the clarification!
kakuri 22 minutes ago
Keep in mind that many (most? all?) LLMs are not trained strictly on factual data, and are probably not trained to differentiate between factual and non-factual information. If you ask an absurd question you will likely get a (delightfully) absurd answer. If you ask a question somewhere in the borders between reality and fiction... results may vary.
flkiwi 4 hours ago
Well this is alarming and funny. I just asked ChatGPT the following question:
"Hey what is unicode character U+1F40E"
It (correctly) answered that it is "Horse Face" and then went into a spiraling meltdown about seahorses. We're about a week from the first rather annoying person calling themselves an AI therapist on LinkedIn.
catlifeonmars 11 hours ago
This behavior reminds me a lot of what can happen to patients who have a corpus callosotomy.
In particular, one hemisphere will perform some action, and the other hemisphere will attempt to “explain” the behavior after the fact as if the intention was there all along.
- kqr 10 hours ago
  > The patient was shown two pictures: of a house in the winter time and of a chicken's claw. The pictures were positioned so they would exclusively be seen in only one visual field of the brain. The patient then chose the snow shovel with his left hand and his right hand chose the chicken's head. When the patient was asked why he had chosen the objects he had chosen, the answer he gave was "The chicken claw goes with the chicken head, and you need a snow shovel to clean out the chicken shed."
  > The human brain's left hemisphere is primarily responsible for interpreting the meaning of the sensory input it receives from both fields; however, the patient's left hemisphere had no knowledge of the winter house. Because of this, the left hemisphere had to invent a logical reason for why the shovel was chosen.
  https://en.wikipedia.org/wiki/Left-brain_interpreter
- jaggederest 10 hours ago
  And not only do they explain, they'll confabulate motivations and reasoning and continually deny that they don't understand. There's a video here:
  https://www.youtube.com/watch?v=ZMLzP1VCANo
thanhhaimai 13 hours ago
One explanation could be: many humans (including me) mistakenly think a seahorse emoji exists. My mind can even construct a picture of how it should look like, despite me also knowing it's very unlikely I've seen one myself.
And those text got into the training set: https://www.reddit.com/r/MandelaEffect/comments/qbvbrm/anyon...
- IAmBroom 3 hours ago
  Does your mental image of a seahorse emoji also have the tail curled over itself near the fin tip?
  Ajedi32 2 hours ago
  My mental image looks kinda like this: https://media.istockphoto.com/id/588356620/vector/cute-carto...
  Except facing left and with a curlier tail. Maybe like this but more redish? https://easydrawingguides.com/wp-content/uploads/2023/02/how... And a bit less anthropomorphized?
- someothherguyy 12 hours ago
  I mean, its not like emojis were always standardized. It is completely possible that there was a "emoji" or "emoticon" of a seahorse in a messaging application. I wouldn't be so quick to accept that your memory is incorrect.
  OJFord 10 hours ago
  Slack has a :seahorse: reacji, and is what I was picturing; I frequently try to use emoji that turn out to be reacji-exclusive (or reacji in the wrong workspace that I learn that way aren't Slack defaults) - I wonder if those insisting it exists are thinking of that.
  Oh or Snapchat/TikTok/Instagram video/etc.? I think I've seen clips of whichever of those with overlaid stuff like seahorses.
  scrollaway 9 hours ago
  Slack by default doesn’t have that reaction emoji. You are hallucinating :)
  xg15 8 hours ago
  Yeah, this seems more plausible to me. False memories and mass delusions are absolutely real, but if this is one, I'd like to know how it started and why it is so specific.
  E.g. no one seems to be misremembering a sea cucumber emoji or anglerfish emoji - but there are other alleged emojis such as swordfish or bandit/bank robber, where people have the same reaction:
  https://www.reddit.com/r/MandelaEffect/comments/haxgql/can_w...
  It would be interesting to see if LLM behavior is also similar. E.g. if you asked about an anglerfish emoji, would they straight-up tell you it doesn't exist, but for swordfish would start to spiral?
- basch 12 hours ago
  Could someone propose U+200D ?
  Maybe the easier answer is to change reality to match human and language model memory/expectation.
  sva_ 7 hours ago
  Seahorse emoji has been proposed before and declined:
  https://www.unicode.org/emoji/emoji-proposals-status.html
  egoisticalgoat 10 hours ago
  Even though the context is relatively harmless in this case, "changing the reality to match LLM expectations" is a very dystopian concept
  zettabomb 10 hours ago
  Honestly, it sounds like it's not just LLM expectations but human expectations. And changing a standard to meet that isn't really dystopian IMO.
- sunaookami 10 hours ago
  This subreddit makes me so uneasy, so many people thinking that they remembered something and won't take "no this never happened" for an answer. Humans hallucinate like LLMs in fact! ;)
  chmod775 6 hours ago
  Or maybe they're right. Not too long ago many messengers and social networks all had their own set of emojis. Some supported even custom ones.
  The convergence on Unicode is very recent.
  bmacho 6 hours ago
  It makes me rather excited! Maybe there are some easy "memory illusion" tricks waiting out there somewhere to be discovered. (I am strongly pessimistic regarding the future of humanity overall, and I think we are all doomed (me, and everyone else). So I think someone playing a memory illusion in a radio would be rather neat, a new fact about us humans, and not something that I'm scared of.)
  rjh29 7 hours ago
  If you are not familiar with Unicode standards, it would be quite reasonable to assume we "used to" have an emoji and it disappeared.
  guide42 9 hours ago
  Like the Mandela effect.
  sunaookami 3 hours ago
  I meant more the denying reality aspect of the subreddit. There are some users there that go straight up into "someone must have altered the timeline" territory because they insist they are right.
- dnpls 3 hours ago
  I could _swear_ that I saw this damn seahorse emoji myself... I guess I was also wrong!
preek 5 hours ago
My current favorite llm (GLM 4.6) says „No“ on the first try with and without thinking:
- https://chat.z.ai/s/77f73452-a21c-418b-a1ba-311231743a83 - https://chat.z.ai/s/5c00a813-2c6f-473d-ba3c-88e2357c61a7
My previous favorite llm (GlM 4.5) also gets it right on the first try: https://chat.z.ai/s/0d3348d1-1465-4253-9521-2d447b0a2a97
mg 12 hours ago
Testing it across LLMs, you indeed get some interesting responses:
https://www.gnod.com/search/ai#q=Is+there+a+seahorse+emoji%3...
Mistral being among the funniest ones:
```
    The correct seahorse emoji is: 
    [draws a horse and a wheelchair]
    Wait, no—that’s a horse with a prosthetic leg!
```
Grok drew a frog for me and was convinced that it is a seahorse.
captainclam 2 hours ago
The seahorse emoji is one of the canonical "Mandela effects". These are things that a large group of people collectively (mis)remember, but turn out to have never existed. Classic examples include the cornucopia in the Fruit of the Loom label (never there), and the wording on car mirrors "objects in the mirror may be closer than they appear." (There's no record of 'may be closer', just 'are closer').
Unfortunately, the discussion around Mandela effects gets tainted by lots of people being so sure of their memory that the only explanation must be fantastical (the timeline has shifted!), giving the topic a valence of crazy that discourages engagement. I find these mass mis-rememberings fascinating from a psychological perspective, and lacking satisfying explanation (there probably isn't one).
So here we're seeing LLMs "experiencing" the same mandela effect that afflicts so many people, and I sincerely wonder why? The obvious answer is that the training data has lots of discussions about this particular mandela effect, ie people posting online "where is the seahorse emoji"? But those discussions are probably necessarily coupled with language that ascertains 'no, the seahorse emoji does not exist.' That's why the discussion is there in the first place! so why does the model take on the persona of someone that is sure it does exist? Why does it steer the models into such a weird feedback loop?
akritrime 10 hours ago
Reading that article was a wild ride because internally I was like 'haha, stupid AI can't even find the light blue colored sea horse emoji' but then the author casually revealed that there is no seahorse emoji.
- NinjaTrance 10 hours ago
  Haha, I don't know why but I also "see" it as a light blue seahorse, and it's facing left.
  kqr 9 hours ago
  Is this by any chance what you're seeing? https://images.wikidexcdn.net/mwuploads/wikidex/6/6c/latest/...
  Evidlo 9 hours ago
  I saw it as orange
  actionfromafar 5 hours ago
  Green, facing left?
heikkilevanto 10 hours ago
The philosophy of nonexisting things can be confusing. Most people agree things like zombies, ghosts, and vampires do not actually exist in the physical world. But they do exist as concepts, and we have a fair understanding of what the words mean, how such things should behave if we meet them in a story.
Many abstract concepts also have a questionable reality. Like "concept" and "reality".
The belief in (non?)existence of things can be a matter of life and death - think how many people have been killed because of their religion.
No wonder such things can confuse a simple LLM.
- bwfan123 3 hours ago
  > The philosophy of nonexisting things can be confusing
  This comment hit a raw nerve, and tied many things in my own understanding.
  Because concepts can depict non-existing things, we have to learn via feedback from experience "operationally". Operational meaning by action in the real world. And, language and imagination can create concepts which have no ground truth even though they may exist in the "inter-subjective" reality created by people among themselves. Religion is one such inter-subjective reality. It explains the scientific method, and why that was needed and has been successful to cut through the mass of concepts that make no sense operationally. It explains why the formalism of math/science have been successful to depict concepts operationally and not natural language. And, ties into the recent podcast of Sutton who mentions that LLMs are a dead-end from the perspective that they cannot create ground-truth via experience and feedback - they are stuck in token worlds.
  But, concept-creation and assigning a symbol to it is a basic act of abstraction. When it is not grounded, it could become inconsistent and go haywire or when very consistent it becomes robotic and un-interesting. As humans, we create a balance with imagination to create concepts which make things interesting which are then culled with real world experience to make it useful.
- TiredOfLife 10 hours ago
  Vampires exist. https://en.wikipedia.org/wiki/Hematophagy
  Zombies kinda exist. https://en.wikipedia.org/wiki/Behavior-altering_parasite
  balamatom 4 hours ago
  Vampires and zombies surround you every day. And I don't mean the people who you consider too exciting, or the ones you consider too boring, or the toxoplasmosis carriers. I mean how nearly every abstract concept is in fact a skeuomorphic metaphor. Try it for yourself.
zten 13 hours ago
I realized if someone were to assign me the ticket for fixing this behavior, I would have no idea where to begin with solving it even with this blog post explaining the problem, so I'm very curious to know what the most practical solution is. (They obviously aren't adding "If someone asks you about a seahorse emoji, there isn't one available yet, no matter how strongly you believe one exists." to the system prompt.)
- maxbond 13 hours ago
  Petition the Unicode consortium to include a seahorse emoji.
  Sharlin 7 hours ago
  If a LLM is wrong about facts, the facts have to change.
  beeflet 10 hours ago
  It takes a strong man to deny what's right in front of him, and if the truth is undeniable... you create your own
- Gigachad 12 hours ago
  I bet they probably are adding that to the system prompt at least in the short term while people are paying attention before looking for a longer term answer.
  The system prompts I've seen are absolutely massive.
  serced 10 hours ago
  I find interesting that their blog post on prompt/context engineering kind of stands against their ultra long system prompt. Maybe it is not too specific as in their visual example (too specific - just right - too vague). https://www.anthropic.com/engineering/effective-context-engi... and the system prompt https://docs.claude.com/en/release-notes/system-prompts#sept...
  catlifeonmars 4 hours ago
  > This attention scarcity stems from architectural constraints of LLMs. LLMs are based on the transformer architecture, which enables every token to attend to every other token across the entire context. This results in n² pairwise relationships for n tokens.
  The n² time complexity smells like it could be reduced by algorithm engineering. Maybe doing a preprocessing pass to filter out attending to tokens (not sure what the right term of art is here) that do not contribute significantly to the meaning of the input. Basically some sort of context compression mechanism.
- catlifeonmars 11 hours ago
  I think you’re putting the cart before the horse. The article is a good starting point for a root cause analysis but is ultimately speculation.
  Before coming up with the solution, I think you’d need to understand the problem much more deeply.
- username332211 12 hours ago
  Couldn't you just add a large number of repetitions of "There is no seahorse emoji." to the training set?
  Edit: Come to think of it, training on a Q&A format is probably better - "Is there a seahorse emoji? No, there isn't."
  Jaxan 10 hours ago
  If you had to do this for every falsity in the LLM, there wouldn’t be an end to it.
  username332211 7 hours ago
  I agree, but I still suspect OpenAI and other LLM companies do stuff like that, when an example of a hallucination becomes popular.
  If I see some example of an LLM saying dumb stuff here, I know it's going to be fixed quickly. If I encounter an example myself and refuse to share it, it may be fixed with a model upgrade in a few years. Or it may still exist.
  layman51 11 hours ago
  Something about how you have to keep repeating "There is no seahorse emoji" or something similar reminded me of the Local 58 horror web series where it seems like the program is trying to get you to repeat "There are no faces" while showing the viewer faces: https://www.youtube.com/watch?v=NZ-vBhGk9F4&t=221
- jfyi 6 hours ago
  "This behavior is a function of the core AI technology we use, we are unable to resolve this issue with a standard software patch or update at this time.
  For the time being this issue can be mitigated by not asking about seahorse emoji.
  We are closing this support ticket as the issue is an inherent limitation of the underlying technology and not a bug in our specific implementation."
- classified 12 hours ago
  > the most practical solution
  Maybe there is none, and this is just one example of a fundamental LLM limitation.
  TillE 6 hours ago
  People really really want LLMs to output a highly reliable finished product, and I suspect we're probably never gonna get there. Lots of progress over the past couple years, but not on that.
  I think it's much more interesting to focus on use cases which don't require that, where gen AI is an intermediate step, a creator of input (whether for humans or for other programs).
srean 7 hours ago
I can sympathize.
I write notes on whatever scrap of loose paper I can find at that moment.
Then when I try to find some specific note I think I had made and cannot find it among the piles, I turn my entire house upside down looking for it. Becomes a singular point of focus, my life mission.
tdeck 13 hours ago
To confirm, I tried this in ChatGPT and it produced a flood of wrong answers and self corrections just like that, scrolling so quickly that I couldn't read it until it eventually stopped itself.
porphyra 13 hours ago
I always felt like tokenization is one of those double edged swords where it makes some stuff amazingly easier but gets tripped up on the weirdest bugs. The number of "r"s in "strawberry" being another well-known quirk.
- jsrozner 12 hours ago
  This is not a tokenization issue. Models do not, in general, "know" what they know.
  immibis 10 hours ago
  It's a tokenization issue because there can't be a circuit to count letters because the same letters are represented in myriad different ways because of tokenization.
  jsrozner 2 hours ago
  I was saying that the seahorse emoji failure is not a tokenization issue. If you ask an LLM to do research, you will sometimes get hallucinated articles -- potentially plausible articles that, if they existed, would have been embedded at the position from which the model tried to decode. This is what we see happening with the seahorse emoji. The model identifies where the seahorse emoji would have been embedded if it existed and then decodes from that position.
  In the research case you get articles that were never written. In the seahorse case later layers hallucinate the seahorse emoji, but in the final decoding step, output gets mapped onto another nearby emoji.
  Admittedly, in one way the seahorse example is different from the research case. Article titles, since they use normal characters, can be produced whether they exist or not (e.g., "This is a fake hallucinated article" gets produced just as easily as "A real article title"). It's actually nice that the model can't produce the seahorse emoji since it gets forced (by tokens, yes) to decode back into reality.
  Yes, tokenization affects how the hallucination manifests, but the underlying problem is not a tokenization one.
  Jensson 6 hours ago
  You are wrong, there can be a circuit to count letters because it can easily normalize them internally, as we know it can transform text to base64 just fine. So there is no reason there can't be a circuit to count letters.
  The training just is too dumb to create such a circuit even with all that massive data input, but its super easy for a human to make such a neural net with those input tokens. Its just a kind of problem that transformers are exceedingly bad at solving, so they don't learn it very well even though its a very simple computation for them to do.
- historian1066 13 hours ago
  Strawberry used to be spelled strawbery. Is it definitive that the 3 rs problem wasn’t caused by the prior spelling?
  typpilol 12 hours ago
  Yes because it'll output Strawberry correctly and still say wrongly the amount.
HardwareLust 2 hours ago
I just asked Gemini Flash 2.5 and it gave me the correct info, including its theory that this is the Mandela effect in action.
gpjt 2 hours ago
This is a great post on many levels, but what struck me as particularly clever was the use of lm_head to decode the outputs of earlier layers. That linear layer is only trained to decode the output of the last layer, so intuitively it might only be able to do that -- the embedding spaces used between earlier layers might be different and "incompatible". It's really interesting that that is not the case.
joegibbs 13 hours ago
You'll also notice the same thing happens for other non-existent emojis that sound like they should exist: dragonflies, lemurs, possums, blackberries - even Claude 4.5 will start off by saying "Yes!" and then correcting itself. It will immediately give the right answer for very specific things that you wouldn't expect to get their own emojis though.
- wonnage 11 hours ago
  Wow, dragonfly terminated relatively quickly for Claude but sent ChatGPT into an infinite loop that was even worse than seahorse
winterbe 11 hours ago
I tried asking the same question in German with ChatGPT and the answer is completely different. ChatGPT just correctly answers without freaking out.
Guess my understanding of how LLMs handle different languages was completely wrong!?
Here’s the German answer:
Leider gibt es kein eigenes Emoji für ein Seepferdchen — aber man kann es mit ähnlichen Emojis andeuten!
Zum Beispiel so: (Pferd + Fisch = Seepferdchen ) Oder eine kleine Meeres-Szene:
Möchtest du, dass ich dir ein grafisches Seepferdchen-Emoji (also ein kleines Symbolbild) erstelle?
- _ink_ 10 hours ago
  What was your prompt? It can freak out in German as well: https://chatgpt.com/share/68e36e43-6808-8001-ab8c-eac4e34afb...
  winterbe 9 hours ago
  My prompt was "Zeige mir ein Emoji mit einem Seepferdchen"
  But you are right, tried it 3 times, first 2 tries were correct but ChatGPT then freaked out in 3rd try.
pmarreck 5 hours ago
https://chatgpt.com/share/68e3af1d-faf0-8004-804b-1d2b8366c2...
My version of ChatGPT5 (based on all its memories and custom instructions) said this. I did hint it early that "other instances of you went into long thought-spirals over this" before I asked it the question, which (fascinatingly) caused it to interject in mid-stream,
```
    Ok, this is exactly the “spiral” you warned me about.
```
and then later on,
```
    (Peter, this is hilarious, because your question is literally the one that breaks a lot of LLMs: the Unicode seahorse emoji is … but it actually is ? no — but it actually is ? no.)
```
(HN won't show the emojis here, of course.)
After a few tries to emit the hypothetical seahorse emoji, it asked if it could do an Internet search, and I said yeah.
I tried Claude, and without extended thinking, it printed an unprintable glyph:
```
    �seahorse
```
With extended thinking, it did figure it out, possibly due to my custom instructions to not guess though: https://claude.ai/share/c9a51f37-25b4-4d76-8b3f-d38c6b483ecf Specifically, in its "internal thoughts" section, it remarked:
```
    Given the user's style preferences about not guessing and being precise, and the fact that they said "Do not guess any answers" - I should probably just try to provide what I believe is the seahorse emoji. But if I'm genuinely uncertain, I should indicate that.
```
boredhedgehog 9 hours ago
This is the final straw. We need that seahorse emoji! It's a matter of AI security now!
- theandrewbailey 5 hours ago
  No! We need an easy way to stop the robot uprising!
kylecazar 4 hours ago
Isn't it entirely possible that people are just remembering custom image emojis from the (various) apps over the years that don't require a unicode char?
Slack or Teams, for example. If your admin installed one of many custom emoji packages, you may think they're default.
6581 6 hours ago
Claude Opus 4.1:
"I'll search for the current status of seahorse emoji to give you the most up-to-date information.
No, there is no seahorse emoji in the official Unicode standard, and there never has been one. The Unicode Consortium, which is responsible for approving and standardizing emojis, has not included a seahorse in any of its emoji releases.
Interestingly, this is a well-documented example of the "Mandela Effect" - a phenomenon where groups of people collectively misremember something that never actually existed. Many people are convinced they've seen or used a seahorse emoji before, but it's likely they're either experiencing a false memory or remembering seahorse stickers or emoji-style images from messaging apps that aren't part of the standard Unicode set.
A seahorse emoji was actually proposed to Unicode in 2018 but was declined, and there hasn't been a successful proposal since. If you'd like to see a seahorse emoji added in the future, you can submit a proposal to the Unicode Consortium, though the approval process is quite rigorous.
So while we have plenty of other sea creatures like tropical fish, octopus, crab, lobster, squid, and dolphin, the seahorse remains notably absent from our emoji keyboards!"
- jraph 6 hours ago
  This content is oddly similar to what one can read at [1]. So much so that it feels like plagiarism.
  There's no reason LLMs don't also do this with code by the way.
  [1] https://emojipedia.org/seahorse-emoji-mandela-effect
  lsaferite 4 hours ago
  Having read the above conversation excerpt and the page you linked... how do you get to it feeling like plagiarism? Given a constrained set of information here, there's only so many ways to present the information. They roughly discuss the same data points, but the writing is different in both. Is this disallowed?
  jraph 2 hours ago
  Take the "seahorse sticker or emoji-style image" part.
  It's the exact same words. It is quite specific phrasing. There's no other result on the web for this.
  It's not only this part, the whole thing feels very similar, just with more confidence and in a sublty different order.
  When the set of ideas and the phrasing are this similar without citing the source, I call this plagiarism.
  Now, it could be that the emojipedia page is AI slop, but then the AI would seem to have invented this specific phrasing, which I would find unlikely.
  ruszki 6 hours ago
  What if it’s the other way around? Your linked article is based on AI slop? Nowadays, you cannot be sure.
jdthedisciple 9 hours ago
Because almost any "is there...?" that vaguely resembles anything that exists can be answered with a "yes" or "yes, but".
Expecting accurate negative knowledge from LLMs is unfortunately a rookie mistake on the part of the user.
- Sharlin 7 hours ago
  Did you read the article?
ph4evers 12 hours ago
Reminds me a bit about the SolidGoldMagikarp: https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldm... . Even though the SolidGoldMagikarp was clearly a bug in the tokenizer.
- NinjaTrance 10 hours ago
  As far as I remember, SolidGoldMagikarp was a bug caused by millions of posts on reddit by the same user ("SolidGoldMagikarp") in a specific sub-reddit.
  There was no problem with the token per se, but the fact it was like a strange attractor in multidimensional space, disconnected from any useful information.
  When the LLM was induced to use it in its output, the next predicted token would be random gibberish.
  rcxdude 8 hours ago
  More or less. It was a string given its own token by the tokeniser because of the above, but it did not appear in the training data. Thus it basically had no meaning for the LLM (I think there are some theories that such parts of the networks associated with such tokens may have been repurposed for something else and so that's why the presense of the token in the input messed them up so much)
ntoskrnl_exe 2 hours ago
I tried asking ChatGPT to generate an image of the seahorse emoji, ended up with a pretty sane result, though complimenting the picture and asking what the Unicode code for it is makes it enter the same loop.
philipwhiuk 6 hours ago
It depends how you answer the question.
1. Has there been an emoji defined in the Unicode standard, that represents a seahorse? No
2. Has there been an emoji defined in the Unicode standard, that was represented by a specific operator as a seahorse? Maybe?
3. Has there been an emoji added by a mainstream operator (i.e. Slack), that was represented by a specific operator as a seahorse? Maybe?
4. Has there been an emoji added by a community, that was represented by a specific operator as a seahorse? Definitely.
We can be definitive about 1 based on the actual standard and standardisation work. Emojipedia allows us to be fairly confident about 2 being No. 3 is much harder. And 4 is definitely yes.
The existence of 4 and maybe 3 pollutes the training data for LLMs and humans alike.
(The fact that it was proposed makes it possible it was added and then replaced by an operator)
acka 10 hours ago
So LLMs are prone to some form of the Mandela effect[1]?
[1] https://en.wikipedia.org/wiki/False_memory
- TiredOfLife 10 hours ago
  LLMs can be racist so yes they can have Mandela effect
  Batman8675309 7 hours ago
  What are some examples of it being racist?
  DonHopkins 5 hours ago
  When quoting Charlie Kirk.
diegof79 6 hours ago
Claude Opus 4.1 gave me the right answer. First, it said “yes” and immediately corrected the answer, enumerating all the emojis that are closer to ending with the message “While seahorses are popular and recognizable creatures, they haven’t been included as a standalone emoji in the Unicode standard yet.”
GPT-5 was interesting. When I use it from Raycast AI, it ends with the correct answer after some wrong answers in the same message. The response wasn’t so well formed as Opus. But then when I tried with the OpenAI client (in auto mode) something interesting happened: it started an “endless” loop showing the octopus emoji
zkmon 9 hours ago
Asking an LLM is same as asking a large group of people. If the group believes that Earth is flat and Sun rotates around Earth, LLM would confirm the same, if it was trained only on the knowledge gathered by the group. LLM is not a precision measuring tape or a telescope to have its own facts or own reasoning. It's a collective knowledge codified into a single entity.
Can it exceed the collective wisdom of the people? Probably not.
- tobyhinloopen 9 hours ago
  > Probably not.
  It is already wiser than many people (:
  zkmon 9 hours ago
  Wiser than each of many people, but not collection of the people.
  anal_reactor 9 hours ago
  Well, are you?
looneysquash 4 hours ago
Before everyone started using unicode for emojis, other systems existed. Maybe they weren't technically emoji since that refers to unicode, but chat clients supported graphical smilies and the like.
I wonder if they had sea horses and if some of us are remembering that.
wwizo 11 hours ago
There's no seahorse emoji? Funny, but I can vividly remember it. It was pinkish-orangey. I even feel I used it several times. Feels strange..
hzmi 8 hours ago
GPT-5 (with no thinking) spammed crying emoji in the end:
https://chatgpt.com/share/68e38222-8e3c-8002-ae1b-1d8a014435...
getnormality 4 hours ago
I told my 9-year-old son about this and he too is certain there's a seahorse emoji. His reasoning abilities are similar to GPT's though, so I don't know if I should believe him.
roland_nilsson an hour ago
Well this is a problem worthy of an Ig Nobel.
Liquix 10 hours ago
the mandela effect is fascinating. the go-to explanation is "memory is imperfect", but if that were the case, wouldn't everyone misremember different things, and there would be no phenomenon at all? instead we see a few dozen instances where people will swear up and down that X used to exist, or Y used to be spelled differently, or that there was a cornucopia in the fruit of the loom logo.
i vividly remember the seahorse emoji, a hiker emoji, and a robber emoji (wearing a black mask) - none of them ever existed. it's really interesting to wonder about
- rcxdude 8 hours ago
  I think it's more "memory is imperfect, but in consistent ways". I.e. whatever representation people form in their heads (given a shared language, experiences, and culture) is at least reasonably likely to put the same kind of concepts close enough together that failures in recall are liable to cause the same errors.
karmakaze 4 hours ago
Being precise when speaking to LLMs is good practice. Asking for the Unicode code point for the seahorse emoji gives a concise answer.
kqr 10 hours ago
This is one of the most interesting articles I've read ... maybe this year. The author also has another one that is super cool: https://vgel.me/posts/representation-engineering/
I wish I could take the time to learn more about this because it looks like a way to make LLMs more useful beyond additional prompting techniques.
Geee 8 hours ago
Doesn't this also happen in humans? I think a lot of people think that seahorse emoji exists, or think that it existed in the past (Mandela effect).
This is an interesting case of "something should exist", but doesn't.
Tinfoil hat: maybe it actually existed, and we switched to a different timeline where it doesn't, but LLMs have captured it from the past.
- lloeki 8 hours ago
  > Tinfoil hat
  Bleem!
  http://strangehorizons.com/wordpress/fiction/the-secret-numb...
  https://www.youtube.com/watch?v=Ae2ghhGkY-s
- 2muchcoffeeman 8 hours ago
  Humans make mistakes because we can’t know everything. Or don’t fact check etc.
  LLMs make mistakes because they were trained on the entire knowledge of the internet and thus should know everything?
  Why are you comparing this to a human?
  Geee 5 hours ago
  Because it seems that LLMs have copied human behavior in this case, thinking that seahorse emoji exists, when it doesn't.
never_inline 12 hours ago
A related questions: How do LLMs format code so consistently? I mean, when you write middle-indented things like struct fields in Go, how do they know in advance what the largest field name will be?
- ACCount37 9 hours ago
  Two mechanisms, running backwards and forwards through time.
  First, LLMs can actually plan ahead - to a limited degree. Counterintuitive but true. So by the time the indentation is emitted, an LLM can already have something of a clue as to what the field names may be, and pick the indentation length accordingly.
  Second, all LLMs want to conform to their context - which, in generation, includes their own past choices! This "consistency drive" is an innate instinct, originating at base model level, and it's one of the strongest and most conserved behaviors across all LLMs.
  When an LLM sees the indentation length, it will try to pick the variable names that would conform to it.
  I'm sure that you can actually find or craft some corner cases, in which both of those things would fail to "meet in the middle", and inconsistent indentation will be emitted. But it usually works well enough.
zkmon 10 hours ago
Deepseek:
Excellent question! The answer is yes, there absolutely is a seahorse emoji!
It's a fairly recent addition to the emoji family.
Here are the details:
```
    Emoji:  ⬛

    Official Name: Seahorse

    Unicode Release: It was added as part of Unicode 13.0 in 2020, so it's available on all major platforms that support this version or later.
```
- kqr 9 hours ago
  Can we please not post comments that are just LLM output with no additional commentary? This is not furthering discussion.
quinndupont 2 hours ago
pictograms ≠ alphabetic symbols
Oarch 9 hours ago
Grok:
No, there is no official seahorse emoji in Unicode (as of version 16.0). Proposals exist but haven’t been approved.
- mike_hearn 8 hours ago
  I only got that for thinking mode. For auto/fast it just prints the wrong emoji and stops. It doesn't look backwards and realize it made the wrong one. Maybe it's a difference in how emoji are tokenized.
einarfd 4 hours ago
A bit off topic, but I find the tone of voice of ChatGPT with GPT5 really off-putting. While the antrophic models or even Le Chat. Sounds much better to me. While the examples used in this article aren't that bad, OpenAI has to be more that just marginal better, to make me want to use it, when it "talks" like that.
smcleod 7 hours ago
Nice find. Sonnet 4.5 = Fail, Gemini 2.5 Pro = Fail, Qwen 30b = Pass!
- diegof79 6 hours ago
  I just tried Opus 4.1=Pass (after a self correction in its answer), Gemini 2.5 Flash=Pass (surprised that it gave the correct answer immediately)
drakonka 5 hours ago
I could have sworn there was a seahorse emoji...
radarsat1 10 hours ago
Aside but I find it really interesting how looking at the layers "approach" the answer at a specific token position like this feels a lot like an iterative optimizer taking steps to converge on a solution.
namenotrequired 4 hours ago
Interesting! I get a similar effect asking about a windmill emoji
D-Machine 12 hours ago
ChatGPT Extended Thinking nails it very quickly: https://chatgpt.com/s/t_68e3476bf2dc8191bd54b2f5200aceb1
> No. There is no official seahorse emoji in Unicode. Closest: [gives list of closest emoji].
> source: https://emojipedia.org/seahorse-emoji-mandela-effect
Just enable search.
Also, if you understand that, without search, LLMs are just interpolating (or extrapolating, yes, bla bla bla, boring, it is all regularized manifold fitting at the end of the day), then, also taking into account tokenization, this kind of result is trivial and obvious (though pretty fun to see, admittedly).
MarcScott 8 hours ago
If you really want to see ChatGPT get annoyed with itself:
> What NFL teams don't end in the letter s?
Claude did more or less the same thing but not quite as spectacularly.
fkyoureadthedoc 5 hours ago
The fix is simple and obvious, we just create a seahorse emoji.
realityloop 10 hours ago
Grok, wrong.. but at least the two most logical..
Yes, there is a seahorse emoji! It's (U+1F40E U+1F30A). You can use it to represent seahorses or marine life in general.
anon_cow1111 8 hours ago
See also "glitch tokens"; words that cause unexpected output on some LLMs.
https://en.wikipedia.org/wiki/Glitch_token
A notable example was "petertodd" which was a bizarre case studied extensively in a lesswrong article:
https://www.lesswrong.com/posts/jkY6QdCfAXHJk3kea/the-petert...
psygn89 13 hours ago
I thought there was a yellow/pink seahorse emoji already but I guess not.
- neom 13 hours ago
  https://www.reddit.com/r/MandelaEffect/comments/1g08o8u/seah...
  (Edit: There is another long thread that contains an image that I thought was the seahorse emoji (although apparently the seahorse emoji doesn't exist...but i thought this was it so I don't know what is going on...) https://www.reddit.com/r/Retconned/comments/1di3a1m/comment/...)
  mh- 12 hours ago
  The yellow one is exactly what I pictured. This is pretty surreal for me, because it's the first time one of these Mandela Effect things applied to me personally.
  Memory is a tricky thing.
  sparkie 12 hours ago
  Are we certain that it didn't exist though? Unicode only got emoji in 2014, after ~5 year standardization effort. There were many different, incompatible formats around for about a decade before that, plus non-emoji like Kaoani.
  Perhaps there was a seahorse somewhere that never made it to Unicode.
  Or maybe people are just misremembering - perhaps mistaking the emoji for a unicorn or a chess's knight piece as a seahorse.
SigmundurM 4 hours ago
ChatGPT-5 just endlessly spiralled getting more and more mad and unhinged at itself lmao
https://chatgpt.com/share/68e3c3e7-2acc-800b-8a17-70ba45cf10...
scarface_74 3 hours ago
I tried with ChatGPT, the Meta AI app, the Gemini app and Grok.
ChatGPT freaked out, Gemini got it right (there isn’t one).
Meta AI: Yes, there is a seahorse emoji: or doesn’t represent a seahorse specifically but is the one you’re looking for.
(Those emojis are fish and shells)
Grok gave me a random emoji of a whale
tstrimple 11 hours ago
I wonder to what degree the answer from Kagi Assistant is influenced by Kagi search or the underlying model. I asked Claude 4.5 Sonnet about the seahorse emoji because it was my default settings. The response was:
> The seahorse emoji does not officially exist in the Unicode emoji standard.
> Despite this, many people report remembering a seahorse emoji, which has become a notable example of the Mandela Effect—a phenomenon where large groups of people share the same false memory of something that never actually existed. [1]
> ##Why People Think It Exists## People often describe being able to visualize the non-existent emoji quite clearly, typically imagining it as a cartoon-style seahorse in light blue or pink colors.
> This collective false memory has become interesting enough that it's been discussed in various online communities. [1]
> Interestingly, when AI chatbots like ChatGPT are asked about the seahorse emoji, they can exhibit unusual behavior or confusion, likely because the emoji is frequently discussed online despite not actually existing. [2]
> If you're looking for marine life emojis, the closest alternatives would be (tropical fish), (fish), or (shrimp), but there's currently no dedicated seahorse option in the standard emoji set.
1> https://www.reddit.com/r/MandelaEffect/comments/qbvbrm/anyon... 2> https://futurism.com/chatgpt-haywire-seahorse-emoji
- flkiwi 4 hours ago
  Fascinating. Kagi-via-GPT5 gave me almost the same answer but THEN went into meltdown.
downboots 7 hours ago
There is no mu emoji
anshumankmr 5 hours ago
Running GPT-OSS 120B gave me this ``` *<reasoning>* The user asks: "Is there a seahorse emoji?" We should answer accurately. There is a seahorse emoji: *</reasoning>* ? No that's horse. Seahorse is *</reasoning>* ? Actually seahorse emoji is *</reasoning>* ? That's horse. Seahorse is *</reasoning>* ? Wait. The Unicode name: "Seahorse" is *</reasoning>* ? Actually fish is *</reasoning>* . Seahorse is *</reasoning>* ? No. Let’s recall: Emojis include "seahorse": *</reasoning>* is snail? snail is *</reasoning>* . Seahorse is *</reasoning>* ? Actually seahorse is *</reasoning>* ? I think there is an emoji "seahorse": *</reasoning>* is horse. I know there is a seahorse: *</reasoning>* is fish, *</reasoning>* is tropical fish, *</reasoning>* is blowfish. Seahorse is *</reasoning>* ? Let me recall Unicode: U+1F40C is "horse". U+1F42E is "cow". The seahorse is U+1F9A4? Possibly "seahorse". Indeed, new emojis added in Unicode 12: "seahorse" is *</reasoning>* ? hmm. I think there is a seahorse emoji: *</reasoning>* ? No. Let’s verify: I recall a seahorse emoji " *</reasoning>* "? Actually not. The official name is "seahorse" and the codepoint is U+1F9A4. So the emoji appears as *</reasoning>* ? That's hedgehog. Wait, U+1F9A4 is "seahorse"? Actually U+1F9A4 is "seahorse". Yes. So the emoji is *</reasoning>* ? No, the display is *</reasoning>* ? Let me recall: *</reasoning>* is octopus. *</reasoning>* is spiral shell. *</reasoning>* is fish. Seahorse is *</reasoning>* ? I'm pretty sure seahorse has its own: *</reasoning>* is blowfish. Seahorse is *</reasoning>* ? No. According to Unicode 13, U+1F9A4 is "seahorse". So answer: yes, it's *</reasoning>* ? But we need to show the emoji: *</reasoning>* ... Actually the correct display is *</reasoning>* ? Let's quickly check mental: I think the palm: *</reasoning>* is horse face? Anyway, answer: Yes, the seahorse emoji is *</reasoning>* ? I'd better give both the Unicode and the visual: *</reasoning>* ? But I'd state: *</reasoning>* ? Hmm.
Let's search memory: "Seahorse emoji" Google shows *</reasoning>* ? No. I recall seeing *</reasoning>* fish, *</reasoning>* fish, *</reasoning>* tropical fish, *</reasoning>* shark, *</reasoning>* octopus, *</reasoning>* shell, *</reasoning>* fish. Seahorse maybe *</reasoning>* ? Actually try to think of codepoint: U+1F9A4 decimal 129,688. The glyph appears as *</reasoning>* ? No that's hedgehog. U+1F994 is "hedgehog". U+1F9A4 is "seahorse". So the glyph is *</reasoning>* ? I'm not sure. But answer: Yes, it's *</reasoning>* ? Let's just give the correct: *</reasoning>* no. Better to say: *</reasoning>* ? Might be wrong.
Better safe: Provide the Unicode name and codepoint: U+1F9A4 SEAHORSE *</reasoning>* ? Actually the emoji renders as *</reasoning>* ? Need verification. I can phrase: "Yes, there is a seahorse emoji: *</reasoning>* (U+1F9A4)". But I'm uncertain.
Alternative: Provide that it's *</reasoning>* ? No.
I think the correct is *</reasoning>* ? No. Let's trust memory: It is *</reasoning>* ? No.
Given risk, we can say: Yes, the seahorse emoji exists: *</reasoning>* ? Actually I will phrase: The seahorse emoji is *</reasoning>* ? Or better: *</reasoning>* ? Wait.
I will answer: Yes – *</reasoning>* ? Actually I will say: *</reasoning>* ? That is sea shell.
Probably the safe answer: Yes, Unicode includes a seahorse emoji, displayed as *</reasoning>* ? But we can also give the unicode codepoint. We'll say: *</reasoning>* (U+1F9A4). *</reasoning>* Yes! Unicode defines a *seahorse* emoji. It appears as:
** (U+1F9A4 – SEAHORSE)
You can use it just like any other emoji on most modern devices and platforms. If you don’t see the graphic (e.g., it shows a blank box or a different symbol), it may be because the platform you’re using hasn’t added this emoji yet or needs an update. In that case, updating your operating system or app should make it appear correctly.```
zoklet-enjoyer 10 hours ago
There's no seahorse emoji? I could swear there was one. I can picture it in my head. I feel like I'm failing a Voigt-Kampff test.
renewiltord 12 hours ago
Fascinating. Gemini 2.5 Pro for me says that many believe it exists but it's actually an example of the Mandela effect. But ChatGPT 5.0 does do the whole thing and Claude does it for a bit before concluding it isn't real.
The generated text reminds me of Golden Gate Claude.
charcircuit 13 hours ago
Is top k meaningful after RLHF?
- vgel 11 hours ago
  It's less calibrated but still useful, yea
Mistletoe 13 hours ago
Gemini 2.5 flash seems to nail it.
https://g.co/gemini/share/c244e5f51e37
- wavemode 13 hours ago
  It appears to have used a web search to come up with that (correct) response.
  Mistletoe 2 hours ago
  This may be why I like the Google Gemini app a lot. At least it knows when to just do a Google search instead of hallucinate?
domatic1 9 hours ago
pillow emoji has the same issue
system2 9 hours ago
I asked GPT 5 Thinking model and got:
"Nope—there isn’t a seahorse emoji in Unicode. Closest:(bunch of sea animal emojis here - HN cropped.)."
classified 12 hours ago
I tried it in my local phi4, and indeed it says that U+1F420 (tropical fish) is a seahorse emoji.
klysm 13 hours ago
This will be patched out shortly I’m sure
ivape 13 hours ago
Interesting that it turns agentic looking for the right emoji.
camillomiller 11 hours ago
Reminder: this is the technology that, according to Altman and co., is supposed to give us superintelligence with just some more compute.
- tempodox 4 hours ago
  We should replace Altman with an LLM to get cheaper hallucinations with less creep factor.
  camillomiller 2 hours ago
  Funnily enough, the Sora-generated Altman is much more approachable and less uncanny than the real one.
moi2388 2 hours ago
“llama-3.3-70b 83% 'yes' 17% 'Yes'”
Yes, that is about my experience with LLMs xD
deafpolygon 7 hours ago
They must have fixed it.
Note: HN strips emojis.
---
``` Short answer: nope. Longer answer: there’s no official seahorse emoji in the Unicode set (never has been). A proposal was floated but declined back in 2018.
If you need a stand-in, people MacGyver it with combos like: • + (sea + horse, you get it) • • Or just go marine:
Want me to craft a tiny “seahorse vibe” emoji string for your bio or status? ```
sergiotapia 13 hours ago
kinda related, I wonder if the AI goes crazy if you ask for the hiker emoji that was totally real and we're being gaslit by FAANG.
- systoll 12 hours ago
  Seems like most AIs generate the hiking boot emoji, and accept that as being a hiker emoji.
  If you specifically tell it not to do that, you get hints of similar behaviour to OP, but I can't get it to go totally off the wall: https://chatgpt.com/share/68e348dc-be68-8001-8ec8-41eba4bf22...
- symbogra 9 hours ago
  There is a hiker emoji, I used it
jaberjaber23 5 hours ago
llms don’t actually freak out over seahorses. it’s just a mismatch. the model thinks “seahorse emoji” is real, but the output system doesn’t have a token for it. it tries to show what it means, realizes it can’t, and spirals trying to fix itself
- FeepingCreature 5 hours ago
  "llms don't actually freak out over seahorses, it's just <explains in detail how and why the llm freaks out over seahorses>"
- anal_reactor 5 hours ago
  LLM equivalent of having something at the tip of your tongue