• m-hodges an hour ago

    To the “LLMs just interpolate their training data” crowd:

    Ayer, and in a different way early Wittgenstein, held that mathematical truths don’t report new facts about the world. Proofs unfold what is already implicit in axioms, definitions, symbols, and rules.

    I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.

    So either “recombining existing material” isn’t disqualifying, or a lot of Fields Medals need to be returned.

    • dvt an hour ago

      > I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.

      Most discoveries are indeed implied from axioms, but every now and then, new mathematics is (for lack of a better word) "created"—and you have people like Descartes, Newton, Leibnitz, Gauss, Euler, Ramanujan, Galois, etc. that treat math more like an art than a science.

      For example, many belive that to sovle the Riemann Hypothesis, we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.

      • pulkitsh1234 12 minutes ago

        Creation is done by humans who have been trained on the data of their life experiences. Nothing new is being created, just changing forms.

        A scientist has to extract the "Creation" from an abstract dimension using the tools of "human knowledge". The creativity is often selecting the best set of tools or recombining tools to access the platonic space. For instance a "telescope" is not a new creation, it is recombination of something which already existed: lenses.

        How can we truly create something ? Everything is built upon something.

        You could argue that even "numbers" are a creation, but are they ? Aren't they just a tool to access an abstract concept of counting ?

        Another angle to look at it, even in dreams do we really create something new ? or we dream about "things" (i.e. data) we have ingested in our waking life. Someone could argue that dream truly create something as the exact set of events never happened anywhere in the real world... but we all know that dreams are derived.. derived from brain chemistry, experiences and so on. We may not have the reduction of how each and every thing works.

        Just like energy is conserved, IMO everything we call as "created" is just a changed form of "something". I fully believe LLMs (and humans) both can create tools to change the forms. Nothing new is being "created", just convenient tools which abstract upon some nature of reality.

        • kenjackson 10 minutes ago

          "new kind of math"

          Well I think the point is there is no "new kind of math". There's just types of math we've discovered and what we haven't. No new math is created, just found.

          • grey-area 4 minutes ago

            The map is not the territory.

          • Tenobrus 44 minutes ago

            what basis do you have for assuming an LLM is fundamentally incapable of doing this?

            • truncate 31 minutes ago

              What's your basis for assuming LLM is capable of doing this?

              I honestly don't know personally either way. Based on my limited understanding of how LLMs work, I don't see them be making the next great song or next great book and based on that reasoning I'm betting that it probably wont be able to do whatever next "Descartes, Newton, Leibnitz, Gauss, Euler, Ramanujan, Galois" are going to do.

              Of course AI as a wider field comes up with something more powerful than LLM that would be different.

              • blueone 28 minutes ago

                > what basis do you have for assuming an LLM is fundamentally incapable of doing this?

                because I have no basis for assuming an LLM is fundamentally capable of doing this.

                • pickleRick243 7 minutes ago

                  Except this has been said since the 2010's and has been proven wrong again and again. Clearly the theory that LLM's can't "extrapolate" is woefully incomplete at best (and most likely simply incorrect). Before the rise of ChatGPT, the onus was on the labs to show it was plausible. At this point, I think the more epistemologically honest position is to put the burden back on the naysayers. At the least, they need to admit they were wrong and give a satisfactory explanation why their conceptual model was unable to account for the tremendous success of LLM's and why their model is still correct going forward. Realistically, progress on the "anti-LLM" side requires a more nuanced conceptual model to be developed carefully outlining and demonstrating the fundamental deficiencies of LLMs (not just deficiencies in current LLMs, but a theory of why further advancements can't solve the deficiencies).

                  • sswatson 14 minutes ago

                    Good on you for spelling out this reasoning, but it is manifestly unsound. For a wide variety of values of X, people a few years ago had no reason to expect that LLMs would be capable of X. Yet here we are.

                    • TheOtherHobbes 8 minutes ago

                      In 1989, Gary Kasparov said that it was "ridiculous!" to suggest a computer would ever beat him at chess.

                      "Never shall I be beaten by a machine!”

                      In 1997 he lost to Deep Blue.

                  • voooduuuuu 30 minutes ago

                    Ask an LLM to invent a new word and post it here. You will see that it simply combines words already in the training data.

                    • satvikpendem 11 minutes ago

                      Funny that the replies are dead. It's true that generally we shouldn't have AI output on HN but this case is an exception as we are explicitly asking for it, so it's interesting that people still flag the replies.

                      • baq 11 minutes ago

                        Mathematics can be mostly boiled down to a few sentences with very lengthy possible combinations, so yeah that is not a problem

                        • konart 13 minutes ago

                          So LLM is german?

                          • Garlef 10 minutes ago

                            What does "new word" even mean?

                          • dvt 40 minutes ago

                            Because by definition LLMs are permutation machines, not creativity machines. (My premise, which you may disagree with, is that creativity/imagination/artistry is not merely permutation.)

                            • satvikpendem 10 minutes ago

                              What is creativity if not permutation? A brain has some model of the world and recombines concepts to create new concepts.

                              • fnordpiglet 29 minutes ago

                                I prefer to think of it as they’re interpolation machines not extrapolation machines. They can project within the space they’re trained in, and what they produce may not be in their training corpus, but it must be implied by it. I don’t know if this is sufficient to make them too weak to create original “ideas” of this sort, but I think it is sufficient to make them incapable of original thought vs a very complex to evaluate expected thought.

                                • lukol 26 minutes ago

                                  This "new math" might be a recombination of things that we already know - or an obvious pattern that emerges if you take a look at things from a far enough distance - or something that can be brute-forced into existence. All things LLMs are perfectly capable of.

                                  In the end, creativity has always been a combination of chance and the application of known patterns in new contexts.

                                  • dvt 22 minutes ago

                                    > This "new math" might be a recombination of things that we already know

                                    If you know anything about the invention of new math (analytic geometry, Calculus, etc.), you'd know how untrue this is. In fact, Calculus was extremely hand-wavy and without rigorous underpinnings until the mid 1800s. Again: more art than science.

                                    • baq 8 minutes ago

                                      And yet nowadays you can restate all of it using just combinations of sets of sets and some logic operators.

                                  • nh23423fefe 36 minutes ago

                                    god of the gaps

                                    • KoolKat23 37 minutes ago

                                      It pretty much is, otherwise it is randomness or entropy.

                                      • lajamerr 28 minutes ago

                                        LLMs by themselves are not able to but you are missing a piece here.

                                        LLMs are prompted by humans and the right query may make it think/behave in a way to create a novel solution.

                                        Then there's a third factor now with Agentic AI system loops with LLMs. Where it can research, try, experiment in its own loop that's tied to the real world for feedback.

                                        Agentic + LLM + Initial Human Prompter by definition can have it experiment outside of its domain of expertise.

                                        So that's extending the "LLM can't create novel ideas" but I don't think anyone can disagree the three elements above are enough ingredients for an AI to come up with novel ideas.

                                        • awesome_dude 10 minutes ago

                                          You're proving the GP's argument - LLMs aren't creative you say as much, it's the driving that is the creative force

                                          • lajamerr a minute ago

                                            You can tell an agentic system. "Go and find a novel area of math that has unresolved answers and solve it mathematically with verified properties in LEAN. Verify before you start working on a problem that no one has solved this area of math"

                                            That's not creative prompt. That's a driving prompt to get it to start its engine.

                                            You could do that nowadays and while it may spend $1,000 to $100,000 worth of tokens. It will create something humans haven't done before as long as you set it up with all its tool calls/permissions.

                                            • Barbing 3 minutes ago

                                              If that’s a requirement, aren’t LLMs driven by pretraining which was human driven?

                                              Who decides at which the last point it’s OK to provide text to the model in order to be able to describe it as creative? (non-rhetorical)

                                    • throw-the-towel an hour ago

                                      See the longstanding debate on whether new math is "invented" or "discovered". Most mathematicians I knew thought it's discovered.

                                      • ASalazarMX 11 minutes ago

                                        Math is an abstraction of reality, it had to be invented, so more inventions or discoveries could be made within it.

                                        • pigpop 5 minutes ago

                                          What is an abstraction? It is something that arises from human thought and human thought arises from the activity of neurons which are a part of reality. You can't escape reality unless you invoke some form of dualism.

                                          • baq 6 minutes ago

                                            The test goes like ‘is our universe, or any other universe, required for the axioms to exist’ and I don’t see how ‘yes’ is a defensible answer.

                                          • skybrian an hour ago

                                            Any design already exists as a possibility, so it could be said to be both invented and discovered, depending on how you look at it.

                                            • cubefox 41 minutes ago

                                              All inventions are discoveries, though not all discoveries are inventions.

                                              • FrustratedMonky 16 minutes ago

                                                Depending on your point of view? I see what you did there.

                                                Who knew Obi-one was just smoking and pontificating on Wittgenstein.

                                              • protoplancton 25 minutes ago

                                                One can argue that mathematical facts are discovered, but the tools that allow us to find, express them and prove them, are mostly invented. This goes up to the axioms, that we can deliberately choose and craft.

                                                • soupspaces 44 minutes ago

                                                  Regardless of which, both Newton and Leibniz imprint in their findings a 'voice' and understanding different from each other and that of an LLM (for now?)

                                                  • atmosx an hour ago

                                                    ...long standing indeed. It can be traced back to Plato's works.

                                                    • lioeters 5 minutes ago

                                                      "The European philosophical tradition consists of a series of footnotes to Plato."

                                                  • sillysaurusx 18 minutes ago

                                                    It’s easy to see that LLMs don’t merely recombine their training data. Claude can program in Arc, a mostly dead language. It can also make use of new language constructs. So either all programming language constructs are merely remixes of existing ideas, or LLMs are capable of working in domains where no training data exists.

                                                    • baq a minute ago

                                                      LLMs ingest and output tokens, but they don’t compute with them. They have internal representations of concepts, so they have some capability to work with things which they didn’t see but can map onto what they know. The surprise and the whole revolution we’re going through is that it works so well.

                                                    • nomel 17 minutes ago

                                                      I feel this is the case whenever I "problem solve". I'm not really being creative, I'm pruning a graph of a conceptual space that already exists. The more possibilities I see, the easier it is to run more towards an optimal route between the nodes, but I didn't "create" those nodes or edges, they are just causal inevitabilities.

                                                      • awesome_dude 3 minutes ago

                                                        There was a project long long ago where every piece of knowledge known was cross polinated with every other piece of knowledge, creating a new and unique piece of knowledge, and it was intended to use that machine to invalidate the patent process - obviously everything had therefore been invented.

                                                        But that's now how new frontiers are conquered - there's a great deal of existing knowledge that is leveraged upon to get us into a position where we think we can succeed, yes, but there's also the recognition that there is knowledge we don't yet have that needs to be acquired in order for us to truly succeed.

                                                        THAT is where we (as humans) have excelled - we've taken natural processes, discovered their attributes and properties, and then understood how they can be applied to other domains.

                                                        Take fire, for example, it was in nature for billions of years before we as a species understood that it needed air, fuel, and heat in order for it to exist at all, and we then leveraged that knowledge into controlling fire - creating, growing, reducing, destroying it.

                                                        LLMs have ZERO ability (at this moment) to interact with, and discover on their own, those facts, nor does it appear to know how to leverage them.

                                                        • adam_arthur 25 minutes ago

                                                          Pretty much everything that appears novel in life is derivative of other works or concepts.

                                                          You can watch a rock roll down a hill and derive the concept for the wheel.

                                                          Seems pretty self evident to me

                                                          • voooduuuuu 42 minutes ago

                                                            I think you are conflating composition and prediction. LLMs don't compose higher abstractions from the "axioms, symbols and rules", they simply predict the next token, like a really large spinning wheel.

                                                            • peterlk 36 minutes ago

                                                              Yes they do…? Who cares if they just predict the next token? The outcome is that they can invent new abstractions. You could claim that the invention of this new idea is a combination of an LLM and a harness, but that combination can solve logic puzzles and invent abstractions. If a really large spinning wheel could invent proofs that were previously unsolved, that would be a wildly amazing spinning wheel. I view LLMs similarly. It is just fancy autocomplete, but look what we can do with it!

                                                              Said differently, what is prediction but composition projected forward through time/ideas?

                                                              • voooduuuuu 33 minutes ago

                                                                Ask an LLM to invent a new word and post it here, I will be waiting. You will see that it simply combines words already in the training data.

                                                                • romanhn 17 minutes ago

                                                                  I'm not sure what the point of this exercise is. My prompt to ChatGPT: "Create a new English word with a reasonably sounding definition. That word must not come up in a Google search." Two attempts did come up in a search, the third was "Thaleniq (noun)". Definition: The brief feeling that a conversation has permanently changed your opinion of someone, even if nothing dramatic was said. Nothing in Google. There, a new word, not sure it proves or disproves anything. Or is it time to move the goal posts?

                                                                  • jimmaswell 19 minutes ago

                                                                    Why is everyone who responds to this with a real example immediately flagged/dead?

                                                                    • sillysaurusx 16 minutes ago

                                                                      HN autokills LLM generated comments. People don’t seem to believe this, but there’s proof for you.

                                                                      • lcnPylGDnU4H9OF 2 minutes ago

                                                                        To be fair, it is pretty wild they seem to be able to detect them without a noticeable amount of false positives.

                                                                    • bossyTeacher 28 minutes ago

                                                                      Does a random sequence of letters qualify as a new word?

                                                                    • FrustratedMonky 15 minutes ago

                                                                      "Who cares if they just predict the next token?"

                                                                      Exactly. I also only write one word at a time. Who knows what is going on in order to come up with that word.

                                                                    • sunshowers 32 minutes ago

                                                                      One might argue that the composition of higher abstractions is the next token predicted after "here is a higher abstraction:"

                                                                      • adampunk 40 minutes ago

                                                                        How sure are you that this is correct?

                                                                        • frozenseven 23 minutes ago

                                                                          Show me on the anatomical prop where the magical "real reasoning" gland is.

                                                                        • paulddraper 24 minutes ago

                                                                          "LLMs just interpolate their training data"

                                                                          Cracks me up.

                                                                          What exactly do we think that human brains do?

                                                                          • gpugreg 6 minutes ago

                                                                            Human brains do not interpolate everything. To interpolate, you need some points to start with.

                                                                            • omnimus 9 minutes ago

                                                                              That has been the question since the beginning of humans.

                                                                              Maybe computers can help understand better because by now it's pretty clear brains aren't just LLMs.

                                                                          • lubujackson an hour ago

                                                                            For anyone using LLMs heavily for coding, this shouldn't be too surprising. It was just a matter of time.

                                                                            Mathematicians make new discoveries by building and applying mathematical tools in new ways. It is tons of iterative work, following hunches and exploring connections. While true that LLMs can't truly "make discoveries" since they have no sense of what that would mean, they can Monte Carlo every mathematical tool at a narrow objective and see what sticks, then build on that or combine improvements.

                                                                            Reading the article, that seems exactly how the discovery was made, an LLM used a "surprising connection" to go beyond the expected result. But the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.

                                                                            • daishi55 3 minutes ago

                                                                              > the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.

                                                                              Isn't this just anthropocentrism? Why is understanding only valid if a human does it? Why is knowledge only for humans? If another species resolved the contradictions between gravity and quantum mechanics, does that not have meaning unless they explain it to us and we understand it?

                                                                              • cubefox 37 minutes ago

                                                                                There is a long and interesting recent essay on that topic by a mathematician: https://davidbessis.substack.com/p/the-fall-of-the-theorem-e...

                                                                                • zem 26 minutes ago

                                                                                  wow, that was indeed a brilliant essay. i particularly liked the framing that "solving a big conjecture was a cryptographic proof that you had come up with a genuine conceptual innovation".

                                                                              • mooreat 18 minutes ago

                                                                                I think one interesting thing to point out is that the proof (disproof) was done by finding a counterexample of Erdős' original conjecture.

                                                                                I agree with one of the mathematician's responses in the linked PDF that this is somewhat less interesting than proving the actual conjecture was true.

                                                                                In my eyes proving the conjecture true requires a bit more theory crafting. You have to explain why the conjecture is correct by grounding it in a larger theory while with the counterexample the model has to just perform a more advanced form of search to find the correct construction.

                                                                                Obviously this search is impressive not naive and requires many steps along the way to prove connections to the counterexample, but instead of developing new deep mathematics the model is still just connecting existing ideas.

                                                                                Not to discount this monumental achievement. I think we're really getting somewhere! To me, and this is just vibes based, I think the models aren't far from being able to theory craft in such a way that they could prove more complicated conjectures that require developing new mathematics. I think that's just a matter of having them able to work on longer and longer time horizons.

                                                                                • vatsachak an hour ago

                                                                                  As I have stated before, AI will win a fields medal before it can manage a McDonald's

                                                                                  A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.

                                                                                  LLMs are just the beginning, we'll see more specialized math AI resembling StockFish soon.

                                                                                  • trostaft 37 minutes ago

                                                                                    > A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.

                                                                                    However, this was not verified in Lean. This was purely plain language in and out. I think, in many ways, this is a quite exciting demonstration of exactly the opposite of the point you're making. Verification comes in when you want to offload checking proofs to computers as well. As it stands, this proof was hand-verified by a group of mathematicians in the field.

                                                                                    • vatsachak 7 minutes ago

                                                                                      Yeah, but I wouldn't be surprised if they train the model on verification assisted by Lean.

                                                                                    • Lerc an hour ago

                                                                                      I disagree. It will be able to perform work deserving if a fields medal before it is capable of running a McDonalds. I think it will be running a McDonalds well before either of those things happen, and a fields medal long after both have happened.

                                                                                      • vatsachak 5 minutes ago

                                                                                        Not necessarily. Obviously playing Kasparov on the board requires more planning ability than managing a McDonald's but look at where chess bots are now.

                                                                                        There's much more to being human than our "cognitive abilities"

                                                                                        • c7b 24 minutes ago

                                                                                          One could hardly ask for a task better suited for LLMs than producing math in Lean. Running a restaurant is so much fuzzier, from the definition of what it even means to the relation of inputs to outputs and evaluating success.

                                                                                          • edbaskerville an hour ago

                                                                                            I just visited a McDonald's for the first time in a while. The self-order kiosk UI is quite bad. I think this is evidence in favor of the idea that an incompetent AI will soon be incompetently running a McDonald's.

                                                                                            • Silamoth 26 minutes ago

                                                                                              Out of curiosity, what issue did you have with the McDonald’s self-order kiosk? I actually think McDonald’s has the best kiosk I’ve ever encountered. The little animation that plays when you add an item to your cart is a little annoying (but I think they’ve sped that up). But otherwise, it’s everything I’d want. It shows you all the items, tells you every ingredient, and lets you add or remove ingredients. I have a better experience ordering through the kiosk than I do talking to a cashier.

                                                                                              • ndiddy 12 minutes ago

                                                                                                It takes longer than ordering with a cashier, it keeps trying to upsell you, and it's always out of receipt paper because unsurprisingly the company that isn't willing to pay a person to take orders is also not willing to pay a person to maintain the kiosks.

                                                                                          • evenhash an hour ago

                                                                                            The proof is not written in Lean, though. It’s written in English and requires validation by human experts to confirm that it’s not gibberish.

                                                                                            • vatsachak 7 minutes ago

                                                                                              Yeah, but I wouldn't be surprised if they train the model on verification assisted by Lean

                                                                                            • Terr_ an hour ago

                                                                                              > manage a McDonald's

                                                                                              Dystopia vibes from the fictional "Manna" management system [0] used at a hamburger franchise, which involved a lot of "reverse centaur" automation.

                                                                                              > At any given moment Manna had a list of things that it needed to do. There were orders coming in from the cash registers, so Manna directed employees to prepare those meals. There were also toilets to be scrubbed on a regular basis, floors to mop, tables to wipe, sidewalks to sweep, buns to defrost, inventory to rotate, windows to wash and so on. Manna kept track of the hundreds of tasks that needed to get done, and assigned each task to an employee one at a time. [...]

                                                                                              > At the end of the shift Manna always said the same thing. “You are done for today. Thank you for your help.” Then you took off your headset and put it back on the rack to recharge. The first few minutes off the headset were always disorienting — there had been this voice in your head telling you exactly what to do in minute detail for six or eight hours. You had to turn your brain back on to get out of the restaurant.

                                                                                              [0] https://en.wikipedia.org/wiki/Manna_(novel)

                                                                                              • kmeisthax 24 minutes ago

                                                                                                Casual reminder that the author's proposed solution to the labor-automation dystopia is to invent a second identity-verification dystopia. Also casual reminder that the author wanted the death penalty to anyone over the age of 65.

                                                                                              • volkercraig 12 minutes ago

                                                                                                > we'll see more specialized math AI resembling StockFish soon

                                                                                                Heuristically weighted directed graphs? Wow amazing I'm sure nobody has done that before.

                                                                                                • vatsachak a few seconds ago

                                                                                                  My claim is that LLMs waste a lot of time training on all available data.

                                                                                                  Math is a sequence of formal rules applied to construct a proof tree. Therefore an AI trained on these rules could be far more efficient, and search far deeper into proof space

                                                                                                • ori_b 2 minutes ago

                                                                                                  We're automating art and science so that we can flip burgers. This future sucks.

                                                                                                  • sigmoid10 an hour ago

                                                                                                    Managing a McDonalds is a question of integration and modalities at this point. I don't think anyone still doubts that these models lack the reasoning capability or world knowledge needed for the job. So it's less of a fundamental technical problem and more of a process engineering issue.

                                                                                                    • andy12_ 11 minutes ago

                                                                                                      I disagree. Even frontier models still achieve way worse results than the human baseline in VendingBench. As long as models can't manage optimally something as simple as a vending machine, they have no hope of managing a McDonalds.

                                                                                                      • throw-the-towel an hour ago

                                                                                                        The capability they lack is being able to be sued.

                                                                                                        • pear01 an hour ago

                                                                                                          Police officers are human. In the United States in the vast majority of cases you can't sue the police, only the community responsible for them.

                                                                                                          https://en.wikipedia.org/wiki/Qualified_immunity

                                                                                                          Assuming you can still sue McDonalds I am not sure if this is a problem in the robotic llm case. I'm also trying to imagine a case where you would want to sue the llm and not the company. Given robots/llm don't have free will I'm not sure the problem with qualified immunity making police unaccountable applies.

                                                                                                          There already exist a lot of similar conventions in corporate law. Generally, a main advantage of incorporation is protecting the people making the decisions from personal lawsuits.

                                                                                                          • nemomarx 33 minutes ago

                                                                                                            McDonald's are franchises - you generally want to sue the local owner or threaten them in addition to the holding company.

                                                                                                            That only requires someone own the ai managed McDonald's though. so long as they can't avoid responsibility by pointing to the AI I don't see why you couldn't sue them.

                                                                                                            • logicchains 21 minutes ago

                                                                                                              >Police officers are human. In the United States in the vast majority of cases you can't sue the police, only the community responsible for them.

                                                                                                              Police are a monopoly; nobody has a choice about which police company to use. McDonalds are not a monopoly, and many customers would prefer to eat at competitors run by entities that could be sued or jailed if they did anything particularly egregious.

                                                                                                        • forinti an hour ago

                                                                                                          AI is already too old for that.

                                                                                                          • segmondy 25 minutes ago

                                                                                                            our local AI models are already capable of running McDonalds.

                                                                                                            • whimsicalism 40 minutes ago

                                                                                                              the only thing keeping the mcdonalds from happening will be political, likewise the same with fields medal

                                                                                                              • soupspaces an hour ago
                                                                                                                • w29UiIm2Xz an hour ago

                                                                                                                  Enough body cameras and audio recordings of each job function should be enough the build the world model for operating a fast food franchise. Training on academic publications wouldn't yield the same effect.

                                                                                                                • zozbot234 an hour ago

                                                                                                                  The summarized chain of thought for this task (linked in the blogpost) is 125 pages. That's an insane scale of reasoning, quite akin to what Anthropic has been teasing with Mythos.

                                                                                                                  • 0x5FC3 an hour ago

                                                                                                                    Is there a reason why we only hear of Erdos problems being solved? I would imagine there are a myriad of other unsolved problems in math, but every single ChatGPT "breakthrough in math" I come across on r/singularity and r/accelerate are Erdos problems.

                                                                                                                    • bananaflag an hour ago

                                                                                                                      Erdos problems are easier to state, thus they make a great benchmark for the first year of AI mathematics.

                                                                                                                      • tonfa an hour ago

                                                                                                                        Afaik this is because there is a community and database around them.

                                                                                                                        • 0x5FC3 an hour ago

                                                                                                                          Interesting. OpenAI could also be trying to solve other problems, but Erdos problems maybe falling first?

                                                                                                                          • CSMastermind 26 minutes ago

                                                                                                                            No, Erdos problems were accepted as sort of a benchmark. There's a bunch of reasons they're favorable for this task:

                                                                                                                            1. They have a wide range of difficulties. 2. They were curated (Erdos didn't know at first glance how to solve them). 3. Humans already took the time to organize, formally state, add metadata to them. 4. There's a lot of them.

                                                                                                                            If you go around looking for a mathematics benchmark it's hard to do better than that.

                                                                                                                        • famouswaffles an hour ago

                                                                                                                          It's not just Erdos problems - https://news.ycombinator.com/item?id=48213189

                                                                                                                          • throw-the-towel an hour ago

                                                                                                                            They're just famous because Erdos was a great mathematician, kinda like the Hilbert problems a century earlier.

                                                                                                                            • empath75 an hour ago

                                                                                                                              It's a large set of problems that are both interesting and difficult, but not seen as foundational enough or important enough that they have already had sustained attention on them by mathematicians for decades or centuries, and so they might actually be solvable by an LLM.

                                                                                                                              • 1qaboutecs an hour ago

                                                                                                                                Also fewer prerequisites to understand the statement than the average research problem.

                                                                                                                            • atleastoptimal 2 minutes ago

                                                                                                                              To all AI skeptics:

                                                                                                                              What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?

                                                                                                                              If we compare AI now vs 2022 the difference is outstandingly stark. Do you believe this improvement will just stop before it eclipses all humans in everything we care about?

                                                                                                                              • CGMthrowaway 4 minutes ago

                                                                                                                                How do you even get an LLM to try to solve one of these problems? When I ask it just comes back with the name of the problem and saying "it can't be done"

                                                                                                                                • endymi0n an hour ago

                                                                                                                                  To paraphrase Gwynne Shotwell: “Not too bad for just a large Markov chain, eh?”

                                                                                                                                  • rhubarbtree 39 minutes ago

                                                                                                                                    Erdos, or the model?

                                                                                                                                  • aurareturn an hour ago

                                                                                                                                    One thing seems for certain is that OpenAI models hold a distinct lead in academics over Anthropic and Google models.

                                                                                                                                    For those in academics, is OpenAI the vendor of choice?

                                                                                                                                    • Jcampuzano2 an hour ago

                                                                                                                                      OpenAI specifically targeted Academia a lot and gave out a lot of free/unlimited usage to top academics and universities/researchers.

                                                                                                                                      They also offer grants you can apply for as a researcher. I'm sure other labs may have this too but I believe OpenAI was first to this.

                                                                                                                                      • tracerbulletx an hour ago

                                                                                                                                        Hasn't AlphaFold been used to make real discoveries for a few years now?

                                                                                                                                        • bayindirh an hour ago

                                                                                                                                          From my limited testing, Gemini can dig out hard to find information given you detail your prompt enough.

                                                                                                                                          Given that Google is the "web indexing company", finding hard to find things is natural for their models, and this is the only way I need these models for.

                                                                                                                                          If I can't find it for a week digging the internet, I give it a colossal prompt, and it digs out what I'm looking for.

                                                                                                                                          • FloorEgg an hour ago

                                                                                                                                            Gemini seems better trained for learning and I think Google has made a more deliberate effort to optimize for pedagoical best practices. (E.g. tutoring, formative feedback, cognitive load optimization)

                                                                                                                                            As far as academic research is concerned (e.g. this threads topic), I can't say.

                                                                                                                                            • snaking0776 37 minutes ago

                                                                                                                                              Agreed I usually use Gemini for explaining concepts and ChatGPT for getting things done on research projects.

                                                                                                                                              • aurareturn an hour ago

                                                                                                                                                Yes, I meant academic research.

                                                                                                                                                • cute_boi an hour ago

                                                                                                                                                  Gemini is like someone with short-term memory loss; after the first response, it forgets everything. That being said, I have checked multiple model and gemini can sometime give accurate answer.

                                                                                                                                                • karmasimida an hour ago

                                                                                                                                                  I think the mathematicians on X are all using GPT 5.5 Pro

                                                                                                                                                  • logicchains 20 minutes ago

                                                                                                                                                    OpenAI models seem to have been trained on a lot of auto-generated theorem proving data; GPT 5.5 is really good at writing Lean.

                                                                                                                                                    • causal an hour ago

                                                                                                                                                      A simpler explanation is that more people are using ChatGPT

                                                                                                                                                    • Fraterkes an hour ago

                                                                                                                                                      I guess if this stuff is going to make my employment more precarious, it’d be nice if it also makes some scientific breakthroughs. We’ll see

                                                                                                                                                      • ausbah an hour ago

                                                                                                                                                        shame we won’t see any of these medical breakthroughs when we all lose our jobs and thus our healthcare

                                                                                                                                                        • karmasimida an hour ago

                                                                                                                                                          There is a world that AI makes medical breakthroughs, but there is 0 guarantee it is going to be affordable

                                                                                                                                                        • cubefox 29 minutes ago

                                                                                                                                                          Breakthroughs in pure mathematics aren't scientific though. They say us nothing about the world, and they are not useful.

                                                                                                                                                        • Jeff_Brown an hour ago

                                                                                                                                                          Can anyone find (or draw) a picture of the construction?

                                                                                                                                                          • ninjha an hour ago

                                                                                                                                                            They only proved that one exists; computing the actual construction is non-obvious (the naive way to construct it is computationally infeasible).

                                                                                                                                                            • gibspaulding an hour ago

                                                                                                                                                              This only a proof that a field with more connections is possible, not what it looks like.

                                                                                                                                                              I’m very out of my depth, but the structure of the proof seems to follow a pattern similar to a proof by contradiction. Where you’d say for example “assume for the sake of contradiction that the previously known limit is the highest possible” then prove that if that statement is true you get some impossible result.

                                                                                                                                                              • pradn an hour ago

                                                                                                                                                                They have a "before" picture but not an "after"!

                                                                                                                                                              • throwaway2027 an hour ago

                                                                                                                                                                Not to dismiss the AI but the important part is that you still need someone able to recognize these solutions in the first place. A lot of things were just hidden in plain sight before AI but no one noticed or didn't have the framework either in maths or any other field they're specialized in to recognize those feats.

                                                                                                                                                                • famouswaffles an hour ago

                                                                                                                                                                  Another entry in a growing list of the last couple months (interestingly mostly Open AI):

                                                                                                                                                                  1. Erdos 1196, GPT-5.4 Pro - https://www.scientificamerican.com/article/amateur-armed-wit...

                                                                                                                                                                  There are a couple of other Erdos wins, but this was the most impressive, prior to the thread in question. And it's completely unsupervised.

                                                                                                                                                                  Solution - https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...

                                                                                                                                                                  2. Single-minus gluon tree amplitudes are nonzero , GPT-5.2 https://openai.com/index/new-result-theoretical-physics/

                                                                                                                                                                  3. Frontier Math Open Problem, GPT-5.4 Pro and others - https://epoch.ai/frontiermath/open-problems/ramsey-hypergrap...

                                                                                                                                                                  4. GPT-5.5 Pro - https://gowers.wordpress.com/2026/05/08/a-recent-experience-...

                                                                                                                                                                  5. Claude's Cycles, Claude Opus 4.6 - https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cyc...

                                                                                                                                                                  • alansaber an hour ago

                                                                                                                                                                    AI isn't going to supercharge science but I wouldn't be as dismissive as other posters here.

                                                                                                                                                                    • tombert an hour ago

                                                                                                                                                                      I'm not a scientist but I like to LARP as one in my free time, and I have found ChatGPT/Claude extremely useful for research, and I'd say I'd go as far as to say it supercharged it for me.

                                                                                                                                                                      When I'm learning about a new subject, I'll ask Claude to give me five papers that are relevant to what I'm learning about. Often three of the papers are either irrelevant or kind of shit, but that leaves 2/5 of them that are actually useful. Then from those papers, I'll ask Claude to give me a "dependency graph" by recursing on the citations, and then I start bottom-up.

                                                                                                                                                                      This was game-changing for me. Reading advanced papers can be really hard for a variety of reasons, but one big one can simply be because you don't know the terminology and vernacular that the paper writers are using. Sometimes you can reasonably infer it from context, but sometimes I infer incorrectly, or simply have to skip over a section because I don't understand it. By working from the "lowest common denominator" of papers first, it generally makes the entire process easier.

                                                                                                                                                                      I was already doing this to some extent prior to LLMs, as in I would get to a spot I didn't really understand, jump to a relevant citation, and recurse until I got to an understanding, but that was kind of a pain in the ass, so having a nice pretty graph for me makes it considerably easier for me to read and understand more papers.

                                                                                                                                                                      • kingkongjaffa 44 minutes ago

                                                                                                                                                                        One heuristic I used during my masters degree research thesis was to look for the seminal people or papers in a field by using google scholar to find the most cited research papers and then reading everything else by that author / looking at the paper's references for others. You often only need to go back 3-4 papers to find some really seminal/foundational stuff.

                                                                                                                                                                        • tombert 40 minutes ago

                                                                                                                                                                          Yeah, that's actually how I discovered Leslie Lamport like ten years ago. I was looking for papers on distributed consensus, and it's hard not to come across Paxos when doing that. It turns out that he has oodles of really great papers across a lot of different cool things in computer science and I feel like I understand a lot more about this space because of it.

                                                                                                                                                                          It doesn't hurt that Lamport is exceptionally good at explaining things in plain language compared to a lot of other computer scientists.

                                                                                                                                                                      • vatsachak an hour ago

                                                                                                                                                                        I absolutely believe that AI will supercharge science.

                                                                                                                                                                        I do not believe it will replace humans.

                                                                                                                                                                        • unsupp0rted an hour ago

                                                                                                                                                                          I absolutely believe that AI will supercharge science and replace humans.

                                                                                                                                                                          Why shouldn't it? Humans are poorly optimized for almost anything, and built on a substrate that's barely hanging together

                                                                                                                                                                          • vatsachak 2 minutes ago

                                                                                                                                                                            Well, for starters AI doesn't have goals. If there was a super intelligence with goals, why would they work for us?

                                                                                                                                                                            • geraneum 11 minutes ago

                                                                                                                                                                              > Humans are poorly optimized for almost anything, and built on a substrate that's barely hanging together

                                                                                                                                                                              Goodness gracious!

                                                                                                                                                                              • stonogo an hour ago

                                                                                                                                                                                Not like large language models, which only required tens of megawatts of power and use highly efficient monte carlo methods, eh

                                                                                                                                                                              • seydor an hour ago

                                                                                                                                                                                replace, no. obsolete, yes

                                                                                                                                                                                • dvfjsdhgfv an hour ago

                                                                                                                                                                                  lol

                                                                                                                                                                                  (That's the first time I used that expression on HN.)

                                                                                                                                                                              • OldGreenYodaGPT an hour ago

                                                                                                                                                                                Isn’t that a joke? It already has supercharged science

                                                                                                                                                                                • ks2048 40 minutes ago

                                                                                                                                                                                  Since "supercharged science" is as ill-defined as AGI, ASI, etc., people will be able to debate it endlessly for no reason.

                                                                                                                                                                                  • datsci_est_2015 an hour ago

                                                                                                                                                                                    Where are the second order effects of this supercharging of science? Or has it not been enough time for those to propagate?

                                                                                                                                                                                  • comboy an hour ago

                                                                                                                                                                                    Not only it supercharged science it supercharges scientist. Research on any narrow topic is a different world now. Agents can read 50 papers for you and tell you what's where. This was impossible with pure text search. Looking up non-trivial stuff and having complex things explained to you is also amazing. I mean they don't even have to be complex, but can be for adjacent field where these are basics for the other field but happen to be useful in yours. The list goes on. It's a hammer you need to watch your fingers, it's not good at cutting wood, but it's definitely worth having.

                                                                                                                                                                                    • dvfjsdhgfv 39 minutes ago

                                                                                                                                                                                      It's a very heavy hammer. I used it in the way you describe and after double-checking noticed some crucial details were missed and certain facts were subtly misrepresented.

                                                                                                                                                                                      But I agree with you, especially in areas where they have a lot of training data, they can be very useful and save tons of time.

                                                                                                                                                                                      • Karrot_Kream 25 minutes ago

                                                                                                                                                                                        I don't think there's a substitute for reading the source material. You have to read the actual paper that's cited. You have to read the code that's being sourced/generated. But used as a reasoning search engine, it's a huge enabler. I mean so much of research literally is reasoning through piles of existing research. There's probably a large amount of good research (especially the kind that don't easily get grant funding) that can "easily" shake out through existing literature that humans just haven't been able to synthesize correctly.

                                                                                                                                                                                    • renegade-otter an hour ago

                                                                                                                                                                                      It will notice things that humans may have missed. That said - it can only work off the body of work SOMEONE did in the past.

                                                                                                                                                                                      • throw-the-towel an hour ago

                                                                                                                                                                                        > it can only work off the body of work SOMEONE did in the past.

                                                                                                                                                                                        And so do humans. Gotta stand on these shoulders of giants.

                                                                                                                                                                                        • bel8 an hour ago

                                                                                                                                                                                          Can't the previous body of work be from AI too?

                                                                                                                                                                                        • karmasimida an hour ago

                                                                                                                                                                                          To be strict, Math is not Science.

                                                                                                                                                                                          But AI is supercharging Math like there is no tomorrow.

                                                                                                                                                                                        • dwroberts 42 minutes ago

                                                                                                                                                                                          Would be interesting to know what kind of preparatory work actually went into this - how long did it take to construct an input that produced a real result, and how much input did they get from actual mathematicians to guide refining it

                                                                                                                                                                                          • ks2048 27 minutes ago

                                                                                                                                                                                            Timothy Gowers' tweet about this: "If you are a mathematician, then you may want to make sure you are sitting down before reading futher.".

                                                                                                                                                                                            woah.

                                                                                                                                                                                            • phkahler an hour ago

                                                                                                                                                                                              I would have thought a triangular grid works better than a grid of squares. You get ~3n links vs ~2n for the square grid. Curious what the AI came up with.

                                                                                                                                                                                              • bustermellotron 13 minutes ago

                                                                                                                                                                                                The grid of squares actually gets > Cn for any C. (More in fact… C can grow like n^a/loglog(n).) The AI proved > n^{1 + b} for some small b > 0, which a human (Will Sawin) has now proved can be about b = 0.014. The grid can be rescaled so the edges are not necessarily length 1, but other pairs will have length 1; that is necessary to get more than 2n unit distances.

                                                                                                                                                                                                • comboy an hour ago

                                                                                                                                                                                                  Yes, not providing visualization of the solution seems criminal.

                                                                                                                                                                                                  • red_admiral 26 minutes ago

                                                                                                                                                                                                    Unless it's a non-constructive proof.

                                                                                                                                                                                                    • kmeisthax 27 minutes ago

                                                                                                                                                                                                      Knowing OpenAI, the solution's probably being withheld as a trade secret, lest it fall victim to distillation attacks (i.e. exactly the same shit they did to the open Internet).

                                                                                                                                                                                                    • kilotaras 29 minutes ago

                                                                                                                                                                                                      Both 3n and 2n are linear, the broken conjecture is that you can't do better than linear.

                                                                                                                                                                                                    • taimurshasan an hour ago

                                                                                                                                                                                                      I wonder how much this cost vs a Math Professor or a team of Math Professors.

                                                                                                                                                                                                      • Karrot_Kream 24 minutes ago

                                                                                                                                                                                                        Sadly math professors aren't very expensive. Academics are paid terrible wages. Until recently, tenure was the carrot at the end of a grueling education. But now that tenure positions are getting rarer (well, tenure positions aren't growing vs the number of aspiring academics is), they continue to be cheap highly educated labor.

                                                                                                                                                                                                        • forgot_old_user an hour ago

                                                                                                                                                                                                          it will only get cheaper in the long run

                                                                                                                                                                                                          • aspenmartin 26 minutes ago

                                                                                                                                                                                                            40x cheaper per year if trends continue

                                                                                                                                                                                                            • dvfjsdhgfv 41 minutes ago

                                                                                                                                                                                                              for a sufficiently long definition of long

                                                                                                                                                                                                              • aspenmartin 26 minutes ago

                                                                                                                                                                                                                No for a very short definition of long, look at data on: how fast do prices decrease for a constant level of performance

                                                                                                                                                                                                          • solomatov an hour ago

                                                                                                                                                                                                            How central is it in the discrete geometry? Could anyone with the knowledge in the field reply?

                                                                                                                                                                                                            • energy123 an hour ago

                                                                                                                                                                                                              There's pages of comments from like 8 mathematicians in the attached pdf

                                                                                                                                                                                                              • sigmar an hour ago

                                                                                                                                                                                                                The blog post links a pdf that OpenAI put together of nine mathematicians that commented on the proof. Each is quite brief and written in accessible terms (or more accessible terms, at least). https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29a...

                                                                                                                                                                                                              • yusufozkan an hour ago

                                                                                                                                                                                                                "The proof came from a general-purpose reasoning model, not a system built specifically to solve math problems or this problem in particular, and represents an important milestone for the math and AI communities."

                                                                                                                                                                                                                • seydor an hour ago

                                                                                                                                                                                                                  all reasoning is .. well problem reasoning. restricting black-box AIs to specific human-defined domains because we believe that's better is such a human-ist thing to do.

                                                                                                                                                                                                                  • Kwantuum an hour ago

                                                                                                                                                                                                                    I trust openAI's marketing team 100%

                                                                                                                                                                                                                    • krackers an hour ago

                                                                                                                                                                                                                      It seems plausible given that people have been using off the shelf 5.5 xhigh to decent success with some erdos problems. There is likely still some scaffolding around it though (like parallel sampling or separate verifier step) since it's not clear if you can just "one shot" problems like this.

                                                                                                                                                                                                                  • Kye 23 minutes ago

                                                                                                                                                                                                                    Is this something that can be made explainable to someone without any of the relevant background, or is this one of those things where all that background is needed to understand it? Because I have no idea what's going on here, but would like to.

                                                                                                                                                                                                                    • pizzao an hour ago

                                                                                                                                                                                                                      Can someone explain to me what is their "prompting-scaffolding" to make it work ?

                                                                                                                                                                                                                      • yusufozkan an hour ago

                                                                                                                                                                                                                        "This is a general-purpose LLM. It wasn’t targeted at this problem or even at mathematics. Also, it’s not a scaffold. We have not pushed this model to the limit on open problems. Our focus is to get it out quickly so that everyone can use it for themselves." - Noam Brown (OpenAI reasoning researcher) on X

                                                                                                                                                                                                                      • seydor an hour ago

                                                                                                                                                                                                                        can the AI please tell us what to do now that all knowledge work will become unemployment?

                                                                                                                                                                                                                        • dadrian an hour ago

                                                                                                                                                                                                                          While the result is impressive, this blog post is extremely disappointing.

                                                                                                                                                                                                                          - It does not show an example of the new best solution, nor explain why they couldn't show an example (e.g. if the proof was not constructive)

                                                                                                                                                                                                                          - It does not even explain the previous best solution. The diagram of the rescaled unit grid doesn't indicate what the "points" are beyond the normal non-scaled unit grid. I have no idea what to take away from it.

                                                                                                                                                                                                                          - It's description of the new proof just cites some terms of art with no effort made to actually explain the result.

                                                                                                                                                                                                                          If this post were not on the OpenAI blog, I would assume it was slop. I understand advanced pure mathematics is complicated, but it is entirely possible to explain complicated topics to non-experts.

                                                                                                                                                                                                                          • Al-Khwarizmi an hour ago

                                                                                                                                                                                                                            Indeed, it's a pity. While many advanced math problems are highly abstract or convoluted to explain to a layman audience, this one in particular is about points in a 2D plane and distances. A drawing would have been nice.

                                                                                                                                                                                                                            • changoplatanero an hour ago

                                                                                                                                                                                                                              apparently the proof is not constructive in the sense of not giving an easy to compute recipe for generating a set of points that you can plot on a 2d plane

                                                                                                                                                                                                                            • arsan87 36 minutes ago

                                                                                                                                                                                                                              neato. can we do any thing with this new found knowledge or is this mathematical sports?

                                                                                                                                                                                                                              can we please put these ground breaking AIs to work on actual problems humans have?

                                                                                                                                                                                                                              • clarle 24 minutes ago

                                                                                                                                                                                                                                People thought neural networks were just an interesting thought exercise a few decades ago and not for practical ML problems, and look what happened since then.

                                                                                                                                                                                                                              • catigula an hour ago

                                                                                                                                                                                                                                Every time I interact even with OpenAI's pro model, I am forced to come to the conclusion that anything outside the domain of specific technical problems is almost completely hopeless outside of a simple enhanced search and summary engine.

                                                                                                                                                                                                                                For example, these machines, if scaling intellect so fiercely that they are solving bespoke mathematics problems, should be able to generate mundane insights or unique conjectures far below the level of intellect required for highly advanced mathematics - and they simply do not.

                                                                                                                                                                                                                                Ask a model to give you the rundown and theory on a specific pharmacological substance, for example. It will cite the textbook and meta-analyses it pulls, but be completely incapable of any bespoke thinking on the topic. A random person pursuing a bachelor's in chemistry can do this.

                                                                                                                                                                                                                                Anything at all outside of the absolute facts, even the faintest conjecture, feels completely outside of their reach.

                                                                                                                                                                                                                                • dvfjsdhgfv 44 minutes ago

                                                                                                                                                                                                                                  Yeah, I remember it was one of my biggest disappointments with LLMs.

                                                                                                                                                                                                                                • empath75 an hour ago

                                                                                                                                                                                                                                  Important note: this was not done with a special mathematics harness or specialized workflow.

                                                                                                                                                                                                                                  • dwroberts 37 minutes ago

                                                                                                                                                                                                                                    How/why should we know this, it does not explain the process in the text?

                                                                                                                                                                                                                                  • bradleykingz an hour ago

                                                                                                                                                                                                                                    ok. so what are the implications of for math

                                                                                                                                                                                                                                    • brcmthrowaway an hour ago

                                                                                                                                                                                                                                      End times are approaching

                                                                                                                                                                                                                                      • reactordev an hour ago

                                                                                                                                                                                                                                        I dunno, I'm skeptical without proof. I've had the MAX+ plan for a while and I'm sorry, the quality between GPT vs Claude is night and day difference. Claude understands. GPT stumbles over every request I give it.

                                                                                                                                                                                                                                        • nathan_compton an hour ago

                                                                                                                                                                                                                                          Weird thing to say about a report which literally has the attached mathematical proof.

                                                                                                                                                                                                                                          • reactordev 39 minutes ago

                                                                                                                                                                                                                                            Except its not a proof. It's an existential proof of what? Projecting points and loosing density? Nah, it's wrong. At least with Edros you could solve f(x) or not solve it (inf). You can not with this. All they did was balance a really fancy quadratic equation. The projection from C^f to R² doesn't demonstrate geometric injectivity, so nⱼ = |X| isn't established, and the bound collapses.