Comments Page - Chemical knowledge and reasoning of large language models vs. chemist expertise

« Back Chemical knowledge and reasoning of large language models vs. chemist expertisenature.comSubmitted by bookofjoe 2 days ago

PlasmonOwl 4 hours ago
Ok so I am always interested in these papers as a chemist. Often, we find that the LLM are terrible at chemistry. This is because the lived experience of a chemist is fundamentally different from the education they receive. Often, a masters student takes 6 months to become productive at research in a new sub field. A PhD, around 3 months.
Most chemists will begin to develop an intuition. This is where the issues develop.
This intuition is a combination of the chemists mental model, and how the sensory environment stimulates that. As a polymer chemist in a certain system maybe brown means I see scattering hence particles. My system is supposed to be homogeneous so I bin the reaction.
It is often known that good grades don’t make good researchers. That’s because researchers aren’t doing rote recall.
So the issue is this: we ask the LLM how many proton environment in this nmr?
We should ask: I’m intercalating Li into a perovskite using BuLi. Why does the solution turn pink?
- Workaccount2 2 hours ago
  I think a huge reason why LLMs are so far ahead in programming is because programming exists entirely in a known and totally severed digital environment outside our own. To become a master programmer all you need is a laptop and an internet connection. The nature of it existing entirely in a parallel digital universe just lends itself perfectly to training.
  All of that is to say that I don't think the classic engineering fields have some kind of knowledge or intuition that is truly inaccessible to LLMs, I just think that it is in a form that is too difficult right now to train on. However if you could train a model on them, I strongly suspect they would get to the same level they are at today with software.
calibas 12 hours ago
I'm sure an LLM knows more about computer science than a human programmer.
Not to say the LLM is more intelligent or better at coding, but that computer science is an incredibly broad field (like chemistry). There's simply so much to know that the LLM has an inherent advantage. It can be trained with huge amounts of generalized knowledge far faster than a human can learn.
Do you know every common programing language? The LLM does, plus it can code in FRACTRAN, Brainfuck, Binary lambda calculus, and a dozen other obscure languages.
It's very impressive, until you realize the LLM's knowledge is a mile wide and an inch deep. It has vast quantities of knowledge, but lacks depth. A human that specializes in a field is almost always going to outperform an LLM in that field, at least for the moment.
- mumbisChungo 11 hours ago
  It's impressive until you realize its limitations.
  Then it becomes impressive again once you understand how to productively use it as a tool, given its limitations.
  X6S1x6Okd1st 28 minutes ago
  Also that limitations keep dropping every six months
- logifail 9 hours ago
  > Do you know every common programing language?
  A long time ago my OH was introduced to someone who claimed "to speak seven languages fluently".
  Her response at the time was was "Do they have anything interesting to say in any of them?"
  dandellion 5 hours ago
  As a foreign English speaker, it's a huge pet peeve is when people use acronyms without having used the full sentence before. Especially when the acronym is already a word or expression and looking it up just returns a bunch of useless examples (oh!). Eventually I'll find out the meaning (other half), and it always turns out they only saved a total of six or seven letters, which can be typed in less than 0.5 seconds, but in exchange they made their sentence more or less incomprehensible for a large group of people.
  dylan604 3 hours ago
  As a native English speaker, I had no idea what OH was either. I’ve seen SO for significant other and not stack overflow, and I’ve seen reference to better half not just other half. By that choice, I am left to assume this person feels they are the better half which says a lot about them.
  daveguy 2 hours ago
  > By that choice, I am left to assume this person feels they are the better half which says a lot about them.
  What a ridiculous assumption.
  Maybe they consider themselves and their partner to be equal halves of a whole. You know, the definition of half.
  Shadowmist 2 hours ago
  Paste the comment into an LLM and ask it what it means. Don’t use Google.
  glenneroo 4 hours ago
  OTOH we are one of today's "lucky" 10,000? And future searches will possibly lead to this post, further reducing friction to using this acronym. Also newly trained LLMs will also be able to answer quicker. Yay?
  I wonder how acronyms such as OTOH even become so well known that they can be used without fear or not being understood? When is that threshold reached? Is using OH now the beginning of a new well-known acronym? I guess only time will tell...
  theelous3 3 hours ago
  the far more common and acceptable-to-use-without-introduction acronym for this is SO (significant other)
  And to answer the question - the threshold is when people stop complaining about the use :)
  catigula 3 hours ago
  I've literally never seen "OTOH" in my life. Anyhow, if you really feel your sentence can't do without it you can say "conversely" which is pretty short and clear.
  mitb6 31 minutes ago
  OTOH dates back to the 90s and has since remained very common in internet writing. It is more surprising that you've never seen it than that someone used it.
  It also isn't an exact synonym of "conversely".
  catigula 28 minutes ago
  There aren't any exact synonyms in English.
  I've been an extensive internet user for decades and I don't have it in memory, so I'm not sure how to feel about your assertion. I'm not the only person saying this.
  dylan604 2 hours ago
  We are not in a text chat using T9 on a numeric keypad where typing is painful. There’s no need for acronyms now except for the attempt at not looking like an old or just lazy. We’re also not limited to 140 chars, so not an advantage there either.
  arcanemachiner 7 hours ago
  > OH
  Other half? I've never seen this acronym before.
  Upvoter33 3 hours ago
  sounds snarky and defensive, tbh
- timschmidt 11 hours ago
  > Do you know every common programing language? The LLM does, plus it can code in FRACTRAN, Brainfuck, Binary lambda calculus, and a dozen other obscure languages.
  Not only this, but they're surprisingly talented at reading compiled binaries in a dozen different machine and bytecodes. I have seen one one-shot an applet rewrite from compiled java bytecode to modern javascript.
  catigula 3 hours ago
  And herein lies the fundamental power of the LLM and why it can even solve "impressive" problems: it is able to navigate a space that humans can't trivially - massive amounts of information and ability to parse through walls of simple logic/text.
  LLMs are at their best when the context capacity of the human is stretched and the task doesn't really take any reasoning but requires an extraction of some basic, common pattern.
  dylan604 2 hours ago
  > it is able to navigate a space that humans can't trivially - massive amounts of information and ability to parse through walls of simple logic/text.
  That’s the very reason we built computers. If an LLM did not also meet this definition, there would be no point of it existing
  catigula 2 hours ago
  You're not the first person to suggest that LLMs have no reason to exist.
  anthk 9 hours ago
  Binwalk, Unicorn... as if it that was advanced wizardry. Unix systems have file(1) since forever and binutils from and to every arch.
  Energiekomin 4 hours ago
  Yes it is and you compare apples with pineapples.
  file can't program in brainfuck while doing basic binary analysis.
  Binwalk and Unicorn can't do that either. And they can't write to you in multiply natural languages either
- yMEyUyNE1 7 hours ago
  > There's simply so much to know that the LLM has an inherent advantage.
  But do they understand it? I mean, A child used swear words, but does it understand the meaning of the swear words. In other comment, somebodies OH also mentioned about artistic abilities and utility of the words spoken.
- esafak 12 hours ago
  But the LLM can already connect things that you can not, by virtue of its breadth. Some may disagree, but I think it will soon go deeper too.
- anthk 9 hours ago
  So impressive that every complex SUBLEQ code I've tried with an LLM failed really fast.
marcodiego an hour ago
> [..] models are [...] limited in [...] ability to answer knowledge-intensive questions [...], they did not memorize the relevant facts. [...] This is probably because the required knowledge cannot easily be accessed via papers [...] but rather by lookup in specialized databases [...], which the humans [...] used to answer such questions [...]. This indicates that there is [...] room for improving [...] by training [...] on more specialized data sources or integrating them with specialized databases.
> [...] our analysis shows [...] performance of models is correlated with [...] size [...]. This [...] also indicates that chemical LLMs could, [...], be further improved by scaling them up.
Does that means the world of chemists will be eaten by LLMs? Will LLMs just improve chemists output or productivity? I'd be scared if this happened in my area of work.
- X6S1x6Okd1st 29 minutes ago
  It's increasingly looking like if you're young enough most knowledge work will be eaten by LLMs (or the thing that comes next) within your lifetime.
  Hopefully we'll see human assisted with AI & induced demand for a good while, but the idea that people work unassisted in knowledge work is gonna go the way of artisan clothing
pu_pe 8 hours ago
Nice benchmark but the human comparison is a little lacking. They claim to have surveyed 19 experts, though the vast majority of them have only a master's degree. This would be akin to comparing LLM programming expertise to a sample of programmers with less than 5 years of experience.
I'm also not sure it's a fair comparison to average human results like that. If you quiz physicians on a broad variety of topics, you shouldn't expect cardiologists to know that much about neurology and vice-versa. This is what they did here, it seems.
- KSteffensen 7 hours ago
  I'll get some downvotes for this but PhD vs master's degree difference is mostly work experience, an element of workload hazing and snobbery.
  Somebody with a masters degree and 5 years of work experience will likely know more than a freshly graduated PhD
  698969 5 hours ago
  I think the breadth vs depth thing applies here as well, the PhD will know more about the topic they're researching of course.
  eesmith 7 hours ago
  Sure, but all we know is that these "13 have a master’s degree (and are currently enroled in Ph.D. studies)". We only know they have at least "2 years of experience in chemistry after their first university-level course in chemistry."
  How does that qualify them as "domain experts"? What domain is their expertise? All of chemistry?
6LLvveMx2koXfwn 11 hours ago
Received 01 April 2024
Accepted 26 March 2025
Published 20 May 2025
Probably normal but shows the built in obsolescence of the peer review journal article model in such a fast moving field.
- rotis 6 hours ago
  Yes, this paper and many others will be forgotten as soon as they leave the front page. Afterwards noone refers to articles like these here. People just talk about anecdotes and personal experiences. Not that I think this is bad.
- eesmith 9 hours ago
  How so?
  To me it looks like the paper was submitted last year but the peer reviewers identified issues with the paper which required revision before the final acceptance in March.
  We can see the paper was updated since the 1 April 2024 version as it includes o1-preview (released September 2024, I believe), and GPT‑3.5 Turbo from August. I think a couple of other tested versions also post-date 1 April.
  Thus, one possible criticism might have been (and I stress that I am making this up) that the original paper evaluated only 3 systems, and didn't reflect the fully diversity of available tools.
  In any case, the main point of the paper was not the specific results of AI models available by the end of last year, but the development of a benchmark which can be used to evaluated models in general.
  How has that work been made obsolete?
  bufferoverflow 9 hours ago
  How so? All the models they've tested are obsolete, multiple generations behind high-end versions.
  (Though even these obsolete models did better than the best humans and domain experts).
  eesmith 8 hours ago
  As I wrote, the main point of the paper was not the specific model evaluation, but the development of a benchmark which can be used to test new models.
  Good benchmark development is hard work. The paper goes into the details of how it was carried out.
  Now that the benchmark is available, you or anyone else could use it to evaluate the current high-end versions, and measure how the performance has changed over time.
  You could also use their paper to help understand how to develop a new benchmark, perhaps to overcome some limitations in the benchmark.
  That benchmark and the contents of that paper are not obsolete until there is a better benchmark and description of how to build benchmarks.
- Jimmc414 10 hours ago
  shows the value of preprint servers like arxiv.org and chemrxiv.org
gavinray 4 hours ago
I asked several LLM's after jailbreaking with prompts to provide viable synthesis routes for various psychoactive substances and they did a remarkable job.
This was neat to see but also raised some eyebrows from me. A clever kid with some pharmacology knowledge and basic organic chemistry understanding could get up to no good.
Especially since you can ask the model to use commonly available reagents + precursors and for synthesis routes that use the least amount of equipment and glassware.
- Workaccount2 20 minutes ago
  You need a decent amount of experience to make psychoactive substances. Chemistry is one of those things that looks like you just follow the steps, but in practice requires a ton of intuition and "feeling it". You can see this if you watch NileRed on youtube, he is a pretty experienced chemist, and even then still flops all the time trying to replicate reactions right out of the book.
  Besides, the books Pihkl and Tikhl lay out how to make most psychoactive substances, and those books have been online for free for decades now.[1][2] Maybe there are easier routes and easier to acquire precursor recipes, but I doubt those would be hard to find. The hardest part by far is the chemistry intuition.
  [1]https://erowid.org/library/books_online/pihkal/pihkal.shtml [2]https://erowid.org/library/books_online/tihkal/tihkal.shtml
- dylan604 2 hours ago
  My limited bit of knowledge of both chemistry and LLMs would tell me that subtle incorrect chemistry can have disastrous effects while subtle incorrect is an LLM superpower suggests that this is precisely the inevitable outcome
sgt101 6 hours ago
Also, books, books are really good for finding knowledge !
Seriously LLM's as a cultural technology cast them as a super interactive indexing system. I find that's a useful lens to use to understand this kind of study.
AvAn12 3 hours ago
How much of this is because Scale AI and others have had human “taskers” create huge amounts of domain-specific content for OpenAI and other foundation model providers?
fuzzfactor 2 days ago
Nothing to see here unless you have some kind of unsatisfied interest in the future of AI :\
This is all highly academic, and I'm highly industrial so take this with a grain of salt. Sodium salt or otherwise, your choice ;)
If you want things to be accomplished at the bench, you want any simulation to be made by those who have not been away from the bench for that many decades :)
Same thing with the industrial environment, some people have just been away from it for too long regardless of how much familiarity they once had. You need to brush up, sometimes the same plant is like a whole different world if you haven't been back in a while.
- mistrial9 14 hours ago
  BASF Group - will they speak in public? probably not, given what is at stake IMHO