From the conclusions:
> I find that AI substantially boosts materials discovery, leading to an increase in patent filing and a rise in downstream product innovation. However, the technology is effective only when paired with sufficiently skilled scientists.
I can see the point here. Today I was exploring the possibility of some new algorithm. I asked Claude to generate some part which is well know (but there are not a lot of examples on the internet) and it hallucinated some function. In spite of being bad, it was sufficiently close to the solution that I could myself "rehallucinate it" from my side, and turn it into a creative solution. Of course, the hallucination would have been useless if I was not already an expert in the field.
I came to the same conclusion a while back. LLMs are very useful when user expertise level is medium to high, and task complexity is low to medium. Why ? because it those scenarios, the user can use the LLM as a tool for brainstorming on drawing the first sketch before improving it. Human in the loop is the key and will stay key for the forceable future no matter what the autonomous AI agent gurus are saying. https://www.lycee.ai/blog/mistral-ai-strategy-openai
"when user expertise level is medium to high, and task complexity is low to medium" – this reminds me of Python itself. Python isn't the best at anything, it's slow, sometimes vague in its formalisms, etc. But it keeps being super popular because most work is low to medium complexity. In everyone's work, from a simple webdev to an AI researcher, there are moments of complexity in the work but most of the work is getting through the relatively simple stuff.
Or maybe in general we can say that to do something really hard and complex you must and should put a lot of effort into getting all the not-hard not-complex pieces in place, making yourself comfortable with them so they don't distract, and setting the stage for that hard part. And when you look back you'll find it odd how the hard part wasn't where you spent most of the time, and yet that's how we actually do hard stuff. Like we have to spend time knolling our code to be ready for the creative part.
So it's not so much an "artificial intelligence" as it is an "intelligence amplifier", with the usual amplifier feedback loop.
Habitual Artificial Intelligence contrasts nicely with Artifial General Intelligence. It parses data and forms habits based on that data. When you want to discover something new, you have to break out of a habit and think. It forms some habits better than others also.
When i saw how Alphazero played chess back in 2017, different than other engines, that's what i described it usually, as a habit forming machine.
exactly or one can say that all artificial intelligences are and will be human intelligence amplifiers
Yes, amplification is really apt and suitable analogy comparison.
Just treat the hallucinations as the non-linear distortion and harmonics phenomena that come from amplification process. You can just filter the unwanted signals and noises judiciously if you're well informed.
Taking this analogy further you need to have an appropriate and proper impedance matching to maximize the accuracy, and impedance matching source or load-pull (close-loop or open-loop) and for LLM it can be in the form of RAG for example.
I love a good analogy. Can you expand on this one?
I wonder if the next generation of experts will be held back by use of AI tools. Having learned things “the hard way” without AI tools may allow better judgement of these semi-reliable outputs. A younger generation growing up in this era would not yet have that experience and may be more accepting of AI generated results.
> Having learned things “the hard way” without AI tools may allow better judgement
I see a parallel in how web search replaced other skills like finding information in physical libraries. We might not do research the old way, but we learned new tricks for the new tools. We know when to rely on them and how much, how to tell useful from garbage. We don't write by hand much, do computation in our heads much, but we type and compute more.
I'm pretty sure people said the same thing about compilers.
That's how progress works. Clever people will still be clever, but maybe about slightly different things.
Pretty sure people say the same thing about compilers even today. They insist of using interpreters instead of compilers.
Exactly agree - as the author of the paper fears there will be overreliance of AI
Yeah, as a cs student, some professors allow use of LLM's because it is what will be a part of the job going forward. I get that, and I use them for learning, as opposed to internet searches, but I still manually write my code and fully understand it, cause I don't wanna miss out on those lessons. Otherwise I might not be able to verify an LLM's output.
Reminds me of the "Learn X the Hard Way" series, distributed as PDF I think, on the idea that if there's code samples you should transcribe them by hand because the act of transcribing matters.
Maybe that's an argument for simpler chat modalities over shared codepads, as forcing the human to assemble bits of code provided by the LLM helps keep the human in the driver's seat.
Excellent approach. You will be leagues ahead of someone who relies on LLM alone.
Yeah. My favorite professor this semester constantly says "hey, if you rely to much on the robot, and can't do this yourself, you won't get a job." I know some people are just here for the paper, but that makes me feel better when I'm having a hard time finding a new role..
I call this the "babysitting problem."
If a model is right 99.99% of the time (which nobody has come close to), we still need something that understands what it's doing enough to observe and catch that 0.01% where it's wrong.
Because wrong at that level is often dangerously wrong.
This is explored (in an earlier context) in the 1983 paper "Ironies of Automation".
> we still need something that understands what it's doing enough to observe and catch that 0.01% where it's wrong.
Nobody has figured out how to get a confidence metric out of the innards of a neural net. This is why chatbots seldom say "I don't know", but, instead, hallucinate something plausible.
Most of the attempts to fix this are hacks outside the LLM. Run several copies and compare. Ask for citations and check them. Throw in more training data. Punish for wrong answers. None of those hacks work very well. The black box part is still not understood.
This is the elephant in the room of LLMs. If someone doesn't crack this soon, AI Winter #3 will begin. There's a lot of startup valuation which assumes this problem gets solved.
> There's a lot of startup valuation which assumes this problem gets solved.
Not just solved, but solved soon. I think this is an extremely difficult problem to solve to the point it'd involve new aspects of computer science to even approach correctly, but we seem to just think throwing more CPU and $$$ at the problem will work itself out. I myself am skeptical.
Is there any progress? About two years ago, there were people training neural nets to play games, looking for a representation of the game state inside the net, and claiming to find it. That doesn't seem to be mentioned any more.
As for "solved soon", the market can remain irrational longer than you can stay solvent. Look at Uber and Tesla, both counting on some kind of miracle to justify their market cap.
I get the impression that most of the 'understand the innards' work isn't scalable - you build out a careful experiment with a specific network, but the work doesn't transfer to new models, fine-tuned models, etc.
I'm just an outside observer, though...
Tesla was mildly successful right until ite CEO satrted to fight its customers. It's unclear if this will revert.
Uber seems to have become sustainable thid year.
There's little reason to expect a correction any soon on any of those.
I’m pretty sure humans make mistakes too and it happens rather frequently that nobody catches them until it’s too late. In most fields we’re okay with that because perfection is prohibitively expensive.
Obviously systems have always had to be resilient. But the point here is how dangerous a "set it and forget it" AI can be. Because the mistakes it makes, although fewer, are much more dangerous, unpredictable, and inscrutable than the mistakes a human would make.
Which means the people who catch these mistakes have to be operating at a very high level.
This means we need to resist getting lulled into a false sense of security with these systems, and we need to make sure we can still get people to a high level of experience and education.
I find proofreading the code gen ai less satisfying than writing it myself though it does depend on the nature of the function. Migrating mindless mapping type functions to autocomplete is nice
This is one big point I've subscribed to, I'd rather write the code and understand it that way, than read and try to understand code I did not write.
Also, I think it would be faster to write my own than try to fully understand others (LLM) code. I have developed my own ways of ensuring certain aspects of the code, like security, organization, and speed. Trying to knead out how those things are addressed in code I didn't write takes me longer.
Edit; spelling
Yes, I have experienced it, too. I was building a web crawler using Replit as an agent. I could have done that in 2 hours without LLM help but I wanted to see how the LLM would do it. I gave it a set of of instructions but the LLM could not execute on it. It later choose an alternative path but that also did not yield. I then gave an exact list of steps. Results were slightly better but not what I was expecting. Overall, it's good to get something going but you still have to hold hands. It is not the best but also not the worst experience.
Yeah I had similar experience where I ask why a bug was happening but it gave me some thing that looked wrong, but upon closer inspection it pointed to a vague general direction where I haven’t thought of and i solved my bug with its help. The caveat is you still need to know your shit to decipher/recognize it.
“Survey evidence reveals that these gains come at a cost, however, as 82% of scientists report reduced satisfaction with their work due to decreased creativity and skill underutilization.”
What an interesting finding and not what I was expecting. Is this an issue with the UX/tooling? Could we alleviate this with an interface that still incorporates the joy of problem solving.
I haven’t seen any research that Copilot and similar tools for programmers have a similar reduction in satisfaction. Likely with how much the tools feel like an extension of traditional auto complete, and you still spend a lot of time “programming”. You haven’t abandoned your core skill.
Related: I often find myself disabling copilot when I have a fun problem I want the satisfaction of solving myself.
I feel if people are finding programming as creative and interesting with AI as without there is a chance they actually prefer product management?
Half statement, half question… I have personally stopped using AI assistance in programming as I felt it was making my mind lazy, and I stopped learning.
The thing I like the most about AI coding is how it lowers the threshold of energy and motivation needed to start a task. Being able to write a detailed spec of what I want, or even discussing an attack plan (for high-level architecture or solution design) and getting an initial draft is game-changing for me. I usually take it from there, because as far as I can tell, it sucks after that point anyway.
This makes sense for sure. Have you been getting good results with something more complicated than basic CRID type applications?
o1-preview is the best model I've tried thus far, but I wouldn't say it's even capable of putting a basic CRUD app together, without constant coaxing major adjustments and on my part.
As a programmer I feel that software development as in "designing and building software products" can be still be fun with AI. But what absolutely isn't fun is feeding requirements written by someone else to ChatGPT / Copilot and then just doing plumbing / QA work to make sure it works. The kind of work junior devs would typically do feels devalued now.
AI appears to have automated aspects of the job scientists found most intellectually satisfying.
- Reduced creativity and ideation work (dropping from 39% to 16% of time)
- Increased focus on evaluating AI suggestions (rising to 40% of time)
- Feelings of skill underutilization
> Related: I often find myself disabling copilot when I have a fun problem I want the satisfaction of solving myself.
The way things seem to be going, I'd be worried management will find a way to monitor and try cut out this "security risk" in the coming months and years.
"The tool automates a majority of “idea generation” tasks, reallocating scientists to the new task of evaluating model-suggested candidate compounds. In the absence of AI, researchers devote nearly half their time to conceptualizing potential materials. This falls to less than 16% after the tool’s introduction. Meanwhile, time spent assessing candidate materials increases by 74%"
So the AI is in charge, and mostly needs a bunch of lab assistants.
"Machines should think. People should work." - not a joke any more.
Interesting, a large US company with over 1000 materials scientists (there can only be a handful of those) introduced a cutting-edge AI tool and decided to make a study out of it / randomize it and gave all the credentials to some econ PHD student. Would love to know more about how this came to be. Also, why his PHD supervisor didn't get a co-author, never seen that. I'm always slightly suspicious of these very strong results without any public data / way to reproduce it. We essentially have to believe 1 guys word.
It’s interesting to see how this research emphasizes the continued need for human expertise, even in the era of advanced AI. It highlights that while AI can significantly boost productivity, the value of human judgment and domain knowledge remains crucial.
Even Warren McCulloch and Walter Pitts were the two who originally modeled neurons with OR statements, realized it wasn't sufficient for a full replacement.
Biological neurons have many features like active dendritic compartmentalization that perceptrons cannot duplicate.
They are different with different advantages and limitations.
We have also known about the specification and frame problems for a long time also.
Note that part of the reason for the split between the symbolic camp and statistical camp in the 90s was due to more practical models being possible with existential quantification.
There have been several papers on HN talking about a shift to universal quantification to get around limitations lately.
Unfortunately discussions about the limits of first order logic have historical challenges and adding in the limits of fragments of first order logic like grounding are compounded upon those challenges with cognitive dissonance.
While understanding the abilities of multi level perceptrons is challenging, there is a path of realizing the implications of an individual perceptron as a choice function that is useful for me.
The same limits that have been known for decades still hold in the general case for those who can figure a way to control their own cognitive dissonance, but they are just lenses.
As an industry we need to find ways to avoid the traps of the Brouwer–Hilbert controversy and unsettled questions and opaque definitions about the nature of intelligence to fully exploit the advantages.
Hopefully experience will tempor the fear and enthusiasm for AGI that has made it challenging to discuss the power and constraints of ML.
I know that even discussing dropping the a priori assumption of LEM with my brother who has a PhD in complex analysis is challenging.
But the platonic ideals simply don't hold for non-trivial properties, and no matter if we are using ML or BoG Sat, the hard problems are too high in the polynomial hierarchy to make that assumption.
How generalizable are these findings given the rapid pace of AI advancement? The paper studies a snapshot in time with current AI capabilities, but the relationship between human expertise and AI could look very different with more advanced models. I would love to have seen the paper:
- Examine how the human-AI relationship evolved as the AI system improved during the study period
- Theorize more explicitly about which aspects of human judgment might be more vs less persistent
- Consider how their findings might change with more capable AI systems
Conclusion: Augmented Intelligence is more useful than Artificial Intelligence.
Any idea if the points raised here
https://pubs.acs.org/doi/10.1021/acs.chemmater.4c00643
were considered in the analysis?
would there be a difference in accuracy of the statement if you replace AI w/ "data science and statistical models"?
Well I hope it works well and fast enough. I cannot wait for my 10k cycles, 300 Wh/kg batteries. 35% efficiency solar modules in market at cheap prices and plenty of nanotech breakthroughs that were promised yet we are still waiting on
Well damn, that’s a lot more specific and empirical than I was expecting given the title. Fascinating stuff, talk about a useful setup for studying the issue! “AI is useless to many but invaluable to some” (as mentioned in the abstract) is a great counterpoint to anti-AI luddites. No offense to any luddites on here ofc, the luddites were pretty darn woke for their time, all things considered