We're increasingly aware today of how the media operates cycles of self-referential and self-justifying citations: a TV show will quote an article that reports "some people" taking an issue, which ends up being a quote from someone interviewed for another newspaper article.. and so on. This "legitimacy laundering" is rampant, and we're now getting towards media literacy levels which expose it for many people.
However, most concerning: this is how academia has always worked. It's the great absurdity of peer review, and of citation. This is how entire fields can sustain themselves with little or no scientific validity (esp. see, psychometrics).
We are no where near the equivalent "academic literacy" for generally informed members of the public to understand this problem. Entire fields can be sustained with zero "empirical pressure" to close down. So long as one can cite another who can cite another... and somewhere some government body will take these citations as prima facie evidence of a research programme, then funding will be given and more papers published.
Economics works exactly like this, but at a fractal level: there are many different subfields living in the grand family of economists which barely communicate with one another, often have dramatically different theories that lead to opposing conclusion, none of which even trying to give an accurate description of the world but instead promote one particular ideological bias.
Every once is a while someone from one chapel is going to actually do science by refuting one other chapel's bullshit, but it's almost never seen as a good thing by anyone, and the Nobel committee sometimes ends up giving the price together to people from different chapels even when their work contradict each other (Fama being empirically being proven wrong by Schiller and both being given the price in 2013 to appease the feud).
Economic behaviors are so complex that studying economics seriously requires gigantic efforts for little results, and people who attempt to do that stand no chance of survival in the publish-or-perish competition against the grandiose bullshitters, and that's how we get where we are.
This happens everywhere. Let me tell you a story about who studies language and how.
Linguistics studies language right? It has a few hundred year history of doing so seriously. But once AI people wanted to also have a look linguists became unhappy. They didn't like the idea of someone else saying something about their problem with a different set of tools. Those tools don't provide explanations that they approve of. Even the thought of measuring things over a dataset to check if your linguistic theories work made them bristle with Chomsky famously rejecting the idea of evidence at scale.
So AI people created natural language processing. Their own field to study the same problem. One in a while someone tried to do something to bridge the gap which led to the famous "every time I fire a linguist my performance goes up".
Many people in NLP see linguists as outdated fossils. Most would probably vote to just shut down linguistics departments I suspect.
Then something funny happened. The LLM people did to NLP what they did to linguistics. And the NLP community rejected them. Because LLMs don't provide an explanation that they approve of. Recently just like in linguistics the President of ACL, the main NLP organization, said recently that NLP is not about machine learning! The absurdity of that statement just boggles the mind. So the LLM people now have their own conference and field.
A lot of people cannot accept that things must change over time. They want to do research the way they did it 10 or 40 years ago. So they create an immune response. It's human nature.
As someone who doesn't feel this way and enjoys moving forward, I'm really sad about what's happened to NLP in the past few years. It would be nice if we could stop this rift. But it won't happen with the current hate filled President. I stopped sending any work there until they reform.
Just the cycle of academic life. I wonder if science would move faster if we could select for people who are willing to change with the times.
The goal of physics isn't to produce video games -- that a video game can generate a frame using unphysical formulae and unphysical processes (eg., rasterisation) does not invalidate physics.
There's linguistics which produces descriptive theories of the practice of language. There's "computational linguistics" which aims to build discrete algorithmic models of language. Both of these are interested in explanations of actual physical phenomena.
Machine Learning is largely an engineering discipline which, like video games, can generate output using any method which will convince a human user. Indeed, often more radically, ML works by replaying variations of past convincing output rather than have any explicit simulation involved at all.
Thus there's no sense in which LLMs supercede linguistics, nor somehow computational linguistics superseding this.
This would be like claiming rasterisation means we should retire optics and the physics of light.
That's a nice dividing line, science and engineering are different for sure. But science makes use of engineering knowledge and engineering makes use of scientific knowledge.
That's like saying that it's not astronomy if I'm using a radio telescope, only optical telescopes are real astronomy. It makes zero sense. The tools are irrelevant. We don't have physics with and without ml journals.
It's like saying AlphaFold isn't biology because it uses ml. It's absurd
One goal of linguistics is to produce an account of how language works. If you understand how something works you can reproduce it. This was always a core part of linguistics. Just look at Chomsky's work. Or at what counts as evidence in a linguistics paper. It is a model that "explains" some feature of language.
NLP does exactly the same thing as linguistics. It produces explanations and makes predictions about language. But it does so with ML.
I think you're confused about optics and rasterization. There are countless ways to relate the two. Rasterization is not unphysical, no more so than say, finite element methods are unphysical. Heck physics is in the name of PBR. It's an approximation. And a fine one that people actually use in both science and games.
My master's project was applying non-gradient-based ML methods to parameter optimisation in quantum metrology -- I'm aware that the reskin on curve-fitting algorithms called "Machine Learning" may be used as part of science. We've had taylor series approximations to functions for 300 years.
But insofar as we're talking about LLMs, and most products you can use, we're talking about an engineering use. Engineers build systems that "perform" as you say explanatory accuracy is not a kind of engineering performance.
It would equally well be said that, for a video game studio, "the more physicists I fired, the more the games perform better" -- for, of course, physicists have never studied 'making a video-game image convincing', and their techniques are best run on supercomputers, since they'd build explanatory simulations with explanatory accuracy. Video game developers do not do this. I have developed video games also, so I know.
Convincingly placing pixels on a screen as-if governed by the laws of physics does not require actually simulating them. It requires generating frames from a "point of view" that never expose the absense of real physics, or approximations thereof. The world isnt made of triangles, and you cannot clip thru solid objects.
The goal of a person making an LLM is similar to the goal of a video game developer, these are engineering goals: to give a user an experience which they will pay for. Now, a 2d 80s video-game has more actual physics in it than an LLM has models of language use, but still, this is irrelevant to the goals of these creators.
A linguist could analyse an LLM as the conditional probability structure of the english language, as captured largely recent electronic texts -- but this structure is partly what precisely linguists and others are trying to explain.
This is why all curve-fitting to historical data isn't in itself science: it is a restatement of the very target of explanation, the data. The job of scientific fields is to explain that curve.
Its explotation by engineers is explanation-free. It isnt a science, and replaces no science.
And my PhD and dayjob is doing research on this.
If a student told me they had this view of what ML is, I would tell them that we've failed to educate them.
The thought that physics doesn't care about performance or approximation is silly. Just look at AlphaFold. Heck, I talk to climatologists and material scientists that want the equivalent all the time.
Prediction is the heart of all science. Whether we're talking neroscience, linguistics, physics, etc.
You think people who run things on supercomputers want to do so for some idealistic notion of what science is? No. They have to do so because they don't have good approximations. Just like with protein folding. Places like DESRES used to build supercomputers for that. This is over now.
ML models learn representations of data which can then be reused for many tasks. There's a whole field where we try to understand those representations. Those are explanations of what's going on given that they're such good predictions. The embeddings you get from an LLM are better models of language than anything linguistics ever accomplished. Their goal should be to explain them and probe their limits instead of complaining. If linguists has come up with gpt they would have been celebrating, the method by which you do science is irrelevant as long as it works.
I'll close by quoting Dawkins. "Science. It works, bitches". That's the value of science. Can we predict which molecule will cure cancer? Everything else is ideology and silly thoughts from before the paradigm shift.
Prediction is not the heart of science, this is early 20th C. mumbojumbo and humean nonesense that gets repeated by curve-fitters because it's all they do.
Explanation is the heart of science, not prediction. All predictions newton would have made of the orbits of the planets would have been wrong (and so on). And this goes for the vast majority of textbooks physics when its applied to very many ordinary situations: no predictive power at all.
ML models learn "representations" of the data, yes. They are models of measures. The model is just an f=sample({(all possible measures,)}). Science provides representations of the data generating process, ie., reality. It says why those are the measures, why the temperature of gas has that value, not a report on what those values were. Nor even a compressed conditional probability model of those values -- there is no atomic theory in a zip of temperatures.
The purpose of a scientific model is to explain these data-representations, not merely to predict them based on some naive regularity assumption about the measurement device: that it will always measure that way in the future.
The reality of the ML is that it offers only intra-distribution generalisation, nothing of the kind of generalisation science offers where all possible distributions induced by intervention on the (explanatory) variables of scientific models are captured. And this is often a scam that only academics can get away with, ex hyp., just assuming that the test distribution "will turn out as expected" in premise.
The reality is that this sort of repetition of historical data, requires extreme control over the data generating process which gives rise to the test distribution. How is that control delivered in practice? If its a medical lab, through untold amounts of toil delivering, say, histological slides "just right" so this dumb process almost works. If its a face tracking, well you'd be hope its not being used by the police -- because they aint orienting the camera at 3.001m at ISO 151.5 from the masses.
This is the problem with models that obtain predictive power without explanatory content: they rely on prediction time being rigged with, in-practice, extreme control mechanisms that are wholly unstated and unknown by the "modellers". Because these are no models of reality at all, but mere repetitions of historical data with unknown, unmodelled and hence unexplained similarity.
Science concerns itself with explanation, that is its goal. Prediction is instrumental. Engineering's goal is the utility of the product, and so any strategy whatsoever, even a dumb, "if its happened before, itll happen again" is permitted.
There is no textbook of physics which models reality by saying, "Well, we suppose in the future, the positions and velocities will just follow the same distribution, but we've no idea why, and how dare you ask, and get out, and doesnt my Ideal-Gas-TransformerModel look pretty? It gets the pressure right for Argon at 20.0001 C in glass jars at about 2.002L"
Prediction and explanation are basically equivalent IMO. A predictive model entails an explanation, and an explanation entails a predictive model. Predictive models that are more accurate are more accurate explanations, and predictive models that are more precise are more precise explanations, and vice versa. They are not that distinct.
They are highly distinct.
Compare reporting the temperature tomorrow as a mode of all temperatures in November at your location, with a climate & weather simulation involving: cloud layers, the ocean, etc.
The former is likely to be vastly more predictively accurate than the latter, but explains nothing.
Explanatory models are often less predictively accurate than these (weakly inductive) predictive models. Their purpose is to tell us how reality works, and that provides some insight as to when we can adopt merely predictive approaches. This is because merely predictive models capture accidental features of measurement which hold up for awhile in some environments, that we wouldn't wish to explain.
Without explanatory insight we find merely predictive models catastrophically collapse, and are otherwise, highly fragile. Eg., consider the performance of "predict the mode" in a snow storm.
If you want a midly formal analysis of the difference: explanatory models quantify over causal properties of reality, describe their relationship, and provide necessary inferential methods for deducing conclusions from models. They permit arbitrary simulation across all relevant measuring systems.
Merely predictive models quantify over historical measurements, assume similarity conditions across them, and assume future similarity to past cases. They provide only extremely weak inferential grounds for any inference. They can offer only repetition of one kind of measure, and cannot simulate the state of other relevant measuring devices or in different environments of measurement.
Explanatory models describe reality. Predictive models describe the measurement device you happened to use, in the environment you specifically used it in.
> The former is likely to be vastly more predictively accurate than the latter, but explains nothing.
It actually explains quite a bit, most importantly that weather is cyclical with only statistically minor variations around the mode.
This predictive model is also very specific, rather than general. There are plenty of "explanations" that are also not predictive, like that "Thor creates thunder".
Explanations and predictive models that are both general and minimize parameters are better, and they are interchangeable.
As I've said, merely predictive models do not quantify over the features of reality.
A weakly inductive model is simply this: the next case will be some average of the prior cases because we assume unknown aspects of the environment will ensure it is so.
An explanatory model tells you why. It says: there are planets, gravity, molecules, mountains, germs, atoms, and so on.
The semantics of explanations are causal properties of reality, and their relationship.
Explanations do not have "parameters" to "minimise". The universal law of gravitation is not a curve fit to data. There was never a dataset of F,M,m,r; netwon did not fit any parameters -- there are no such parameters.
Explanations absolutely do have parameters, called premises, and we absolutely do try to minimize them. There's a whole principle about it called Occam's razor.
Those are propositions with existential quantifications, not parameters, and we do not minimise them
We use comparisons over the richness of existential commitments in theory comparison, but NOT in the creation of theories nor in their specification
These are all distinctions without a difference when analyzing theory descriptions under a unified framework, like ordering them by Kolmogorov complexity, eg. a Turing machine description is equivalent to a universal Turing machine+a suitably encoded tape.
Yes, indeed, at the level of abstraction in which everything is syntax without any semantics, then everything is the same.
That's as content free as saying, "since the sun is ordered, and my CD collection is ordered, the sun and my CDs are the same".
This "unified framework" you're talking about is just an analysis of functions over whole numbers.
I enjoyed your conversation and just want to chip in that there are as many definitions of science and knowledge as there are philosophers. One don’t have to have only one definition, but usually one have to adhere to the ones within the realm of ones scientific paradigm to be accepted and to develop the science. Normal science as Kuhn called it.
If you look at the history of these accounts of science though, they are "pre modern" in a negative sence.
We didnt have formal methods of causal analysis to the 1920s, and it took til the 80s to have a real robust formalism and account of causal analysis.
Before, experimenters would "have in their heads" the causal knowledge of how to conduct and interpret these results -- but this was never formalised, or given explicitly.
So accounts of science before this "causal revolution" of the late 20th C. are broken, and based on a philosopher's literal reading of scientific experiments (and the like) with little undersatnding of how the experimenters actaully thought about them -- and philosophers often doctrinally opposed to entertaining these thoughts seriously.
Today the partition between science/engineering, prediction/explanation, etc. can be given on much more robust grounds, and there's few philosophers of science working that adopt positions wholly at odds with my account. What you find from ML-ists is humean conceptions which ideologically honour their curve-fitting methods with equivalent status to actual experimentation and explanation
> There is no textbook of physics which models reality by saying, "Well, we suppose in the future, the positions and velocities will just follow the same distribution, but we've no idea why, and how dare you ask, and get out, and doesnt my Ideal-Gas-TransformerModel look pretty? It gets the pressure right for Argon at 20.0001 C in glass jars at about 2.002L"
That's precisely what physics textbooks say!
Put most famously by Feynman (originally by Mermin): "Shut up and calculate."
Entanglement is nonsense? Too bad it works. There's no explanation. Don't like the notion of virtual particles that violate the conservation of energy? Too bad, it just works. Don't like notion of wave function collapse and that we have no mathematical model of it. Too bad, it works. etc.
Explanations are models. Models can be mental models. They can be mathematical models. They can be computational models. But all that matters is that they make good predictions. Nothing else.
It's totally irrelevant whether you or anyone else personally feels they're provided with an explanation or not.
There’s no single forward direction in science. We’re in a machine learning hype cycle. That doesn’t mean we must fire everyone in every other department.
Totally, and I think there is a fundamental, deeper, inherent problem in using statistics to determine how you want to manipulate an given object of study.
Researches find that, on average, consumers want X. Companies decide they want to maximize reach and so begin to produce X. Consumers soon have little choice except for X, reaffirming, to future researchers, that consumers want X.
This is why I think a plurality of data is so necessary especially when it comes to anything in the social domain, and why it's imperative that we begin to invest more into cross-disciplinary research. Specialization has gotten us far, but it's starting to lead to breakdowns. The only thing that might offset the statistical observation of consumer behavior is a statistical study of consumer opinion that proves to refute or contradict that behavior... but then in response some people will say "people don't know what they want" further reaffirming the conclusions that the action based on the data itself caused (behavior or observation bias). The application of statistics to social problems essentially becomes a self-fulfilling prophecy.
I fully agree. How many IQ effects in studied populations are actually created by the use of IQ tests?
Consider how infiltrated IQ-like assessment in throughout society, selecting for doctors, lawyers, postgrads -- military, police, etc. Then consider what data is offered as evidence that IQ 1) exists, and 2) causes observable measures in real-world outcomes. Filtering on the test becomes evidence the test is a measure of anything.
The application of "statistics" I dislike the most is where these feedback cycles exist, and large swathes of academia have some extreme responsibility here.
Whole fields of gene-traits studies were created over decade+ and then disappeared overnight as actual sequencing took place. All the rigour and splendour of "statistics", and then poof when science was done, it disappeared.
Since there are basically no scientific theories of human psychology, society, and the like -- gluing together correlations here should be seen as prima facie absurd. The alternative? Rely on expertise, and build resilience-to-failure into the system and tolerance for higher variability.
Human expertise obtained in domain-specific environments is vastly superior to the species correlations of surveys written by idiots who've never done any actual science.
Great video about that by Kurzgesagt: https://www.youtube.com/watch?v=bgo7rm5Maqg&ab_channel=Kurzg...
They show how deep they had to go to find the original source of the claim that a single human's blood vessels, if lined up, would stretch 100,000km... and how that was quoted by so many that no one really know where the claim came from. And of course, it was wrong.
Damn it, I had just posted this video too. Off to delete it.
Every single time (some 4 times in my past life) when I’m familiar with/close to the background story of something that played out in the news I come to the same conclusion: news doesn’t report the facts. I’m in Western Europe, btw. I advise to run this experiment yourself.
Another name for a similar but slightly more pernicious problem is the Gell-Mann Amnesia effect: "the phenomenon of experts reading articles within their fields of expertise and finding them to be error-ridden and full of misunderstanding, but seemingly forgetting those experiences when reading articles in the same publications written on topics outside of their fields of expertise, which they believe to be credible."
https://en.wikipedia.org/wiki/Michael_Crichton#Gell-Mann_amn...
No journalist will ever have the expertise required to accurately report scientific results.
Instead we need to help average people understand that when the news says "Scientists say drinking red wine is healthy!" that No actual scientist ever said that!
Instead, the journalist writing the segment cribbed notes from the University's PR page about the study, which was also written by someone with zero science background, and in fact almost always has a marketing background.
Oh, and those PR releases are outright stating false things that the actual scientific paper doesn't even discuss like half the time.
And now you have decades of people insisting that nutrition science is awful, even though nobody in academia or science is saying any of the things the average person thinks they have.
You might like the BBC's Science in action https://www.bbc.co.uk/programmes/p002vsnb/episodes/downloads - the presenter Roland Pease has a scientific background and each episode he interviews the actual researchers on several interesting findings/publications for that week.
I think science journalists often have some science background, and university press releases are in my experience collaborations between the scientist and the PR person.
That's the problem when you have people responsible for news having the business model of ad-sposored entertainment and not fact reporting.
The conclusion I’m coming to is that this, too, is an exponent of an inherent weakness: we like to be told stories. As children we like to be told stories and this remains. We now have a complete spectrum of story-telling: science fiction, fiction, non-fiction, …, and news. News reporting itself is a continuum, with tabloid gossip on one side and political coverage on the other. We have so much stories to choose from.
In Western Europe a lot of the media isn't ad-sponsored.
Just "bad-sponsored"
I live in France, and saying that 90% of the media are ad-sponsored is a conservative estimate.
So I don't know which part of western Europe you're talking about, but it clearly doesn't apply to all of it.
This is the danger with non-empirical government funded “science”. Empiricism allows verification. Private funding means that things that matter get studied.
If you trace the oft-cited claim from “studies” that cats kill so and so animals a year you will find it’s just someone’s Fermi estimate. And you’ll find that the person has a personal distaste for housecats.
The idea that private funding means that what is studied matters is highly dubious at best. Most private research pertains to application and product development, and often relies on fundamental discoveries coming from academic labs. Only academia can afford to let scientists run loose. It certainly results in a good amount of inapplicable theories, but is nevertheless very much essential.
I doubt gender studies is essential.
Could one create a proof of pseudo-science, by injecting a faked fundamental corner stone paper, that becomes proof by inheritance that a full field is rotten?
Also why does this remind me of european politicans, claiming everyone wants to life european lifes, meanwhile whole countries goto war and atrocities without big counter-demonstrations by those western valued citizens .. narrative glider guns going ad absurdum..
Even a broken clock is right twice a day. I can publish a paper with results I think are wrong, but it's entirely possible that follow up studies confirm it was accidentally correct. While this is improbable in general, if we restrict ourselves to publishing claims that are sufficiently plausible that people in the field would accept them as a corner stone paper, then there is a very decent chance it sounds plausible because it's true. Even if I test the claim myself before publishing to confirm it's false, I may have made an error.
Possibly repeatedly publishing bogus papers in a certain manner might be able to confidently weed out poor academic hygeine, but it's not a trivial thing.
> Could one create a proof of pseudo-science, by injecting a faked fundamental corner stone paper, that becomes proof by inheritance that a full field is rotten?
I don't think there's any doubt that pseudoscience exists, even amongst the most optimistic of scientists.
The problem is identifying what is bad science vs what is good. The fact that I can send this from a small phone from a parking lot is proof that someone did good science at some point in time. Or that I've seen therapies like CBT turn someone who was struggling mentally on a daily basis to thrive, or that I've seen valve replacement surgery give someone years of great life after a "six months to live diagnosis" -- all show that there is good science.
I think we almost need a "discipline" of people who validate scientific results, and people should be held accountable for results that validate or don't.
Peer review is great. It's not a farce (I've gotten some incredible feedback on papers from it), but it is also extremely limited by design. We need more.
Besides the sociological problems listed, we must always be conscious of how counterintuitive and difficult statistical inference itself can be. Good things to search for are statistical fallacies, probabilistic paradoxes and books like Counterexamples in Probability.
And it is not sufficient to read about them once or twice; researchers who use statistical inference regularly must revisit these caveats at least as regularly.
Myself, I have taught Probability and Statistics many times, discussed and dispelled many misconceptions by students. Would I be 100% sure I will not be caught up in a fallacy while informally thinking about probability? I wouldn't even be 10% sure; any intuition I conjure up, I would triple check as rigorously as possible.
On page 11 there is a mention of taking the result of self reporting (surveys) at their word. I’ve wondered about this issue not just in science but other situations. For example political polling, data point in time surveys, census, etc. Without verification, what good is the data? And yet you often see such self reported data quoted by articles or papers as if it were factual.
Pollsters know about this, and the rule of thumb is that you can reliably ask people of what they currently do routinely, almost daily. E.g. if you ask them who they voted 10 years ago, they might remember it wrong. If you ask what they're gonna do, this is completely unreliable.
In political polls, this is the only thing you can do. I guess, they have corrections for the real attendance of social/age strata.
In deep interviews, the sociologists are more interested in how people explain their world picture and what they point at, and how react to questions, than to what they tell about themselves.
For many types of data you'd like, there is no alternative. For example for political polls, especially early ones where you're asking people who they would vote for if so and so were running, what other tool could you use? Or how would you verify the results?
Of course, given the weakness of the data, you have to remember to place very little trust in the results. Even so, "X% of people say they would vote for Y" can be an interesting information in itself, even if those people were lying.
Polls may or may not be carried out in a neutral way. A good many are deliberately biased, even more have known flaws that are glossed over. A humorous take on this I love (from the 80s BBC comedy Yes Minister, which is often scarily realistic):
Oh, absolutely, even in more subtle, unintentional ways (order of responses, whether you offer a neutral option or not, etc). But that doesn't mean polling is entirely useless, or that you need to verify the responses in the way that the original seemed to suggest.
There are multiple levels to data. If 57% of respondents to a self reported survey say they are going to do X, it doesn't mean that 57% of people are actually going to do X. But if you do the same survey again and now only 43% of respondents say they will do X, then that is clear evidence something has clearly changed, even if you don't know exactly what effect that something will have on X. That is very useful.
The problem is when people only look at self reporting. For example if in the previous scenario 57% responded X but only 54% actually did X, then someone might naively assume that after 43% respond X that 40% will actually do X. Or just as naively they could say 54% will still do X. Or they could apply any of an infinite number of other models. There exists a model that will spit out any given answer for any given input, so without that followup work to actually verify and understand the underlying mechanism, models are worthless.
It's like the uncertainty principle: even the act of asking someone a question will likely change their views.
> For example political polling, data point in time surveys, census, etc. Without verification, what good is the data?
Political polls are regularly wrong https://www.bbc.co.uk/news/articles/cj4ve004llxo
In my country, supporters of one political party seem a lot more likely to tell polling companies they're "undecided" - so the pollsters try to adjust their predictions, based on what they've seen in previous elections.
But the problem is getting the adjustment right is very difficult. The political landscape can change a great deal between elections. If John McCain and Donald Trump appeal to very different sets of voters, why should voter behaviour observed with McCain hold true with Trump?
It is a proxy.