Just sharing an anecdote:
I was searching for a quote that I'd heard in an audiobook the other day. I just had the general paraphrase, and didn't feel like scanning through the chapters to go find it. This was a somewhat obscure source.
Google had just straight garbage for me. The quote was political in nature, and I felt like the results were fighting general tone-policing filters and were tuned for recent events.
o1 on the other hand, found the author of the quote, summarized the general idea of what i might be searching for and then cited potential sources.
It's just patently obvious to me that google has failed in delivering the core value prop of their product, they're begging to be replaced.
A week ago, I was looking for the name of a semi obscure late 90ies computer game. Google could not help me given the description i gave
Described it to ChatGPT, and we had a back and forth where I explained why the games it suggested me weren’t right, it eventually found the correct game (Nocturne) and was able to explain to me that half of the things I remembered from the game were just plain wrong, and that’s why Google couldn’t find it
ChatGPT helping me with my hallucinations, go figure
I've had a conversation almost exactly like this too, about an obscure dos based 3d fighting game. And some pessimistic instinct tells me I should worry, like this capability will be optimized away in future versions. It's like that precise pang of satisfaction is tied to a jaded feeling that I can't trust it to last if it ultimately depends on infrastructure and incentives that will lead openai toward eating the world like Google did.
Getting ahead of myself to be sure, Google absolutely deserves to be stomped, so for now I guess we just ride out this wave.
Edit: I see elsewhere that others are converging on this idea and expressing it more clearly, namely that we may be in a honeymoon period.
> And some pessimistic instinct tells me I should worry, like this capability will be optimized away in future versions.
A big difference here is that oprn source models are available, effective and checkpointed at this point in time!
Worst case scenario, we stop getting new open source models and are forced to query new, suboptimal models only for recent information.
Let me guess: One Must Fall 2097 or FX Fighter?
I loved One Must Fall, but it's not that and it was a lot more similar to FX fighter, but older and more grainy. I don't think I ever actually found out what it was.
One must fall was AMAZING! It had so much character and satisfying feedback in it!
(Or this is how I remember it)
No, it was beyond amazing. Both the pilots and robots had a distinct character and a history behind them, and some combinations just seemed a better git than others. Like I can't imagine the kickboxer Milano piloting anything else besides a Jaguar or maybe a Shadow. I also liked how after every match, you got a news report with highlights of your match, with "screenshots" showing key moments - not sure how they managed to do that in an old DOS game - nothing short of genius IMO. I can't think of any other game which has this feature.
The manual included with the game was also a very interesting and fun read - full of humor and quips about the game devs - even including a humorous piece on the main developer's baby daughter. Reading the manual made the game and the game devs feel so much more alive. I haven't played OMF in decades, yet I still remember most of their names - Rob Elam, Ryan Elam, the genius music composer Kenny Chou[1] and not to forget the baby, Bethany Kay Elam. Till date I wonder what she's is up to, and whether she's gotten over her habit of slobbering over the keyboard...
Personally, for me, OMF2097 was the definitive fighting game of all time. It's a pity that almost no one ever thinks of it, it's always either Tekken, Street Fighter, or MK. All great games, mind you, but they're nothing like OMF.
[1] Kenny Chou uploaded a remake of the OMF theme song a while ago, in case you weren't aware: https://www.youtube.com/watch?v=UvlVaQl7kEk
This is great, never thought to use ChatGPT for this use. I have been trying to remember a game from extremely early memories where I was able to play it for a single weekend at an uncle's house in my youth when my parents were out of town.
Bugged me on and off for the better part of a decade and I couldn't figure it out from describing it to google - the only thing I knew for sure were a few graphical UI screens seared into my brain, the rest was far too generic to really narrow things down.
In the end ChatGPT got it correct in the first try from a minimal description, confirmed by watching a youtube playthrough and the memories coming back immediately.
Masters of Orion if anyone was curious :)
I must now point out how, paralleling this article, MoO 1 is known for its elegant simplicity compared to most of the game of the 4X genre it created / popularized.
masters of orion was fantastic!
MoO 2 is considered the best, III was a bit odd, the new one isn't bad apparently, gonna try it out.
Thanks for bringing that up.
I dumped thousands of hours into Moo 2. Great game. I haven't touched it in years.
I have tried the same, and had ChatGPT utterly fail.
> me: what's the relatively recent movie with a pig on the movie poster
> gpt: The movie you're referring to is likely "Pig," a 2021 American drama film. It stars Nicolas Cage as a truffle hunter who lives alone in the Oregon wilderness. He embarks on a quest to find his beloved foraging pig after she is kidnapped. The film was directed by Michael Sarnoski .
> me: no, that movie does not have a pig on the poster
https://en.wikipedia.org/wiki/Pig_(2021_film)#/media/File:Pi...
> gpt: My mistake, let's correct that. Given your description, another film that fits the criteria of having a pig on its movie poster and is relatively recent is "The Menu" (2022). This film features a darkly comedic and thrilling story set against the backdrop of an exclusive restaurant on a remote island, where a young couple experiences a very unusual dining experience. The pig on the poster is part of the film's intriguing promotional material, hinting at the film's critique of culinary culture and excess.
https://en.wikipedia.org/wiki/The_Menu_(2022_film)#/media/Fi...
> me: No, that move does not have a pig poster
> gpt: It seems I misunderstood your initial request. After a more detailed search, I couldn't find a recent movie specifically featuring a pig prominently on its movie poster beyond the titles we've discussed. Could there be any other details about the movie you're thinking of that might help narrow it down? For instance, any specific elements of the plot, actors involved, or even the genre might provide more clues to identify the correct film.
movie I was looking for was "The Hunt": https://en.wikipedia.org/wiki/The_Hunt_%282020_film%29#/medi...
Inb4 we all tell you you're using it wrong. There are certainly better ways to do this, but this is really the major thing that drives me nuts. No matter how many times I tell some LLM I want it to ask clarifying questions to provide better answers, it just won't. You end up doing exactly this, guessing what information might trigger the right recall.
I've found that there are often different trigger phrases I can use to get an LLM to "change its mind".
For Llama, I can just say "Are you sure?" and it will change its tune (unless it's quite "certain" about the results).
Qwen is more insistent, but will change course if I say "I looked it up and it says you're wrong".
if we formalize what might trigger the recall we could keep a mental model of it, maybe call it bangs, something like !wikipedia would only index data sourced from training on wikipedia..you might be on to something..
> No matter how many times I tell some LLM I want it to ask clarifying questions to provide better answers, it just won't
You're telling me there's a modern frontier model that refused to ask clarifying questions after you told it to?
I can't tell if you're being sarcastic, but yes, when it asks questions they are trivial disambiguation of mainly language, and never information seeking like you might expect when doing more than superficial investigation
If you can't tell if I'm being sarcastic about an LLM asking clarifying questions, you are possibly not great at using LLMs.
Prompting isn't necessarily the career some people wanted was sold as, but it's not a bad idea to practice a bit and build a sense of what a clear and effective prompt looks like.
-
To be clear, I get telling people it's a "you" problem every time an issue with LLMs comes up isn't helpful... but sometimes the disconnect between someone's claimed experience, and what little most people actually can agree LLMs are capable of is so great that it must be PEBCAK.
I just tried the original checkpoint of GPT 3.5 Turbo and it was able to handle drilling up and down in specificity as appropriate with the prompt: "I need you to help me remember a movie. Ask clarifying questions as we go along."
I think it depends a lot on what llm and tooling you have access to.
I had great results prototyping agents at work for specific tasks and answering in a specific style including asking appropriate clarification questions.
But at home with just free/local options? I've nowhere near the same settings to play with and only had very mixed results. I couldn't get most models to follow simple instructions at all.
Nobody has more PEBKAC problems than me. It might be my experience is colored by older models. I can give it another shot. But I do use LLMs quite a lot and got decent and surprising functionality out of them in the past. I was just consistently vexed by this one thing when doing information seeking search -like activity.
The recent web search integration actually made ChatGPT much worse at this specific kind of search (I want a piece of media, but I only remember characteristics of it)
If you explicitly tell it not to search, it's less likely to anchor itself in trying to search the web and going down rabbit holes.
It also more likely to pause at times and ask the right questions to nudge.
Agreed, it has been much more valuable to me for it to rely on it's training than it's web searches.
Ha ha, you should see what Google returns when you ask it for the recent movie with a pig on the movie poster.
(Hint: it is a film from 2021 starring Nicolas Cage.)
I've done the same thing with movies quite a few times now. I'd mention a scene or two from the movie and approximately what decade it came out in and some of the themes of the movie and after a couple of back and forths it was able to return the name that I had forgotten.
Wow! A million thanks goes to you! I never thought of using ChatGPT to find an old book I had 30 years ago! I just found it! Epic! I use ChatGPT but never tried this before! Thank you, thank you thank you!
There’s so much low effort, low hanging fruit that Google could do to improve results, that I just assume they've realized those things are unprofitable for them.
I mean just de-ranking any article with an affiliate link alone would skyrocket the relevance of what content you surface on search.
The problem for Google is, they’re incentivized to make the results worse than the SERP ads. If the organic results are too good, nobody would ever click the ads. And they basically gutted the 3rd party Adsense ecosystem, so they no longer monetize off third party sites. That to me, was the dumbest decision in company history — basically leading to the dilemma they’re in now. They squeezed all the profit in the short term while killing the open web in the long term.
You can’t have a product that purports to help people search the open web…while simultaneously trying to sabotage people from organically clicking on the open web. It’s pure idiocy.
The answer is staring them right in the face with Youtube, who faces zero threat to their dominance. Turns out if you just surface the best stuff and rev-share the ad money with the content creators (like they used to do with blogs via Adsense) then 1) the creators keep producing good stuff 2) the product stays useful and 3) importantly the monopoly profits continue minting!
It's intentional.
https://www.wheresyoured.at/the-men-who-killed-google/
Traditional search still works. Ask anyone who uses kagi. Google makes more money (for now) with garbage search, so they're optimizing for garbage. Thing is, garbage in, garage out, and it will eventually catch up with them (might already have given the disparity between Gemini and Claude/cgpt)
As an aside, I tried to love ddg before using kagi but I'd always revert to google with !g and end up not finding anything because google has sucked for the past 5 years
I've been using kagi and I've noticed that the only time I ever switch to google is for things that are truly local to where I am at the moment (restaurants, etc..).
Kagi is really great.
I've also been testing perplexity and I'm not that convinced yet. The only thing it's been decent at is finding relevant studies.
The thing I don’t like about Kagi is how you have to be logged in with an email to search.
I wish they’d setup something like Mullvad where you get an account number, pay for said account, and that’s it.
What's the difference? It's still a permanent ID tied to your account and linked to your payment. Just generate a random email address if that matters.
Kagi and mullvad make the same claims to privacy and non-tracking of accounts. What's the difference between a random numeric and a random alphanumeric account identifier?
Edit: actually I just remembered that my android devices are linked to my Kagi account with a token. Kagi gives you an alphanumeric token you can use in your query URL to authenticate searches. That's how my homescreen search widgets are set up, no email required.
Last I looked, it's not a verified email, you can use it as a random id, and they even documented that. (Maybe it changed?)
They have paid search, too, which might filter (or mark) the garbage. It required knowing how more of their stack works than I wanted to learn. I couldn’t find any docs to fix the problem I had. So, I ended up using SerpStack (IIRC) with custom rendering which worked well.
That said, is anyone using the Google Search API to try to cut out a lot of the junk the author is talking about? How well has that worked?
I cringe whenever I hear the "it's SEO that ruined the search results" excuse for Google's incompetence/negligence.
It's more IMO the logical conclusion when you have a company that was built by engineers to solve problems than then changed out all their leadership over time to business executives who don't do anything besides line-go-up.
We did some Christmas quiz and as asked ChatGPT one of the answers, as the magazine had missed printing some of the answers by accident.
It was to do with a song used as a theme to a sitcom, and it changed the name of the song in the first answer, then it changed the name of the song and the sitcom on the second, and I forget what the third one did. Then I Googled it and got my answer straight away.
That's the biggest issue with LLM searching. I cannot trust the answers. Search engines aren't much better, but at least I can check different sources. And to be fair search engines where never meant to be answer machines.
Piling on, but I wanted to find a song I had once heard when I was a kid. I did not know the exact lyrics, just that it was woman talking about her husband leaving for space. I had tried to find the song using regular search engines multiple times over the years. Eerily (a little over a year ago, using GPT3.5), I was able to find the song within about 15 minutes of prompting with various pieces of information including tone, topic, and rough year I heard the song. (The song was "Clouds across the moon" from the Rah band.)
That's where I'm at too, I wanted to find the origin a quote and couldn't remember the exact words. Google was just pulling up random shit related to the words in the query and nothing related to the quote itself. ChatGPT figured it out in a minute. Same goes for looking stuff up in books, just ask what chapter something happens in.
Interesting that worked. My experience with what I consider a similar type thing has been dismal. Perhaps not similar enough?
One thing I wish either was better at is finding things from fuzzy or minimal descriptions. My memory is pretty bad but I often want to find a movie I'd watched in the past.
I often remember little bits and pieces or themes from a movie or show and curious what it was from, and both Google and ChatGPT are absolutely terrible at this.
Compared to say, r/tipofmytongue, which is absolutely fantastic at such things. Sure, it's AI vs humans, but the difference in ability between the two on such queries is pretty staggering.
Same experience for me for a small plastic car part. Tried to find it via Google using multiple queries but nothing worked, ended up at generic SEO car part sites trying to sell me anything but the part I needed.
Asked Bing Copilot by describing the thing I wanted and got the exact part number and a link to a schematic on how it fits. Ordered it for just a few euros.
But recently it looks like Microsoft has been dialing back the AI answers on Bing. Maybe it got too expensive?
I didn't paraphrase, I put the exact quote[1] I had copied earlier from goodreads in google and got garbage.
I use only local LLM's, So if anyone has Open AI's search tools give it a shot.
[1] "Few hundred years of Western society that we have lost the ability to memorise vast amounts of information."
― Lynne Kelly, Memory Craft: Improve your memory using the most powerful methods from around the world
Open AI fails here:
https://chatgpt.com/share/6770c547-2f90-8004-ba41-21bfa4d3a7...
Curious about your local LLM usage -- do you have that documented, or can you recommend sources on how to get started in that domain? I self host most of my infrastructure, but not LLMs so far. Do you need special hardware? How do you interact with the LLMs? How to you keep them updated? Do you fine tune/do any training, or just of the shelf llama? Do you need to know a bunch about quantization? How fast are the responses? Can you use them in your IDE as a coding assistant? How is resource utilization?
Update:
I have documented how I run local LLMs here - https://abishekmuthian.com/how-i-run-llms-locally/ .
I will document it and share with you.
I tried with OpenAI’s o1 model and it came up with the right answer on the second try: https://chatgpt.com/share/6774a930-1e44-8013-8591-424c3b58d8...
Very interesting, Does o1 have a cutoff date for knowlege or does it index HN in realtime?
Interestingly:
- Bing: Goodreads is third result
- DDG: Paywalled NZ Herald article appears to be the only result.
Of course what you've really uncovered is that Google search is the only thing that respects robots.txt: https://www.goodreads.com/robots.txt
No, you've uncovered that.
I gave a thought on whether this could be some copyright issue with Amazon but didn't bother to check the robots.txt file, nicely done!
The robots.txt does not prevent Google from scraping the quote page, which is the one that matches for me:
https://www.goodreads.com/quotes/11541816-few-hundred-years-...
The robots.txt seems to exclude things which are genuinely useful to exclude, like RSS feeds.
While true, it seems like every individual quote page also includes:
<meta content='noindex' name='robots'>
Worth noting as well, their sitemaps all 404.Please refer to the thoughts of Google's former CEO: AI can create deadly viruses!
I couldn’t remember the phrase “oral history” and Google was terrible at figuring out what I wanted to search for, but ChatGpt got it for me after a few back and forths.
I had a similar experience last week. Searched and searched for a blurb from Middlemarch, without luck. o1 returned exactly what I was hoping for in one fell swoop.
Google can't even find exact matches most of the time and its heuristic matches are just not as good as the transformers.
> google has failed in delivering the core value prop of their product
They have long since been riding the downhill slope of the enshittification curve, so the 'core value prop' is advertising now, which I submit they've been delivering.
Google isn't for you or me, it's for Google.
Doesn’t Gemini perform just as well?
Google enshitified one product to require more money for the new one, while delivering zero added value overall.
I would say, well done.
SEO ruined the web, guided by Google's ranking algorithm.
Things will get even worse as scammy companies start flooding the web with LLM generated content pushing their products to bias LLMs to increase the probability of outputing their name for keywords related to their business.
Libraries and librarians are starting to seem very relevant again. As are journalistic institutions.
Journalistic institutions have been requiring so much fact-checking, cross referencing and research lately it's a full time job to get informed.
Whenever I read or hear anything from the medias now, I'm now always asking myself "what are their political inclinations? who is owning them ? what do they want me to believe? how much of a blind spot do they got ? how lazy or ignorant they are in that context ? etc."
They killed the trust I had in them so many times I can't get any the benefit of the doubt anymore.
It's exhausting.
What I was taught is this is just the labor of being critical, or just "having a critical mind about things." I can maybe see how it is exhausting, but I am not sure I understand the implication that it could be better or different. If it is particularly exhausting to you, it is perfectly fine to suspend your judgement about certain things!
It could be better and different - trust. Being critical is not the same thing as not trusting anyone at all. Media has by and large become not worthy of trusting at all. There are exceptions, but they are few and far between.
The economics of just giving the news with little bias just aren't there anymore.
If running a marathon is not exhausting to you, I don't think expecting the rest of the world to feel fresh after it is the right way to see the world.
Except given the noise/signal ratio and the sheer mass of information we have today, the workload is much higher than training for a 42 km run.
That's not new, it's always been the case.
The signal/noise ratio is getting lower and lower.
News is leaning more and more into entertainment.
You did have all of this before, but 24h news channel with empty content are reaching new magnitude, fox news types of outlet are getting bolder and bolder, manufacturing facts is now automated and mass-produced, consequences for scandals are at an all time low, concentration of power at an all time high, etc.
It was bad.
It is getting worse.
I don't have a baseline (though can think of a few places I might look...)[1], but I do have some recent data based on a project I've been working on.
There's a simplified page for CNN news at <https://lite.cnn.com>.
I've found that frustrating as all the stories are jumbled together with little rhyme or reason (though they seem to be roughly date-ordered).
Ironically, the story URLs themselves include both date and news-section coding, as with:
https://lite.cnn.com/2024/12/28/us/patrick-thomas-egan-accused-tv-reporter-attack/index.html
That's a US story dated 2024-12-28.It's possible to extract these and write a restructured page grouped by subject, which I've recently done. One work product is an archive of downloaded front-page views, which I've collected over about the past 5 days. Extracting unique news URLs from that and counting by classification we get a sense of what CNN considers "news":
Stories: 486
Sections: 27
76 (15.64%) US News
67 (13.79%) US Politics
9 (1.85%) World
8 (1.65%) World -- Americas
6 (1.23%) World -- Africa
15 (3.09%) World -- Asia
4 (0.82%) World -- Australia
5 (1.03%) World -- China
2 (0.41%) World -- India
37 (7.61%) World -- Europe
21 (4.32%) World -- MidEast
2 (0.41%) World -- UK
8 (1.65%) Economy
45 (9.26%) Business
4 (0.82%) Tech
3 (0.62%) Investing
8 (1.65%) Media
8 (1.65%) Science
7 (1.44%) Weather
4 (0.82%) Climate
22 (4.53%) Health
2 (0.41%) Food
1 (0.21%) Homes
39 (8.02%) Entertainment
52 (10.70%) Sport
22 (4.53%) Travel
9 (1.85%) Style
The ordering here is how I display sections within the rendered page, by my own assigned significance.One element which had inspired this was that so much of CNN's "news" seemed entertainment-related. That's not just "Entertainment", but also much of Health, Food, Homes, Sport, Travel, and Style, which are collectively 147 of 486 stories, or about 1/3 of the total.
Further, much if not most of the "US-News" category is ... relatively mundane crime coverage. It's attention-grabbing, but not particularly significant. Stories in other sections (politics, business, investing, media) can also be markedly trivial.
Ballparking half of US news as non-trivial crime, at best about 60% of the headlines are what I'd consider to be actual journalistic news, and probably less than that.
On the one hand, I now have a tool which gives me a far more organised view of CNN headlines. On the other ... the actual content isn't especially significant.
I'm looking at similar tools for other news sites, though I'm limited to those which will serve JS-free content. Many sites have exceedingly complex page layouts, and some (e.g., the Financial Times don't encode date or section clearly in the story URLs themselves, e.g.:
https://www.ft.com/content/d85f3f2d-9e9d-4d92-a851-64480e56a248
That's a presently current story "Putin apologises to Azerbaijan for Kazakhstan air crash", classified as "Aviation accidents and safety".-------------------------------
Notes:
1. For those interested, most readily accessed and parsed, the Vanderbilt TV News Archive (<https://tvnews.vanderbilt.edu/>), which has rundowns of US natinoal news beginning 5 August 1968, to present (ABC, CBS, and NBC from inception, with CNN since 1995 and Fox News since 2004). It's not the most rigorous archive, but it's one that could probably be analysed more reasonably than others.
Newspapers and other media have always had a political slant. But the more respected media have maintained rough factual accuracy because it enhances their impact and so their political slant.
What's happened is that the income of media outlets has declined to the point that most can't get factual accuracy even if they want it.
I'm not sure that's true. I think that media has always had some inevitable inaccuracy, but it's only been in the past 20-30 years that people have had enough information to see that inaccuracy. Back when there were a dozen newspapers on the newsstand and 3 TV channels, there simply wasn't anywhere to see any information outside the mainstream media. This wasn't necessarily malicious or intentional; it was simply a reflection of culture and the type of people who worked in newsrooms. With the invention of the Internet anyone could easily find alternative sources of information. Sometimes those sources were more accurate than the mainstream, sometimes less. Nowadays there isn't a "mainstream" of media because there's so many sources, and the group labelled as "the mainstream media" is simply a group with similar biases.
Or to put it another way, the media's accuracy rate has stayed consistent at some value less than 100%, but if all three TV channels reported the same information then it looked like they had 100% accuracy. Once there were more sources of information then it became apparent that the media's accuracy was less than 100% despite their protests to the contrary.
The result is that the media landscape is fractured. A person can live in a bubble where all of their news sources (eg NYT, WaPo, and Bluesky for one bubble; Fox, Newsmax, and Truth Social for another bubble) all report the same information, making their accuracy appear to be 100%, while any single source of information outside the bubble that disagrees with the bubble is disagreeing with a bunch of apparently 100% accurate sources and so can safely be discarded.
The solution is to realize that no source is 100% accurate or unbiased even despite genuine efforts to be. That isn't to say that some sources aren't more accurate or unbiased than others, but you should apply some base level of skepticism to any and every source
Your claim that media outlets are no longer factual because they can't afford paying to be factual seems specious. They often make egregious errors that take a 5 minute Google to correct.
Instead of facts being unaffordable, it seems that lies and bias simply pay more (or at least the media outlets seem to think so).
Libraries are booming but as gathering spots and place for people to get wifi to ... consume the web. Books remain but the selection is quite sparse.
And journalism has been gutted, more gutted than is obvious. Especially, with mainstream journalists having few "feet on the ground" a lot can sneak by (what happened in East Palestine, for example, can be found on Youtube's Status Coup new but not the mainstream).
What a relief then that those are all healthy, well-supported organizations with bright futures.
It's not a coincidence that the solution to this problem is exactly the organizations that are being systematically undermined and dismantled.
They aren't being undermined and dismantled, they're dying of the same cancer search is, and one they contracted voluntarily: advertising.
My local library doesn't have ads posted all over the place, or anywhere for that matter.
Libraries are?
I meant journalistic institutions.
Libraries themselves are a special thing - they've been invented before intellectual property was a thing, and have been attacked by IP proponents ever since. In this way, they're very much like LLMs, and many of the arguments against LLMs trained on copyrighted material apply just as much to public libraries. Oh the irony.
Wow, what a connection you've made here. I never made this connection, but you're right.
I've occasionally thought to myself "once an idea is created it wants to be free," but in the back of my mind I've always had some sympathy for artist of any medium trying to profit off their talents or anyone trying to create anything and profit from their efforts.
The library comparison is notable, but a library has a limited, physical nature. They can only hold a finite amount of books, while anything digital can be replicated indefinitely.
I wouldn't even be sure about libraries... Books or whatever you are storing have to come from somewhere. And you have to regularly enough get new items in. And if these are polluted by AI generated content in various ways... Being able to pick real things from fake is nearly impossible outside very specialised areas where you have gone to primary sources. And even then just look at science. Already buckling with various issues.
Unfortunately, many of the “journalistic” institutions are owned by large corporations who aren’t going to “speak truth to power” in fear of retribution.
We just saw this with ABC News’s settlement with Trump because its owner Disney wanted to stay in his good graces.
We also saw this with Bezos owner Washington Post
Let's be honest: the major journalistic outlets only "speak truth to power" when it means they get to criticize their outgroup. Which means any time Republicans have power, they're falling over themselves to speak up. But when Democrats have power, they are conspicuously silent. Time and time again this happens, and they have completely undermined their own credibility by doing so.
You forgot LA Times oligarch, Patrick Soon-Shiong.
A library is similar to search engine - it can't have (display) all possible items (results), so there is also a bias for selection.
It's not easy for a truly creative, new and unique content to get into your local library.
I would argue that advertising ruined the web. SEO for sites selling real products only goes so far. People are often searching for information, and monetizing that activity through advertising is what caused the disaster of low quality content flooding the web today. I'm not saying things would be perfect without advertising, just much better than they are now.
Advertising ruined the UX of Google’s search page, but I would argue the exact opposite when it comes to the web itself.
The real thing that ruined the open web and viability of search was, ironically, when Google killed display advertising by cutting Adsense payouts to near zero.
Now publishers monetize via the much more sinister “affiliate” marketing. You know, when you search for “Best [X]” and get assaulted with 1,000 listicles packed with affiliate links for junk the author has never even seen in person.
At least in the old system, you knew that an ad was an ad! Now the content itself is corrupted to the core.
This. If Google kept at “Pages must be short and provide straight answers”, then we’d have much better search results today.
Google is machine-gunning its foot since 2021, it’s really unclear to me whether they’re killing their baby just to make the job harder for competitors or something. For now… I open the Google Search results with a machete, and often don’t find any answer.
Talk about severing your own foot to avoid gangrene.
It's fascinating to me that Google didn't yet crack the actual discovery of websites and information. Google is constrained to 10 search results by design because majority of people won't ever go to the second search results page. So basically they have to figure out how to put as much useful information and links on the first page of search results. Btw I think we need web directories now more than ever.
> If Google kept at “Pages must be short and provide straight answers”, then we’d have much better search results today.
I disagree. Any prescription for what the ranking should be that isn't simply the most relevant result is a worse ranking.
I don't care if the top search result is the fastest, leanest, shortest, straightest, most adless, most equitable answer to my query if it's not the best answer to my query. I'll take the slowest loading, most verbose, popup ridden, mobile-unfriendly site if it's the one that has what I asked for.
Trying to add weights for things other than relevance is probably exactly where Google started going wrong. And then when it turned out badly, people propose yet more weights beyond relevance to fix the problem of irrelevance?
Why ascribe to malice, what can be explained by ineptitude?
I just don't think Google cares enough about the web as a whole to make strategic decisions for content quality in aggregate.
Sure it cares about geeky nuances and standards (e.g. page structure / load times), but Pichai isn't considering the impact on web content quality when debating an algorithm change or feature.
If Google continues driving web quality off the cliff? Well, the business KPIs stayed green.
> I just don't think Google cares enough about the web as a whole to make strategic decisions for content quality in aggregate.
The only thing they care about is ad revenue. Google created Chrome which vastly improved browser user experience. Google is a major participant in web standard & JavaScript language evolution, among other work. That's all true, but not necessarily because they "care about the web", but rather it helps their ad business. If people put the entire world's information on websites, and people spend more time in browsers, Google ends up earning more money from ads.
Maybe, but Google in trying to stamp out seo spam just gravitates now to a few big company websites and shows them first because it no longer matters. Google is actively now trying to not even show you the organic results.
Even better for Google the worse the organic results are the more you need to rely on ads or some sort of ai snippet.
Don't bet on AI staying clean.
A lot of HN readers conceptualize the forces attacking the integrity of the search results as just some isolated people taking occasional potshots, and then maybe slinking away if their trick gets blocked.
It is probably a lot more accurate to visualize the SEO industry as a Dark Google. Roughly as well resourced, with many smart people working on it full time, day in, day out, with information sharing and coordination. It isn't literally one company, but this conception is probably a lot closer then the one in the heads of most people reading this. Dark Google is motivated, resourced, and smart.
And then, once I started thinking of it that way for this post, I realized that increasingly.... Google is increasingly at beck and call of Dark Google. They're increasingly the real customers of Google and the real source of money. It's why Google just seems to be getting worse and worse for us... it's because we're not the real customers any more. Dark Google rules.
And if Dark Google has not yet figured out how to scam AI... it is only a matter of time. Dark Google is where Google gets its money now. When Dark Google turns its attention to AI fully, OpenAI will be no more able to resist its economic incentives than Google did.
Can't wait for the first screenshot of someone searching for the impact of the battle of Gettysburg on the civil war and seeing the AI do its subtle best to slide an add for Coca Cola into it in some semantically bizarre manner.
I very much agree this is effectivity a 'honeymoon' period.
Expect the SEO collective to shift focus on AI if the search approach becomes profitable in a few years.
That said, given an "AI search" is estimated to be at least ten times [0] as expensive per query than traditional search, I hope you like ads.
For those hoping to see that cost to go down, training costs for improved models have instead been going up. [1]
[0] https://www.scientificamerican.com/article/what-do-googles-a...
[1] https://www.pymnts.com/artificial-intelligence-2/2024/openai...
The kids running local LLM's then have the right idea. They can stop with the last, good dataset before Sellama (and AdGPT) poisoned the well.
Pure speculation on my part, but to play along, I suspect the models you mentioned will be a cheaper or free subscription option.
And again speculating, but given Apple's new emphasis on privacy, it might rhyme quite well to (1) make that into apple intelligence or even (2) make it fully local and, as a matter of principle, not dependent on external infrastructure.
> I very much agree this is effectivity a 'honeymoon' period.
At this point I'd be much more interested to hear which "unicorn" tech company did not have such a honeymoon period which it later turned away from. This should really be the default, expected behaviour at this point.
> At this point I'd be much more interested to hear which "unicorn" tech company did not have such a honeymoon period which it later turned away from
Doctolib in France (and Italy, Germany, Netherlands) is one such example. Founded in 2013 so decent life, still as good as in the beginning for both consumers (people booking healthcare appointments) and the customers (doctors paying to use it for their appointment management). And they're only getting better, with e.g. an AI assistant in beta to take notes during appointments.
Only criticism is that they've made it increasingly hard to book a doctor appointment for people who don't have a carte vital.
How so? Is a social security number a prerequisite for booking appointments?
Expect ads to be interspersed with ChatGPT replies in the near future. It's another place to advertise.
Must go look for a patent on it, there will be one. (lol)
Dark Google isn't a new thing. Isn't even an Internet thing. It's the very same cancerous growth on society that's been ruining every single communication medium we invented, the disease that's the answer to "why X sucks?" for any X.
Advertising industry.
I wish I knew what it would take for people to stop treating the unholy blend of marketing communications as respectable occupation, and consider it for what it is: a guild of liars ruining lives and societies.
Devs always complain about advertising, but it's why devs tend to do much better comparatively to other jobs in same region. Without distribution mechanisms, there are no sales, without sales, there are no well paid jobs, and eventually no jobs at all.
Similar can be said about almost any other job, except the well-paid part.
Look at the world before advertising. Everything had to be local small scale only. So everything was very expensive, because no large-scale manufacturing effectivity and massive discounting on end product. Lot of technology isn't even possible on small scale.
Nothing worse than salty uneducated devs that think they understand how world works, just because they know how to code, but know nothing about history, economy, business and marketing.
I don't like intrusive advertising either. Nobody says that it has to be pushed to the most extreme level possible for it to maintain the benefits. Not even remotely. I also use agressive adblock, I don't react to ads in Google or YouTube, because they tend to be bad and completely non-relevant. I occasionaly react to Instagram ads though, for example. I bought products off Instagram ads and I am glad for it.
Some way to promote your product to consumers is extremely valuable and healthy to the modern world. Unless you want to separate yourself from it and become a self-sufficient small community oriented society, like Mormons.
The idea of virality and "build great product and they will come" has to be the stupidest crap devs tend to believe. It only works briefly in the early stage within the new market category. 99.9% of world'S products are in mature stage, and those beliefs will get you bankrupt. World needs advertising. New startup founders of great valuable products need advertising. You can notice that those that understand that, succeed vastly more often.
going from 0 to profitable doesn't require that much marketing. people do discuss good products and recommend them for pain points independent of being bombarded with ads for them. you can talk about your product to people who need it.
going from profitable to "we own gigantic yachts that supply our humongous yachts" aka jeff bezos money requires absurdly crafted insidious borderline evil marketing.
> And if Dark Google has not yet figured out how to scam AI... it is only a matter of time.
There is one thing the AI companies have going for them that Google doesn't: They don't have 90% market share.
That's a huge problem that only Google has, because if your black hat SEO stops working against Google, you dedicate all your efforts to fixing it until it does, and then Google has to win against the combined efforts of everyone in the world.
Now suppose there were a dozen search engines that all had around equal market share. You find something that works against the first one but that causes you to get dropped to page 10 by two of the other ones. You finally find something that works against three quarters of them... for a week... after which it still works against a third of them but doing that causes the other two thirds to drop you again. To have any chance with those now you have to stop doing that, but if you do you fall off the front page for the third it was still working against. Also, by the end of the month that third will be a quarter.
You could target only one specific search engine, but then you're doing the same amount of work for only 8% of the advantage, meanwhile that search engine can concentrate their efforts on thwarting the similarly smaller percentage of black hat SEOs who chose to target them instead of one of the competing search engines.
It's much harder to hit a dozen moving targets at once than just one.
If beating SEO was something a company with a different algorithm could do, then it would have happened by now. Google remains the default search engine because the alternatives are perhaps even more susceptible to SEO than Google, so there just isn't the motivation to use them. Why do I have to go to the second or third page to find an answer with a naive search on DDG?
> If beating SEO was something a company with a different algorithm could do, then it would have happened by now.
The reductio of this argument is that nothing can ever change or improve because if it was possible someone would have done it already.
More to the point, people who have used Kagi seem to speak favorably of it. A search engine you have to pay for is never going to be the mass market option, but it can have enough users to be self-sustaining and remain one of the options, and it proves it can be done.
I like the Dark Google metaphor, but SEO agencies being Google's real customers makes no sense to me.
Not OP, but here's how I read it: the SEO operators are driving traffic to google's ad network, where google make's its money. They aren't necessarily paying google much: the ad buyers are doing that, but they deliver the eyeballs to google's ads.
Its kind of like the US hospital system, where doctors are considered by the hospital, if not de jure then certainly de facto, to be the real customers of the hospital. Doctors don't pay the hospital much of anything -- the patients do, usually via their insurance company -- but without the doctors, no procedures happen (i.e., no "traffic.") Hospitals can't bill for room and board, nursing services, therapy, etc., where the hospital makes its operating income without the doctors, and in markets where multiple hospitals exist, doctors drive the patients to the hospital(s) of the doctor's choice. Ergo, the doctor is the "real" customer of the hospital.
The hospitals can and have adapted to get a bigger share of the revenue pie by hiring their own doctors and buying up the clinics that drive a lot of customers to hospitals in the first place, just like Google has introduced products that are more vertically integrated, but the basic dynamic still exists where they are dependent on third parties to deliver customers.
Hospitals are increasingly owned by insurance companies. The customers are not doctors but shareholders. That is why a cure is seen as a threat.
1) I was making an analogy.
2) I took at look at the most recent CMS data[0] I could find (from 2022,) and out of the top ten owners of hospitals in the United States, zero are payers. I only recognize about half of the parties in the 11-20 part of the list, but of the ones I do recognize, one is related to a payer. I can find no data to support your assertion that insurance companies are purchasing hospitals. They are purchasing physician practice groups, but that only reinforces the dynamic I described where hospitals have to court physicians to drive patients to their facilities.
0 - https://aspe.hhs.gov/sites/default/files/documents/582de65f2...
Hospitals vs. physicians groups is a distinction without a difference in this context (people responsible for making you better). It is like on paper J&J is not responsible for cancer its baby powder causes (some bankrupt entity left holding the bag).
To make sure I'm not misremembering, I've asked deepseek in web search mode: "Is it true that hospitals are increasingly owned by insurance companies in US" It says: "In summary, the ownership of hospitals by insurance companies is a growing trend in the U.S. healthcare system, driven by the desire to control costs and integrate care delivery. While this model offers potential efficiencies, it also raises significant concerns about the quality of care and the balance of power in the healthcare industry." I've looked at several links it provided and they are consistent with the conclusion. Try it for yourself.
I don't see much point in continuing this conversation. I linked you to primary source data about who owns hospitals in the United States. You responded with some AI slop with absolutely no reliable citations.
I posted an analogy about the dynamic that exists between hospitals and doctors. You responded by saying that's a distinction without a difference, when the dynamic I described is a primary difference between the two groups: doctors can practice medicine without hospitals. Hospitals cannot provide patient care without doctors.
You seem like a human, so I give you the benefit of the doubt. Try to see the forest behind the trees instead of just repeating yourself ignoring what is said.
> Try to see the forest behind the trees instead of just repeating yourself ignoring what is said.
I'm not ignoring anything. I posted an analogy about how the US hospital business has some similar incentive structures as Google's search business, in that both are dependent on third-parties to deliver revenue. You, for reasons that are not clear to me, felt the need to post an unhelpful, off-topic, and incorrect rant about the insurance system in the US backed up, apparently, by nothing more than your favorite AI hallucination engine.
I don't think you've even tried to understand what I posted or why, nor do I think you are capable of understanding why your claim is completely irrelevant to what I wrote. Guess what? Even if insurance companies purchased every single hospital in the country, those hospitals would still be dependent on the doctors to operate.
The biggest examples would be https://en.wikipedia.org/wiki/University_of_Pittsburgh_Medic... and https://en.wikipedia.org/wiki/Kaiser_Permanente.
I keep hoping my local university hospital system will offer such a thing.
I'm familiar with both organizations. 1) the OP asserted that insurance companies "increasingly" owned hospitals. UPMC and Kaiser have existed for decades. 2) together, they represent less than 2% of the hospitals in the United States.
That doesn’t make sense - a sick patient costs the shareholders money.
Setting aside the previous commenters "the cure is a threat" thing, there's some precedent for this in the US healthcare system in the form of HMOs (particularly Kaiser Permanente.) Part of that is supposed to be vertical efficiency. Part is the idea that it is possible to avoid extremely expensive acute care services with proactive low-cost primary care.
One would think so. Unless there are programs that pay you for sick patients (risk adjustment) and you control how much you are willing to spent on a patient.
I've made a mistake of consuming healthcare related content due to the recent ceo assassination. Do not recommend.
Only if you treat the patient. Cures cost money to administer. Better to just deny cover in the first place.
That's got nothing to do with cures. In that world administering a cure would be cheaper than treating a chronic illness, which is the opposite of what's being said.
But Google would do just as well (probably better) without the SEOs, no? The SEOs are just manipulating the order of search results to their own interests, but I don't see how this would benefit Google's bottom line.
SEO pages tend to be infested with ads (which google makes money on,) whereas legitimate businesses much less so. Imagine this scenario: you run a B&B on a popular tourist island and take reservations over the web. You pay google to run an ad relevant to your business. Some SEO turkey comes along and builds a ranked-up site that shows that ad, and has no other real reason to exist. Does google make more money or less money with the SEO operator in the equation? Would the original business have even bought a google ad if they could get organic traffic to their site without all the SEO spam?
Your last point is the most important: SEO forces businesses to advertise on Google, because it is so hard to appear on organic results. Without black SEO, there would be much less need for ads. It is clear that Google has a symbiotic relationship with black SEO.
Maybe one step removed, if I understand it right, the spam sites make money by offering ad space and traffic, and Google makes money by selling ad placement on those sites, so actually Google and Dark Google are on the same side of the marketplace.
In a way, it's the other way around, Google is paying the spammers for providing billboard space for their ad placement services.
The pervese incentive is that the harder it is to find what I'm looking for, the more ads I get served, hence all ad supported products trend towards becoming useless (see also: Amazon.com)
I think it's more a symbiotic relationship, they feed on each other and both benefit at the expense of the users
SEO agencies often onboard people to AdWords
Yeah I also didn’t get this. Can you explain what you meant?
"dark Google" seems like the title of a blog post I would find on HN! This is intended as a compliment, in case not clear... Add some important facts and figures (what is the revenue of dark Google, who and how many are they employing) and write it up!
> we're not the real customers any more.
Were we ever? I didn’t pay for search. I don’t pay for email.
I pay for both now (Kagi, Runbox). It’s just so much better.
I’m a Kagi customer too, and I’ve got the family on it. I don’t miss Google at all. Block lists, Wiki hit at the top. It’s great.
Yeah it’s hard to see how most HN user were the ‘real customers’ of Google at any point in time. It makes no sense to claim that.
Google isn't a search company, it is an identification company. that is its moat.
"Dark Google" is dark as in bad, but not as in obfuscated. They can be seen and observed in places like LinkedIn where they discuss their plans in the open.
But I don't think Google is part of them, or their slave. I think Google is our last line of defense.
True, it's not very effective; more like the knight in Monty Python's Sacred Grail. Still, when we lose Google, we'll have nothing else.
I think you're right, but also this doesn't contradict the article. Google reset to a cleaner state (leaving behind the advertising choked portals of the late 90s). Over time both internal and external forces have chipped aways at Google's original model.
Perhaps AI could drive a similar reset.
It's taken 20 years of a public company having to show quarter on quarter, year on year revenue increases, which has resulted in (Dark) Google squeezing dollars over quality.
I think AI companies have a bit more time to avoid a similar situation.
Or not. If the rates stay where there are or do not drop drastically there is lot sooner pressure to actually show profits. 2010s were very special economic environment.
Fun take
Disconcerting take
I like the narrative aspect of it
> Does ChatGPT Search have trust? Open AI isn't monetizing its search just yet, but AI has its own issues with hallucinations.
Everywhere where SEO people congregate, they talk only about this: how to produce content that will eventually end up in training data for LLMs, so that when you ask about anything remotely connected to a given brand, its products will show up in the response.
Ads are bad enough today, but it's possible the future will be worse: product placement in everything, everywhere, all of the time.
This reminds me of that awesome analysis of all the speech of a diplomat's visit in Asimov's original Foundation book:
> Hardin threw himself back in the chair. “You know, that's the most interesting part of the whole business. I admit that I thought his Lordship a most consummate donkey when I first met him – but it turned out that he is an accomplished diplomat and a most clever man. I took the liberty of recording all his statements.”
>... When Houk, after two days of steady work, succeeded in eliminating meaningless statements, vague gibberish, useless qualifications—in short all the goo and dribble—he found he had nothing left. Everything canceled out. Lord Dorwin, gentlemen, in five days of discussion didn't say one damned thing, and said it so that you never noticed.
I'm pretty sure that we're now at the level of AI where it's possibly to fully automate such an analysis, such that even if the original content is entirely corrupted by product placement, the AI could cut it out to leave only the valuable information, if any remains. The only question is whether the AI will be on the user's side or the advertiser's side.
What will happen is that when you ask the AI to summarize a book to remove the fluff, it will inject random mentions of how the main character decided to drink a Coke.
Let's go with truly open models! you say. That way we can be sure there are no shoddy behind-the-scenes deal going on between the model provider and some company or government.
But the ads are in the training data, they are part of the fabric of the world. You can't get rid of them except if you do the training yourself, which is a huge amount of work, and maybe impossible (because model providers escape copyright laws, and you can't).
Actually, training it yourself is probably better in terms of copyright. If you train an LLM that only you use, from books you own, I don't see how that is actually illegal. As long as you're not selling the LLM or content it generates is there any legal issue?
> If you train an LLM that only you use, from books you own, I don't see how that is actually illegal.
But it won't be functional, because Large Language Models requires far more samples than your private library, at least if you want it to do things that you couldn't solve with a much simpler deterministic search.
For example, usefully extending the document "Which book had the sarcastic one-armed antagonist?" with another sentence that happens to also be the answer.
I agree about the difficulty, but am optimistic that if enough of us wanted to achieve this, we could train it as a distributed effort, with the coordination of a non-profit like the EFF. The question is whether we care enough.
>Ads are bad enough today, but it's possible the future will be worse: product placement in everything, everywhere, all of the time.
Right, and it could be that a measurable criteria to optimize for will be the path finding from prompts to naturalistic conversations that mention products or even that reinforce consumerist thought patterns and consumerist self perception.
Suddenly your "friend" starts recommending Coke or talking about it. But your friend is the AI now because everyone is lonely and living in their Death Stranding cells.
I agree we need something radically different because it’s an open gate to partial product placement with ads generated on the fly to people taste. Awful future ahead
This is why there will be a market for personal LLMs that act as your agent, or maybe shared for a family or the like.
A lot of folks are focusing on the AI answers that Google gives, but for me the real downgrade has been the change in their algorithm a few years ago where it tries to search for what it thinks I mean instead of what I search for.
Even putting entire chunks of text in quotes isn’t enough anymore. I can never get Google to search for what I want without trying to engineer a prompt, when it could at some point.
In trying to become more helpful, it’s become worse.
Especially for programming identifiers. You put in some really tech sounding camel case identifier and it tries to split it out and match straight to celebrity trash scraped from Taboola.
The Yahooification of Googles search becomes more amusing when one considers that it is now led buy the dude who ran Yahoo search to the ground.
Prabhakar Raghavan was recently "promoted" out of that: https://www.wheresyoured.at/requiem-for-raghavan/
Given how Vivek and Elon kicked the hornets nest about this, I wonder if there will be blow-back at Yahoo. A lot of people don't realize that Yahoo basically only survives today off the back of H1b labor.
Every large tech company only survives because of H1b labor.
Not anymore. Prabhakar was promoted from Head of Search to Chief Technologist whatever that means.
> Enter 2024 with AI. The top 20% of search results are a wall of text from AI...
I'll be the contrarian here and say I actually like Google's AI Overview? For the first time in a long time, I can search for an answer to a question and, instead of getting annoying ads and SEO-optimized uselessness, I actually get an answer.
Google is finally useful again. That said, once Google screws with this and starts making search challenging again, as it has been for years, I'll go elsewhere.
But "search" and "getting an answer to a question" are two different things, aren't they? I realize that the trend has been going this way for a long time - probably since Ask Jeeves started blurring the line - and this is indeed how a lot of people try / want to use search engines, but still... I wish that Google (and competitors) would have separate pages for something like "Ask Google" vs. traditional search (where I want to find a particular document or quality content on a certain topic instead of just getting a specific answer).
May I ask how old you are? I'm 38 and I've been trying hard to break my 10 year-old of the habit of just typing questions into search engines (or telling me to "Ask Google" whenever she asks me a question and I say, "Oh, I don't know").
Yes, they very much are two different things.
I loath products like Facebook, Messenger, Google Photos, etc. are turning their traditional "search" page/feature into a one-stop AI slop shop.
All I want to do is find a specific photo album by name.
They're perfectly capable of implementing all the same search operators as 1990s Yahoo and 2000s Google. It's a solved problem.
The issue is that they don't want to. They'd rather be a middleman offering you "useful recommendations" (that they or may not sell to the highest bidder) instead of offering you value.
Agreed. So many times I have to put Wikipedia or Reddit behind my search to get anything useful out of google. So it can work. Google is clearly prioritizing junk over value
Why don't you use Kagi?
I've been using it for a while now. It is marginally better, but not exactly night and day. It seems to struggle with intent at times, and others I just get the same bland results as the free engines. The privacy is a big plus however.
Probably because it costs money and it also likely also will quickly succumb to sloppification by experimenting with their own ai and having an unstable founder…
I'm using it for about two years and i haven't seen any sloppification. I see it as a feature that it is a paid service because i hope it will be a sustainable model for them to keep it as it is. I think it's a no brainer to pay for it instead of all the suffering people describe here. The founder remark i don't get
Not to discourage you, but note it took a while before Google succumbed. Hopefully Kagi will hold out.
Right now Kagi is a better search engine than Google. Why should some eventual demise in the future discourage anyone? There is no cost of switching and you can start using it right away
> having an unstable founder
I can assure you that force is strong with me.
I'd never heard of it until I just Googled it... Is it a better experience compared to DuckDuckGo with bang operators?
Would recommend to just try their 100 free searches. Their results are good, but it’s hard to have an objective measure. For me, it’s the little features that make it worth it (and that they have a forum for feature requests, and a proper changelog).
Yes, it's great. I use it for about two years already and never had any problem. I went to search for something on Google like twice during that time.
I've been disappointed with pretty much all recent SEs (DDG being among the very worst). Having been an early Scroogle user, ixquick (startpage) and a few other ones I dearly miss, I've been using https://freespoke.com/ lately and find it tolerable.
I was using searx instances with reasonable results but many of them started failing recently.
Anyway, I hope everyone finds a good one. I fear things will only get worse though.
Are you suggesting that 2000d Google codebase would do a decent job against today's SEO?
The biggest reason SEO is profitable is because low quality sites run display ads. That is the lifeblood, and intrinsic motivation, for these sites to even exist.
Google operates the largest display ads network. They literally *pay* websites for SEO spam, and take a very healthy cut off the top.
I wish people would stop acting like Google has been in a noble battle against the spam sites, when those sites generate Google billions of dollars a year in revenue.
The obvious question is, why would they ruin search for display? The answer is greed combined with hubris. They were able to double dip for years, but they killed the golden goose.
Everybody with a brain knew this would happen when they bought Doubleclick, and it took longer than expected, but here we are.
Today's SEO isn't the reason, it's simply more profitable for Google to give you terrible search results.
It makes no financial sense for Google to give you good search results and get you off Google as soon as possible.
Instead, if they make you spend more time on Google due to having to go through more crappy results, they can sell more ads.
Most people won't change search engine and will stomach it.
Until ChatGPT happened and can save you the pain of having to use Googles search engine.
I think the search and ad code base may not be explicitly co-mingled, but they are implicitly co-mingled. And return worse search results than the early 2000s code base.
Both are explicitly related to the web at large. Google sells ads on more than two million sites, and those mostly aim to be the kind of sites that feature in search results. I'd say that the two code bases are related, by virtue of operating on the same data structure.
You remember the pagerank paper? It described how Google classifies each page on a scale from "something that links to good pages" to "informative page". Google and other search engines produces links to the latter. And since then, web site operators have had strong incentives to be on the "informative page" end of the range. Today I don't think the 2000s code can find a lot of pages of the former kind (well, outside Facebook).
It produced very nice results back then. It was/is good code, but good results need more than just good code, it needs good input too.
If they provided what you're asking for you'd leave the site and look at fewer ads.
I get very wrong and dangerous answers from AI frequently.
I just searched "what's the ld50 of caffeine" and it says:
> 367.7 mg/kg bw
This is the ld50 of rats from this paper: https://pubmed.ncbi.nlm.nih.gov/27461039/
This is higher than the ld50 estimated for humans: https://en.wikipedia.org/wiki/Caffeinism
> The LD50 of caffeine in humans is dependent on individual sensitivity, but is estimated to be 150–200 milligrams per kilogram of body mass (75–100 cups of coffee for a 70 kilogram adult).
Good stuff, Google.
Perhaps a more common question: "How many calories do men need to lose weight?"
Google AI responded: "To lose weight, men typically need to reduce their daily calorie intake by 1,500 - 1,800 calories"
Which is obviously dangerous advice.
IMO Google AI overviews should not show up for anything (a) medical or (b) numerical. LLMs just aren't safe enough yet.
I think even when the answer is "right" in some sense, it should probably come within the context of a bunch of caveats, explanations, etc.
But maybe I'm just weird. Oftentimes when my wife or kids ask me a question, I take a deep breath and start to say something like "I know what you're asking, but there's not a simple or straightforward answer; it's important to first understand ____ or define ____..." by which time they get frustrated with me.
> I think even when the answer is "right" in some sense, it should probably come within the context of a bunch of caveats, explanations, etc.
Funnily enough, this is exactly what the LLM does with these questions. So well that people usually try to tweak their prompts so they don't have to wade through additional info, context, hedging, and caveats.
So you are saying that Google should provide responses that are more likely to frustrate its users? ;)
If you’re aiming to lose weight safely the rule of thumb is 3 lbs a week. 3000kcal per pound works out to an average deficit of about 1280 calories per day. Max.
Obese people can lose a bit more under doctor supervision. My understanding is that it’s tied partially to % of body weight lost per week and partly to what your organs can process, which does not increase with body mass.
I don’t think absolute numbers are very useful here. You need around 5–10% reduction in calorie intake to get any weight-loss effect going, and I wouldn’t reduce by more than 20% (relative to weight-maintaining intake — it’s different if you’ve been seriously overeating) if you want it to be sustainable longer-term.
So for example if your weight is stable at 2500 kcal per day, I would start by reducing the intake by 250–500 kcal, but not more. If this works well for a month or two and then you want to lose weight faster, you can still reduce your intake further. You generally have to do that anyway even just to maintain the velocity, because weight loss also tends to reduce calorie expenditure.
First and foremost, you need to monitor your calorie intake against weight. Here is a useful text about that: https://www.fourmilab.ch/hackdiet/
Your body will get more efficient at whatever exercise you do to make the calories work out. So over time you’ll either have to increase your exercise or rein in the calories a bit more to achieve a sustained result.
That is assuming that you do any exercise. But yes, and the method I linked to explains how to handle any such variation in calorie expenditure regardless of its cause.
I don't see the problem with the answer, and the question is already garbage. Plus, the LLM hedges its advice with precautions.
I get a pretty good summary when I paste the question into Google. It comes up with a ballpark but also gives precautions and info on how to estimate what caloric restriction makes sense for you within the first 3 sentences.
And all in a format someone is likely to read instead of clicking on some verbose search result that only answers the question if they read a whole article which they aren't going to do.
This seems like really lame nit picking. And I don't think it passes the "compared to what?" test.
The basic problem is it says reduce "by 1,500 - 1,800" rather than "to 1,500 - 1,800" (not that that answer is much better). Yes, it's a garbage question, but the first answer is unsafe in all circumstances. The simplest solution here is to show nothing.
The question is garbage. But people will ask it with their best intentions and not know it’s garbage.
If your calorie intake is just 1500 today, it is bad advice. If your calorie surplus is 1800, it is good advice.
But I wonder, were those few words the full response? Information hiding to prove a point is too easy.
A calorie surplus of 1800/day is ~190 lbs/yr. Is that something people actually do?
Yes. You can't just say that someone eating 1800 over the recommended 2000 will perpetually gain weight. Weight maintenance calories will depend on the weight of the person.
A 500lbs man will need to consume 4000kcals/day to not lose weight. Cutting 1800 of that is realistic and might be good advice on the LLM's part, so it really depends on how GP asked the question.
Funny thing is you can train a small BERT model to detect queries that are in categories that aren’t ready for AI “answers” with like .00000001% of the energy of an LLM.
That's (obviously) a bit of an exaggeration. BERT is just another transformer architecture. Cut down from ~100 layers to 1, ~1k dimensions to ~10, and ~10k tokens to 100, and you're only 1e6 faster / more efficient, still a factor of 10k greater than your estimate and also too small to handle the detection you're describing with any reasonable degree of accuracy.
I literally have DistilBERT models that can do this exact task in ~14ms on an NVIDIA A6000. I don’t know the precise performance per watt, but it’s really fucking low.
I use LLM to help with training data as they are great at zero shot, but after the training corpora is built a small, well trained, model will smoke an LLM in classification accuracy and are way faster - which means you can get scale and low carbon cost.
In my personal opinion there is a moral imperative to use the most efficient models possible at every step in a system design. LLM are one type of architecture and while they do a lot well, you can use a variety of energy efficient techniques to do discrete tasks much better.
Thanks for providing a concrete model to work with. Compared to GPT3.5, the number you're looking for is ~0.04%. I pointed out the napkin math because 0.00000001% was so obviously wrong even at a glance that it was hurting your claim.
And, yes, purpose-built models definitely have their place even with the advent of LLMs. I'm happy to see more people working on that sort of thing.
I applaud you doing the math! Proves you aren’t an LLM :-D
> Which is obviously dangerous advice.
Same advice as my trainer gives me.
Did they say:
reduce their daily calorie intake to 1,500 - 1,800 calories
or
reduce their daily calorie intake by 1,500 - 1,800 calories
These are very different answers, unless you’re consuming ~3,300 calories per day. These kinds of ‘subtle’ phrasing issue often results in AI mistake as both words are commonly used in advice but the context is really important.Oh yeah! No, reduce to not reduce by. Though at the time I was eating a few things that had high calories that I didn’t realize so it would have been the same.
Your trainer advises you to reduce your calorie intake to between 200 and 500 calories per day? [0] That sounds very, very hazardous for anything other than very short term use, and (given the body's inbuilt "starvation mode") probably counterproductive, even then.
[0] Note that the robot suggested to reduce calorie intake by 1,500->1,800 calories, and the recommended calorie intake is 2,000.
People losing weight are probably eating more than 2000 per day to begin with. But if you go from 2800 down to 1500 you’re already likely to exceed 3 lbs of weight loss per week that is recommended without doctor supervision. If you need to lose more than 150 lbs in a year because you’re well past morbid obesity then you need staff, not just a food plan.
If you’re eating out at Chilies, you could easily be eating 3000 calories per meal.
I recall when Krispy Kreme came out with a donut shake that was 1800 calories for the large size. It’s crazy out there.
It’s not even advice, and it’s not wrong.
That explains why I haven't been losing weight!
I think that would explain why you’re starving, not how you’re not losing weight.
For whatever it’s worth, in response to the same question posed by me (“what is the ld50 of caffeine”), Google’s AI properly reported it as 150-200 mg/kg.
I asked this about 1 minute after you posted your comment. Perhaps it learned of and corrected its mistake in that short span of time, perhaps it reports differently on every occasion, or perhaps it thought you were a rat :)
Perhaps Google AI reads HN at work just like us.
The median lethal dose (LD50) of caffeine in humans is estimated to be 150–200 milligrams per kilogram of body mass. However, the lethal dose can vary depending on a person's sensitivity to caffeine, and can be as low as 57 milligrams per kilogram.
Route of administration
Oral 367.7 mg/kg bw
Dermal >2000 mg/kg bw
Inhalation LC50 combined: ca. 4.94 mg/L
ref: https://i.imghippo.com/files/yeKK3113pE.png 13:25EST (by a Kagi shill ftr)That’s the danger with thinking in terms of LD50.
That’s half the people in a caffeine chugging contest falling over dead. The first 911 call would be much much earlier. I doubt you’d get to 57 mg before someone thought they were having a heart attack (angina).
I also got similar and just tried, we are posting within minutes.
--
The median lethal dose (LD50) of caffeine in humans is estimated to be 150–200 milligrams per kilogram of body mass. However, the lethal dose can vary depending on a person's sensitivity to caffeine, and can be as low as 57 milligrams per kilogram. Route of administration LD50 Oral 367.7 mg/kg bw Dermal 2000 mg/kg bw Inhalation LC50 combined: ca. 4.94 mg/L The FDA estimates that toxic effects, such as seizures, can occur after consuming around 1,200 milligrams of caffeine.
There was a table in the middle there.
LLM are non deterministic by nature.
Is this really true? The linear algebra is deterministic, although maybe there is some chaotic behavior with floating point handling. The non deterministic part mostly comes from intentionally added randomness, which can be turned off right?
Maybe the argument is that if you turn off the randomness you don’t have an LLM like result any more?
Floats are deterministic too (this winds up being helpful if you want to do something like test an algorithm on every single float); you just might get different deterministic outcomes on different compilation targets or with threaded intermediate values.
The argument is, as you suggest, that without randomness you don't have an LLM-like result any more. You _can_ use the most likely token every time, or beam search, or any number of other strategies to try to tease out an answer. Doing so gives you a completely different result distribution, and it's not even guaranteed to give a "likely" output (imagine, e.g., a string of tokens that are all 10% likely for any greedy choice, vs a different string where the first is 9% and the remainder are 90% -- with a 10-token answer the second option is 387 million times more likely with random sampling but will never happen with a simple deterministic strategy, and you can tweak the example slightly to keep beam search and similar from finding good results).
That brings up an interesting UI/UX question.
Suppose (as a simplified example) that you have a simple yes/no question and only know the answer probabilistically, something like "will it rain tomorrow" with an appropriate answer being "yes" 60% of the time and "no" 40%. Do you try to lengthen the answer to include that uncertainty? Do you respond "yes" always? 60% of the time? To 60% of the users and then deterministically for a period of time for each user to prevent flip-flopping answers?
The LD50 question is just a more complicated version of that conundrum. The model isn't quite sure. The question forces its hand a bit in terms of the classes of answers. What should its result distribution be?
Yes, that’s the main issue as ideally they wouldn’t be non-deterministic on well-established quantitative facts.
But they can never be. RAG gets you somewhere, but it’s still a pile of RNGs under a trenchcoat.
>> ideally
It’s just not possible. You can do a lot with nondeterministic systems, they have value - but oranges and apples. They need to coexist.
ideal (def. #2) = Existing only in the mind; conceptual, imaginary
https://en.m.wiktionary.org/wiki/ideal
(We’re allowed to imagine the impossible.)
Fair, I am loath to take away your dreams!
I get your point wasn't this specific example, it's perhaps not a very good example of being dangerous: Getting that much caffeine into your bloodstream takes quite a commitment, and someone who knows the term LD50 is perhaps not very likely to think it indicates what is safe to consume. It's also not something you're likely to do accidentally because you've looked it up online and decided to test it.
In the most concentrated form in typical commercial caffeine tablets, it's half to one fistful. In high-caffeine pre-workout supplements, it's still a quantity that you'd find almost impossible to get down and keep down... E.g. a large tumbler full of powder of mine with just enough water to make it a thick slurry you'd likely vomit up long before much would make it into your bloodstream...
I'm not saying it's impossible to overdose on caffeinated drinks, because some do, and you can run into health problems before that, but I don't think that error is likely to be very high on the list of dangerous advice.
Hmm my search returns “between 150 to 200 mg per kilogram”, which is maybe more correct?
Also, in what context is this dangerous? To reach dangerous levels one would have to drink well over 100 cups of coffee in a sitting, something remarkably hard to do.
> Also, in what context is this dangerous? To reach dangerous levels one would have to drink well over 100 cups of coffee in a sitting
some people use caffeine powder / pills for gym stuff apparently.
someone overdosed and died after incorrectly weighing a bunch of powder.
doubt it is a big leap to someone dying because they were told the wrong limits by google.
https://www.bbc.co.uk/news/uk-wales-60570470
as ever, machine learning is not really suitable for safety/security critical systems / use cases without additional non-ML measures. it hasn’t been in the past, and i’ve seen zero evidence recently to back up any claim that it is.
I don't doubt the news article on this, but even with caffeine pills/powder it's near half a fistful to get to LD50 judging by my caffeine tablets. It's not impossible to consume, but it'd be distinctly unpleasant long before you get even anywhere close to dangerous levels.
For my high-caffeine pre-workout powder, I suspect I'd vomit long before I'd get anywhere near. Pure caffeine is less unpleasant, but still pretty awful, which I guess is why we don't see more deaths from it despite the widespread use.
I agree with you that there really ought to be caution around giving advice on safety-critical things, but this one really is right up there in freak accident territory, in the intersection of somewhat dangerous substances sold in a poorly regulated form (e.g. there's little reason for these to be sold as bulk powders instead of pressed into pills other than making people feel more macho downing awful tasting drinks instead of taking pills).
I wonder if they’re thinking 200mg per kilo to trigger cell death. I have trouble believing a human heart surviving a dose of 50mg/kg. Half of them surviving four times that much? No. I don’t believe it.
Found an article about a teenager who died after three strong beverages. The coroner is careful to point out that this was likely an underlying medical condition not the caffeine. The health professional they interviewed claims 10g is lethal for “most” people, which would be 100-150mg/kg. That still seems like something an ER doctor would roll their eyes at.
Your example doesn't interact with the chicken littling in this thread.
> The hearing was told the scales Mr Mansfield had used to measure the powder had a weighing range from two to 5,000 grams, whereas he was attempting to weigh a recommended dose of 60-300mg.
Nothing to do with an LLM nor with someone not knowing the exact LD50 of caffeine. Just "this article contains someone dying of caffeine overdose, and we're talking about caffeine overdose here, therefore LLM is dangerous."
> some people use caffeine powder / pills for gym stuff apparently.
At 200mg per pill, which is the strongest I had, I'd still have to down some 70+ pills in one go. Not strictly impossible, but not something you could possibly do by accident, and even for the purpose of early check-out, it wouldn't be my first choice.
An accident with it in powdered form is possible - people who use them are often used to pre-workout supplements tasting awful, and so might be prepared to down it as fast as possible - but it's a big enough volume of powder that it really is a freak accident.
And if on purpose, using caffeine would just be staggeringly awful...
the problem isn’t someone’s intent (on purpose/by accident).
it’s intent (want to improve my gym performance so down a bunch of caffeine) combined with incorrect information gained from what is supposedly a trustworthy source (the limit presented is much higher than it actually is for humans).
If they're searching for LD50, they're already setting themselves up for errors, even with the right information. The LD50 isn't a safe dose, after all, but the mean lethal dose. While it's not great if its wrong, if people search for an LD50 thinking it indicates what they can safely take, it's already going to be hard to protect them against themselves.
This is why we let the pros do compounding. Slip a decimal point and you can kill yourself with many substances.
Even that seems high. I don’t feel good with 200mg per human, not per kilo. I can’t imagine drinking ten times as much and not being in the ER. A hundred times that much? No fucking way.
Yes, Google's AI chatbot confidently claimed yesterday that US passports have a fingerprinting requirement, which is absolutely not true. These things can't be trusted to emit even basic facts without somehow screwing it up and it's frankly depressing how they are worming their way into almost everything. I hope this particular hype train is derailed as soon as possible.
It want stay long. You are now in the pre ad phase. As soon as the ads are integrated the answers will become worse.
They have something Google never had: a paid tier.
There are plenty of revenue models aside from ads.
The excluded middle here is a paid tier that nevertheless serves you ads :(
Google was originally fairly egalitarian, OpenAI never was, and never will be. For better or worse.
Streaming services have been introducing ads in their lower paid tiers. It will come eventually.
MS already talks about ads for Copilot
Obviously the future is to train the model with the ads, so that they're indistinguishable from the core of the answer.
I kid, but also hope I'm wrong.
Hm, m$ also runs a few giant adtech platforms, maybe they can just inject tracking code at the source.
> I've been trying hard to break my 10 year-old of the habit of just typing questions into search engines
Honest question: why?
I understand not wanting to use Google (the search engine) or not wanting to support Google (the company). But I don't see with the issue with just looking up questions.
I'm 10 years younger than you, and I've been reaching for search engines first since I was 7, I think. Basically since I learned how to turn the computer on and open a web browser.
Because I want her to find authoritative sources, read, learn, understand, think critically, etc. rather than taking a given answer at face value.
For me: because that‘s exactly what Google and/or seo optimize for, but with no regard for accuracy and quality.
Right, A lot of times I'm searching for a filing. Or a site link. I do not ask questions when I'm doing so, that's ridiculous. I don't ask questions if I'm searching for a recipe, or something in my local area either. Actually, I very rarely do this.
> But "search" and "getting an answer to a question" are two different things, aren't they?
Google exists, as both a successful enterprise and as a verb, precisely because to most people they are exactly the same thing.
No, this is wrong. People ask what they want to know. Sometimes the best answer is a link. Sometimes it's just an answer. The ability to intuit which is best is what makes products in this space worth making.
Like you, I thought typing questions into Google was wrong for a long time. The times have changed; this is how most people interact with Google, and it really does convey intent to the system better now that we have sufficiently powerful NLP.
That’s okay if your goal is to get an answer to a straightforward question. If, however, your goal is to research a topic, or to find sources for something, or any other scenario where your aim is to read actual web pages, then you want web search, not AI answers. These are two different use cases.
I absolutely agree that it handles natural language questions much better now than when I started using search engines in the late 1990s - in fact it's optimized for this task now, meeting demand where it's at - but a direct answer to a question is often not what I want. For example, I often want to find a page that I remember reading in the past, so that I can re-read or cite it. Or I want more reading material to get a deeper, more nuanced understanding of some topic; things that will provide more context around answers or lead me to generating new questions.
>But "search" and "getting an answer to a question" are two different things, aren't they?
First conceptualization of the "search" were web directories then AltaVista and Google drove the complexity down for the users by providing the actual system which crawls, index and ranks web information. Now cycle will repeat again and we will get Answer Machines aka chat bots which drive the UX complexity for users even more down.
Why would I skim search results links and websites if the "AI" can do it for me. The only reason would be if you don't trust the "AI" and you want the actual links of websites so you can look for useful information by yourself but the majority of people want an instant answer/result to their search query hence Google's old school button "I'm feeling lucky".
Kagi has better search and you can tweak it however you like. So the product you are wishing for exists.
Getting an answer to a question is a superset - the answer can be a page.
Sometimes the answer we want is a specific page containing some term, but for most people, most of the time, I'd argue that getting a narrower piece of information is more likely to be valuable and helpful.
The answers come from the same websites. They just get stripped of their traffic. As someone who puts a ton of work into writing accurate, helpful guides, it's devastating to have my work plundered like that.
Once these monopolies have successfully established themselves, they will become indistinguishable from the ad-invested websites they replace. The only difference is that they will create no new information of their own, and they will destroy the indieweb that once provided it.
What value does the traffic have for you? Is it lost revenue from ads? Or are you selling something? If you're selling something, then the AIs could very well be giving you more sales than they take away.
I guide immigrants who settle in Germany
Have you noticed a decline in sales from AIs? I'd think that for such a service, people who don't want to pay wouldn't pay you anyway even if they went to your website first for the information, and people who do want to pay will find your business through the AI.
I don’t sell my services. My income comes from affiliate links on some pages (e.g. for choosing a first bank). A vast majority of the content is not monetised.
I noticed a drop in traffic, despite having added a lot of valuable guides this year. The traffic per page is way down, and only for Google.
You are correct that people don’t pay for this, even when they email me and I personally assist them. There’s just an expectation that things on the internet are free and that’s fine.
People pay thousands of dollars for immigration assistance, even if you're just filling out paperwork. If you want to make money from your webpage, you could simply offer to help them with these things for a fee. And leave the information up if you want, for the people who don't want to pay. Also, the more information you have up, the more comfortable people will be to hire you. E-mails from people who want free advice simply gets the delete button. They can read your free information or hire you.
As for affiliate links, I think they are a thing of the past. It's exactly these that web users want to avoid. Better to make a deal where you charge the bank a fee for each person you sign up. There's a lot of banks who offer these referral deals to all their customers. For example N26 gives you €70 per referral and Wise gives £25 per referral.
You can probably make good money from your website from doing what you love. But ads and affiliate links and getting money from traffic is a thing of the past.
Affiliate income is tied to the size of my audience. Income from relocation assistance is tied to the number of hours in a day. So far I have made more money doing less work, and it helped far more people.
I’d much rather help a greater number of people for free than the well-off minority that can afford my time. I’d sooner run my business into the ground than change this.
Since you're an expert in the field, the time you spend on an individual client is much less than what the person would spend figuring it out for themselves. And time is money. Whether a person is rich or poor.
I think you are thinking and operating in the old paradigm, thinking that somebody spending a few hundred dollars on getting expert help on the Internet is outrageous, even if it's a life changing spend. Would you consider somebody spending a few hundred dollars on expert advice in a fancy office to be outrageous? Then consider that your expertise is way higher. There's nothing wrong with you charging for your immigration services and expertise. And as I said, you can still offer as much information for free as you please. It's only better for sales.
Google does not owe you any traffic and they don't owe you any business income from affiliate links or ads. Zero. If you don't like Google using the information on your website that you are giving away for free, then it only takes a few minutes to remove your domain from Google.
You can combine giving information for free with getting paid for your work. But you can't demand that Google pays for that.
I just checked out your website -- what a beautiful labor of love!
Thank You For Making And Sharing :)
How are you monetizing your website? Is it with ads?
Who says they need to monetize it? Is that the only value we ascribe to traffic, now?
Indeed, this is what I likened to a "Dark Forest":
Either I’m monetizing my site and I care about traffic or why else would I care if people visit my site as long as information gets out there?
This whole website's raison d'être was to provide neutral and accurate information about German immigration.
> as long as information gets out there
A possibly incorrect summary of the information gets out there. Given how much nuance I weave into my content, and how much effort I put into getting the phrasing just right, it frustrates me to no end. There's a very high likelihood that AI could give someone an invalid answer _and_ put my name under it, surrounded by their ads.
And ChatGPT 4o (at least the paid version) and the AI overview in Google both give real time links to sources. Well at least you can ask the paid version of ChatGPT to give you sources and it will do a web search
I use Perplexity and it routinely confabulates while linking to the source it confabulates from. Parent has a valid gripe that AI is essentially damaging their reputation by pretending to cite its information with a credible source, I wish there were some legal avenue to sue but it's not quite libel is it?
And you have the source as proof. But that says a lot about Perplexity.
Does Perplexity actually give you a clickable link like ChatGPT does?
It gives you clickable links to a half dozen or so sources. It's not clear which of the information comes from source 1 vs source 2, etc.
And it’s too much to verify the sources? When you use Google and search for something do you not have to go to multiple sources?
Building a professional reputation? Letting people contact you with feedback and improvement suggestions? Pure personal pride? Plenty of reasons to want your work to be attributed to you regardless of whether you're directly monetising people reading it.
And who is going to find or even care about these websites except for people going to them specifically because of a link to your profile on social media sites, through public talks or otherwise through word of mouth?
I don't understand what you're getting at. This thread concerns how we used to be able to find good information with these contraptions called search engines, so that word of mouth was not the only way information was found.
What I’m getting at is simple, no one is going to find a random persons obscure blog where they are trying to build a “brand” or be a “thought leader” that is not on the first page of search results.
I subscribe to Ben Thompson’s writing and make it habit to go to a few other websites because they have earned my trust.
The only method that most people have ever had of gaining traction is via word of mouth and not through search engines.
No one owes you traffic or discoverability any more than they owed HuffPost or the other click bait, SEO optimized websites before the algorithm changes
I don't know how old you are, or whether you ever really knew the web in the prior era that we're talking about. Forgive me if I'm making flawed guesses about where you're coming from.
Back in the day, if I wanted the answer to some specific question about, say, restaurants in Chicago, I'd search for it on Google. Even if I didn't know enough about the topic to recognize the highest quality sites, it was okay, because the sorts of people who spent time writing websites about the Chicago restaurant scene did know enough, and they mostly linked to the high-quality sites, and that was the basis of how Google formed its rankings. Word of mouth only had to spread among deeply-invested experts (which happens quite naturally), and that was enough to allow search engines to send the broader public to the best resources. So yeah, once upon a time, search engines were pretty darn good at pointing people to high quality sites, and a lot of those quality sites became well-known in exactly that way.
I’m old enough that my first paid project was making modifications to a home grown Gopher server built using XCMDs for HyperCard.
My first post was on Usenet in 1994 using the “nn” newsreader
The web has gotten much larger than when it didn’t exist when I started.
But web rings on GeoCities weren’t exactly places to do “high quality research”. You still had to go to trusted sites you knew about or start at Wikipedia and go to citations.
Before two years ago I would go to Yelp. Now I use the paid version of ChatGPT that searches the internet and returns sources with links
I've had numerous people contact me directly with follow up questions about various info I've put on my website. Many of those have turned into further conversations and collaborations.
You can't have that if Google is plagiarizing your site and delivering the info.
How many of those people found you by randomly searching Google versus via links via your profile on social media?
All of them, because I'm not on any social media. I also mostly put obscure things on my website that aren't easily found elsewhere online, so very specific searches tend to end up on my site.
Also probably why I get email from people visiting as it's one of the few places people can reference said info.
So you aren’t willing to put in the work necessary to get your site recognized in 2024 and you don’t see that as a problem?
Eh? My site is recognized and found on google searches. People find the info they are looking for and sometimes email me asking follow up questions. The site is working as intended so I'm not sure what you're talking about.
That's the real question, because younger generations use less and less Open Web.
That was actually one of the main concerns of Larry Page back in the day, that the majority of Web's information might get and be locked behind walled gardens, paywalls or whatever else.
The web is just as “open” as it ever was. It is just as easy if not more so to create and host your own content.
You’re complaining about “discoverability” which hasn’t been easy since 2000.
The most successful independent writer today is probably Ben Thompson’s “Stratechery”
https://stratechery.com/about/ https://blockbuster.thoughtleader.school/p/how-ben-thompson-...
Through organic search, you probably won’t find any of his free articles when searching for a topic on the first page. He had to put in the work over years and couldn’t depend on Google.
Walled gardens like Facebook, Instagram, LinkedIn and others are the missed opportunities for the Open Web. Google nor any other search engine can't crawl their information, so Web users who are not on the aforementioned sites are missing a lot of useful information and social dynamics that would otherwise take place on the Open Web.
At least for Facebook, if the information is not publicly available via Google it’s because the content creator has decided not to make their content public.
Google very much can crawl information on Facebook and Instagram that people have made “public”
As far as “social dynamics”, do you remember Cambridge Analytics? Why would I want my social graph to be publicly available.
It’s bad enough that people have their contacts synced with Facebook.
If most information on Facebook is private, it’s because everything else gets spammed to hell. Same with discord. They are not a replacement to public, curated information put out by relatively knowledgeable people.
If I had a site (no time lately to maintain one) it would be because I wanted to inform people and contribute to the world’s accessible knowledge. I would want my information presented in context, accurately, the way I intended, not digested and reworded (often inaccurately) by Google.
And how likely is someone to find your site through search instead of word of mouth?
I bet you if you had insightful posts on HN (not saying you don’t) and people knew you, you would get more traffic by putting a link in your profile here than people searching on Google.
I can answer that question with actual numbers: 90% of my traffic comes from search engines. The remaining 10% is much more time-consuming to acquire. It doesn’t help that external links are downranked by most social media sites.
It doesn't matter anymore. There won't be monetary reward, citation, personal brand building, or anything. Google just rips off the information, presents it as fact, and a visitor will never visit an author's original website again.
Websites are training data and will become an anachronism.
If I care about my “personal brand”, what are the chances that people are going to find me organically on the web?
If I want to get my name our there - which I don’t - I’m going to post to LinkedIn, give in person talks at conferences, try to get on popular podcasts that have guests, etc.
Presumably, because no other method than ads or affiliate links works...
Where are you publishing your guides? Would love to add another bookmark to my collection.
Unless you are thinking of moving to Germany, it might not be helpful to you.
You’re correct but this is pretty interesting and I’m sure helpful for people in Berlin!
Do you just like collecting links to online "guides" to anything? No preference for any subject matter, just a collection of random "guides"? Interesting, you could make a guide for that!
You could say that. I have found that a lot of guides produced by folks on Hacker News to be generally interesting. Probably too much free time? Either way a guide of guides does seem like a good use of that free time.
It would be great if it wasn't completely wrong 50% of the time.
Describes my general experience with AI across the board. Copilot, ChatGPT, Claude, etc. It’s like I’m talking to a genius toddler. With ChatGPT losing 5 billion dollars on 3.7B in revenue this is unsustainable. It feels like the dotcom bubble all over again.
This is true, but fairly or unfairly, asking a question to a chat bot feels like “opting in” to the possibility that the answers you get will be hallucinated garbage, in a way that doing a Google search does not. It’s a tough problem for Google to overcome— the fact that they will be held to a higher standard—- but that’s what it is: we have already learned to accept bullshit from LLMs as a fact of life, whereas on the top of Google results it feels like an outrage.
I have been a paying ChatGPT user for awhile. It’s simply a matter of saying “verify that” and it will give you wen citations
Aren’t those citations sometimes entirely made up? Like the lawyers who used it for a case and it cited ones that never happened?
I really do think hallucinated references are a thing of the past. Models will still make things up, but they won't make up references.
ChatGPT with web search does a good job of summarizing content.
No, ChatGPT has had a web search tool for paid users forever. It actually searches the web and you can click on the links
It invents citations too, constantly. You could look up the things it cites, although at that point, what are you actually gaining?
And I’m not saying this makes them useless: I pay for Claude and am a reasonably happy customer, despite the occasional bullshit. But none of that is relevant to my point that the bots get held to a different standard than Google search and I don’t see an easy way for Google to deal with that.
Do you pay for ChatGPT? The paid version of ChatGPT has had a web search tool for ages. It will search the web and give you live links.
ChatGPT has had web search for exactly 58 days. I guess our definitions of 'ages' differ by several orders of magnitude.
The paid version has had web access for at least a year
March 23rd 2023
https://openai.com/index/chatgpt-plugins/
That’s 666 days.
So you are off by over “one order of magnitude”
A plugin? You’re joking.
It’s a “plug in” built into the paid version of ChatGPT, run by default and created by OpenAI.
This isn’t a third party obscure plug in.
All “tools” use a plug in architecture
You're a troll, and I'm done feeding you.
What part is “trolling”? Paid users have been able to use ChatGPT using the built in web browsing plug in for over a year just by saying “please provide citations” or “verify that”.
What you say has been around for a few weeks has literally been around for paid users for over a year
> we have already learned to accept bullshit from LLMs as a fact of life, whereas on the top of Google results it feels like an outrage.
Sort of. Top results for any kind of question that applies to general population - health, lifestyle, etc. - are usually complete bullshit too. It's all pre-AI slop known as content marketing.
> genius toddler
I think it's closer to a well spoken idiot.
A cat who can talk.
What are you using it for?
That's a very pessimistic take. It's right about 50% of the time!
Both of your requirements for correctness are just 50% too high.
The mark of a great product/feature is always when they feel the need to force it on users, because they know that a significant portion of users would switch it off if they could.
The difficulty of verifying the answer isn't-wrong is another important factor. Bad search results are often obvious, but LLM nonsense can have tricky falsehoods.
If a process gives false results half the time, and verifying any result takes half as long as deriving a correct solution yourself... Well, I don't know the limiting sum of the infinite series offhand, but it's a terrible tool.
I find it mostly right 70% of the time.
Which would be great, except for that I found the top google result to be more than 70% relevant to my searches in the past, its a clear downgrade of relevancy.
60% of the time, it works every time.
Yeah the AI summaries are garbage still
Compared to 0% of relevant results in first 10 pages it's an enormous improvement.
Have you seen an example where the AI hits on something that isn't in the first 10 pages of results?
Wait till the monetizing by ads starts
The AI answers are nowhere near good enough to always be at the top, without any clear indication that they are just a rough guess. Especially for critical things like visa requirements or medical information. When you search Google for these sort of things, you want the link to the authoritative source, not a best guess. It’s very different for queries like say “movies like blade runner”.
It seems damning enough that Google itself doesn't know what is a more authoritative source or they would have weighted their AI output appropriately.
What does that say about their traditional search results?
I doubt that was the decision process. It’s much more likely that there is a directive coming down from the top that “we need to go all in on AI”, which then gets amplified down by middle management and the result is AI smeared over all results irregardless if helpful. That then drives up some vanity metric like “searches answered by AI summaries”, while metrics like “bad AI summaries shown” don’t get attention. As a result the organization is happy, people can get promoted, etc.
Not all queries are the same but I agree with you that the authority of source is crucial. That's why for example .gov sites rank high and should rank high because government is usually the most trusted source.
But when you are looking for new shoes to buy or food recipes then .gov sites can't help you and that's where things get ugly....SEO spam ugly.
An example: I was looking up what a good diet is to follow after a child has been vomiting. The AI said to avoid giving fruit juice … yet the authoritative sources said the opposite. I already knew not to trust the AI, but this was nail in the coffin for me.
agreed and frankly I am a big fan of LLMs in general… it just doesn’t seem like the one behind google search is all that smart
Google's AI summary of search results hallucinates. You might like it, but you may also end up seeing, and believing in, something that just doesn't exist.
For example, it says there's a sequel to a Disney film called Encanto, and there just isn't. https://bsky.app/profile/jasonschreier.bsky.social/post/3lee...
It also misidentified an article by Bernard Vonnegut about ice crystals as written by Kurt Vonnegut
Most if not all of the times it "hallucinated" it was paraphrasing the web.
And in doing that it's endorsing some random website that makes things up and elevating that garbage to be something users trust 'because it's on Google.'
Users will need to learn that Google is only as trustworthy as the crappy websites it uses for data that drives its AI. I'll leave it up to you to work out how that might impact Google's brand.
For quick simple steps like how to get a Bluetooth keyboard into pairing mode, it seems to work really well. I hated the prior world where everyone attempted to hide the real answer 3/4ths of the way through a useless blog post or YouTube video.
Which, we should note, didn't happen 10 years ago before the accountants took over search at Google. Those good, lean, helpful pages still exist. Google incentives websites to have pages of slop on everything now because they track how long you spend on a site as a "metric of a good match". Forrest for the trees...
Baking recipes are the fucking worst.
I shouldn’t have to read 2000 words to make a cheesecake. And I shouldn’t have to read it three times before starting to make sure I combine the ingredients in the right order.
Even the good ones are often subtly wrong. For example, never add baking powder or especially cinnamon to wet ingredients. Stir them into the dry ingredients first, then combine. Otherwise they clump. With cinnamon it makes it look bad. With chemically reactive ingredients it can lead to insufficient rise. Who taught you people to cook? Obviously not grandma or PBS.
I see a lot of people blame “stale” baking powder and while that is a thing, mixing it in wrong or subbing oil for butter or not chilling (eg cookie dough) is just as likely a culprit.
My friend made two sheets of cookies from the same batch and the second ones were terrible. She left the dough on the counter while the first batch was in the oven. Rookie mistake. And she has adult children.
Stack overflow launched 16 years ago, when for many years most of google results already were expertexchange type of sites with the obfuscated answers hidden pages deep in the link.
> expertexchange
This reminded me that, rather hilariously, it used to be called expertsexchange.com before adding a dash (experts-exchange.com).
Expert sex change will long be remembered. I recall there was an Italian site that had a very spicy “Le Tits Now” reading but it escapes me.
i think if anything by google actively penalizes long slip articles with lots of affiliate links
Were you not googling before? They had a bullet point summary that was actually more accurate because it scraped direct quotes from the website. Now I am getting wrong info from the ai summary. Its a huge step back from just what was there previously but its sold as some advancement.
It definitely didn't seem more accurate to me. If quite frequently either scraped quotes that weren't actually an answer to my search (the webpage was correct, but the link between my search and the webpage was not), or it was an answer but the answer was wrong (because the webpage was wrong).
The AI summary now isn't perfect because it can still regurgitate wrong information from the Internet, or hallucinate information when there isn't any -- but it seems to actually understand what I want now, so it doesn't suffer from the incorrect matching problem.
Also, there are way more AI answers now than there ever were snippet answers.
Agreed.
My friend and I used to paste pre-AI Google search snippets to each other when they were so bad, especially when it quoted a comment on Reddit.
…i’m no fan of the google AI feature but it is way more accurate than the scraped bullet point predecessor which would often scrape things while missing something key like a “here is the opposite of what we are talking about:” in the webpage
They are also wrong just slightly too often. After the fifth time I was twenty minutes in to trying to use command line options that just don’t exist before realizing that I was being led down the winding path by an ai hallucination that I mistook for a stack overflow quote, I broke and paid for Kagi. Which then immediately added an AI drek feature, fml.
> I can search for an answer to a question and, instead of getting annoying ads and SEO-optimized uselessness, I actually get an answer.
You get the average of the seo optimized answers
An answer is only as good as the expertise behind it. When searching I always pay attention to the source, and will skip ones that look less trustworthy.
One major advantage of Google's original pagerank was that originally it worked well and number of links to a page was a good proxy for trustworthiness and authority on a subject. It used to be that you'd find what you were looking for in the top few Google search results, which was a massive improvement to Alta Vista which was the existing competition where you'd have to wade though pages of keyword match sites listed in no particular order.
Anyway, source is critically important, and if I'm looking to find something authoritative then the output of an LLM, even if RAG based, is not what I'm looking for! Increasingly people may be looking to search to verify stuff suggested by an LLM, which makes a search engine that puts LLM output as it's top result rather unhelpful!
It doesn't help that with Google in particular their AI output is all heavily DEI biased, and who knows what else ... I just don't trust it as objective.
I totally agree, I really appreciate them. Half the time they give me the answer straight away.
And when they're not helpful, it's no different from the first search result not being helpful and going to the second. Plus, they do a pretty good job of only showing them for the types of searches where they're appropriate.
Are the right 100% of the time? Of course not. But snippets weren't right 100% of the time, and not infrequently clicking on the top search result will contain information that's wrong as well. Because the Internet isn't 100% right.
The idea that a "wall of text from AI" is somehow bad doesn't make any sense to me. And it's not a "wall", it's basically paragraph-sized. Where the context is really helpful in determining whether the answer seems correct/reasonable.
They're strictly worse.
They're just a summary, so any information is in the results or hallucinated.
If the AI could accurately point to the correct information, they would just order the results as such, but instead it's just a paragraph of spaghetti on a wall to look cutting edge.
I just want to be able to turn the stupid overview off. That's all. One simple toggle.
I don't get why a Google Workspaces account can have Gemini forcibly disabled across the entire enterprise yet still have these AI features seep in with no way to manage it at the enterprise level.
Why don't you use Kagi? It has better search and you can customize a lot of things, even turning off their LLM
Because it gives money to people I don’t want to have that money, even if I turn off any features related to the people that I don’t want to have money.
Is Google supposed to give you an answer, or help you find something you're looking for?
Back when they only tried to help you find something, they were good at that. Really good. Then the ads and meta-slop came in and you couldn't find things anymore.
Then they decided they also wanted to answer questions, which is hard enough (they're often wrong). So they have to focus harder on answering questions.
And since they're trying to do both in one page/place, the question-answering has taken center stage, and finding things is now next to impossible.
So they're no longer a search engine. They're a crap version of OpenAI.
It's wrong enough and unsourced enough that it's more cognitive load to vet the result than not having it.
Google is barely more useful because of this.
Its hit or miss for me. This week I was googling how to use libarchive and the AI generated responses at the top of each query were either incorrect or hallucinations of methods that don’t exist.
I don’t mind playing with AI to help scratch together some code, but I do that using better models. Whatever model google is using for search results is too crappy for me to consider trusting.
I sometimes like it, but I've gotten very skeptical of it. One day a friend and I searched the exact same question in Google and got opposing answers for the identical search string. Thus wasn't in the "AI" widget, but one of their usual widgets that give answers to questions. I assume both use some form of AI anyway.
I think when it’s good it’s pretty good.
But knowing when it is good is still hard, as I can’t trust it more than an LLM. But with an LLM I have a simple chat window, not a bag of rabid SVPs fighting to be on the SERP page.
I dont find it useful for the things I search
I still have to check the sources and then add “reddit” to the end of my search query
so for me its actually an additional third step or remembering not to trust the ai overview
I use google.com for search and Gemini for Q&A. Two sites for two modes. I also use uBlock to remove the ai response from my search results to keep them clean and separate
Same with Brave search. The AI answer is often good enough for me to not need to go further. I’m with you, except I don’t use Google.
Search for anything with Google that has high ad monetization potential. You will find that Google turns off the AI overview.
Arc Search is what Gemini dreams of being. I’ve found it to be incredibly useful tool to cut through a lot of the crap.
The few times I gave it a try it was dead wrong. The dream is nice but the execution is lacking.
Google search had gotten so bad the AI overview is passable in comparison. They don't deserve credit for that! Search was better at getting useful information fifteen years ago than it does now. (And yes, the internet is way more full of garbage now--but they did that, they are responsible for that too!)
... unless you want anything like a perspective or an opinion on something, instead of a factual answer to a question, in which case it's totally useless.
Why do we think the AI is any better? Isn't it based on the same dataset as search? How can it be anything but strictly worse for any given query?
Er.. I used it and that's what I thought?
Search goes out of its way to hide why I want and show me bullshit shopping ads and influencer videos. The overview at least tries to help. For now.
But I will emphasize: it's still not that helpful, it's just less corrupted than the main body results are... so far.
if google’s AI overview were as smart as 4o, i would like it a lot more.
"Be Socially destructively evil via diffusion of content"
You're definitely a contrarian.
Google search is awfully bad these days.
Google in 2000s was excellent. Modern Google makes me feel like I’m a product being sold a lemon at a barb wire fenced used car lot. It’s horrible, the things being shown are horrible, and there’s questionable ethics and value to be had by even going there.
I remember when the more tech savvy folks were migrating from Altavista and Yahoo to Google because they understood it is better sooner than others. The same thing is happening. I consider myself tech savvy (still!) and I have to admit that I rarely use Google for all sort of information.
As a side note: I am using Safari and I noticed that Apple's search is also replacing my Google searches. In the past if I knew name of a company or organization but not their website I'd Google it. Now I put it in the address bar and Safari very often finds the website for me.
Of course the savvy are using LLMs now, but they’re also reckoning with two things:
* you have to check an LLM result, especially if it cites something (because it may or may not exist)
* you can’t cite an LLM result
It’s a useful tool, but it lacks certain utility features that a useful web + effective search has. Or had.
> Of course the savvy are using LLMs now
I don't think this is true at all. It is people prone to hype, or who are naive, who are using LLMs. The savvy know that a tool which you have to verify every single time (because it isn't deterministic and makes shit up) isn't actually saving you any effort.
The point that there’s effort involved in verification is an important one and definitely part of any thorough inventory of liabilities and advantages of LLMs.
I am less convinced this means it’s always a wash (or worse). Sometimes it’s good to have something to start with. “You can’t edit a blank page” is a truism among fiction and non-fiction writers of all stripes for a reason. Of course, the quality of what you start with matters, and it matters more the less of an evaluatory mindset you come to your tools with. I know at least some earlier adopters have that mindset, but perhaps I’m too optimistic about how it generalizes.
Even without using LLMs, Duckduckgo is good at providing little widgets that just give you a straight answer. Try searching "150 usd to eur" or "Weather Philadelphia".
> * you have to check an LLM result, especially if it cites something (because it may or may not exist)
You think that you don't have the check the link at the top of the page from a google search?
In fact, last I checked, for every google search I do, I have literally hit the page down key before starting to read.
> * you can’t cite an LLM result
And you think you can cite the top search result from a google search?
> And you think you can cite the top search result from a google search?
I think the issue with LLM answers is that I often have to go to Google or something to check if the answer is right. Maybe it shaves some time off, but we're still back to the original problem.
Of course you can. The top search result from a Google search yeilds (eventually) a URL —a uniform resource locator— which is something that can be cited.
100%. But gradually being solved as source links are being provided. If the source link is as good as what you’d find on Google anyway then you have a way better search experience and don’t really burn much extra time checking the LLM work
Apple's search is Google search. Google pays Apple $20 billion a year for that.
He means the top part above Google suggestions. If you type a url or company name Apple sometimes guesses the url and provides a direct link. Google not involved there - only if you click a Google suggestion or hit search/enter or whatever
In 2000s we were freeloading on the investors expense, in 2020's investors are recouping their investments.
Also, most of the content was either stolen(divx, mp3 etc) or created without of expectation of immediate reward(mostly passion projects).
Oh and btw, Google didn't got infested after LLMs proliferation. Google results were useless way before that. With LLMs there's even improvement as the spam is at least mediocre content.
The question of course is how well that level of quality will hold up now that more and more of the internet is AI generated, and said AI generated content is being sourced for said AI tools. It feels like our choices are either to only get information from a certain time period or earlier, or to accept the information provided by Open AI and co is only going to get worse and worse over time...
There is no cure to the endless stream of AI generated SEO-trash. As long as there is a system to game and an incentive to do so, it will be gamed.
The only solution I can dream of is to remove the incentive, aka remove advertising. I'm afraid I'll be dead long before that.
OpenAI trains on sources like BusinessInsider which allow sponsored content for a price. This is how they will monetize their products, by intentionally introducing it to bias for a profit motive. It's a good thing they convinced you that its a black box, otherwise you would see this coming.
I'll be honest, I don't understand the example in this article -- it doesn't seem to be evidence of the thesis/complaint. The two dark mode screenshots are basically the same and, at a glance, I trust the Google one more because it's showing me a snippet of the authoritative source, Apple.com. Beyond that, there's no ads in the Google screenshot, which was a huge part of the thesis statement.
> OpenAI's search is becoming Google in the 2000s
Google started going bad in the 2000s (albeit not as bad as now).
> if it can remain trustworthy.
At no point was it trustworthy - even if it were an abstract LLM, trust would be an issue; but this is the opaque product of a corporation heavily invested in by untrustworthy entities and people.
That does not mean it isn't often useful, but "trust" and "usefulness" are two very different things.
In 1999 I worked at Yahoo! It was great the stock was flying, I worked on really cool tech writing C++ code. Then one day I went to a social dinner and a high schooler got up and made a speech and at one point said “I googled it”. Right then and there I know the gig was up.
So far I have not heard anyone say I GPT’d it, but Google is running very dangerously close to the edge here. For one thing the founders have checked out, never a good sign.
Something that also bugs people is GOOG wants to follow you everywhere, when you sign in to many websites that little blurb asking for your google account comes from a google server (<script src="https://accounts.google.com/gsi/client" async defer>).
I was responsible for servers that ran 100m page views a day at Yahoo! One day I was approached by this smarmy little guy who asked if he could pull logs from the machines. Alarm bells. Who the heck was this and what was he doing with the logs. I knew of course he worked for Filo and so I had to give over the data. This was the start of the spying on the customers. Google is a master of this, and it really irks a lot of their customers. Another red flag.
Alternates like duck duck go and brave have made some inroads. Their percentages are quite low still.
There have been layoffs in the name of cost cutting. Googlers have had some very public employee dissatisfaction meetings (my name for them). Employee compensation problems, problems with businesses the company is etc.
One last thing, Mark Cuban sold Broadcast.com to Yahoo for $5.7 billion April 1, 1999 (seriously!). “ Apollo Global Management acquired a 90% stake in Verizon Media, which included Yahoo and AOL, for $5 billion. Verizon retained a 10% stake in the new company, which was rebranded as Yahoo upon the deal's completion.” The deal was finalized Sept 1, 2021 according to chatGPT.
I say ChatGPTed it almost every day, or some form of it, OpenAI dropped ball on the name. Me and me friends say "asked CHatGPT, asked GPT, asked AI (but mean ChatGPT)" ... it hurts though as it's all too long and akward, no intuitive verb to use.
Probably just my limited perspective, but I am also noticing, it's vastly men who use ChatGPT daily on anything from random questions, to health queries or personal growth. Not sure why, but somehow, I don't know a single female who would use it much, beyond super basic queries. Meanwhile guys of all kinds of backgrounds, nerds or not, technical or not, young or old, doesn't matter ... if there is a heavy user, it's a guy. But as I say, just limited perspective, I don't know big enough number of people for substantial sample size. Just recently I see the stark contrasts more and more, even very smart, nerdy and highly curious women I know, are not interested in ChatGPT.
Definitely agree, but I'm surprised Perplexity wasn't mentioned in this post. It's currently Perplexity Vs. Google
Once chatGPT and Claude (through MCP) added web search functionality I completely dropped Perplexity. I assume I'm not unique in this regard. Feels like the writing is on the wall for Perplexity.
Could you explain more? Are the results better, or are there other features?
I assume he's just being over dramatic. I use Perplexity every day, multiple times per day, and it almost completely replaced Google for me in such areas as coding or technological research. Also, I never used nor planning to use ChatGPT or Claude (I use private open models instead - Mistral Nemo, Qwen, etc.). But I also feel like "I'm not unique in this regard", lol.
It's mostly a decision to manage the total amount I spend on LLM tools. Given unlimited money I suppose I'd still be subscribed to Perplexity because the UI is slightly better than Claude and chatGPT for web results. But Claude and chatGPT are plenty enough for my web use cases while allowing me full access to all of their models for non web search use cases.
Yeah, I keep using the $1/mo promotions for Perplexity but I only really use it because it can read the results to me which I use for practicing foreign languages, so there's that.
Why do you need perplexity when ChatGpt can do search now.
This! "I don't want to watch a 10-minute video for a quick answer."
And this: "OpenAI's search is becoming Google in the 2000s, if it can remain trustworthy."
The problem I see: People use OpenAI/Perplexity for knowledge. Not to seek website. I think sooner or later, most website will block AI crawlers. What does a website gets out of it?
I'm still waiting for the AI service that turns those 10 minute videos into a text tutorial with photos.
I think it's pretty damning that it's not a built in YouTube feature by now.
Lots of limitations here but they do offer an 'ask' button under some circumstances: https://support.google.com/youtube/answer/14110396?hl=en
I've found it to be helpful in getting quicker information out of some 'review' style videos, where I can ask a few pointed questions and get answers faster than the narrator can get to that info. I hadn't found it to be wrong in my few attempts with it, but ymmv
let's go kantian and imagine everyone would use such a service
why would someone create a 10 minute video if no one would watch it and they would get no money for it?
I assume some folks prefer the videos but if not, then why are they getting made?
The creator can use the tool themselves and post it with ads. In fact, I'd prefer it.
> I think sooner or later, most website will block AI crawlers.
Get a third party to do the crawling and the AI company can buy the data from the third party.
So... the next step clearly is ChatGPT adding Ads to its output.
Of course OpenAI can't stuff their UI with ads yet. In the middle of escalating anti-AI sentiment due to the rapid slopification of the infosphere that they intiated? With cultural resentment growing over the devaluation of visual evidence? With video and audio modalities creating distrust and obsolescence in the creative class?
Google only had to provide superior value with a clean UI. OpenAI has to contend with normalizing the mechanisms that are upsetting the lives of the customer base that pays them; the customers they'll replace with ad servers as soon as it becomes prudent to start indicating their end-game.
So, as a company, when can I pay OpenAI to twist its responses towards my company?
Because it will happen. I’m afraid that ChatGPT is my friend for 20€ today, but prices will increase and response quality will go the way of Siri.
> when can I pay OpenAI to twist its responses towards my company?
They are probably thinking about it.
ChatGpt will become one of the most valuable company if they can pull it off and still keep the users happy.
Today I had a better experience with Google than with ChatGPT.
I started with chat and asked why I couldn’t change the passcode on a kid's device.
I tried 3 answers. None worked.
So I google. First hit is an Apple forum with the exact problem. Solved.
What was the Apple forum's answer and what was ChatGpt's answer?
I think I’ve had such a bad experience with openAIs charGPT answering a question falsely and sounding very convincing that I skip the AI overview or double check to confirm it’s correct. Am I mad?
I already regularly am helping someone with something, they'll Google it at the same time as me and they'll authoritatively regurgitate the AI summary. More often than not, it's been misleading, or a partial answer to the point of being detrimental.
I get it though. I sort of am the news guy of my family and they constantly ask me where I go for news. It's an impossible answer. I grew up with the internet in it's early days, using it for policy debate research. It's a non-precise method of consuming various sources, judging their answers on plausibility and commonalities, and a healthy dose of constant skepticism.
But people don't want that. They want the answer spoodfed to them.
In my experience that's not limited to chatGPT, all of the LLM tools seem worthless to me.
Trying to replace Google w/GPT but....
- Local search results for businesses, phone #s, directions isnt available
- Im paying GPT subscriber and it constantly logs me out on my different devices
- Sora I can not upload photos of whoever & make them do whatever like https://hailuoai.video/ which is free and at times fairly convincing (fun for the 12 yr old in you lol).
I know GPT Search is new and Im excited for GPT to become a phone AI OS or Open AI & MIcrosoft developed their own phone with a new personal device paradigm... i.e. create a H.E.R. phone.. it's your personal AI Assistant that does all for you via text, voice, hand gestures, facial expressions, etc. Once you pick up your phone you see your assistant waiting to assist. You can skin your AI assistant to look like anyone living or dead (loved one could live on & help your throughout ur day). Probably some crazy ideas but a H.E.R. phone / personal device as described (some parts) would be something new/different and possibly give Apple and Android a run for their money!
I am working with a new for me tech, and got stuck integrating an SDK from a hardware manufacturer into my runtime. The SDK is in C, and my runtime isn’t.
A friend suggested to use his LLM wrangling skills to get the SDK adapted. The results were interesting, but we wasted a day or two on trying to make it actually work and failed.
Then it turned out I can call C code from within my runtime, which the LLM did not point out. I figured out the specifics with a bit of googling and successfully managed to call C functions. Yet there were still issues in parsing the data, now in C SDK itself; the friend used different LLMs to refactor and comment C code, but they did a bad job with missing implementations and a lot of general obvious comments (an empty deinitialization function with “clean up as needed” comment and so on).
Then it turned out there was an SDK for my runtime. I just did not search enough. Of course, the LLM would not say that. The LLM would just obediently try to do what it’s asked to do and never question why.
People want to spend less time on a task. Whichever tool succeeds at this will win.
Time on task only matters for the lowest paid. At high levels robust, predictable, and other characteristics compete strongly for value. Do you want the most accurate medical diagnosis or the fastest?
Is that your typical search engine use case or are you more likely trying to find out when the taco stand opens?
So you want the answer to when the taco stand opens to be quick and wrong?
That's possibly the biggest waste of my time you could think of, because I'm probably going to have to spend a half-hour and a trip out of the house finding out that the answer was wrong. I'd rather get the answer in 5 minutes and for it to always be right. Dying in an accident coming back from a taco stand AI didn't know closed 6 months ago would be the worst death.
There is no "typical" search engine use case. The fallacy that there is such a thing is a big part of the problem. The enshittification and decline of the digital window to the internet is so complete that even basic information management tools like browser bookmarks are deprecated: people will "search" even for sites they use repeatedly.
Minimally there needs to be a transparent split between commercial queries (searching to buy something) and knowledge / abstract queries (searching to learn something).
Users should be context aware (ideally using completely separate tools) of when they are simply accessing a pay-to-play online product catalog versus when they are querying what is effectively a decentralized wikipedia.
Commingling the commercial with the factual was always going to be a dead-end.
Sadly, this is true. Getting the wrong answers quickly is valued more than having to work for the right answers.
I fear the day that the poor kids who grew up believing everything LLMs hard out will reach a position of power.
you get wrong answers from either source. When ai builds in fact checking that's accurate to validate it's answers then it's game over and that wouldn't be too hard to implement and future versions will have a lot less hallucinations and faulty facts.
We're in the early stages here.
> And even these ads weren't that bad. What made Google successful was showing ads you wanted to see. I'm searching for a bottle of wine, and ads for bottles of wine were shown to me.
Even at that time, Google hardly ever showed me ads that I wanted to see (even though in former days, Google's ads were less intrusive than they are today): as I sometimes mention on Hacker News, most advertising networks have difficulties to "pigeon-hole" me into interest groups - many interests of me are somewhat niche.
For example many Google searches that I do are about scientific topics from my areas of expertise - this is a rather hard topic for advertisisers (though it is plausible that it might be a lucrative topic for anybody who is capable to monetize on people like me).
On the other hand, if (rather accidentally ;-) ) Google's advertising/interesting sorting algorithm actually "finds" some interest of me which can be "monetized" via ads that try to sell me something, it's the common case that I have already bought such a product recently - in such a situation I clearly don't need another piece of the product. The reason for this is simple: after I bought the product, I have to understand some of the product's "complicated convoluted details", so a lot of Google search queries are of the type "[product name] [potential problem that I have with the product]". Thus, I do a lot more Google search queries for the product after I bought it.
It should be obvious that for the ad to pay off for the advertisier, it should be shown before I buy the product - thus Google has to be capable of interpreting the quite weak signals that might show that in the future I might want to buy such a product, and not the strong signals that Google sees in the weeks after my purchase.
Google seriously needs to scale up their generative models to all of crawling/indexing/ranking infrastructure. Their current ranking models are not capable of dealing with the next-gen web filled with 99% gen AI craps. I think they also know this. The problem is the cost and they're hyper-focused on bringing it down, but it is not fast enough.
This is correct. A ton of the hype about LLMs is you can type in a question and get a direct answer without needing to struggle through listicles. Phenomenal compared to the trash a search result provides.
Is is ironic that LLMs are the source of much of the garbage in search results. Good business model to produce unbearable noise and the filter that recovers some signal, I guess.
I don't think it will be as easy to supplant google search as it was in the past with Yahoo and Alta Vista. In the past internet search was just starting we were all used to trying different search engine and the results were different enough with all the different ways of indexing and searching that moving from 1 to another made a difference. Most of the people using the internet were a bit of pioneers navigating through something brand new and unique.
Today majority of the internet users ie 10-32 years old or 60%+ of the internet users grew up with using Google and how to get best answers our of it. Chat gpt might bring in some churn but as long google is close enough it won't get replaced easily.
> When Google came onto the scene, I credit its success to the tried and true paradigm that makes companies successful: simple and easy to use.
If Google's competitive advantage was only that they had a clean and simple web page, I don't think they'd have nearly the sticking power they ended up having. You can design a front-end that looks like Google's original UI in a matter of hours.
It seems implausible tale that in the 25 years since their ascendance, nobody has tried to compete with them with a simple-looking search box.
Google became dominant because Google was much better than the alternatives.
The example he shows here has the answer to his question highlighted in the AI overview, and in huge text in the InfoBox. He got his answer instantly without having to go to another page.
Really wish such bold claims had better evidence.
Agreed, I actually prefer the Google page in the example screenshots. And in my experience Google still returns much much faster than GPT Search.
I'm open to the suggestion that OpenAI can disrupt Google, but this was a weird case to highlight.
Google's results from the AIStudio are really good results. Everything there is good. Probably because the studio has better models than are currently being used for general search.
AI is just doing what SEO people have been doing for decades, inflating results that have no business being there. At least now there's no pretention of having any skills.
Once timeless facts will be solidified in AI training, like historical dates, laws of physics, maths formulas or specific programming API, there will be no reason to search for these things using Google.
The web battle will then happen on this moving quickly, like the news.
This will give immensely more power to the medias, and I fear that a lot given they have demonstrated time and again they can't be trusted with it.
> Once timeless facts will be solidified in AI training
Who decides those are facts?
What if someone comes up with alternative facts?
I don't think you'll find a lot of people fighting about newtons gravity equations, the year of the French revolution or the fundamentals about steam engines.
They lost way too much trust across the board the last years, culminating in the Biden/Harris/Trump debacle.
Youtube/Instagram Reels/Tiktok is the (sad) future. "Classical" media is a zombie.
Google is deader than dead on the search front. But I think they'll go the IBM route, albeit more successful
In addition to the ChatGPT’s search, the advanced voice model also supports web search, making it even easier to search.
I wouldn't be concerned about trusting the results of ChatGPT if it also were providing links to the sources it had cited or used as a reference in its answers.
Unfortunately, it doesn't, and so I can't verify them. Not sure if it's an actual limitation of current LLM or rather they're intentionally filtering out the sources.
> I wouldn't be concerned about trusting the results of ChatGPT if it also were providing links to the sources it had cited or used as a reference in its answers.
This will only be a problem for a few more years. Soon every article, paper, and website will be generated by a LLM. Verifying the output of ChatGPT by referencing other LLM generated source material will be a pointless exercise.
If the system was willing to link to the sources from which it obtained particular information, it would be too easy to hold them accountable for having stolen a bunch of content from people who didn't opt in!
> Unfortunately, it doesn't, and so I can't verify them.
Try asking for sources, whenever I have asked, it gives them.
Investors value OpenAI more than $150bn, but it just reported $5bn loss on $3.7bn revenue: the financial pressure is tremendous and that cannot last long. It's the same kind of situation as with Google vs. Yahoo in the early days, when the clean interface without much ads was a very strong USP.
I may be overly cynical, may live in the rarified bubble of the tech-savvy, but I think Google as it has existed is already dead and doesn't yet know it.
Their search results have been less than useless to me for some years now. And then LLM's came along and have been my go-to for all queries.
I was using ChatGPT to compare Docker and Podman and getting reasonable comparisons. I also asked it about c code searching tools and getting a reasonable list with what I think were reasonable comparisons.
It hit me that in a few years, this may not be available as Docker and other tool suppliers start paying for advertising. We’ll see.
This. I think Amazon is doing the same thing and leaving the door open for Walmart or another player with some logistical muscle to take a bunch of the market from them. Amazon search used to be so simple. Now you sift through so many ads and hidden ad garbage to find the thing you want to buy.
Will OpenAI eventually deliver ads along with responses?
“Before getting a response, a word from our sponsors:” type thing?
> I don't want to watch a 10-minute video for a quick answer
Nailed it. Google should show the transcript.
All I see on Google for most searches is ads and SEO spam that gets the ranking and clicks. It’s only once I spend time scrolling it that I realize it has the telltale signs like very lengthy text that doesn’t answer the title succinctly.
> Google is losing trust with all these buzzing results, and its answer is to throw more jam at the wall to see what sticks. But this just attracts more flies.
If you're going to work the metaphor, work the metaphor!
Infestation also applies to actual "organic" results. The old Google philosophy would have ranked pages with annoying pop-ups, paywalls, sign-in walls etc out of existence. Today's Google does not. They even tolerate sites that show them articles to index but are then hidden from the general public behind a paywall.
The old Internet still exists, but Google's ranking behaviour hides it.
> When Google came onto the scene, I credit its success to the tried and true paradigm that makes companies successful: simple and easy to use.
> Yahoo was dominant back then, and it tried to put everyone and everything in front of you. Then we learned about the paralysis of choice. Too many choices, the mental fatigue weighed in, and the product became difficult to use.
This nonsense again? I was around then, and I switched from Yahoo and AltaVista to Google despite its dumb name and stupid, childish logo because Google's results were hands-down better. Instead of a solely full-text search paradigm based only on keyword density, Google also ranked pages based on how many other pages linked to them, the so-called "PageRank" algorithm.
This worked much, much better, and was much harder (for a while) to game. Before Google, it was common when searching to find pages that gamed the search engines by stuffing their <meta> keyword tags with SEO crap or putting it in giant footer sections in a tiny font the same color as the background (to render it invisible). Google's PageRank wasn't fooled by this.
Also most of the major search engines adopted similarly minimalist UIs, and it did zero to stop the bleeding. They all lost to google. (AltaVista, the pre-Google Google, was still useful for a while for some specialty searching, like for anonymous FTP servers, and I wonder if DEC had never gone under or if Compaq had spun off AltaVista, maybe history would be different.)
EDIT: I just realized the article doesn't even mention AltaVista. Unbelievable.
I actually think this might push me towards using ChatGPT more. I just hate having to type, "What version was the iPhone in 2016?" rather than "iphone 2016 version number" lol
Google became an asset of the government. By selecting what you see they control what you think, what you buy who you vote for, similar to legacy media.
Open AI has started on day one with that goal.
This conspiracy theory is broken, because governments around the world have been struggling more than previously to get reelected. It might be that Google has been used to "select what you see so they control what you think, what you buy who you vote for, similar to legacy media", but they're not doing so under the control of government.
This was proven in the Twitter files. The same was happening in Facebook and Google.
There also independant research that prove this. hard to Google ;)
Basicaly it's proven that just switching the autocomplete when you search Trump or Kamala can sway the votes of undecided by huge margin.
The seek those and switch the suggestion were it matter.
They also incite people to go vote depending on the choice and buch of other dirty tricks.
You are right that it no longer work. In the end truth will always win because it connect to reality. Having just X switching to truth will swarm the world.
Between Kagi for ordinary web search and Claude for “complicated questions I don’t know how to phrase for a search engine”, I’m pretty happy right now.
Great job to both organizations!
Also check out Grok, it's surprisingly good, especially at organizing the results to get the job done for you. You can access it as part of X.
I have been using Perplexity 95% of the time this past year.
I'm happy to pay for ChatGPT, and I would happily pay for a clean Google search experience without ads. How much are they making per visitor a month?
Have you looked into Kagi? I'm very happy with it.
I'm surprised nobody (in the comments I scrolled by) has brought up the Web option (which only shows links) yet
Brave Search might be a good alternative.
It's index is created by people surfing with the Brave browser, so only websites used by real people are included.
Google has put it's early 2000s history where it can't find it! In the garbage most likely, but somewhere deeper.
This article woefully assumes OpenAI has had the same R&D that Google has had as a search business.
chatGPT offers bullshit answers faster, and more confidently. However the issue for openAI is the cost of business is _horrifically_ expensive.
Sure they charge some users for premium access, but they aren't currently enough to cover costs.
openAI needs a step change in performance, and that meta doesn't release an open-source version of it and a step change in compute efficiency.
Is it faster? I just did similar queries to the ones in the article and ChatGPT (web) spent a long time with an animated "Searching the web" placeholder, then served an error.
jeffbee - this is an unrelated comment because I need to follow up with you on a comment you made (https://news.ycombinator.com/item?id=41873615) a couple of months ago. Could you reach out via email to paul@linuxaudiosystems.com ? Thanks. I'll delete this comment in 24 hrs.
for searching the web, probably not. For the "whats the difference between tiffin and rocky road" its probably much quicker.
However if you want to get _more_ information, or assess if thats bollocks or not, its much slower and click-ier
I honestly don't mind the current Google Search Results page design and additional features.
For me, it's the web in general that isn't the same. Google Search is probably just as good, but I have an idea that the content to be indexed isn't what it was.
Has the world changed, or have I changed?
Probably both.
The WWW of content isn't the same, and i don't search for the same things anymore.
My use of search has become integrated in my patterns in the physical world. I need to know what products are in stock, where things are, when a restaurant is open and so on.
For those searches, Google is generally excellent I find.
Also, I actually find Google's AI Answers to be pretty decent.
Even if I do ethically disagree with denying the original content authors those page views.
Agree on the idea, however I think Perplexity is a better product than Open AI in search
What is the current status of SEO weasels trying to poison AI data by stuffing it with spam?
What do you think is the endgame of AI monetization other than marketing/spam in the AI output? Whether the SEO weasels get there before OpenAI themselves is largely irrelevant.
google's playbook from the 2000s was to run a search engine based on a novel ranking algorithm that actually worked with a ui designed by engineers who like things fast, simple and minimal.
the rest of the process described herein has been best described by cory doctorow as "enshitification."
i’ll buy this. seems reasonable.
Why OpenAI in particular?
I mean, Gemini is much better than the "traditional" google search as well.
Anything is better than 2024 google search actually.
I for one love the “wall of ai” this author decries. It is super helpful for most of the questions I ask. I don’t know why he doesn’t like it.
yeah Google Search is awful and there's still nothing better. What's new?
Still using Google I see? How primitive..
What do you use?
Kagi.