The conspiracy theorist in me wonders what was accidentally copied into the archive that powerful interests want removed and if this is all smoke and mirrors while they make that happen.
More details here about the data breach. Stolen database contains 31 million records.
https://www.bleepingcomputer.com/news/security/internet-arch...
> The data will soon be added to HIBP
My unique-to-archive.org email address is not there yet.
I just checked and my unique-to-archive.org email is showing up in the breach as of 2024-08-09.
Mine too.
How do they get a hold of all these leaks so fast?
Voluntary sharing, since afaik they don't pay the criminals to get the data. Someone else bought it and shared it either publicly, privately with HIBP, or privately with someone who then reported it to HIBP
How this specific instance unfolded, time will have to tell
Friendly reminder to generate a unique password for every account you create so database leaks like this one don't bother you.
https://www.reddit.com/r/DataHoarder/comments/h02jl4/lets_sa...
I found this reddit thread from /r/DataHoarder about backing up the internet archive particularly interesting, given the circumstances
50 PB * $0.014/GB = $0.7M. $0.014/GB is from[1], bare drive cost without chassis, power, or redundancy.
1: https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/
How long does an average hard drive last? You'd have to spend that 700k every that many years (plus the extra bits you mentioned). Quite an operation actually
If this is a backup, you don't need it to be powered up and available 24x7.
So the question becomes more like "how long does an average hard drive last while powered down and still reliably be able to power back up and be read?".
I'm fairly sure that is a lot longer than the single digit years that'd be the probably answer to your question.
I wonder if there are useful guidelines for long term storage of powered down hard drives? My gut feel is the major failure modes would be electrolytic capacitor failure, bearings sticking as the lubrication ages, and obseleting of the interfaces. I wonder how hard it'd be to find hardware that'd read my Mac SCSI hard drives from 25 years ago?
> How long does an average hard drive last?
This is a great question, and a state of the art kind of thing.
HDDs are sold with a lifetime drive read/write amount and power cycle warranty, along with usually some environmental operating envelope. read/write relates to the quality/space of the platter, power cycle is usually the actuator & read/write head being reseated/wearing out. Environment is the same as all other devices in a DC.
Most folks replace drives when they die (reads/writes stall or return garbage), or when the warranty runs out. Some will pay for a warranty exception, and some will just use the drive outside of warranty. Depending on how you use the drive, what environment it's in, etc changes how much you can push things.
I'd say anywhere from 4-8 years, depending on how it's used. In many cases it can be cheaper to have a worse environment for your fleet (thus using less power on hvac) and replace devices more frequently.
I actually find that fairly tame. For a point of comparison, Wikipedia gets ~$150M in revenue a year, an "asset rise" (I presume this is what non-profits call profit?) of ~$15M a year, and is sitting on about a quarter billion in the bank.
Not that they want to, but I think Wikipedia could fund this using their current donations if they wanted. Hell, I almost wonder if one of the big storage providers would do it for free if they could do it in their staging environment so they get real traffic. It would be less good than real backups, but extra copies are still extra copies even if they're unreliable.
It's been tried several times, but it's hard because it's such a massive quantity of data. The IPFS backup never really got off the ground.
They have their own backups which I think is good enough for now unless someone plans on donating a few hundred million.
Oh no! I didn't know their IPFS initiative didn't pan out. What happened to it? I am surprised how hard it is to google. I remember interviewing for a role on that team at the archive to help move it to filecoin. Was so happy to hear that the effort was underway to decentralize their datastore. We need this more than ever.
There are people still working on trying to make it happen but it's just a collosal amount of data so it's a very hard problem.
From my own personal experience unrelated to the Archive.org project, Filecoin/IPFS's UX isn't quite there yet. They still don't let you serve data to the network from a normal filesystem, you have to let their system ingest all of your stuff so you end up double-storing data or you have to give into everything being stored as inscrutable binary blobs.
Perhaps you can persuade Elon that it owns the libs?
I don't want Elon anywhere near Archive.org, please don't give him any ideas.
“According to their twitter, they’re doing it just to do it. Just because they can. No statement, no idea, no demands.”
A special place in Hell…
That's a strange thing to read on Hacker news. Isn't that description the definition of hack value? As in http://www.catb.org/jargon/html/H/hack-value.html
Now, it depends what the "it" is referring to here, but so far all I've heard is about an alert() message saying the usernames will be sent to a breach alerting site. If they're doing it just for the heck of it, it's still costing a lot of people a lot of time that they could have spent doing better things, but I'd reserve special places in hell for the people who do plan this out carefully and make malicious demands
There is a big difference between doing something for pure curiosity, love, or exploration and doing something directly harmful to other people for the same reasons. One is art; the other is sadism.
I'm not sure that placing free long distance calls isn't harmful to the org whose infrastructure you're using for your own benefit, but 2600 (Hz) is a respected hacker magazine and phreaking and Cap'n crunch whistles are seen as cool
Hacking the Internet Archive and only placing an alert with a provocative message, I could see my teenage self do that. My judgment of the character is going to depend on what it turns out they've actually done
Of course, my grown up self (or late teen also, as I've done responsible disclosures back then as well) would rather have seen them do a coordinated vulnerability disclosure, but alas, I just meant to remark upon the "special place in hell" for not having a plan or motive bit
*Edit:* wait, I just saw in the article (I opened the thread before the link was changed) that this quote refers to a DDoS, not the alert() message that the thread was initially about
> the site was experiencing a DDoS attack, posting on Mastodon that “According to their twitter, they’re doing it just to do it.
That's indeed just destructive and not related to (hacker) curiosity...
I get your point and your edit. I think most people reaction is less because of the destruction itself and more because The Internet Archive is being targeted. It is a place that most would say are representing the hacker values, and few such places exist on current internet landscape.
There are so many other possible targets that would get even positive reactions from people. The only kind of people that might be happy about TIA being down is maybe some big corporations that want to control and sell the information being freely preserved there.
> I'm not sure that placing free long distance calls isn't harmful to the org whose infrastructure you're using for your own benefit,
If there's a call you wouldn't make unless it was free, the infrastructure isn't at capacity, and you're not acting otherwise in a detrimental fashion to other users of the infrastructure-- there's no harm to that organization.
Certainly a fair point, but it also costs a lot of person-hours to patch up that infrastructure's security and trace who's placing the calls when one could just choose not to do this fraud in the first place. I am not old enough to know whether carriers also charged each other back then, but at least nowadays it could also incur charges for the originating party; costs which the caller isn't covering
Toying with the system, learning how it works and finding what you can make it do, there's a certain art to it and I'd encourage anyone to at least tinker with the systems they own (and everything else within reason and ethics), but there's two sides to nearly everything
We have lost the ability to meaningfully compare the magnitude of things.
It's a special feeling when someone seems to lose faith in humanity based on something I wrote in good faith
There's a spectrum and case by case judgement. I'd agree your examples are harmless even if technically they harm the phone company. Taking down the internet archive just for the hell of it has a distinctly less "cool" or "fun" flavor, to my eye.
Doing the internet equivalent of burning the largest library in the world is not exactly a good person's behavior.
This isn't the equivalent of burning it, a closer equivalent would be barricading it for a while.
Still awful, but nowhere near as awful as the former.
Is it better to deface a website for ransom or to support a scam than it is to deface a website because you're bored?
The action is reprehensible either way, but if this is truly just an old-fashioned Anonymous attack with no ulterior motive beyond just being bad that's honestly kind of refreshing.
It isn't "breaking into things" hackers.
It's "whipping something together" hackers.
Breaking into the Internet Archive's servers is like breaking into your public library. There's no honor to be had.
https://www.ccc.de/en/hackerethik
> Make public data available, protect private data.
For all I know, they've given the private data to an organisation dedicated to alerting people about breaches. If they fear that the data may also have been accessed by others, that's not a reprehensible thing to do by itself. Besides the DDoS apparently being from the same author (which seems odd because those ethics are incongruous), I don't know what else they've done so it doesn't seem in violation of what you linked
This isn't Cracker News.
>No statement, no idea, no demands. A special place in Hell…
I mean... would it be better if the hackers had asked for money or did it to protest global warming or something?
[delayed]
"Say what you will about the tenets of National Socialism, but at last it's an ethos."
>No statement, no idea, no demands
Sure.
They're probably doing this because it's filled with evidence of war crimes to be used as evidence in the ICJ/ICC cases against Israel. Luckily most of the evidence gathering projects have backups of backups.
Is there any reason to think this? (Honestly asking). It seems like quite a stretch to me unless there is some reason to connect the two.
100%. https://x.com/Sn_darkmeta/status/1844080692772401399?t=j3xDz...
This Twitter account is suspicious and odd. I don't think anyone doing this is stupid enough to actually believe that they're doing it to "help Palestine." Seems like a job by Israel or supporting countries pretending to be supporters of Palestine.
What is the connection? I don’t understand how this would help either Isreal or Palestine?
We have no idea, that’s just what they said
We have an entire generation of activists who have somehow been programmed into believing that disruptive, moronic, antisocial acts of “protest” are a way to effect change, whether it’s vandalizing historic artwork or blockading a freeway. And the Internet Archive is even a museum of sorts, so you can see how the rationale would track.
Are you suggesting something similar along the lines of murdering your own citizens and showcasing them as victims? Something akin to 911 being an insider job?
Reporting on security issues is always so terrible. Is it a data breach or is it a DDoS? (Or both). Those are opposite things. One is trying to release secret information one is trying to make the site inaccessible.
It is both. They got attacked by a DDOS after the security breach.
That's like complaining the reporting on the weather forecast channel is so often wrong. This news broke about an hour ago and the IA is down, what witchcraft do you expect news media to practice! Nobody yet has the answers you're looking for, give it some time and log files will be audited and the reporting becomes useful :)
Actually figure out what is happening, or at least say how confident they are in what they know.
They aren't predicting the future, they are reporting on an ongoing event.
> or at least say how confident they are in what they know
This I can very much underwrite. Error bars or rough confidence indicators are missing far too often, also from sites reporting on e.g. benchmark values of hardware they've been testing... such professional organisations yet such basic omissions
Should we be linking to the site that is very likely to be breached? Could start to host any type of malware until the access can be definitively revoked
This - dang/mods is there a policy for this?
Verge article as possible replacement: https://www.theverge.com/2024/10/9/24266419/internet-archive...
Just noticed the site now alerts this:
> Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach? It just happened. See 31 million of you on HIBP!
Jokes on them... I'm already on HIBP countless of times...
It's all good, as long as you're not in that recent AI Girlfriend breach which exposed a ton of users who were trying to coax it into generating CSAM images.
“I went to the site to jerk off (to an adult scenario, to be clear) and noticed that it looked like it [the Muah.ai website] was put together pretty poorly,” the hacker told 404 Media. “It's basically a handful of open-source projects duct-taped together. I started poking around and found some vulnerabilities relatively quickly. At the start it was mostly just curiosity but I decided to contact you once I saw what was in the database.”
What a nice guy.
I assume that if this is a bad actor, then account email/name will be leaked?
Is it a genuine alert, or hacking artifact?
Sometimes with friendly / attempt-at-humorous error messages it’s difficult to tell
I feel like it's safe to assume the official Internet Archive would not write a "friendly"/attempt-at-humurous/unprofessional/confusing/delivered-by-popup message advertising a devastating security breach. Oh also while announcing that nowhere else.
Obv an attackers ability to insert a message does imply a breach beyond a DoS. But I am pretty confident that message was not from the IA.
It's a literal window.alert()
But was that code placed there by IA or by the malicious party?
Verge reports someone has taken credit for an ongoing DDOS against IA. "An account on X called SN_Blackmeta said it was behind the attack and implied that another attack was planned for tomorrow" https://www.theverge.com/2024/10/9/24266419/internet-archive...
Ok, let's switch to that link. Thanks!
Submitted URL was https://archive.org/.
Sounds snarky to me. I'll bet it was the malicious party.
A few minutes ago (22:48 UTC), I got three emails from HIBP about accounts of mine breached on the Internet Archive. Troy is quick! And I'm surprised the author of that alert() actually had the data as well as followed through
Bit of a shame the emails contain an ad for a password manager, saying there's two easy steps to become more secure: Step 1: use our password manager (fair enough), "Step 2: Enable 2 factor authentication and store the codes inside your [password manager]" ehh now it's back to 1 factor or am I missing something?
Edit: according to https://www.bleepingcomputer.com/news/security/internet-arch... (via https://news.ycombinator.com/item?id=41793669), Troy Hunt / HIBP already received and verified this "three days ago" as of yesterday 6pm AoE
I was going to disagree with you (and I sort of do about password managers and storing 2FA in them, but I also unlock my password manager with a yubikey).
But, doesn't a DB compromise mean that the attacker would have the TOTP seed as well? It can only increase your account security elsewhere, but also not re-using password prevents the IA leak from hurting you elsewhere as well?
A pulled an old friends website down from Internet Archive.
He's moved on the next stage, but I was glad I was able to put his site back up.
It'll be a shame if IA goes down permanently, but we need a decentralized solution anyway.
Having a single mega organization in charge of our collective heritage isn't a good idea.
I have always thought about this. It would be interesting to have users actually store small amounts of redundant info on a device connected to the internet. Very similarly to what a torrent does but with more peers (more data shards than full copies) and less seeds. And try and keep a huge database for everyone. Obviously open source and it would end up something like tor where they just assist the network with security patches but they don’t actually have any real “control” (admin dashboard control) over the network at large. We already do something smaller but like that with website static file caching, but at much smaller scale. Obviously security implications of this would be very hard but maybe not impossible to overcome. ipfs comes close but it again does more seeds then peers.
if anyone knows something like what I'm suggesting, I'd love to hear about it!
IIRC there were a few storage based projects that popped up using alt coins to encourage people to offer excess storage space for other randos on there internet. The possibility you might be storing illegal content might have been what killed it/them.
https://en.wikipedia.org/wiki/Cooperative_storage_cloud gives a few examples, like Filecoin.
It's called torrent protocol and it doesn't work, no one wants to spend money and bandwidth hosting a god forsaken movie or book that only a handful of people care about.
Not much money and bandwidth if you aren't on a metered connection. You can share tens of gigabytes or more on a cheap read only flash plugged into into a $25 single board computer that draws way less than a full PC and can be left sitting there near the router. Just limit its bandwidth on the torrent client and you won't even notice it during online gaming. The client can be as small as the Transmission daemon running headless on one of the many Debian based embedded distros: all control through either the web or from its client: no monitor, mouse, keyboard etc. just a small cheap box.
https://www.friendlyelec.com/index.php?route=product/product...
(just an example, as it's way overkill for the task)
It does work, when you don't notice it. We need sane limits and permanent seeders. This is why so many regular people get hit with ISP notices, they don't know they've seeded Captain America for the last six months every time they started their PC.
Yup. If browsers built in support for magnet links and (on desktop) defaulted to seeding with some capped bandwidth then a lot of centralized hosting platforms would become unnecessary.
This thread is looking like it'll be one of the first places this incident will be documented (seems to be on the top of Google).
Already there are two new users just for this.
i see more than 2
Yeah, I was looking around, but saw no mention of it anywhere until I realized it just happened.
It looks like someone has compromised one of their subdomains for Polyfill
Update: Subdomain seems to be returning normal responses again now.
You mean the IA included some JS polyfill from a subdomain and that's what's compromised / where the alert is coming from?
yes, "https://polyfill.archive.org/v3/polyfill.min.js?features=fet..." is the URL with the malicious code
It looks like it is running the service that was part of the supply chain attacker earlier this year. https://github.com/polyfillpolyfill/polyfill-service/issues/...
The service was fine, it was the "official" hosted instance of the service which was compromised. IA appears to be running their own instance.
That was a DNS hack of polyfill.io though right? This looks like it was/is self hosted.
Yeah I'm getting this exact response from the above URL now:
https://sourcegraph.com/github.com/polyfillpolyfill/polyfill...
Seems like they self hosted that service
Correct. The source subdomain of the popup seems to be hxxps[:]//polyfill[.]archive[.]org
That would perhaps explain how they managed to inject the JS alert popup, right?
Why go for the Internet Archive go for something else not the fucking archive!
We all need our easily accessible decentralized archive of some sort...
They're hiring, if you're looking for a job.
> Software Engineer, Archiving & Data Services (Remote) [...] Preliminary duties of the role will primarily focus on developing Archive-It
That is. Paying over 100k at the lower end of the range for 3y experience as software engineer
It's a non profit. You're probably not choosing to work for the IA for high compensation.
The undertone was intended to be: that's an insane amount of money, something one with quadruple that amount of experience would maybe earn in a for-profit organisation, but I guess your reaction further proves it's different where you're from
The way you worded it was confusing to read, I thought it was a complaint about "only 100k".
Thanks for clarifying your intent.
It's not high for bay area software jobs; there are new grads who were paid more than that 10 years ago and I assume new grad wages have gone up since. Of course cost of living (particularly rent) and taxes are high there too, but if you don't blow it all on renting a higher-end place or luxuries you can still save a lot.
For context someone making less than $105k is classified as "low income" in San Francisco. https://www.sfgate.com/local/article/under-100k-low-income-s...
Not even in the 10th % for the area per https://www.levels.fyi/heatmap/
https://blog.archive.org/2021/02/04/thank-you-ubuntu-and-lin... "The Internet Archive is wholly dependent on Ubuntu and the Linux communities that create a reliable, free (as in beer), free (as in speech), rapidly evolving operating system. It is hard to overestimate how important that is to creating services such as the Internet Archive." Maybe CUPS?
Let's hope it was someone dumb enough to be extraditable.
No one gets extradited when the attack aligns with US interests abroad.
What weird conspiracy is this? US interests dont involve taking down archive.org
There is no US, there are just a bunch of interest groups. Some interest group definitely wants IA down. I wouldn't be surprised this is a paid attack.
People in other parts of the thread say it's Israel. (Which certainly is "aligned with US interests abroad", as the powerful see it anyway). I think it is ridiculous conspiracism, right now anything anyone doesn't like they think Israel is behind it.
The crazy rise of conspiracism in our society in general, combined with Israel really is doing some nasty stuff (but not controlling everything you don't like), combined with the latent antisemitism in most conspiracism.
And I say this as a strong supporter of and activist on Palestinian rights and liberation. Free Palestine. (But there is no reasonable reason to think Israel is behind an IA hack. Or the fact that your mail came late, or anything else except what they're actually doing which is bad enough. Call your senators and tell them to vote for Bernie's JRD resolutions).
There are so many well documented awful things IL has done that most people don't know about (many still haven't even heard of the Sde Teiman video) that folks could be spreading the word about instead. It's a shame to see this kind of conspiracy mindset from at least some people who probably mean well. There is no harm in waiting a little bit for facts to emerge.
Just for completeness sake and my own opinion based on my own witnessing of history, every political party of every government of every country would love to see all the archives gone. It's easier to twist the truth if one can memory hole reports and make the original source go offline or pressure them to change their words. There will always be individuals that archive stories they find interesting, but many stories are uninteresting until people learn what more may have been left out at a later time as part of a much bigger story. That is when the archives become a treasure trove and big archives sites are the first that people turn to for the original reporting. As a generic example, many news sites will redact what they knew to be false after the vast majority saw their misinformation but they can't redact an archive of their twisted truth. The internet has made it a little harder to control a narrative. It was so much easier to control when it was just a few big newspaper publishers that owned the smaller ones and a few big cable companies that owned most of the smaller ones. They would all literally parrot the same lines.
Curious to see if they go after archive.is next.
Wouldn't be surprised if the service was purchased by some publishing empires. This kind of things usually costs some $$$.
Hachette Book Group or Hack-it Boot Group?
Any information on SN_Blackmeta?
Archive.org is now down. Could anyone explain what it used to show?
A pop-up that said,
"Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach? It just happened. See 31 million of you on HIBP!"
I had to look it up, but I guess HIBP refers to https://haveibeenpwned.com/
Yes. Not the hacker but as a hacker, that's what hibp refers to
I just got a Discord "breaking news" notification about this from a server I am, said it may not show on Have I Been Pwned as it is so new.
Related submission: https://news.ycombinator.com/item?id=41792614
What are they looking for here? Negative karma?
Probably want it wants to purge incriminating documents against a nation state?
I hope it will be back again soon
They reported a DDOS attack yesterday, wonder if this is their alert as they manage the fallout?
Bet it’s just a stored XSS alert from a poisoned cache.
Truly unnecessary
Strange I just received this message when going to the archive.org website I thought I might have misspelled the url
They have a Telegram channel and there's some blurb about it being pushback on US support of Israel, but it reads as bullshit. Probably a script kiddie.
"You are all cooked" vibes from that message hahaha
After this error 504 Gateway Time-out Now 503 Service Unavailable No server is available to handle this request. Not looking good
This is why humanity can't have nice things.
I don't know who this is but a lot of people are linking them: https://x.com/Sn_darkmeta/status/1844080692772401399
DDoSed Archive because "the archive belongs to the USA, and as we all know, this horrendous and hypocritical government supports the genocide that is being carried out by the terrorist state of “Israel”."
Now it shows a 'Temporarily Offline' message
Security breach, we intended to make this guy homeless so when we stole his ex girlfriend she wouldn’t get jealous. Quickly destroy his career and reputation!!
Damn I get the notice too
I saw it too
They seem to roll out the we're being DDOS'd every time there's some other thing happening.