Comments Page - Compiler Explorer and the promise of URLs that last forever

« Back Compiler Explorer and the promise of URLs that last foreverxania.orgSubmitted by anarazel 18 days ago

kccqzy 18 days ago
Before 2010 I had this unquestioned assumption that links are supposed to last forever. I used the bookmark feature of my browser extensively. Some time afterwards, I discovered that a large fraction of my bookmarks were essentially unusable due to linkrot. My modus operandi after that was to print the webpage as a PDF. A bit afterwards when reader views became popular reliable, I just copy-pasted the content from the reader view into an RTF file.
- lappa 18 days ago
  I use the SingleFile extension to archive every page I visit.
  It's easy to set up, but be warned, it takes up a lot of disk space.
  $ du -h ~/archive/webpages 1.1T /home/andrew/archive/webpages
  https://github.com/gildas-lormeau/SingleFile
  internetter 18 days ago
  storage is cheap, but if you wanted to improve this:
  1. find a way to dedup media
  2. ensure content blockers are doing well
  3. for news articles, put it through readability and store the markdown instead. if you wanted to be really fancy, instead you could attempt to programatically create a "template" of sites you've visited with multiple endpoints so the style is retained but you're not storing the content. alternatively a good compression algo could do this, if you had your directory like /home/andrew/archive/boehs.org.tar.gz and inside of the tar all the boehs.org pages you visited are saved
  4. add fts and embeddings over the pages
  ashirviskas 18 days ago
  1 and partly 3 - I use btrfs with compression and deduping for games and other stuff. Works really well and is "invisible" to you.
  bombela 17 days ago
  dedup on btrfs requires to setup a cronjob. And you need to pick one of the dedup too. It's not completely invisible in my mind bwcause of this ;)
  windward 17 days ago
  >storage is cheap
  It is. 1.1TB is both:
  - objectively an incredibly huge amount of information
  - something that can be stored for the cost of less than a day of this industry's work
  Half my reluctance to store big files is just an irrational fear of the effort of managing it.
  IanCal 17 days ago
  > - something that can be stored for the cost of less than a day of this industry's work
  Far, far less even. You can grab a 1TB external SSD from a good name for less than a days work at minimum wage in the UK.
  I keep getting surprised at just how cheap large storage is every time I need to update stuff.
  davidcollantes 18 days ago
  How do you manage those? Do you have a way to search them, or a specific way to catalogue them, which will make it easy to find exactly what you need from them?
  nirav72 17 days ago
  KaraKeep is a decent self hostable app that has support for receiving singlefile pages via singlefile browser extension and pointing to karakeep API. This allows me to search for archived pages. (Plus auto summarization and tagging via LLM).
  dotancohen 17 days ago
  Very naive question, surely. What does KaraKeep provide that grep doesn't?
  nirav72 17 days ago
  jokes aside. It has a mobile app
  dotancohen 16 days ago
  I don't get it aside. How does that help him search files on his local file system? Or is he syncing an index of his entire web history to his mobile device?
  nirav72 16 days ago
  GP is using SingleFile browser extension. Which allows him to download the entire page as a single .html file. But SingleFile also allows sending that page to Karakeep directly instead of downloading it to his local file system. (if he's hosting karakeep on a NAS on his network). He can then use the mobile app or Karakeep web UI to search and view that archived page. Karakeep does the indexing. (Including auto-tagging via LLM)
  dotancohen 15 days ago
  I see now, thank you.
  snthpy 17 days ago
  Thanks. I didn't know about this and it looks great.
  A couple of questions:
  - do you store them compressed or plain?
  - what about private info like bank accounts or health issuance?
  I guess for privacy one could train oneself to use private browsing mode.
  Regarding compression, for thousands of files don't all those self-extraction headers add up? Wouldn't there be space savings by having a global compression dictionary and only storing the encoded data?
  d4mi3n 17 days ago
  > do you store them compressed or plain?
  Can’t speak to your other issues but I would think the right file system will save you here. Hopefully someone with more insight can provide color here, but my understanding is that file systems like ZFS were specifically built for use cases like this where you have a large set of data you want to store in a space efficient manner. Rather than a compression dictionary, I believe tech like ZFS simply looks at bytes on disk and compresses those.
  genewitch 17 days ago
  By default, singlefile only saves when you tell it to, so there's no worry about leaking personal information.
  I haven't put the effort in to make a "bookmark server" that will accomplish what singlefile does but on the internet because of how well singlefile works.
  shwouchk 17 days ago
  i was considering a similar setup, but i don’t really trust extensions. Im curious;
  - Do you also archive logged in pages, infinite scrollers, banking sites, fb etc? - How many entries is that? - How often do you go back to the archive? is stuff easy to find? - do you have any organization or additional process (eg bookmarks)?
  did you try integrating it with llms/rag etc yet?
  eddd-ddde 17 days ago
  You can just fork it, audit the code, add your own changes, and self host / publish.
  shwouchk 12 days ago
  yes, you right. im not helpless and all the new ai tools make this even easier.
  nyarlathotep_ 17 days ago
  Are you automating this in some fashion? Is there another extension you've authored or similar to invoke SingleFile functionality on a new page load or similar?
  dataflow 17 days ago
  Have you tried MHTML?
  RiverCrochet 17 days ago
  SingleFile is way more convenient as it saves to a standard HTML file. The only thing I know that easily reads MHTML/.mht files is Internet Explorer.
  dataflow 17 days ago
  Chrome and Edge read them just fine? The format is actually the same as .eml AFAIK.
  RiverCrochet 17 days ago
  I remember having issues but it could be because the .mht's I had were so old I think I used Internet Explorer's Save As... function to generate them.
  dataflow 17 days ago
  I've had such issues with them in the past too, yeah. I never figured out the root cause. But in recent times I haven't had issues, for whatever that's worth. (I also haven't really tried to open many of the old files either.)
  90s_dev 18 days ago
  You must have several TB of the internet on disk by now...
- flexagoon 18 days ago
  By the way, if you install the official Web Archive browser extension, you can configure it to automatically archive every page you visit
  petethomas 18 days ago
  This a good suggestion with the caveat that entire domains can and do disappear: https://help.archive.org/help/how-do-i-request-to-remove-som...
  Akronymus 17 days ago
  That's especially annoying when a formerly useful site gets abandoned, a new owner picks up the domain, then gets IA to delete the old archives as well.
  Or even worse, when a domain parking company does that: https://archive.org/post/423432/domainsponsorcom-erasing-pri...
  internetter 18 days ago
  recently I've come to believe even IA and especially archive.is are ephermal. I've watched sites I've saved disappear without a trace, except in my selfhosted archives.
  A technological conundrum, however, is the fact that I have no way to prove that my archive is an accurate representation of a site at a point in time. Hmmm, or maybe I do? Maybe something funky with cert chains could be done.
  akoboldfrying 18 days ago
  There are timestamping services out there, some of which may be free. It should (I think) be possible to basically submit the target site's URL to the timestamping service, and get back a certificate saying "I, Timestamps-R-US, assert that the contents of https://targetsite.com/foo/bar downloaded at 12:34pm on 29/5/2025 hashes to abc12345 with SHA-1", signed with their private key and verifiable (by anyone) with their public key. Then you download the same URL, and check that the hashes match.
  IIUC the timestamping service needs to independently download the contents itself in order to hash it, so if you need to be logged in to see the content there might be complications, and if there's a lot of content they'll probably want to charge you.
  XorNot 18 days ago
  Websites don't really produce consistent content even from identical requests though.
  But you also don't need to do this: all you need is a service which will attest that it saw a particular hashsum at a particular time. It's up to other mechanisms to prove what that means.
  account42 13 days ago
  > But you also don't need to do this: all you need is a service which will attest that it saw a particular hashsum at a particular time. It's up to other mechanisms to prove what that means.
  "That URL served a particular hash at a particular time" or "someone submitted a particular hash at a particular time" provide very different guarantees and the latter will be insufficient to prove your archive is correct.
  akoboldfrying 17 days ago
  > Websites don't really produce consistent content even from identical requests though.
  Often true in practice unfortunately, but to the extent that it is true, any approach that tries to use hashes to prove things to a third party is sunk. (We could imagine a timestamping service that allows some kind of post-download "normalisation" step to strip out content that varies between queries and then hash the results of that, but that doesn't seem practical to offer as a free service.)
  > all you need is a service which will attest that it saw a particular hashsum at a particular time
  Isn't that what I'm proposing?
  shwouchk 17 days ago
  sign it with gpg and upload the sig to bitcoin
  edit: sorry, that would only prove when it was taken, not that it wasn’t fabricated.
  fragmede 17 days ago
  hash the contents
  shwouchk 17 days ago
  signing it is effectively the same thing. question is how to prove that what you hashed is what was there?
  chii 17 days ago
  you can't, because unless you're not the only one with a copy, your hash cannot be verified (since both hash and claim comes from you).
  One way to make this work is to have a mechanism like bitcoin (proof of work), where the proof of work is put into the webpage itself as a hash (made by the original author of that page). Then anyone can verify that the contents wasn't changed, and if someone wants to make changes to it and claim otherwise, they'd have to put in even more proof of work to do it (so not impossible, but costly).
  notpushkin 17 days ago
  I think there was a way to preserve TLS handshake information in a way that something something you can verify you got the exact response from the particular server? I can’t look it up now though, but I think there was a Firefox add-on, even.
  account42 13 days ago
  I don't think how this can work. While the handshake uses asymmetric crypto, that step then gives you a symmetric key that will be used for the actual content. You need that key to decrypt the content but if you have it you can also use it to encrypt your own content and substitute it in the encrypted stream.
  fragmede 17 days ago
  what if instead of the proof of work being in the page as a hash, that the distributed proof of work is that some subset of nodes download a particular bit of html or json from a particular URI, and then each node hashes that, saves the contents and the hash to a blockchain-esque distributed database. Subject to 51% attack, same as any other chain, but still.
  vitorsr 18 days ago
  > you can configure it to automatically archive every page you visit
  What?? I am a heavy user of the Internet Archive services, not just the Wayback Machine, including official and "unofficial" clients and endpoints, and I had absolutely no idea the extension could do this.
  To bulk archive I would manually do it via the web interface or batch automate it. The limitations of manually doing it one by one are obvious, and the limitations of doing it in batches requires, well, keeping batches (lists).
- 90s_dev 18 days ago
  My solution has been to just remember the important stuff, or at least where to find it. I'm not dead yet so I guess it works.
  TeMPOraL 18 days ago
  It was my solution too, and I liked it, but over the past decade or so, I noticed that even when I remember where to find some stuff, hell, even if I just remember how to find it, when I actually try and find it, it often isn't there anymore. "Search rot" is just as big a problem as link rot.
  As for being still alive, by that measure hardly anything anyone does is important in the modern world. It's pretty hard to fail at thinking or remembering so badly that it becomes a life-or-death thing.
  90s_dev 17 days ago
  > hardly anything anyone does is important
  Agreed.
  mock-possum 17 days ago
  I’ve found that whenever I think “why don’t other people just do X” it’s because I’m misunderstanding what’s involved in X for them, and that generally if they could ‘just’ do X then they would.
  “Why don’t you just” is a red flag now for me.
  90s_dev 17 days ago
  Not always. I love it when people offer me a much simpler solution to a problem I overengineered, so I can throw away my solution and use the simpler one.
  Half the time people are suggested a better way, it's because they're actually doing it wrong, they've gotten the solution's requirements all wrong in the first place, and this perspective helps.
  chii 17 days ago
  this applies to basically any suggested solution to any problem.
  "Why don't you just ..." is just lazy idea suggestion from armchair internet warriors.
- mycall 17 days ago
  Is there some browser extension that automatically goes to web.archive.org if the link timesout?
  theblazehen 17 days ago
  I use the Resurrect Pages addon
- account42 13 days ago
  I really is a travesty that Browsers still haven't updated their bookmark feature based on this realization - all bookmarks should store not only the link but a full copy of the rendered page (not just the source which could rely on dynamic content that will no longer be available).
  Also, open tabs should work the same way: I never want to see a network error while going back to a tab while not having an internet connection because the browser has helpfully evicted that tab from memory. It should just reload the state from disk instead of the network in this case until I manually refresh the page.
- macawfish 18 days ago
  Use WARC: https://en.wikipedia.org/wiki/WARC_(file_format) with WebRecorder: https://webrecorder.net/
  shwouchk 17 days ago
  warc is not a panacea; for example, gemini makes it super annoying to get a transcript of your conversation, so i started saving those as pdf and warc.
  turns out that unlike most webpages, the pdf version is only a single page of what is visible on screen.
  turns out also that opening the warc immediately triggers a js redirect that is planted in the page. i can still extract the text manually - it’s embedded there - but i cannot “just open” the warc in my browser and expect an offline “archive” version - im interacting with a live webpage! this sucks from all sides - usability, privacy, security.
  Admittedly, i don’t use webrecorder - does it solve this problem? did you verify?
  weinzierl 17 days ago
  Not sure if you tried that. Chrome has a take full page screenshot command. Just open the command bar in dev tools and search for "full" and you will fund it. Firefox has it right in the context menu, no need for dev tools.
  Unfortunately there are sites where it does not work.
  eMPee584 17 days ago
  Apart from small UX nits, FF's screenshot feature is great - it's just that storing a 2-15MiB bitmap copy of a text medium still feels dirty to me every time.. would much prefer a PDF export, page size matching the scroll port, with embedded fonts and vectors and without print CSS..
- andai 18 days ago
  Is there some kind of thing that turns a web page into a text file? I know you can do it with beautiful soup (or like 4 lines of python stdlib), but I usually need it on my phone, where I don't know a good option.
  My phone browser has a "reader view" popup but it only appears sometimes, and usually not on pages that need it!
  Edit: Just installed w3m in Termux... the things we can do nowadays!
  XorNot 18 days ago
  You want Zotero.
  It's for bibliographies, but it also archives and stores web pages locally with a browser integration.
  _huayra_ 17 days ago
  I frankly don't know how I'd collect any useful info without it.
  I'm sure there are bookmark services that also allow notes, but the tagging, linking related things, etc, all in the app is awesome, plus the ability to export bib tex for writing a paper!
- m-p-3 13 days ago
  I export text-based content I want to retain into Markdown files, and when I find something useful for work I also send the URL to the Wayback Machine.
- nonethewiser 18 days ago
  A reference is a bet on continuity.
  At a fundamental level, broken website links and dangling pointers in C are the same.
- jwe 16 days ago
  I can recommend to use Pinboard with the archive option
- taeric 18 days ago
  That assumption isn't true of any sources? Things flat out change. Some literally, others more in meaning. Some because they are corrected, but there are other reasons.
  Not that I don't think there is some benefit in what you are attempting, of course. A similar thing I still wish I could do is to "archive" someone's phone number from my contact list. Be it a number that used to be ours, or family/friends that have passed.
- rubit_xxx16 18 days ago
  > Before 2010 I had this unquestioned assumption that links are supposed to last forever
  Any site/company whatsoever of this world (and most) that promises that anything will last forever is seriously deluded or intentionally lying, unless their theory of time is different than that of the majority.
mananaysiempre 18 days ago
May be worth cooperating with ArchiveTeam’s project[1] on Goo.gl?
> url shortening was a fucking awful idea[2]
[1] https://wiki.archiveteam.org/index.php/Goo.gl
[2] https://wiki.archiveteam.org/index.php/URLTeam
- MallocVoidstar 18 days ago
  IIRC ArchiveTeam were bruteforcing Goo.gl short URLs, not going through 'known' links, so I'd assume they have many/all of Compiler Explorer's URLs. (So, good idea to contact them)
- tech234a 17 days ago
  Real-time status for that project indicates 7.5 billion goo.gl URLs found out of 42 billion goo.gl URLs scanned: https://tracker.archiveteam.org:1338/status
- mattgodbolt 16 days ago
  Thanks! Someone posted on GitHub about that and I'll be looking at that tomorrow!
s17n 18 days ago
URLs lasting forever was a beautiful dream but in reality, it seems that 99% of URLs don't in fact last forever. Rather than endlessly fighting a losing battle, maybe we should build the technology around the assumption that infrastructure isn't permanent?
- nonethewiser 18 days ago
  >maybe we should build the technology around the assumption that infrastructure isn't permanent?
  Yes. Also not using a url shortener as infrastructure.
- dreamcompiler 17 days ago
  URNs were supposed to solve that problem by separating the identity of the thing from the location of the thing.
  But they never became popular and then link shorteners reimplemented the idea, badly.
  https://en.m.wikipedia.org/wiki/Uniform_Resource_Name
- hoppp 18 days ago
  Yes.
  domain names often exchange hands and a URL that is supposed to last forever can turn into malicious phishing link over time.
  emaro 18 days ago
  In theory a content-addressed system like IPFS would be the best: if someone online still has a copy, you can get it too.
  mananaysiempre 17 days ago
  It feels as though, much like cryptography in general reduces almost all confidentiality-adjacent problems to key distribution (which is damn near unsolvable in large uncoordinated deployments like Web PKI or PGP), content-addressable storage reduces almost all data-persistence-adjacent problems to maintenance of mutable name-to-hash mappings (which is damn near unsolvable in large uncoordinated deployments like BitTorrent, Git, or IP[FN]S).
  dreamcompiler 17 days ago
  DNS seems to solve the problem of a decentralized loosely-coordinated mapping service pretty well.
  emaro 17 days ago
  True, but then you're back on square one. Because it's not guaranteed that using a (DNS) name will point to the same content forever.
  hoppp 17 days ago
  But then all content should be static and never update?
  If you serve an SPA via IPFS, the SPA still needs to fetch the data from an endpoint which could go down or change
  Even if you put everything on a blockchain, an RPC endpoint to read the data must have a URL
  mananaysiempre 17 days ago
  > But then all content should be static and never update?
  And thus we arrive at the root of the conflict. Many users (that care about this kind of thing) want to publications that they’ve seen to stay where they’ve seen them; many publishers have become accustomed to being able to memory-hole things (sometimes for very real safety reasons; often for marketing ones). That on top of all the usual problems of maintaining a space of human-readable names.
  emaro 15 days ago
  No, not all content should never change. This is just the core of the dilemma: dynamic content (and identifiers) rots faster that static content (content addressed). We can have both, but not at the same time.
  immibis 17 days ago
  Note that IPFS is now on the EU Piracy Watchlist which may be a precursor to making it illegal.
  emaro 15 days ago
  Didn't know that, interesting. Although maybe it's not that surprising...
- jjmarr 17 days ago
  URL identify the location of a resource on a network, not the resource itself, and so are not required to be permanent or unique. That's why they're called "uniform resource locators".
  This problem was recognized in 1997 and is why the Digital Object Identifier was invented.
creatonez 18 days ago
There's something poetic about abusing a link shortener as a database and then later having to retrieve all your precious links from random corners of the internet because you've lost the original reference.
- rs186 18 days ago
  Shortening long URLs is the intended use case for a ... URL shortener.
  The real abusers are the people who use a shortener to hide scam/spam/illegal websites behind a common domain and post it everywhere.
  creatonez 18 days ago
  These are not just "long URLs". These are URLs where the entire content is stored in the fragment suffix of the URL. They are blobs, and always have been.
- nonethewiser 18 days ago
  Didnt they just use the link shortener to compress the url? They used their url as the "database" (ie holding the compiler state).
  Arcuru 18 days ago
  They didn't store anything themselves since they encoded the full state in the urls that were given out. So the link shortener was the only place where the "database", the urls, were being stored.
  nonethewiser 18 days ago
  Yeah but the purpose of the url shortener was not to store the data, it was to shorten the url. The fact that the data was persisted on google's sever somewhere is incidental.
  In other words, every shortened url is "using the url shortener as a database" in that sense. Taking a url with a long query parameter and using a url shortener to shorten it does not constitute "abusing a link shortener as a database."
  cortesoft 18 days ago
  Except in this case the url IS the data, so storing the url is the same as storing the data.
  nonethewiser 18 days ago
  Its incidental. The state is in the url which is only shortened because its so long. Google’s url shortener is not needed to store the data.
  It’s simply a normal use-case for a url shortener. A long url, usually because of some very large query parameter, which gets mapped to a short one.
amiga386 18 days ago
https://killedbygoogle.com/
> Google Go Links (2010–2021)
> Killed about 4 years ago, (also known as Google Short Links) was a URL shortening service. It also supported custom domain for customers of Google Workspace (formerly G Suite (formerly Google Apps)). It was about 11 years old.
- zerocrates 18 days ago
  "Killing" the service in the sense of minting new ones is no big deal and hardly merits mention.
  Killing the existing ones is much more of a jerk move. Particularly so since Google is still keeping it around in some form for internal use by their own apps.
  ruune 17 days ago
  Don't they use https://g.co now? Or are there still new internal goo.gl links created?
  Edit: Google is using a g.co link on the "Your device is booting another OS" screen that appears when booting up my Pixel running GrapheneOS. Will be awkward when they kill that service and the hard coded link in the phones bios is just dead
  zerocrates 17 days ago
  Google Maps creates "maps.app.goo.gl" links; I don't know if there are others, they called Maps out specifically in their message.
  Possibly those other ones are just using the domain name and the underlying service is totally different, not sure.
layer8 18 days ago
I find it somewhat surprising that it’s worth the effort for Google to shut down the read-only version. Unless they fear some legal risks of leaving redirects to private links online.
- actuallyalys 18 days ago
  Hard to say from the outside, but it’s possible the service relies on some outdated or insecure library, runtime, service, etc. they want to stop running. Although frankly it seems just as possible it’s a trivial expense and they’re cutting it because it’s still a net expense, goodwill and past promises be dammed.
  Scaevolus 18 days ago
  Typically services like these are side projects of just a few Google employees, and when the last one leaves they are shut down.
  mbac32768 17 days ago
  yeah but nobody wants to put "spent two months migrating goo.gl url shortener to work with Sisyphus release manager and Dante 7 SRE monitoring" in their perf packet
  that's a negative credit activity
  mmooss 18 days ago
  Another possibility is that it's a distraction - whatever the marginal costs, there's a fixed cost to each system in terms of cognitive overhead, if not documentation, legal issues (which can change as laws and regulations change), etc. Removing distractions is basic management.
olalonde 18 days ago
> This article was written by a human, but links were suggested by and grammar checked by an LLM.
This is the second time today I've seen a disclaimer like this. Looks like we're witnessing the start of a new trend.
- tester756 18 days ago
  It's crazy that people feel that they need to put such disclaimers
  actuallyalys 18 days ago
  It makes sense to me. After seeing a bunch of AI slop, people started putting no AI buttons and disclaimers. Then some people using AI for little things wanted to clarify it wasn’t AI generated wholesale without falsely claiming AI wasn’t involved at all.
  layer8 18 days ago
  It’s more a claimer than a disclaimer. ;)
  danadam 17 days ago
  I'd probably call it "disclosure".
  psychoslave 18 days ago
  This comment was written by a human with no check by any automaton, but how will you check that?
  acquisitionsilk 17 days ago
  Business emails, other comments here and there of a more throwaway or ephemeral nature - who cares if LLMs helped?
  Personal blogs, essays, articles, creative writing, "serious work" - please tell us if LLMs were used, if they were, and to what extent. If I read a blog and it seems human and there's no mention of LLMs, I'd like to be able to safely assume it's a human who wrote it. Is that so much to ask?
  qingcharles 17 days ago
  That's exactly what a bot would say!
- chii 17 days ago
  i dont find the need to have such a disclaimer at all.
  If the content can stand on its own, then it is sufficient. If the content is slop, then why does it matter that it is an ai generated slop vs human generated slop?
  The only reason anyone wants to know/have the disclaimer is if they cannot themselves discern the quality of the contents, and is using ai generation as a proxy for (bad) quality.
  johannes1234321 17 days ago
  For the author it matters. To which degree do they want to be associated with the resulting text.
  And I differentiate between "Matt Godbolt" who is an expert in some areas and in my experience careful about avoiding wrong information and an LLM which may produce additional depth, but may also make up things.
  And well, "discern the quality of the contents" - I often read texts to learn new things. On new things I don't have enough knowledge to qualify the statements, but I may have experience with regards to the author or publisher.
  chii 17 days ago
  and what do you do to make this differentiation if what you're reading is a scientific paper?
  johannes1234321 17 days ago
  Same?
  (Some researcher's names I know, some institutions published good reports in the past and that I take into consideration on how much I trust it ... and since I'm human I trust it more if it confirms my view and less if it challenges it or put in different words: there are many factors going into subjective trust)
wrs 18 days ago
I hate to say it, but unless there’s a really well-funded foundation involved, Compiler Explorer and godbolt.org won’t last forever either. (Maybe by then all the info will have been distilled into the 487 quadrillion parameter model of everything…)
- mattgodbolt 17 days ago
  We've done alright so far: 13 years this week. I have funding for another year and change even assuming growth and all our current sponsors pull out.
  I /am/ thinking about a foundation or similar though: the single point of failure is not funding but "me".
- badmintonbaseba 17 days ago
  Well, that's true, but at least now compiler explorer links will stop working when compiler explorer vanishes, but not before that.
  I think the most valuable long-living compiler explorer links are in bug reports. I like to link to compiler explorer in bug reports for convenience, but I also include the code in the report itself, and specify what compiler I used with what version to reproduce the bug. I don't expect compiler explorer to vanish anytime soon, but making bug reports self-contained like this protects against that.
- layer8 18 days ago
  Thanks to the no-hiding theorem, the information will live forever. ;)
2YwaZHXV 18 days ago
Presumably there's no way to get someone at Google to query their database and find all the shortened links that go to godbolt.org?
swyx 18 days ago
idk man how can URLs last forever if it costs money to keep a domain name alive?
i also wonder if url death could be a good thing. humanity makes special effort to keep around the good stuff. the rest goes into the garbage collection of history.
- johannes1234321 18 days ago
  Historians however would love to have more garbage from history, to get more insights on "real" life rather than just the parts one considered worth keeping.
  If I could time jump it would be interesting to see how historians inna thousand years will look back at our period where a lot of information will just disappear without traces as digital media rots.
  mrguyorama 18 days ago
  I regularly wonder if modern educated people do not journal as much as previous century educated people who were kind of rare.
  Maybe we should get a journaling boom going.
  But it has to be written, because pen and paper is literally ten times more durable than even good digital storage.
  swyx 18 days ago
  > pen and paper is literally ten times more durable than even good digital storage.
  citation needed lol. data replication >>>> paper's single point of failure.
  johannes1234321 17 days ago
  The question is: What is more likely in 1000 years to still exist and being readable. The papers caught in some lost ruins or some form of storage media?
  Sure, as long as the media is copied there is a chance of survival, but will this then be "average" material or things we now consider interesting, only? Will the chain hold or will it become as uninteresting as many other things were over time? Will the Organisation doing it be funded? Will the location where this happens be spares from war?
  For today's historians the random finds are important artifacts to understand "average" people's lives as the well preserved documents are legends on the mighty people.
  Having lots of material all over gives a chance for some to survive and from 40 years or so back we were in a good spot. Lots of paper allover about everything. Analog vinyl records, which might be readable in a future to learn about our music. But now all on storage media, where many forms see data loss, where the format is outdated and (when looking from a thousand years away) fast change of data formats etc.
  KPGv2 17 days ago
  > What is more likely in 1000 years to still exist and being readable. The papers caught in some lost ruins or some form of storage media?
  The storage media. We have evidence to support this:
  * original paper works from 1000 years ago are insanely rare
  * more recent storage media provide much more content
  How many digital copies of Beowulf do we have? Millions?
  How many paper copies from 1000 years ago? one
  how many other works from 1000 years ago do we have zero copies of thanks to paper's fragility and thus don't even know existed? probably a lot
  johannes1234321 17 days ago
  However that one paper, stating a random fact, might tell more about the people than an epic poem.
  You can't have a fully history without either.
  tredre3 17 days ago
  > The question is: What is more likely in 1000 years to still exist and being readable. The papers caught in some lost ruins or some form of storage media?
  But that's just survivorship bias. The vast vast vast majority of all written sheets of paper have been lost to history. Those deemed worthy were carefully preserved, some of the rest was preserved by a fluke. The same is happening with digital media.
  swyx 18 days ago
  we'd keep the curiosities around, like so much Ea Nasir Sells Shit Copper. we have room for like 5-10 of those per century. not like 8 billion. much of life is mundane.
  woodruffw 18 days ago
  > much of life is mundane.
  The things that make (or fail to make) life mundane at some point in history are themselves subjects of significant academic interest.
  (And of course we have no way to tell what things are "curiosities" or not. Preservation can be seen as a way to minimize survivorship bias.)
  johannes1234321 17 days ago
  Yes, at the same time we'd be excited about more mundane sources from history. The legends about the mighty are interesting, but what do we actually know about every day love from people a thousand years ago? Very little. Most things are speculation based on objects (tools etc.), on structure of buildings and so on. If we go back just few hundred years there is (using European perspective) a somewhat interesting source in court cases from legal conflicts between "average" people, but in older times more or less all written material is on the powerful, be it worldly or religious power, which often describes the rulers in an extra positive way (from their perspective) and their opponents extra weak.
  Having more average sources certainly helps and we now aren't good judges on what will be relevant in future. We can only try to keep some of everything.
  cortesoft 18 days ago
  Today’s mundane is tomorrow’s fascination
  shakna 18 days ago
  We also have rooms full of footprints. In a thousand years, your mundane is the fascination of the world.
  rightbyte 18 days ago
  Imagine being judged 1000s of year later by some Yelp reviews like poor Nasir.
- internetter 18 days ago
  > i also wonder if url death could be a good thing. humanity makes special effort to keep around the good stuff. the rest goes into the garbage collection of history.
  agreed. formerly wrote some thoughts here: https://boehs.org/node/internet-evanescence
sedatk 18 days ago
Surprisingly, purl.org URLs still work after a quarter century, thanks to Internet Archive.
rurban 17 days ago
He missed the archive.org crawl for those links in the blog post. they have them stored also now. https://github.com/compiler-explorer/compiler-explorer/discu...
- mattgodbolt 16 days ago
  He didn't know at the time but he's definitely pleased this is happening and will get to looking at it tomorrow!
jimmyl02 18 days ago
This is great perspective about how assumptions play out over longer period of time. I think that this risk is much greater for free third party services for critical infrastructure.
Someone has to foot the bill somewhere and if there isn't a source of income then the project is bound to be unsupported eventually.
- tekacs 18 days ago
  I think I would struggle to say that free services die at a higher rate consistently…
  So many paid offerings, whether from startups or even from large companies, have been sunset over time, often with frustratingly short migration periods.
  If anything, I feel like I can think of more paid services that have given their users short migration periods than free ones.
- cortesoft 18 days ago
  Nah, businesses go under all the time, whether their services are paid or not.
- lqstuart 18 days ago
  Counterexample: the Linux kernel
  charcircuit 18 days ago
  How? Big tech foots the bill.
  shlomo_z 18 days ago
  But goo.gl is also big tech...
  0x1ceb00da 17 days ago
  Google wasn't making money off of goo.gl
  johannes1234321 17 days ago
  But what did it cost them? Especially in read only mode
  Sure, an service more to monitor, while for the most part "fix by restart" is a good enough approach. And then once in a while have an intern switching to latest backend choice.
  undefined 18 days ago
  [deleted]
  iainmerrick 18 days ago
  Linux isn't a service (in the SaaS sense).
sebstefan 17 days ago
>Over the last few days, I’ve been scraping everywhere I can think of, collating the links I can find out in the wild, and compiling my own database of links1 – and importantly, the URLs they redirect to. So far, I’ve found 12,000 links from scraping:
>Google (using their web search API)
>GitHub (using their API)
>Our own (somewhat limited) web logs
>The archive.org Stack Overflow data dumps
>Archive.org’s own list of archived webpages
You're an angel Matt
- mattgodbolt 16 days ago
  Thanks! It's been a fun learning experience. I just found out the internet archive has a much more comprehensive effort going so it might have been in vain, but I tried :)
  account42 13 days ago
  What really matters is caring about keeping the links going in the first place. Most website operators never really get that far. So, thanks for caring.
shepmaster 18 days ago
As we all know, Cool URIs don't change [1]. I greatly appreciate the care taken to keep these Compiler Explorer links working as long as possible.
The Rust playground uses GitHub Gists as the primary storage location for shared data. I'm dreading the day that I need to migrate everything away from there to something self-maintained.
[1]: https://www.w3.org/Provider/Style/URI
3cats-in-a-coat 17 days ago
Nothing lasts forever.
I've pondered that a lot in my system design which bears some resemblance to the principles of REST.
I have split resources in ephemeral (and mutable), and immutable, reference counted (or otherwise GC-ed), which are persistent while referred to, but collected when no one refers to them.
In a distributed system the former is the default, the latter can exist in little islands of isolated context.
You can't track references throughout the entire world. The only thing that works is timeouts. But those are not reliable. Nor you can exist forever, years after no one needs you. A system needs its parts to be useful, or it dies full of useless parts.
90s_dev 18 days ago
Some famous programmer once wrote about how links should last forever.
He advocated for /foo/bar with no extension. He was right about not using /foo/bar.php because the implementation might change.
But he was wrong, it should be /foo/bar.html because the end-result will always be HTML when it's served by a browser, whether it's generated by PHP, Node.js or by hand.
It's pointless to prepare for some hypothetical new browser that uses an alternate language other than HTML and that doesn't use HTML.
Just use .html for your pages and stop worrying about how to correctly convert foo.md to foo/index.html and configure nginx accordingly.
- Sesse__ 18 days ago
  > Some famous programmer once wrote about how links should last forever.
  You're probably thinking of W3C's guidance: https://www.w3.org/Provider/Style/URI
  > But he was wrong, it should be /foo/bar.html because the end-result will always be HTML
  20 years ago, it wasn't obvious at all that the end-result would always be HTML (in particular, various styled forms of XML was thought to eventually take over). And in any case, there's no reason to have the content-type in the URL; why would the user care about that?
  90s_dev 18 days ago
  There's strong precedence for associating file extensions with content types. And it allows static files to map 1:1 to URLs.
  I agree though that I was too harsh, I didn't realize it was written in 1998 when HTML was still new. I probably first read it around 2010.
  But now that we have hindsight, I think it's safe to say .html files will continue to be supported for the next 50 years.
- esafak 18 days ago
  If it's always .html, it's cruft; get rid of it. And what if it's not HTML but JSON? Besides, does the user care? Berners-Lee was right.
  https://www.w3.org/Provider/Style/URI
  90s_dev 18 days ago
  If it's JSON then name it /foo/bar.json, and as a bonus you can also have /foo/bar.html!
  You say the extension is cruft. That's your opinion. I don't share it.
  marcosdumay 18 days ago
  The alternative is to declare what you want on the Accept header, what is way less transparent but is more flexible.
  I never saw any site where the extra flexibility added any value. So, right now I do favor the extension.
  kelnos 17 days ago
  At the risk of committing the appeal-to-authority fallacy, it's also the opinion of Tim Berners-Lee, which I would hope carries at least some weight.
  The way I look at it is that yes, the extension can be useful for requesting a particular file format (IMO the Accept header is not particularly accessible, especially if you are just a regular web browser user). But if you have a default/canonical representation, then you should give that representation in response to a URL that has no extension. And when you link to that document in a representation-neutral way, you should link without the extension.
  That doesn't stop you from also serving that same content from a URL that includes the extension that describes the default/canonical representation. And people who want to link to you and ensure they get a particular representation can use the extension in their links. But someone who doesn't care, and just wants the document in whatever format the website owner recommends, should be able to get it without needing to know the extension. For those situations, the extension is an implementation detail that is irrelevant to most visitors.
  90s_dev 17 days ago
  > it's also the opinion of Tim Berners-Lee, which I would hope carries at least some weight
  Not at all. He's famous for helping create the initial version of JavaScript, which was a fairly even mixture of great and terrible. Which means his initial contributions to software were not extremely noteworthy, and he just happened to be in the right time and right place, since something like JavaScript was apparently inevitable. Plus, I can't think of any of his major contributions to software in the decades since. So no, I don't even think that's really an appeal to authority.
  wolfgang42 17 days ago
  > [Tim Berners-Lee is] famous for helping create the initial version of JavaScript
  You may be thinking of Brendan Eich? Berners-Lee is famous for HTML, HTTP, the first web browser, and the World Wide Web in general; as far as I know he had nothing to do with JS.
- 90s_dev 18 days ago
  Found it: https://www.w3.org/Provider/Style/URI
  Why did I think Joel Spolsky or Jeff Atwood wrote it?
- crackalamoo 18 days ago
  I use /foo/bar/ with the trailing slash because it works better with relative URLs for resources like images. I could also use /foo/bar/index.html but I find the former to be cleaner
  90s_dev 18 days ago
  It's always bothered me in a small way that github doesn't honor this:
  https://github.com/sdegutis/bubbles
  https://github.com/sdegutis/bubbles/
  No redirect, just two renders!
  It bothers me first because it's semantically different.
  Second and more importnatly, because it's always such a pain to configure that redirect in nginx or whatever. I eventually figure it out each time, after many hours wasted looking it up all over again and trial/error.
- Dwedit 18 days ago
  mod_rewrite means you can redirect the .php page to something else if you stop using php.
  shakna 18 days ago
  Unless mod_rewrite is disabled, because it has had a few security bugs over the years. Like last year. [0]
  [0] https://nvd.nist.gov/vuln/detail/CVE-2024-38475
  account42 13 days ago
  It also means you can internally redirect the extension-less version to .php in the first place so you never have to change your public URL in the future.
devrandoom 17 days ago
> despite Google solemnly promising ...
I'm pretty sure the lore says that a solemn promise from Google carries the exact same value as a prostitute saying she likes you.
nssnsjsjsjs 17 days ago
The collolary of URLs that last forever is we have both forever storage (costs money forever) and forever institutional care and memory.
Where URLs may last longer is where they are not used for the RL bit. But more like a UUID for namespacing. E.g. in XML, Java or Go.
devnullbrain 18 days ago
>despite Google solemnly promising that “all existing links will continue to redirect to the intended destination,” it went read-only a few years back, and now they’re finally sunsetting it in August 2025
It's become so trite to mention that I'm rolling my eyes at myself just for bringing it up again but... come on! How bad can it be before Google do something about the reputation this behaviour has created?
Was Stadia not an expensive enough failure?
- iainmerrick 17 days ago
  I'm very surprised, even though I shouldn't be, that they're actually shutting the read-only goo.gl service down.
  For other obsolete apps and services, you can argue that they require some continual maintenance and upkeep, so keeping them around is expensive and not cost-effective if very few people are using them.
  But a URL shortener is super simple! It's just a database, and in this case we don't even need to write to it. It's literally one of the example programs for AWS Lambda, intentionally chosen because it's really simple.
  I guess the goo.gl link database is probably really big, but even so, this is Google! Storage is cheap! Shutting it down is such a short-sighted mean-spirited bean-counter decision, I just don't get it.
diggan 18 days ago
URLs (uniform resource locator) cannot ever last forever, as it's a location and locations can't last forever :)
URIs however, can be made to last forever! Also comes with the added benefit that if you somehow integrate content-addressing into the identifier, you'll also be able to safely fetch it from any computer, hostile or not.
- 90s_dev 18 days ago
  I've been making websites for almost 30 years now.
  I still don't know the difference between URI and URL.
  I'm starting to think it doesn't matter.
  Sesse__ 18 days ago
  It doesn't matter.
  URI is basically a format and nothing else. (foo://bar123 would be a URI but not a URL because nothing defines what foo: is.)
  URLs and URNs are thingies using the URI format; https://news.ycombinator.com is a URL (in addition to being a URI) because there's an RFC that specifies that https: means and how to go out and fetch them.
  urn:isbn:0451450523 (example cribbed from Wikipedia) is an URN (in addition to being an URI) that uniquely identifies a book, but doesn't tell you how to go find that book.
  Mostly, the difference is pedantic, given that URNs never took off.
  account42 13 days ago
  Too bad that URLs and URNs are generally distinct subsets. It would be better if URLs also uniquely identified the resource they are pointing to so you could find it elsewhere if the original location goes away.
  90s_dev 18 days ago
  It's almost like URNs were born in an urn! [1]
  [1]: ba dum tss
  layer8 18 days ago
  URLs in the strict sense are a subset of URIs. They specify a mechanism (like HTTP or FTP) for how to access the referenced resource. The other type of URIs are opaque IDs, like doi:10.1000/182 or urn:isbn:9780141036144. These technically can’t expire, though that doesn’t mean you’ll be able to access what they reference.
  However, “URL” in the broader sense is used as an umbrella term for URIs and IRIs (internationalized resource identifiers), in particular by WHATWG.
  In practice, what matters is the specific URI scheme (“http”, “doi”, etc.).
  immibis 17 days ago
  A URL tells you where to get some data, like https://example.com/index.html
  A URN tells you which data to get (usually by hash or by some big centralized registry), but not how to get it. DOIs in academia, for example, or RFC numbers. Magnet links are borderline.
  URIs are either URLs or URNs. URNs are rarely used since they're less practical since browsers can't open them - but note that in any case each URL scheme (https) or URN scheme (doi) is unique - there's no universal way to fetch one without specific handling for each supported scheme. So it's not actually that unusual for a browser not to be able to open a certain scheme.
  diggan 18 days ago
  > I still don't know the difference between URI and URL.
  One is a location, the other one is a ID. Which is which is referenced in the name :)
  And sure, it doesn't matter as long as you're fine with referencing locations rather than the actual data, and aware of the tradeoffs.
  marcosdumay 18 days ago
  An URI is an standard way to write names of documents.
  And URL is an URI that also tells you how to find the document.
- postoplust 18 days ago
  For example: IPFS URI's are content addresses
  https://docs.ipfs.tech/
- bowsamic 17 days ago
  Does this have any actual grounding in reality or does your lack of suggestion for action confirm my suspicion that this is just a theoretical wish?
  diggan 17 days ago
  > Does this have any actual grounding in reality
  Depends on your use case I suppose. For things I want to ensure I can reference forever (theoretical forever), then using location for that reference feels less than ideal, I cannot even count the number of dead bookmarks on both hands and feet, so "link rot" is a real issue.
  If those bookmarks instead referenced the actual content (via content-addressing for example), rather than the location, then those would still work today.
  But again, not everyone cares about things sticking around, not all use cases require the reference to continue being alive, and so on, so if it's applicable to you or not is something only you can decide.
Ericson2314 17 days ago
The only type of reference that lasts forever is a content address.
We should be using more of them.
- account42 13 days ago
  A content address doesn't guarantee that there is anyone still serving that content so it doesn't actually improve much over an URL + reference date.
mbac32768 17 days ago
it seems a bit crazy to try to avoid storing a relatively small amount of data when a link is shared when storage costs and bandwidth costs are rapidly dropping with time
but perhaps I don't appreciate how much traffic godbolt gets
- mattgodbolt 17 days ago
  It was a simpler time and I didn't want the responsibility of storing other people's data. We do now though!
  mattgodbolt 17 days ago
  Oh and traffic: https://stats.compiler-explorer.com/
sdf4j 18 days ago
> One of my founding principles is that Compiler Explorer links should last forever.
And yet… that was a very self-destructive decision.
- mattgodbolt 17 days ago
  I'm not sure why so?
  MyPasswordSucks 17 days ago
  Because URL shortening is relatively trivial to implement, and instead of just doing so on their own end, they decided to rely on a third-party service.
  Considering link permanence was a "founding principle", that's just unbelievably stupid. If I decide one of my "founding principles" is that I'm never going to show up at work with a dirty windshield, then I shouldn't rely on the corner gas station's squeegee and cleaning fluid.
  gwd 17 days ago
  First of all, how the links are made permanent has nothing to do with the principle that they should be made permanent.
  There seemed to be two principles at play here:
  1. Links should always work
  2. We don't want to store any user data
  #2 is a bit complicated, because although it sounds nice, it has two potential justifications:
  2a: For privacy reasons, don't store any user data
  2b: To avoid having to think through the implications of storing all those things ourselves
  I'm not sure how much each played into their thinking; possibly because of a lack of clarity, 2a sounded nice and 2b was the real motivation.
  I'd say 2a is a reasonable aspiration; but using a link shortener changed it from "don't store any user data" to "store the user data somewhere we can't easily get at it", which isn't the same thing.
  2b, when stated more clearly, is obviously just taking on technical debt and adding dependencies which may come back to bite you -- as it did.
  account42 13 days ago
  You're always relying on someone else, no matter what you do.
  Also, "they" is the person you are replying to.
sahil_sharma0 17 days ago
[dead]
merillecuz56 15 days ago
[dead]
curtisszmania 18 days ago
[dead]