« BackHTTrack Website Copiergithub.comSubmitted by iscream26 4 hours ago
  • Felk 26 minutes ago

    Funny seeing this here now, as I _just_ finished archiving an old MyBB PHP forum. Though I used `wget` and it took 2 weeks and 260GB of uncompressed disk space (12GB compressed with zstd), and the process was not interruptible and I had to start over each time my hard drive got full. Maybe I should have given HTTrack a shot to see how it compares.

    If anyone wanna know the specifics on how I used wget, I wrote it down here: https://github.com/SpeedcubeDE/speedcube.de-forum-archive

    Also, if anyone has experience archiving similar websites with HTTrack and maybe know how it compares to wget for my use case, I'd love to hear about it!

    • oriettaxx 14 minutes ago

      I don't get it: last release 2017 while in github I see more releases...

      so, did developer of the github repo took over and updating/upgrading? very good!

      • xnx 3 hours ago

        Great tool. Does it still work for the "modern" web (i.e. now that even simple/content websites have become "apps")?

        • alganet an hour ago

          Nope. It is for the classic web (the only websites worth saving anyway).

          • freedomben 44 minutes ago

            Even for classic web, if it's behind cloudflare, then HTTrack no longer works.

            It's a sad point to be at. Fortunately, the single file extension still works really well for single pages, even when they are built dynamically by JavaScript on the client side. There isn't a solution for cloning an entire site though, at least that I know of

            • alganet 7 minutes ago

              If it is cloudflare human verification, then httrack will have an issue. But in the end it's just a cookie, you can use a browser with JS to grab the cookie, then feed it to httrack headers.

              If cloudflare ddos protection is an issue, you can throttle httrack requests.

        • corinroyal 2 hours ago

          One time I was trying to create an offline backup of a botanical medicine site for my studies. Somehow I turned off depth of link checking and made it follow offsite links. I forgot about it. A few days later the machine crashed due to a full disk from trying to cram as much of the WWW as it could on there.

          • Alifatisk 2 hours ago

            Good ol' days

            • dark-star 3 hours ago

              oh wow that brings back memories. I have used httrack in the late 90s and early 2000's to mirror interesting websites from the early internet, over a modem connection (and early DSL)

              Good to know they're still around, however, now that the web is much more dynamic I guess it's not as useful anymore as it was back then