• isoprophlex 5 hours ago

    > It took me quite a few iterations to get to this recipe for compiling the tool itself

    Hehe, I really feel this remark, having recently built a sqlite3 binary with Lua support[*]

    The LLMs were no help here, either. Which is maybe a good thing for our collective future employment.

    That said, I found the quality of the documentation and legibility of the sqlite source code to be absolutely fantastic. When you work with the code, you can almost feel the programming enlightenment rub off on you.

    [*] why, for heavens' sake? so you can decouple your business logic from the database and test it a bit outside the DB, whilst still keeping the data all in one place without sending it over the wire. It lets you SELECT lua('some lua code that operates on your sqlite data') FROM some-table;, so you can put gnarly nested conditionals or whatever into Lua, build some comprehensive testing around that, and execute it on your data as if everything was a single, plain old SQL statement.

    • simonw 4 hours ago

      I got a little bit of assistance from Claude in figuring this out, but it pointed me in the right direction more than actually giving me the right sequence of commands. Claude got me as far as:

          gcc -o sqldiff sqldiff.c ../sqlite3.c -I.. -ldl -lpthread
      
      Which was enough for me to figure out I needed to get the sqlite3.c amalgamation to build and then run gcc in the tool/ directory. I landed on this after a bit more poking:

          gcc -o sqlite3-rsync sqlite3-rsync.c ../sqlite3.c -DSQLITE_ENABLE_DBPAGE_VTAB
      
      I did have a much more useful interaction with an LLM later on: I was curious about the protocol used over SSH, so I copied the whole of this 1800 line C file:

      https://github.com/sqlite/sqlite/blob/sqlite3-rsync/tool/sql...

      And pasted it into OpenAI o1-preview with the prompt "Explain the protocol over SSH part of this" - and got back a genuinely excellent high-level explanation of how that worked: https://chatgpt.com/share/6701450c-bc9c-8006-8c9e-468ab6f67e...

  • gary_0 4 hours ago

    sqlite3-rsync is in the makefile. I got sqlite-rsync to build by just checking out the main branch and going:

        ./configure
        make sqlite3-rsync
    
    I think the rsync branch has been merged?

    I'm really excited for this new tool; I'm hoping to use it for nightly backups of a webapp I'm building that uses sqlite exclusively. Although I'm hesitant to rely on sqlite-rsync right away because it's so new.

    • pkhuong 2 hours ago

      > I'm hesitant to rely on sqlite-rsync right away because it's so new.

      If you want something less new that only needs an S3-compatible endpoint, there's https://gist.github.com/pkhuong/555be66f5e573a2d56186d658fd8...

      • simonw 3 hours ago

        Thanks for that! I just updated my TIL to promote this recipe instead - I didn't realize it was A) in the Makefile and B) merged into the main branch.

        • ralferoo 3 hours ago

          > I'm hoping to use it for nightly backups of a webapp I'm building that uses sqlite exclusively

          The easiest recipe for this is to use

              sqlite3 $SRC "vacuum into '$DEST'"
          
          $DEST must not already exist, personally I prefer to put the timestamp into the filename and keep the last n copies.
          • gary_0 3 hours ago

            I'm aware of the existing backup methods, but AFAIK they don't do a delta of only the changed pages (unless I use separate software), and "vacuum into" doesn't work over SSH.

            I have no need to store a backup on the same physical server. Also, the backup server will keep snapshots, and with sqlite3-rsync I can just update a previous snapshot with the changed pages rather than redundantly transferring the whole database over every time.

            • simonw 3 hours ago

              Yeah, the unique benefit of sqlite-rsync looks to be efficiently backing up medium to large databases that don’t change enormously (so deltas are worthwhile) over an SSH connection. I particularly like that you don’t need double the disk space to create the temporary backup copy before downloading it.

        • tosh 4 hours ago

          I wonder how this compares to litestream and other existing sqlite replication strategies.

          • ralferoo 3 hours ago

            I had a quick scan through the source and it gets the hash of each page on both sides and sends the page if it's different (so conceptually very similar to rsync). I don't think the cache is indexed anywhere normally, so that'd imply it needs to do a full read of the database on both sides, so it'll be slow and probably trash your disk cache on a large database.

            In theory, it should be able to use the WAL file directly if it copies the WAL checksums, but that information would get lost every time the WAL is flushed and the data is moved into the main db file.

            Litestream knows exactly which pages have been written via the fuse fs, so it doesn't need to do all that work for unchanged pages.

            • simonw 3 hours ago

              Ugh, yeah that's a good point: it looks like each time you run sqlite-rsync it calculates a hash for every database page on both the source and the replica side of things - that's a lot of disk access and CPU, way more than you would want to run on an every-ten-seconds basis.

            • simonw 3 hours ago

              I guess this is probably fast enough that you could run it on a schedule every 10 seconds or so (at least for databases measured in GBs, not TBs) to keep a warm copy updated elsewhere. [UPDATE: maybe not, it has to hash everything, see comment here https://news.ycombinator.com/item?id=41749288#41760395]

              Litestream gives you a backup in S3 within less than a second of data being committed though, plus since it records the full WAL archive you get point-in-time recovery for previous timestamps too.