• nick238 19 minutes ago

    Double your compression ratio for the low, low price of 100,000x slower decompression (zstd: 215GB, 2.2 ns/byte vs. nncp: 106GB, 230 µs/byte)!

    The neural network architectures are technically impressive, but unless there's some standard compression dictionary that works for everything (so the training/compression costs amortize down to nil), and silicon architecture dramatically changes to compute-on-memory, I don't know if this would ever take off. Lossy compression would probably provide huge advantages, but then you need to be domain specific, and can't slap it on anything.

    • lxgr 5 minutes ago

      One interesting application could be instant messaging over extremely bandwidth constrained paths. I wouldn’t be surprised if Apple were doing something like this for their satellite-based iMessage implementation.

      Of course it could also just be a very large “classical” shared dictionary (zstd and brotli can work in that mode, for example).

    • hyperpape an hour ago

      It's worth noting that the benchmark has not been updated as frequently for the past several years, and some versions of compressors are quite far behind the current implementations (http://www.mattmahoney.net/dc/text.html#history).

      The one instance I double-checked (zstd) I don't recall it making a massive difference, but it did make a difference (iirc, the current version was slightly smaller than what was listed in the benchmark).

    • pama an hour ago

      It would be nice to also have a competition of this type where within ressonable limits the size of the compressor does not matter and the material to be compressed is hidden and varied over time. For example up to 10GB compressor size and the dataset is a different random chunk of fineweb every week.

      • pmayrgundter an hour ago

        The very notable thing here is that the best method uses a Transformer, and no other entry does