I find it a little annoying that in the paper[0] they show various graphs of megabytes saved in the paper, but no actual size of the binaries that these policies are applied to, as far as I can tell.
So when they say the inline policies end up saving 20 MiB on the training data, and then only a few megabyte on a different binary not in the training data, I lack the context to really judge what that says. Is the other binary much smaller? The same size? What if it's bigger and therefore hides a much smaller relative size savings?
At the very end of the paper do they mention one binary size: namely that they save about 3 MB on the Chrome on Android binary, which is 213.32 MB after implementing the policy. A solid 1%, probably makes an enormous difference at Google Scale, especially for their main Android browser, so I hope it's obvious that I'm not trying to diminish the achievement of these people. But I find the other benchmarks kind of hard to interpret.
They mention an overhaul improvement of 1% > After seven iterations of our algorithm we find a size reduction of approximately 1% compared to the evolutionary strategy baseline. See the paper for more detailed results
Someone once said the most fruitful research in AI is making models scale to larger compute/data.
I think the same could become true for compilers, and I think equality saturation is the key. AI + equality saturation could scale the optimization of a single program to an entire data center
I feel there is only so much you can squeeze out of a compiler
IMHO we need to accept profile-based compiling or dynamic JIT to keep making progress on performance
EDIT: of course there is still lots of low hanging fruit for exploiting specific instruction sets or memory architectures
Happy to collaborate towards such.