I know it's not the same idea, but I think it's worth mentioning the adjacent concept of 'neural CA':
https://google-research.github.io/self-organising-systems/di...
https://google-research.github.io/self-organising-systems/is...
I can see why Mordvintsev et al are up to what they are doing, but to be honest I'm struggling with understanding the point of using a neural-net to 'emulate' CAs like OP seems to be doing (and as far as I can gather, only totalistic ones too?).
It sounds a bit like swatting a fly using an H-bomb tbh, but maybe someone who knows more about the project can share some of the underlying rationale?
I don't get it, does the prediction go backwards or forward along CA generations?
I think the biggest advantage NNs have over CA is the fact that most CA only provide localized computation. It can take a large number of fixed iterations before information propagates to the appropriate location in the 1d/2d/3d/etc. space. Contrast this with arbitrary NN topology where instant global connectivity is possible between any elements.
CNNs are CA if you don't insert fully connected layers, actually.
intersting idea to do it in a distributed way with people help.