If you are interested in (2026-)internet scale data engineering challenges (e.g. 10-100s of petabyte processing) challenges and pre-training/mid-training/post-training scale challenges, please send me an email to d+data@krea.ai !
English version: https://github.com/datascale-ai/data_engineering_book/blob/m...
Oh thanks! I've switched the top URL to that now. Submitted URL was https://github.com/datascale-ai/data_engineering_book.
I hope xx123122 won't mind my mentioning that they emailed us about this post, which originally got caught in a spam filter. I invited them to post a comment giving the background to the project but they probably haven't seen my reply yet. Hopefully soon, given that the post struck a chord!
The figures in the different chapters are in english (it's not the case for the image in README_en.md).
谢谢
How is possible a Chinese publication gets to the top in HN?
Nevermind.