• Scene_Cast2 4 hours ago

    I wonder how much improvement is owed to which changes. I've also never heard of "Muon - Momentum Orthogonalized by Newton-schulz" being used.

    EDIT: there's a bit more info on his twitter - https://x.com/kellerjordan0

    It looks like he created this optimizer. Works on 2D matrices only.

    • molticrystal 2 hours ago

      Just needs a Zero To Hero series episode offering line by line commentary to follow along on why each choice was made over alternatives.

      • whiplash451 4 hours ago

        Cool work. No license?

        • m3kw9 2 hours ago

          So it compresses info better.

          • pyinstallwoes an hour ago

            That is literally intelligence.

          • gavindean90 4 hours ago

            Seems like this is a modded NanoGPT not the original.

            • munchler 4 hours ago

              Yes. It’s literally called “Modded-NanoGPT”.