• Scene_Cast2 9 months ago

    I wonder how much improvement is owed to which changes. I've also never heard of "Muon - Momentum Orthogonalized by Newton-schulz" being used.

    EDIT: there's a bit more info on his twitter - https://x.com/kellerjordan0

    It looks like he created this optimizer. Works on 2D matrices only.

    • molticrystal 9 months ago

      Just needs a Zero To Hero series episode offering line by line commentary to follow along on why each choice was made over alternatives.

      • whiplash451 9 months ago

        Cool work. No license?

        • byyoung3 9 months ago

          do you have a baseline of the regular implementation with 3x learning rate?

          • m3kw9 9 months ago

            So it compresses info better.

            • pyinstallwoes 9 months ago

              That is literally intelligence.

              • parineum 9 months ago

                It's not.

                • nikonyrh 9 months ago

                  I suppose you aren't a fan of the https://en.wikipedia.org/wiki/Hutter_Prize .

                  > The goal of the Hutter Prize is to encourage research in artificial intelligence (AI). The organizers believe that text compression and AI are equivalent problems.

                  • parineum 9 months ago

                    I believe that they believe that and that it _could_ be true. That's far from declaratively stating that they are the same thing, as if there was some sort of evidence and consensus of such a claim.

            • gavindean90 9 months ago

              Seems like this is a modded NanoGPT not the original.

              • munchler 9 months ago

                Yes. It’s literally called “Modded-NanoGPT”.