Comments Page - Modded-NanoGPT: NanoGPT (124M) quality in 3.25B tokens

Scene_Cast2 a year ago
I wonder how much improvement is owed to which changes. I've also never heard of "Muon - Momentum Orthogonalized by Newton-schulz" being used.
EDIT: there's a bit more info on his twitter - https://x.com/kellerjordan0
It looks like he created this optimizer. Works on 2D matrices only.
molticrystal a year ago
Just needs a Zero To Hero series episode offering line by line commentary to follow along on why each choice was made over alternatives.
whiplash451 a year ago
Cool work. No license?
byyoung3 a year ago
do you have a baseline of the regular implementation with 3x learning rate?
m3kw9 a year ago
So it compresses info better.
- pyinstallwoes a year ago
  That is literally intelligence.
  parineum a year ago
  It's not.
  nikonyrh a year ago
  I suppose you aren't a fan of the https://en.wikipedia.org/wiki/Hutter_Prize .
  > The goal of the Hutter Prize is to encourage research in artificial intelligence (AI). The organizers believe that text compression and AI are equivalent problems.
  parineum a year ago
  I believe that they believe that and that it _could_ be true. That's far from declaratively stating that they are the same thing, as if there was some sort of evidence and consensus of such a claim.
gavindean90 a year ago
Seems like this is a modded NanoGPT not the original.
- munchler a year ago
  Yes. It’s literally called “Modded-NanoGPT”.