Comments Page - Alleged Distillation Attacks by DeepSeek, Moonshot AI, and MiniMax

« Back Alleged Distillation Attacks by DeepSeek, Moonshot AI, and MiniMaxtwitter.comSubmitted by mike_kamau a day ago

exq a day ago
So it's okay when big American corps raid the internet ignoring any terms of service or licenses they see in order to train models they rent back to us, but when a foreign entity trains off of Anthropic it's illegal?
- riku_iki a day ago
  From the tweet, Anthropic's point is that distillation is Ok, unless new model has safeguards removed or used for military or surveillance purposes.
  dmonitor a day ago
  The fact that they're calling it an "attack" implies otherwise.
  I find the entire premise of this announcement absurd. Fraudulent accounts? They're just accounts. They paid for the access the same as any other. They're accessing Claude just like a human (or *claw) would.
  There's no argument against their strategy that doesn't make them complete hypocrites in respect to how they got the model training data in the first place.
  mongrelion 17 hours ago
  I agree with you, especially with this:
  They paid for the access the same as any other.
  If anything, this makes them more legit than Anthropic because they are paying for the content, whereas Anthropic just stole *all* the data they got a hold of. So, in this case the Chinese AI labs stand on higher moral ground LOL.
  riku_iki a day ago
  > them complete hypocrites in respect to how they got the model training data in the first place.
  sure, hypocrisies is part of rules for big games: politics and business.
  > Fraudulent accounts? They're just accounts.
  they tell the story in blog post, that they don't allow claude in China, but those labs use some proxy services to access claude and mix traffic with regular users to hids its activities
  _aavaa_ a day ago
  I don’t think so. It reads much more like “distillation is okay when you do it to your own models.”
credit_guy a day ago
I don't think this counts as distillation. Distillation is when you use a teacher model to train a student model, but crucially, you have access to the entire probability distribution of the generated tokens, not just to the tokens themselves. That probability distribution increases tremendously the strength of the signal, so the training converges much faster. Claude does not provide these probabilities. So, Claude was used for synthetic training data generation, but not really for distillation.
- hooloovoo_zoo a day ago
  Sampling repeatedly gives them an estimate of the probability distribution in any case though.
  hooloovoo_zoo a day ago
  That would be an interesting paper actually; what is the optimal sampling technique given you only have access to the token outputs. Surely someone has already done it.
m4rtink a day ago
Oh no! They are stealing all the data we have stolen ourselves! This needs to be stopped and punished immediately!
veunes 13 hours ago
If just 16 million examples were enough to significantly boost model quality (as Anthropic claims), it turns out that data quality beats quantity
Instead of vacuuming petabytes of trash from Common Crawl, you can just take high-quality distillate from a SOTA model and get comparable results. Bad news for anyone betting solely on massive compute clusters and closed datasets
kingstnap a day ago
Cry me a river, build a bridge, and get over it?
They publish weights and useful research for everyone to benefit.
I mean this is incredibly tone deaf for a company facing multiple lawsuits over where they got their training data from.
ChrisArchitect a day ago
Discussion on Source: https://www.anthropic.com/news/detecting-and-preventing-dist... (https://news.ycombinator.com/item?id=47126177)
https://news.ycombinator.com/item?id=47126614
SilverElfin a day ago
One difference between Anthropic and others is that Anthropic is crawling publicly visible information, and their argument is that this is fair use. Whereas these Chinese LLMs are circumventing an account creating process and terms of service to misuse non public information.
Lots of people think Anthropic training their own LLM is the same but it really isn’t.
saberience a day ago
Pot, meet kettle!
I don’t think I’m the only one feeling some schadenfreude at this news. I suppose it’s ok when you’re a hot Silicon Valley scale-up to slurp up the rest of the worlds data for free and then hire hot shot lawyers to defend you against all the creatives you ripped off, but when it’s the “evil” Chinese doing the same to you it’s a dastardly “attack”?
- m4rtink a day ago
  Yeah - not only have we seend some of the same large companies that have trampled regular people and made examples of them in name of defending copyright fully ignore it when it was time to feed their AI models.
  And now the hypocrisy went full circle with complains of others not respecting their rights!