Comments Page - DeepSeek R2 launch stalled as CEO balks at progress

« Back DeepSeek R2 launch stalled as CEO balks at progressreuters.comSubmitted by nsoonhui 19 days ago

teruakohatu 18 days ago
The title of the article is "DeepSeek R2 launch stalled as CEO balks at progress" but the body of the article says launch stalled because there is a lack of GPU capacity due to export restrictions, not because a lack of progress. The body does not even mention the word "progress".
I can't imagine demand would be greater for R2 than for R1 unless it was a major leap ahead. Maybe R2 is going to be a larger/less performant/more expensive model?
Deepseek could deploy in a US or EU datacenter ... but that would be admitting defeat.
- Thorrez 18 days ago
  The article says this:
  >June 26 (Reuters) - Chinese AI startup DeepSeek has not yet determined the timing of the release of its R2 model as CEO Liang Wenfeng is not satisfied with its performance,
  >Over the past several months, DeepSeek's engineers have been working to refine R2 until Liang gives the green light for release, according to The Information.
  But yes, it is strange how the majority of the article is about lack of GPUs.
  chvid 18 days ago
  I am pretty sure that the information has no access to / sources at Deepseek. At most they are basing their article on selective random internet chatter amongst those who follow Chinese ai.
  roenxi 18 days ago
  Presumably there is a CEO statement somewhere. If DeepSeek said May, but it is almost July, that would call for some comment from them.
  Although I'd like to know the source for the "this is because of chip sanctions" angle. SMIC is claiming they can manufacture at 5nm and a large number of chips at 7nm can get get the same amount of compute of anything Nvidia produces. It wouldn't be market-leading competitive but delaying the release for a few months doesn't change that. I don't really see how DeepSeek production release dates and the chip sanctions could be linked in the small. Unless they're just including that as an aside.
  archon1410 18 days ago
  > If DeepSeek said May
  It is pretty strange that DeepSeek didn't say May anywhere, that was also a Reuters report based on "three people familiar with the company".[1] DeepSeek itself did not respond and did not make any claims about the timeline, ever.
  [1]: https://www.reuters.com/technology/artificial-intelligence/d...
  rvnx 18 days ago
  How it is written it could be 3 anonymous and random guys from Reddit who heard about DeepSeek online.
  tonfa 18 days ago
  The phrasing for quoting sources is extremely codified, it means the journalists have verified who the sources are (either insider or people with access with insider information).
  MichaelZuo 18 days ago
  How does this matter?
  If the journalists aren’t fully trusted in the first place… trusting them to strictly adhere to even the best codified rules seems even less likely.
  tonfa 16 days ago
  Sure, if you don't trust anything what's the point. There's a lot of information that relies on anonymous sources and we usually use third party to vet them (otherwise how would they stay anonymous). Without this system we'd be missing out on a lot of things (if only named sources are used, a lot of things would never come out).
  (A lot of things break down in society without trust, maybe that's already how the US is? Where I live it is thankfully still somewhat ok)
  rvnx 16 days ago
  https://www.ndtv.com/world-news/donald-trumps-big-warning-to...
  The Washington Post, The New York Times, The New Republic, The Intercept, Rolling Stone, CBS News, CNN, Newsweek, USA Today, NBC News, Der Spiegel (Germany), The Sunday Times (UK), Daily Mail (UK), Al Jazeera (Qatar), RT (Russia), Xinhua (China), Press TV (Iran), Haaretz (Israel), Le Monde (France), El País (Spain) all have been caught using fake anonymous sources.
  MichaelZuo 15 days ago
  Do you not understand what “fully trust” means?
  No one I’ve ever heard of on HN fully trusts journalists.
  FooBarWidget 18 days ago
  Welcome to most China news. Many "well-documented" China "facts" are in fact cases like this: the media taking rumors or straight up fabricating things for clicks, and then self-referencing (or different media referencing each other in a circle) to put up the guise of reliable news.
  This is why we need to be critical of journalists nowadays. No longer are they the Fourth Column, protecting society and democracy by providing accurate information.
  chrisweekly 18 days ago
  Not just "China news", unfortunately.
  ethbr1 18 days ago
  The second (be critical of journalism as a field's accuracy) doesn't follow from the first (there are bad journalists).
  Especially since the alternative is to live in a world without facts.
  Which some people would probably love, but I prefer my reality to be constructed from objectivity rather than authority.
  FooBarWidget 18 days ago
  That sounds to me like you are excusing a bad reality based on a nonexistant ideal. Saying "there are bad journalists" is a huge understatement. There are many, perhaps even the majority. Ask yourself why society at large has stopped trusting mainstream media, it's not just because there are a "few" bad apples but because the bad apples are widespread and systemic.
  The tendency to compare to a nonexistant ideal is also something I find very very weird. This tendency does not exist for many other concepts. For example when people talk about communism, and someone say "hey $COUNTRY is just one bad apple, it doesn't mean real communism is bad" then others are quick to respond with "but all countries doing communism have devolved into tyranny/dictatorship/etc, so real communism doesn't exist and what we've seen is the real deal". I am not criticizing that (common) point of view, but people ought to take responsibility and apply this principle equally to all concepts, including "journalism".
  It also doesn't follow that my critique of journalists/journalism means tearing down journalism altogether. It can also mean:
  - that people need to stop trusting mainstream journalists blindly on topics they're not adept in. Right now many people have stopped trusting mainstream journalists only for topics they're adept in, but as soon as those journalists write nonsense about something else (e.g. $ENEMY_STATE) then they swallow that uncritically. No. The response should be "they lied about X, what else are they lying about?" instead of letting themselves be manipulated in other areas.
  - that society as a whole needs to hold journalism accountable, and demand that they return to the role of the Fourth Column.
  ethbr1 18 days ago
  > Ask yourself why society at large has stopped trusting mainstream media
  Because certain political interests take the existence of a fact-based, independent power center as a threat to their own power?
  And so engineered a multi-decade campaign to indoctrinate people against the news/media, thus removing a roadblock to imposing their own often contrary-to-fact narratives?
  Pretending this happened in a vacuum or was grassroots ignores mountains of money deployed with specific intent over spans of time.
  > It can also mean that society as a whole needs to hold journalism accountable, and demand that they return to the role of the Fourth Column.
  I absolutely agree with this.
  If I had my druthers, the US would reinstate the fairness doctrine (abolished in 1987) and specifically the components requiring large media corporations to subsidize non-profit newsrooms as a public good.
  The US would be a better place if we banned 24/7 for-profit news.
  Davidzheng 18 days ago
  Actually I think one of the researchers at Deepseek did say on Twitter but I think that tweet has since been deleted.
  maxglute 18 days ago
  >Reporting by Deborah Sophia in Bengaluru; Editing by Arun Koyyur
  Kek. Reminder after Sino India drama, India has basically 0 accredited journalist in China. The chances of Indian journalist "citing two people with knowledge of the situation" in Deepseek in Bengalurur before it's spreads over PRC rumor mill is vanishingly small.
  rfoo 18 days ago
  Yes. And those random Internet chatter almost certainly doesn't know what they are talking about at all.
  First, nobody is training on H20s, it's absurd. Then their logic was, because of high inference demand of DeepSeek models there are high demand of H20 chips, and H20s were banned so better not release new model weights now, otherwise people would want H20s harder.
  Which is... even more absurd. The reasoning itself doesn't make any sense. And the technical part is just wrong, too. Using H20 to serve DeepSeek V3 / R1 is just SUPER inefficient. Like, R1 is the most anti-H20 model released ever.
  The entire thing makes no sense at all and it's a pity that Reuters fall for that bullshit.
  reliabilityguy 18 days ago
  > Using H20 to serve DeepSeek V3 / R1 is just SUPER inefficient. Like, R1 is the most anti-H20 model released ever.
  Why? Any chance you have some links to read about why it’s the case?
  terafo 18 days ago
  MLA uses way more flops in order to conserve memory bandwidth, H20 has plenty of memory bandwidth and almost no flops. MLA makes sense on H100/H800, but on H20 GQA-based models are a way better option.
  pama 18 days ago
  Not sure what you are referring to—do you have a pointer to a technical writeup perhaps? In training and inference MLA has way less flops than MHA, which is the gold standard, and way better accuracy (model performance) than GQA (see comparisons in the DeepSeek papers or try deepseek models vs llama for long context.)
  More generally, with any hardware architecture you use, you can optimize the throughput for your main goal (initially training; later inference) by balancing other parameters of the architecture. Even if training is suboptimal, if you want to make a global impact with a public model, you aim for the next NVidia inference hardware.
  cma 18 days ago
  Didn't deep-seek figure out how to train with mixed precision and so get much more out of the cards, with a lot of the training steps able to run at what was traditionally post training quantization type precisions (block compressed).
  reliabilityguy 18 days ago
  MLA as in multi-head latent attention?
  terafo 18 days ago
  Yes
  reliabilityguy 18 days ago
  Ah, gotcha. Thank you
  selfselfgo 18 days ago
  I also find it strange. Do you think that it could be for political reasons?
  Thorrez 17 days ago
  No idea. I guess that's one possibility. Another idea is journalists grasping at straws to try to find something to expand an article.
- sschueller 18 days ago
  > lack of GPU capacity due to export restrictions
  Human progress that benefits everyone being stalled by the few and powerful who want to keep their moats. Sad world we live in.
  crazygringo 18 days ago
  It's not about people wanting to keep it in moats.
  It's about China being expansionist, actively preparing to invade Taiwan, and generally becoming an increasing military threat that does not respect the national integrity of other states.
  The US is fine with other countries having AI if the countries "play nice" with others. Nobody is limiting GPU's in France or Thailand.
  This is very specific to China's behavior and stated goals.
  mistrial9 18 days ago
  or also there is an interpretation of State actions that says that a government serves itself first and foremost. Unfortunately this includes governments you like and also governments you do not like. Before rising to invective here, also consider that this is not new at all, since early Kingdoms were ruled by Kings and abuse of power was more common than not, so lots of present day politics around the entire world, do have alternates to simple depostic will.
  So instead of splitting hairs about that description, lets highlight an idea that actually, millions of people doing millions of things per day consitutes its own system, despite what name you call it or who collects the taxes. Observing the actual behavior of that system ("data driven"?) has more benefits than hairsplitting of nomenclature for political studies.
  Why bother writing this? because simplistic labels for government actions in international affairs is Step 2 of "brain-off" us versus them thinking.
  Let's find ways to remove fuels from the fires of war. The stakes are too high. Third call to start thinking instead of invective here. Negotiation and trade are the tools. Name calling on those that work for "peace" is Step 2 again. IMHO
  FirmwareBurner 18 days ago
  >It's about China being expansionist, actively preparing to invade Taiwan, and generally becoming an increasing military threat that does not respect the national integrity of other states.
  Remove the word Taiwan and you are describing the US.
  >It's about China being expansionist
  US has been doing that since their inception as a country. Are you telling me the USs 750 foreign military bases located in at least 80 foreign countries and territories is NOT expansionism? Come on!
  >actively preparing to invade Taiwan
  The US illegally invaded Iraq and Afghanistan for 20 years killing and torturing innocents in the process and leaving the Taliban in power to further cause harm. Wow many countries did China invade? Yet somehow China is the boogieman? Please!
  > generally becoming an increasing military threat that does not respect the national integrity of other states.
  Same with the US, Trump threatened to annex Greenland and Canada, yet I don't see sanctions on the US.
  I don't see the US having any ground to stand on criticizing China.
  GoatInGrey 18 days ago
  But the US doesn't imprison me for criticizing the government, or being Muslim.
  FirmwareBurner 17 days ago
  Then you're clueless about or ignoring Guantanamo Bay. The ignorance and hypocrisy here is astonishing.
  properpopper 18 days ago
  Nice media-led narrative.
  The real reason is that the US cannot compete fairly
  IncreasePosts 18 days ago
  Is rampant IP theft and corporate espionage considered fair in your book?
  mjmsmith 18 days ago
  Whose rampant IP theft are you referring to?
  sschueller 18 days ago
  Hollywood from Edison...
  > Thomas Edison's aggressive patent enforcement in the early days of filmmaking, particularly his control over motion picture technology, played a significant role in the development of Hollywood as the center of the film industry. Driven by a desire to control the market and eliminate competition, Edison's lawsuits and business practices pushed independent filmmakers westward, ultimately leading them to establish studios in Los Angeles, away from Edison's legal reach.
  IncreasePosts 17 days ago
  China is the pretty obvious answer in this context.
  azemetre 17 days ago
  Odd. I’d think it’d be all the companies in the US ignoring IP and copyright laws.
  IncreasePosts 17 days ago
  Examples please.
  azemetre 17 days ago
  This source provides on-going case numbers for court litigation:
  https://www.bakerlaw.com/services/artificial-intelligence-ai...
  Seems like it would be a definitive list to me as it shows US AI companies getting sued for copyright infringement.
  IncreasePosts 17 days ago
  So you're capable of knowing what the judges will decide on these cases? You've already decided that they are liable for what they're accused of?
  And, isn't this the system working exactly how it is supposed to? Someone makes a claim and the courts decide, and then some kind of punishment will be doled out of the claim was found to be true?
  azemetre 16 days ago
  You're changing the the topic of conversation, you asked for cases and now you want judgements as well.
  IncreasePosts 16 days ago
  No, I'm not. You made the claim " I’d think it’d be all the companies in the US ignoring IP and copyright laws" - and then point to ongoing cases, where the outcome hasn't even been decided yet. They may be ignoring IP and copyright laws, but no one knows whether they are or aren't yet.
  azemetre 15 days ago
  I'm sure the parties suing the defendants feel very differently than you.
  SR2Z 18 days ago
  What exactly does fairly look like to you? Uyghurs in camps, suicide nets strung around factories?
  Like, even if you just want to talk about protectionism, China is way worse than the US pre-Trump. "Fairly" has nothing to do with foreign policy.
  sschueller 18 days ago
  I recall kids in cages back before trump was around. The US doesn't exactly have a clean track record when it comes to human rights and international law yet they are quick to point the finger at anyone else when they cross the line.
  [1] https://www.aclu.org/news/smart-justice/president-obama-want...
  SR2Z 13 days ago
  There is a vast distance between putting unaccompanied minors in holding cells and what Trump is doing.
  No nation is perfect, but the US has historically been better than many others.
  pfraze 18 days ago
  What are you talking about
  jaggs 18 days ago
  >> It's about China being expansionist
  Citations? Apart from usual Western government propaganda outlets perhaps?
  apples_oranges 18 days ago
  They are aiming for world domination by buying themselves into businesses all over the planet and by building up a very large army. But that’s just normal human behavior I guess.
  The problem is rather that if the only moral compass is the communist party it will suck
  ethbr1 18 days ago
  It's a bit rich to attribute goals of world domination to buying into businesses all over the world and building a very large army.
  By those metrics, the rest of the world should have been terrified by the US for the last 60 years...
  Those are necessary precursors to aggressive expansionism, but insufficient without political will.
  ericd 18 days ago
  Context matters, who is leading that machinery? They seem to have the political will to invade.
  ethbr1 18 days ago
  Taiwan is a tricky case. The CCP isn't unjustified in making a claim to it. Granted: that claim is contrary to international norms, law, and the population's self determination.
  But if China were only threatening to invade Taiwan it would be a gray area.
  Imho, their claims in the South China Sea are much more obviously expansionist, given the settled cases against them under international law.
  Much easier to see those boiling over into China invading a few populated islands of the Philippines.
  sschueller 18 days ago
  Interesting, I don't recall Iran having to flying bombers for 36 hours to bomb a US military base...
  jaggs 18 days ago
  Okay so thanks very much. That's not really a citation that's an opinion?
  To translate what you're saying. The Chinese are trying to establish the same kind of global trade collaboration that Europe and the US have done for the past hundred and x years? But the Chinese civilization is over 2000 years old, and they had a much larger global trade network back when the west was a pile of wooden shacks and feudal barbarism?
  They're also building up a large army in in the same way that the US and Europe have with NATO? I'm also not really sure what's wrong with the moral compass of the Chinese communist party? From what I can see at the moment it is authoritative, but not necessarily venal?
  It seems that the Chinese people themselves are enjoying a pretty good standard of living and quality of life? I've only been there two or three times, but I never saw the same kind of deprivation in China that I saw behind the Iron Curtain for instance.
  ethbr1 18 days ago
  > I'm also not really sure what's wrong with the moral compass of the Chinese communist party? From what I can see at the moment it is authoritative, but not necessarily venal?
  It's certainly corrupt. Xi didn't launch major, disruptive anti-corruption drives for no reason, but because he saw it as an existential threat to the CCP's legitimacy (after all, it did torpedo the Soviet Union).
  Granted, an alternate rationale was internecine power struggles within the party and removing political enemies, but there was some real corruption.
  The strongman argument against the CCP's moral compass is that it has no concept of or respect for individual rights: the party is above all.
  Historically, this has always ended tragically because eventually it will be abused to either justify suffering or party gain at the expense of people.
  Authoritarianism only works until someone bad grabs the reigns, and single-party non-democratic systems have a way of rewarding sociopaths.
  jaggs 18 days ago
  I think stones in glass houses comes to mind right now? :)
  ethbr1 18 days ago
  The fact that the US still has functioning separation of powers is counter evidence.
  People may gripe about fuzzy areas being stepped on and norms pushed (and they should gripe!), but there's a huge chasm between separation of powers in democracies and China.
  jaggs 18 days ago
  Calling in the marine guard without congress approval seems a little bit un-separate, but I'm not an expert so I'm not going to continue this conversation. You have an opinion and I have my very inexpert one too.
  ethbr1 18 days ago
  The Marines were rebased within the bounds of Constitutional and legal powers, as was the National Guard federalization and deployment.
  Not agreeing with a thing doesn't make it illegal.
  If Congress wants to prohibit Presidents from pushing these areas, then they're free to do so. (And expect they will once the clock tocks)
- Davidzheng 18 days ago
  but deepseek doesn't actually need to host inference right if they opensource it? I don't see why these companies even bother to host inference. deepseek doesn't need outreach (everyone knows about them) and the huge demand for sota will force western companies to host them anyway.
  teruakohatu 18 days ago
  Releasing the model has paid off handsomely with name recognition and making a significant geopolitical and cultural statement.
  But will they keep releasing the weights or do an OpenAI and come up with a reason they can't release them anymore?
  At the end of the day, even if they release the weights, they probably want to make money and leverage the brand by hosting the model API and the consumer mobile app.
  ngruhn 18 days ago
  If they continue to release the weights + detailed reports what they did, I seriously don't understand why. I mean it's cool. I just don't understand why. It's such a cut throat environment where every little bit of moat counts. I don't think they're naive. I think I'm naive.
  senko 18 days ago
  If you’re not appearing, you’re disappearing.
  Now they are firmly on the map, which presumably helps with hiring, doing deals, influence. If they stop publishing something, they run the risk of being labelled a one-hit wonder who got lucky.
  If they have a reason to believe they can do even better in the near future, releasing current tech might make sense.
  ngruhn 17 days ago
  I think those are valid points but it's hard for me to see that this is worth it. With the might of the CCP in the back and the giant labor pool that is China, surely they can make hiring work either way. If they now start offering a model that's cheaper and better then anyone else's, surely anyone will take notice, even if the weights are not open.
  bionhoward 18 days ago
  If moving faster is a most, then open source AI could move faster than closed AI by not needing to be paranoid about privacy and welcoming external contributions
  ngruhn 17 days ago
  But if an open model ever pulls ahead then the closed vendors can immediately piggyback on that.
  Davidzheng 18 days ago
  I don't think any of these companies are aiming at long term goal of making money from inference pricing of customers.
  diggan 18 days ago
  > I don't think any of these companies are aiming at long term goal of making money from inference pricing of customers.
  What is DeepSeek aiming for if not that, which is currently the only thing they offer that cost money? They claim their own inference endpoints has a cost profit margin of 545%, which might be true or not, but the very fact that they mentioned this at all seems to indicate it is of some importance to them and others.
  Davidzheng 18 days ago
  Well it's certainly helpful in the interim that they can recoup some money from inference. I'm just saying that with systems with more intelligence in the future can be used to make money in much better ways than charging customers for interacting with it. For instance it could conduct research on projects which can generate massive revenue if successful.
  coderatlarge 18 days ago
  maybe they benefit from the usage data they collect?
- torginus 18 days ago
  I am a bit sceptical about whether this whole thing is true at all. This article links to another, which happens to be behind a paywall, saying 'GPU export sanctions are working' is a message a lot of US administration, people and investors want to hear, so I think there's a good chance that unsubstantiated speculation and wishful thinking is presented as fact here.
- slowmovintarget 17 days ago
  > ...but that would be admitting defeat.
  Given that DeepSeek is used by the Chinese military, I doubt that it would be a reasonable move for them to host in the U.S., because the capability is about more than profit.
- undefined 18 days ago
  [deleted]
- impossiblefork 18 days ago
  The lack of GPU capacity sounds like bullshit though, and it's unsourced. It's not like you can't offer it as a secondary thing, sort of like O-3 or even just turning on the reasoning.
  coderatlarge 18 days ago
  maybe they’re just waiting to see if they can run on chinese sourced silicon? just speculating
  impossiblefork 17 days ago
  I think my real problem with it is that how slow it is is easy to predict beforehand. If it doesn't meet the goals set for it speed-wise, they could go for something smaller.
  It should only be quality which could be unpredictable before training.
  coderatlarge 17 days ago
  maybe they need to allocate that hardware to other uses and are waiting for alternatives?
  coderatlarge 13 days ago
  to wit:
  https://news.ycombinator.com/item?id=44441089
wizee 18 days ago
They just recently released the r1-0528 model which was a massive upgrade over the original R1 and is roughly on par with the current best proprietary western models. Let them take their time on R2.
- A_D_E_P_T 18 days ago
  At this point the only models I use are o3/o3-pro and R1-0528. The OpenAI model is better at handling data and drawing inferences, whereas the DeepSeek model is better at handling text as a thing in itself -- i.e. for all writing and editing tasks.
  With this combo, I have no reason to use Claude/Gemini for anything.
  People don't realize how good the new Deepseek model is.
  energy123 18 days ago
  My experience with R1-0528 for python code generation was awful. But I was using a context length of 100k tokens, so that might be why. It scores decently in the lmarena code leaderboard, where context length is short.
  diggan 18 days ago
  Would love to see the system/user prompts involved, if possible.
  Personally I get it to write the same code I'd produce, which obviously I think is OK code, but seems other's experience differs a lot from my own so curious to understand why. I've iterated a lot on my system prompt so could be as easy as that.
  Workaccount2 18 days ago
  The biggest reason I use Gemini is because it can still get stuff done at 100k context. The other models start wearing out at 30k and are done by 50k.
  diggan 18 days ago
  The biggest reason I avoid Gemini (and all of Google's models I've tried) is because I cannot get them to produce the same code I'd produce myself, while with OpenAI's models it's fairly trivial.
  There is something deeper in the model that seemingly can be steered/programmed with the system/user prompts and it still produces kind of shitty code for some reason. Or I just haven't found the right way of prompting Google's stuff, could also be the reason, but seemingly the same approach works for OpenAI, Anthropic and others, not sure what to make of it.
  brokegrammer 18 days ago
  I'm having the same issue with Gemini as soon as the context length exceeds 50k-ish. At that point, it starts to blurp out random code of terrible quality, even with clear instructions. It would often mix up various APIs. I spend a lot of time instructing it about not writing such code, with plenty of fewshot examples, but it doesn't seem to work. It's like it gets "confused".
  The large context length is a huge advantage, but it doesn't seem to be able to use it effectively. Would you say that OpenAI models don't suffer from this problem?
  JKCalhoun 18 days ago
  New to me: is more context worse? Is there an ideal context length that maps to a bell curve or something?
  diggan 18 days ago
  > New to me: is more context worse?
  Yes, definitely. For every model I've used and/or tested, the more context there is, the worse the output, even within the context limits.
  When I use chat UIs (which admittedly is less and less), I never let the chat go beyond one of my messages and one response from the LLM. If something is wrong with the response, I figure out what I need to change with my prompt and start new chat/edit the first message and retry, until it works. Any time I've tried to "No, what I meant was ..." or "Great, now change ..." the responses drop sharply in quality.
  tazjin 18 days ago
  Do you use the DeepSeek hosted R1, or a custom one?
  The published model has a note strongly recommending that you should not use system prompts at all, and that all instructions should be sent as user messages, so I'm just curious about whether you use system prompts and what your experience with them is.
  Maybe the hosted service rewrites them into user ones transparently ...
  diggan 18 days ago
  > Do you use the DeepSeek hosted R1, or a custom one?
  Mainly the hosted one.
  > The published model has a note strongly recommending that you should not use system prompts at all
  I think that's outdated, the new release (deepseek-ai/DeepSeek-R1-0528) has the following in the README:
  > Compared to previous versions of DeepSeek-R1, the usage recommendations for DeepSeek-R1-0528 have the following changes: System prompt is supported now.
  The previous ones, while they said to put everything in user prompts, still seemed steerable/programmable via the system prompt regardless, but maybe it wasn't as effective as it is for other models.
  But yeah outside of that, heavy use of system (and obviously user) prompts.
- Art9681 17 days ago
  A lemon is on-par with the best western models for the majority of use cases because they do not require "state of the art" intelligence to solve or respond to the user's query. This is what the benchmarks show.
  For anything that requires "AI level of intelligence", the difference is vast.
Aeolun 18 days ago
So Nvidia stock is going to crash hard when the Chinese inevitably produce their own competitive chip. Though I’m baffled by the fact they don’t just license and pump out billions of AMD chips. Nvidia is ahead, but not that far ahead.
My consumer AMD card (7900 XTX) outperforms the 15x more expensive Nvidia server chip (L40S) that I was using.
- Papazsazsa 18 days ago
  I don't know why this isn't the crux of our current geopolitical spat.
  Surely it would be cheaper and easier for the CCP to develop their own chipmaking capacity than going to war in the Taiwan strait?
  WJW 18 days ago
  China doesn't want Taiwan for the chip making plants, but because they consider its existence to be an ongoing armed rebellion against the "rightful" rulers. Getting the fabs intact would be nice, but it's not the main objective.
  The USA doesn't want to lose Taiwan because of the chip making plants, and a little bit because it is beneficial to surround their geopolitical enemies with a giant ring of allies.
  tw1984 18 days ago
  > China doesn't want Taiwan for the chip making plants, but because they consider its existence to be an ongoing armed rebellion against the "rightful" rulers.
  that is what the CCP tells you and its own people.
  the truth is taiwan is just the symbol of US presence in western pacific. getting taiwan back means the permanent withdrawal of US influence in the western pacific region and the offical end of US global dominance.
  CCP doesn't care the island of taiwan, they care about their historical positioning.
  WJW 18 days ago
  I think that is basically what I said already? What is ensuring historical positioning if not the righting of (perceived) old wrongs?
  In any case it's clear that it is not the fabs that China cares about when it is talking about (re)conquering Taiwan.
  sundache 18 days ago
  A problem they face in building their own capacity is that ASML isn't allowed to export their newest machines to China. The US has even pressured them to stop servicing some machines already in China. They've been working on getting their own ASML competitor for decades, but so far unsuccessfully.
  tw1984 18 days ago
  > A problem they face in building their own capacity is that ASML isn't allowed to export their newest machines to China.
  building their own capacity means building everything in China, that is the entire semiconductor ecosystem. just look at the mobile phones and EVs built by Chinese companies.
  Aeolun 18 days ago
  This is just a question of time. They can afford to wait, since the US is currently in the process of destroying itself.
  If I were China I’d be more worried about the other up and coming world power in India.
  phatfish 18 days ago
  Indeed, once western governments finally get the message about Indian migrants they are going to have to go somewhere else.
  Davidzheng 18 days ago
  US will intervene militarily to stop China from taking control of TSMC if Taiwan isn't pressured by US to destroy the plants themselves, so I don't think taking Taiwan is a viable path to leading in silica only lowering US ability but given the current gap in GPUs it's not clear how helpful this is to China. So all in all I don't think China views taking Taiwan as beneficial in the AI race at all.
  tw1984 18 days ago
  > US will intervene militarily
  with a reality tv show dude being the commander in chief and a news reporter being the defense secretary.
  life is tough in america, man.
  lossolo 18 days ago
  > US will intervene militarily to stop China from taking control of TSMC if Taiwan isn't pressured by US to destroy the plants themselves
  That's not certain. Most war games show that the U.S. would lose the war (also read official reports done for congress). You can't win against the world's leading producer of goods by trying to attack them from the sea when they can reach you with rockets from their own territory.
  Davidzheng 17 days ago
  I didn't say the goal would be to stop China from taking Taiwan, only TSMC. I don't think it's impossible US launches its own attacks on TSMC fabs if TSMC doesn't destroy their production line voluntarily.
  missedthecue 18 days ago
  China has wanted reunification since before the invention of the transistor.
  hopelite 18 days ago
  China will not go to war in or over Taiwan sort of the USA doing its common narcissistic psychopathic thing of instigating, orchestrating, and agitating for aggression. It seems though that some parts of the world have started understanding how to defuse and counter the narcissistic, psychopathic abusive cabal that controls the USA and is constantly agitating for war, destruction, and world domination due to some schizophrenic messianic self-fulfilling prophecies.
  kstrauser 18 days ago
  [flagged]
  hopelite 17 days ago
  You should understand what “whataboutism” is before you make accusations of it.
  TulliusCicero 18 days ago
  Yeah, and Putin will definitely never invade Ukraine! Any suggestion to the contrary is just blatant American warmongering.
  hopelite 17 days ago
  They are not the same situation. I won’t list all the reasons, but in broad strokes; Taiwan is a Pacific Ocean away from the warmongering USA (in American, btw, and a life long member of the war machine), Taiwan is not culturally inclined as Ukraine, Taiwan is not led by an alien group, China has time on their side, China has proximity on their side, China’s has dependency on its side, China has civilizational momentum on its side, China can see what has been done to the USA by a parasitic group from within. I could make a list that is several pages long, but it all amounts to the fact that at this point China really only has to be patient and offer some carrots, which Taiwan can already clearly see itself.
  America is a society and component of a civilization that has never really understood itself or its place in the world and history. We peaked in the industrial warfare of WWII, and then bumbled our way through trying to rely on our past self-involved achievement.
  TulliusCicero 16 days ago
  Whoosh
- Art9681 17 days ago
  "The hardware is the easiest part." - AMD probably
KronisLV 18 days ago
I wonder how different things would be if the CPU and GPU supply chain was more distributed globally: if we were at a point where we'd have models (edit: of hardware, my bad on the wording) developed and produced in the EU, as well as other parts of the world.
Maybe then we wouldn't be beholden to Nvidia's whims (sour spot in regards to buying their cards and the costs of those, vs what Intel is trying to do with their Pro cards but inevitably worse software support, as well as import costs), or those of a particular government. I wonder if we'll ever live in such a world.
- diggan 18 days ago
  > if we were at a point where we'd have models developed and produced in the EU, as well as other parts of the world.
  But we have models developing and being produced outside of the US already, both in Asia but also Europe. Sure, it would be cool to see more from South America and Africa, but the playing field is not just in the US anymore, particularly when it comes to open weights (which seems more of a "world benefit" than closed APIs), then the US is lagging far behind.
  ignoramous 18 days ago
  > when it comes to open weights (which seems more of a "world benefit" than closed APIs), then the US is lagging far behind.
  Llama (v4 notwithstanding) and Gemma (particularly v3) aren't my idea of lagging far behind...
  diggan 18 days ago
  > Llama (v4 notwithstanding) and Gemma (particularly v3) aren't my idea of lagging far behind...
  While neat and of course Llama kicked off a large part of the ecosystem, so credit where credit is due, both of those suffer from "open-but-not-quite" as they have large documents of "Acceptable Use" which outlines what you can and cannot do with the weights, while the Chinese counter-parts slap a FOSS-compatible license on the weights and calls it a day.
  We could argue if that's the best approach, or even legal considering the (probable) origin of their training data, but the end result remains the same, Chinese companies are doing FOSS releases and American companies are doing something more similar to BSL/hybrid-open releases.
  It should tell you something when the legal department of one of these companies calls the model+weights "proprietary" while their marketing department continues to calling the same model+weights "open source". I know who I trust of those two to be more accurate.
  I guess that's why I see American companies as being further behind, even though they do release something.
  cesarb 18 days ago
  > both of those suffer from "open-but-not-quite" as they have large documents of "Acceptable Use" which outlines what you can and cannot do with the weights
  Even worse, the "Acceptable Use" document is a separate web page, which can be updated at any time. Nothing prevents it from, for instance, being updated to say "company X is no longer allowed to use these weights".
  The "FOSS-compatible" licenses for these Chinese and European models are self-contained and won't suddenly change under your feet. They also have no "field of use" restrictions and, by virtue of actually being traditional FOSS licenses being applied to slightly unusual artifacts (they were originally meant for source code, not huge blobs of numeric data), are already well-known and therefore have a lower risk of unusual gotchas.
lossolo 18 days ago
May 2025, Shenzhen, China
HGX 8x Nvidia H100 cluster for sale.
https://imgur.com/a/r6tBkN3
You can buy whatever you want. Export controls are basically fiction. Trying to stop global trade is like trying to stop a river with your bare hands.
numair 18 days ago
> The Information reported on Thursday, citing two people with knowledge of the situation.
I miss the old days of journalism, when they might feel inclined to let the reader know that their source for the indirect source is almost entirely funded by the fortune generated by a man who worked slavishly to become a close friend of the boss of one of DeepSeek’s main competitors (Meta).
Feel bad for anyone who gets their news from The Information and doesn’t have this key bit of context.
- gwern 18 days ago
  I don't think it's well known that TI has FB CoIs. I didn't know myself until fairly recently. You can talk to a lot of people about this sort of stuff without anyone pointing it out.
  numair 18 days ago
  Absolutely. I think the FB CoI within American venture capital, where they have fully infested the LP and GP ranks of most funds, is a much bigger and more important story. It really helped me understand that we need to work really hard to keep the rest of the world’s capital markets free and open — a major focus for me these days.
  You never know which stories The Information won’t run, or which “negative” articles are actually deflections. Similarly, you never know which amazing startups remain shut out of funding, and a lot of entrepreneurs have no idea about the amount of back channel collusion goes on in creating the funding rounds and “overnight successes” they’re told to idolize.
  A random dude on HN such as me shouldn’t be the source of this knowledge. Hope someone takes up the cause, but we live in a time of astounding cowardice.
- Voloskaya 18 days ago
  I missed the old days of HN commenters when they might feel inclined to let the reader know who they are talking about without having to solve a 6 steps enigma.
sigmoid10 18 days ago
https://archive.is/byKrB
spaceman_2020 18 days ago
Honestly, AI progress suffers because of these export restrictions. An open source model that can compete with Gemini Pro 2.5 and o3 is good for the world, and good for AI
- energy123 18 days ago
  Your views on this question are going to differ a lot depending on the probability you assign to a conflict with China in the next five years. I feel like that number should be offered up for scrutiny before a discussion on the cost vs benefits of export controls even starts.
  spaceman_2020 18 days ago
  I'm not American. Ever since I've been old enough to understand the world, the only country constantly at war everywhere is America. An all-powerful American AI is scarier to me than an open source Chinese one
  throwaway290 18 days ago
  As Russian I only recently started to understand that russian government was at wars for a lot of its existence from USSR times: https://en.wikipedia.org/wiki/List_of_wars_involving_Russia#.... Many invasions and wars in places Russia should have no business in. Most of them not publicized in the country. Unlike US it was not spreading liberal values of individual freedom and against violent dictatorships, actually maybe the other way around
  Thlom 18 days ago
  The US is not at perpetual war to spread "liberal values".
  throwaway290 18 days ago
  I didn't say it is always the goal but if one country prevails over another country somewhere then usually it means first country's values propagate
  undefined 18 days ago
  [deleted]
  rescbr 18 days ago
  > against violent dictatorships
  Then look up Latin America’s history, where the US actively worked to install and support such violent dictatorships.
  Some under the guise of protecting countries from the threat of communism - like Brazil, Argentina and Chile, and some explicitly to protect US company’s interests - like in Guatemala
  throwaway290 18 days ago
  > Chile
  Yes fuckups happened. But then for results Russian intervention see CCP and how many people died from their hands and policies
  CamperBob2 18 days ago
  Well, that's different. The Russians meant well, you see.
  CamperBob2 18 days ago
  Well, at least protecting Latin American countries from the threat of communism was a nice thing for us to do, wasn't it? Communism would certainly have done more harm than we did.
  The lesson of present-day America is that democracy is too important to be left to the people.
  rescbr 17 days ago
  Ehhh… don’t forget that I wrote “under the guise”.
  Lots of US companies got a lot of money out of those US-supported dictatorships, while destroying local businesses and torturing and killing people. Those were also the era of closed-off economies, hyperinflation and environmental destruction, so what the local people got out of it?
  So yeah, thanks for protecting us from the dictatorship of the proletariat and fucking up our economies for decades. And I’m not also defending USSR and their imperialistic practices disguised as making the people as equal as possible - fuck them as well!
  There’s an old book called “Confessions of an Economic Hitman” that gets into some details of how the US supported those dictatorships under Project Condor and other CIA programs. Is it 100% truthful? Maybe not, but the gist of it is.
  CamperBob2 17 days ago
  Yeah, it's always someone else's fault, isn't it.
  The proles have let us both down. All my life, I was led to believe that a "dictatorship of the proletariat" would involve a bunch of morons wearing red hats, casting one last vote against their own interests to tear down the established order. So at least that turned out to be technically correct.
  jaggs 18 days ago
  Mmmm...
  vbezhenar 18 days ago
  Communism is good. By "protecting" them from communism, you doomed them to the horrors of capitalism.
  CamperBob2 18 days ago
  You don't have to hold a gun to someone's head to get them to practice capitalism. People will trade labor, goods and services voluntarily unless you go out of your way to stop them.
  And spare us the false equivalence bullshit that we all know is coming.
  andrekandre 17 days ago
  > trade labor, goods and services voluntarily
  small nitpick, but trading != capitalism
  capitalism is using capital (money, materials, and employees/work) as inputs to produce finished products with the goal of re-investing those profits into said production or into other markets
  simply trading or rendering services can be done without the need for constant growth/profits or investment as capital over time (e.g coops, traditional businesses etc)
  throwaway290 17 days ago
  That's not the definition of capitalism I see in my dictionary.
  CamperBob2 17 days ago
  It's just a case of the Redditification of HN. "Anything I don't like is capitalism. Either that, or the CIA is to blame. ... Wait, why not both?"
  throwaway290 17 days ago
  > You don't have to hold a gun to someone's head to get them to practice capitalism. People will trade labor, goods and services voluntarily unless you go out of your way to stop them.
  This is an important point. The only way communism "works" is top down enforcement.
  undefined 18 days ago
  [deleted]
  diggan 18 days ago
  > probability you assign to a conflict with China in the next five years. I feel like that number should be offered up for scrutiny before a discussion
  Might as well talk about the probability of a conflict with South Africa, China might not be the best country to live in nor be country that takes care of its own citizens the best, but they seem non-violent towards other sovereign nations (so far), although of course there is a lot of posturing. But from the current "world powers", they seem to be the least violent.
  energy123 18 days ago
  What is the security competition between South Africa and the US that would justify such an analogy?
  China is peaceful recently, at least since their invasion of Vietnam. But (1) their post-Deng culture is highly militaristic and irredentist, (2) this is the first time in history that they actually can rollback US influence, their previous inability explains the peace rather than lack of will (3) Taiwan from a realist perspective makes too much sense, as the first in the island chain to wedge between Philippines and Japan, and its role in supplying chips to the US.
  The lesson we should learn from Russia's invasion of Ukraine is to believe countries when they say they own another country. Not assume the best and design policy around that assumption.
  If you want to read some experts on this question, see this: https://warontherocks.com/?s=taiwan
  The general consensus seems to be around a 20-25% chance of an invasion of Taiwan within the next 5 years. The remaining debate isn't about whether they want to do it, it's about whether they'll be able to do it and what their calculation will be around those relative capabilities.
  layer8 18 days ago
  Are you saying that large-model capabilities would make a substantial difference in a military conflict within the next five years? Because we aren’t seeing any signs of that in, say, the Ukraine war.
  randomname93857 18 days ago
  We do see the signs and reports, you just have to look. LLMs are being adopted to warfare, with drones or otherwise, there is progress there, but it's not currently at the level of "substantial difference". And 5 years is huge time from progress perspective in this domain - just try to compare LLMs of today with LLMs of 2020.
  Davidzheng 18 days ago
  small scale drones are in use in that conflict. On device AI would be a game-changer no?
  layer8 18 days ago
  It’s not impossible, but also highly nontrivial. Apart from the actual AI implementation, power supply might be a challenge. And there is a multitude of anti-drone technology being continuously developed. Already today, an autonomous drone would have to deal with RF jamming and GPS jamming, which means it’s easily defeated unless it has the ability to navigate purely visually. Drones also tend to be limited to good weather conditions and daytime.
  energy123 18 days ago
  In terms of countermeasures, what's the difference between having a human drone pilot and having an AI (computer vision plus control) do it over cloud? I know I'm moving the goalposts away from edge compute, but if we are discussing the relevance of GPU compute for warfare it seems relevant.
  layer8 18 days ago
  Assuming human-level AI capabilities, not much of a difference, obviously. But I also don’t think that human operators are a bottleneck currently. Cost, failure rate, and technical limitations of drones is. If you are alluding to superhuman AI capabilities, that’s highly speculative as well with regard to what is needed for drone piloting, and also unclear how large the benefits of that would be in terms of actual operational success rate.
  energy123 18 days ago
  GPUs are used for signals intelligence today.
  tazjin 18 days ago
  > depending on the probability you assign to a conflict with China in the next five years
  And on who you would support in such a conflict! ;)
- tw1984 18 days ago
  > Honestly, AI progress suffers because of these export restrictions. An open source model that can compete with Gemini Pro 2.5 and o3 is good for the world, and good for AI
  DeepSeek is not a charity, they are the largest hedge fund in China, nothing different from a typical wall street funds. They don't spend billions to give the world something open and free just because it is good.
  When the model is capable of generating decent amount of revenues, or when there is conclusive evidence of showing being closed would lead to much higher profit, it will be closed.
- Papazsazsa 18 days ago
  "Then business will have to suffer."
- kstrauser 18 days ago
  DeepSeek refuses to acknowledge Tiananmen Square. I don’t want to use a model that’s known to heavily censor historical data. What else is it denying or lying about that’s going to affect how I use it?
  (In before “whatabout”: maybe US-made models do the same, but I’ve yet to hear reports of any anti-US information that they’re censoring.)
  subarctic 18 days ago
  I thought that censoring was done separately and doesn't occur if you run the model locally
  kstrauser 18 days ago
  Nope, it’s local too. I ran it in Ollama and it refused to tell me: https://honeypot.net/2025/01/27/i-like-running-ollama-on.htm...
rsanek 18 days ago
"wildly popular"? maybe there was alot of interest when it was released, but who even is still using R1 these days? i previously utilized it through perplexity but the o3/Gemini pro models are so much better i rarely bother to read its responses.
it's not even in the top ten based on OpenRouter https://openrouter.ai/rankings?view=month
- pama 18 days ago
  V3 is number 5 in your list. R1-0528(free) is number11 and R1(free) is number15. Openrouter separates the free (in the top 20 list you shared) vs paid (further down) instances of V3 and R1, and of course it doesnt count the direct connection to the providers, or the various self-hosted solutions (choice of companies working in sensitive areas, including many of my friends).
jekwoooooe 18 days ago
Ah well this time they can’t just illegally acquire a bunch of gpus and then just train a model from openai outputs. R1 was so overhyped
b0a04gl 18 days ago
no way this delay's about gpus lol. deepseek prob has r2 cooked already. r1‑0528 already pumped expectations too high. if r2 lands flat ppl start doubting.
or
who knows maybe they just chillin watching how west labs burn gpu money, let eval metas shift. then drop r2 when oai/claude trust graph dips a bit
- Art9681 17 days ago
  Distilling western SOTA models that now summarize their thought process is expensive in 2025.
qwertox 18 days ago
"We had difficulties accessing OpenAI, our data provider." /s
- jamesblonde 18 days ago
  Rumour was that DeepSeek used the outputs of the thinking steps in OpenAI's reasoning model (o1 at the time) to traing DeepSeek's Large Reasoning Model R1.
  orbital-decay 18 days ago
  More like a direct (and extremely dubious) accusation without proof from Altman. In reality those two models have as little in common as possible, and o1 reasoning chain wasn't available anyway.
  dachworker 18 days ago
  Maybe they also do that, but I work with a class of problems* that no other model has managed to crack, except for R1 and that is still the case today.
  Remember that DeepSeek is the offshoot of a hedge fund that was already using machine learning extensively, so they probably have troves of high quality datasets and source code repos to throw at it. Plus, they might have higher quality data for the Chinese side of the internet.
  * Of course I won't detail my class of problems else my benchmark would quickly stop being useful. I'll just say that it is a task at the undergraduate level of CS, that requires quite a bit of deductive reasoning.
  Art9681 17 days ago
  This is completely irrelevant without knowing if you are effectively prompting each model. Your workflow may just be suitable for a particular model and not others. And tuning a workflow for each model is tedious. I seriously doubt there is ANY class of problem DSR1 can solve that OAI's third tier model can't at this point (o4-mini).
  WiSaGaN 18 days ago
  Deepseek published thinking trace before OpenAI did, not after.
  msgodel 18 days ago
  I don't think so. They came up with a new RL algorithm that's just better.
  Art9681 17 days ago
  Better how? DeepSeek has never held the top spot in any aggregated benchmark. The Chinese bot armies are certainly better at convincing the internet they are trailing western models, despite the fact that in practical use this is not the case and at this rate, likely will never be. If AI progress is exponential, so is falling behind. What doesnt change however, is holding your ace card and dropping it when the time is right. China is competing with the public western models. Western SOTA labs are competing with their previous unreleased SOTA model. Publicly, China is ~3 months behind. But in reality, they are much further behind, and will never catch up.
  Be mindful of what this means. A kid in his garage fine tuning a model can "catch up" to SOTA models for most use cases. For actual "frontier" work that requires SOTA levels of intelligence, there are only 3 companies in the race. None of them are from China or Europe.
  tw1984 18 days ago
  OpenAI used literally all available text owned by the entire human race to train o1/o3.
  so what?
- astar1 18 days ago
  This, my guess is OpenAI wised up after r1 and put safeguards in place for o3 that it didn't have for o1, hence the delay.
  ozgune 18 days ago
  I think that's unlikely.
  DeepSeek-R1 0528 performs almost as well as o3 in AI quality benchmarks. So, either OpenAI didn't restrict access, DeepSeek wasn't using OpenAI's output, or using OpenAI's output doesn't have a material impact in DeepSeek's performance.
  https://artificialanalysis.ai/?models=gpt-4-1%2Co4-mini%2Co3...
  astar1 18 days ago
  almost as well as o3? kind of like gemini 2.5? I dug deeper and surprise surprise: https://techcrunch.com/2025/06/03/deepseek-may-have-used-goo...
  I am not at all surprised, the CCP views AI race as absolutely critical for their own survival...
  orbital-decay 18 days ago
  Not everything that's written is worth reading, let alone drawing conclusions from. That benchmark shows different trees each time the author runs it, which should tell you something about it. It also stacks grok-3-beta together with gpt-4.5-preview in the GPT family, making the former appear to be trained on the latter. This doesn't make sense if you check the release dates. And previously it classified gpt-4.5-preview to be in a completely different branch than 4o (which does make some sense but now it's different).
  EQBench, another "slop benchmark" from the same author, is equally dubious, as is most of his work, e.g. antislop sampler which is trying to solve an NLP task in a programmatic manner.
  Art9681 17 days ago
  The benchmarks are not reflective of real world use case. This is why OpenAI dominates B2B. As a business, its in your best interest to save money without sacrificing quality.
  "Follow the money."
  Businesses are pouring money into the OpenAI API. This is your biggest clue.
  undefined 18 days ago
  [deleted]
- imiric 18 days ago
  It would be hypocritical to criticize DeepSeek if this is true, since OpenAI and all major players in this space train their models on everything they can get their hands on, with zero legal or moral concerns. Pot, meet kettle.
- nsoonhui 18 days ago
  Not too sure why you are downvoted but OpenAI did announce that they are investigating on the Deepseek (mis)use of their outputs, and that they were tightening up the validation of those who use the API access, presumably to prevent the misuse.
  To me that does seem like a reasonable speculation, though unproven.
  xdennis 18 days ago
  I still find it amusing to call it "misuse". No AI company has ever asked for permission to train.
  viraptor 18 days ago
  Exactly because it's phrased like the poster knows this is the reason. I wouldn't downvote it if it was a clear speculation with the link to the OAI announcement you mentioned for bonus points.