The article is focused on Nvidia, but note that Apple [1][2] and Google [3] have also been working in this area and will undoubtedly continue to do so.
[1]: https://developer.apple.com/machine-learning/core-ml/
[2]: https://machinelearning.apple.com/research/neural-engine-tra...
[3]: https://research.google/blog/improved-on-device-ml-on-pixel-...
400B parameters in a "plug-and-play" form factor for $6k is wild.
Meanwhile all the usual "desktop" players are still trying to find a way to make good on their promises to develop their own competitive chips for AI inference and training workloads in the cloud.
I'm betting on Nvidia to continue to outperform them. The talent, culture and capabilities gap just feels insurmountable for the next decade at least barring major fumbles from Nvidia.
Even 200B for 3k USD is really good, 128 GB of memory for that price is surprising!
Right now using a two RTX 2080 setup for a pilot project, where it runs ollama and qwen2.5-coder (14B quantized version), a more serious step up from there on a budget would realistically be an RX 7900 XTX (24 GB, though I've had issues with setting up ROCm) for 1k EUR or RTX A5000 (24 GB) for 2.5k EUR in the current market.
Honestly, I don't even need the best performance for what I'm trying to do, even an Arc A770 (16 GB) for 350 EUR would be enough to iterate, except that it's not actually supported by ollama and lots of stuff out there: https://github.com/ollama/ollama/blob/main/docs/gpu.md (I know ollama isn't the only solution, but it sucks when the tools you like aren't available)
What are you doing that requires local over calling an API provider? If you're developing an AI app that makes lots of calls and is designed for a local GPU it makes some sense perhaps?
If performance isn't an issue, why not get a Mac mini?
A really good suggestion, I actually used a MacBook Air with an M1 in the first stages and it was fine for prototyping too!
Though the local market here is a bit bad. There's a Mac Mini with an M2 Pro and 16 GB for 1.9k EUR, so more expensive than just two GPUs. I'm guessing that the local sellers are trying to profit quite a bit, because on Apple's site, the M4 16 GB version starts at 600 USD (and for 24 GB it's 800 USD and for 32 GB it's 1000 USD). That actually makes it a good option!
nvidia could just release variations of their "gaming" cards with more RAM. There is absolutely nothing stopping them from releasing 64GB or 128GB 5080s and 5090s.
But they don't because that would cannibalize the extremely overpriced enterprise offerings. The #1 reason people are forced into the tens of thousands of dollar cards is memory needs.
So considering that, ask what niche this device really fills: Is it a new "supercomputer" for the home? Not really, given that it is silicon and memory bandwidth restricted so much that their $600 GPU can beat it soundly on every metric (not surprising when you look at the power and airflow/cooling needs of "real" GPUs) but in scenarios requiring large memory. But while this can hoist those larger models, it is going to be far removed from state of the art.
It's neat, but the market for this is being grossly overstated on a lot of these hype advertorials. The large models you'll run on it will be quantized the point of absurdity, not to mention that for 99.9%+ of users, anything short of state of the art is basically useless.
It's a neat eGPU of sorts for a Mac or something (they really hype the fact that you use CUDA for this). Still really trying to figure out what value it possibly brings outside of trying to lure a bunch of enthusiasts to blow money on this so they can fiddle with Llama for a week and then realize it's a waste of time.
Would it really cannibalize the enterprise offering? The whole point of that NVIDIA Enterprise license is that the their TOS forbids using anything else in datacenters.
Is there any info about how many tokens a second you would get with a 400b model? Without that it's like claiming a graphics card can output at 8k, but neglecting to state the frames per second. Suspect, in other words.
Outputs at 8k resolution! (+)
...at 5 fps (+)
I saw somewhere it's 0.5T/s of memory bandwidth. So you can read the whole memory (128G) 4 times per second. So if you fill it up with model you get 4t/s give or take.
For comparison H100 can read its memory 40 times per second, so if you use it all you can get around 40t/s.
Of course in either case you don't have to fill it up, but instead use smaller model or more GPUs.
What special sauce do they have to run 400B models fast exactly?
From the article:
> Huang also revealed ‘Project Digits,’ a new product based on its Grace Blackwell AI-specific architecture that aims to offer at-home AI processing capable of running 200 billion-parameter models locally for a projected retail cost of around $3,000.
> There are many exciting things about Project Digits, including the fact that two can be paired to offer 405 billion-parameter model support for ‘just’ $6,000
My experience with running local LLMs is quite limited, but most tools can split the workload between GPUs (or more commonly GPU+CPU) with minimal fuss. It parallelizes fairly well. There may not be any actual secret sauce beyond just having the necessary gobs and gobs of fast memory to load the model into.
LPDDR5x.
Which is way slower than GDDR6x or GDDR7 let alone HBM. I don't expect these machines to be anywhere near as fast as the hype.
256-bit LPDDR5X is impressive, don't get me wrong. But it's impressive for a CPU platform. It's actually pretty bad for a GPU.
Quantization and hype.
I don’t get the need. With gaming there’s a real benefit to having the card close to a display. There’s enough benefit that you don’t mind it being unused 20 hours a day. There’s relatively little benefit to having training happen a few feet away rather than a data center. Solid chance it sits unused most of the time, and when you really need it you run into capacity issues, so you’d need to predict your future needs carefully or be happy waiting for a job to finish.
AI training feels like transport. You rent the capacity/vehicle you need on demand, benefit from yearly upgrades. Very few people are doing so much training that they need a local powerhouse, upgraded every year or so.
Even sharing the hardware in a pool seems more rational. Pay 200/month for access to a semi private cluster rather than having it sit on your desk.
I'm not too familiar with the AI space but I wonder if this is an effort from NVIDIA to combine their AI and Gaming markets. Did this come from a conversational question stating, 'How do we sell discrete cards through our existing manufacturing partnerships to both gaming enthusiasts and AI enthusiasts?'. I do wonder how comfortable they are pivoting back to being a consumer hardware company if AI becomes a more competitive space, or if the 'hype' subsides. Pure speculation and I'm probably off the mark.
Yeah quite possible. They have distribution, brand, customers used to paying a lot of money. Can go far just selling to the portion of existing customers who would like "the best local setup for AI".
privacy for machine learning to be locally usable besides a chat interface in a meaningful way I would have to give it access to pretty much all my digital data
sending that over the network feels very idk icky because it's not just photos or emails
I see the DIGITs box as mainly for inference, not training. It allows me to load a fairly large model (e.g. 70B llama or 12B flux) and run it locally at decent speeds.
Then surely far simpler custom chips are the eventual model, like happened with crypto? Groq, Etched etc. In that universe, Nvidia has absolutely no moat and a thousand chips are coming.
A video editor wants the tool sitting on their desk. Not pay-per-gen SaaS where the results are garbage nine out of ten times.
The comfy ecosystem is rife with people that want local tools.
6000 / 24 = 250 usd
If you replace your GPU every 2 years, its 250 usd per month
If the price halves, still its 125 usd.
Even if price halves and use for 5 years, its 50 usd per month.
Related Nvidia's Project Digits is a 'personal AI supercomputer' (622 points, 8 days ago, 501 comments & jeans) https://news.ycombinator.com/item?id=42619139
What is Nvdia's track record with releasing/supporting its own Linux-based OS? Can I easily switch to a different OS?
"I'll never buy a SBC from Nvidia unless all the SW support is up-streamed to Linux kernel," top comment from prev discussion https://news.ycombinator.com/item?id=42623030
https://www.tomshardware.com/pc-components/cpus/nvidia-arm-s...
> Nvidia will be introducing two new chips, the N1X at the end of this year and the N1 in 2026. Nvidia is expected to ship 3 million N1X chips in Q4 this year and 13 million vanilla N1 units next year. Nvidia will be partnering with MediaTek to build these chips, with MediaTek receiving $2 billion in revenue.. Nvidia will show off its upcoming ARM-based SoCs in Computex in May.
pretty unconvinced. when desktop gaming started you didn't have low latency high bandwidth reliable internet. if you did, you probably wouldn't have people buying cards at all and instead GeForce Now would have been the whole market.
we're already at that stage now with AI / LLMs. this type of physical product will remain niche.
we do now but Stadia failed to have a meaningful impact for a reason
obviously companies will try to migrate everything to the SaaS model because it is better for revenue and bug fixes but not necessarily the best user experience
don't want games to stop if I dont't have wifi and i don't want ML stuff to require internet or feel comfortable sending my entire digital data to a third party or have models be updated because of some compliance policy
but it's still in the infancy companies will do everything to make it over the network one or two nice tech people will try to make it local/hybrid hopefully they succeed
Not everyone has low latency high bandwidth reliable internet, and almost no one has it all the time. In addition, the privacy and security benefits from running things locally is a requirement for some.
The people I know that use local AI instead of remote AI like the privacy, response time, and not being charged per-query.
Still no word on where and how to buy these?
It’s just been announced, shipping in July last I heard, will be a few months before you can pre order
Is this just an ad for nvidia's new box or is the author actually making a point?
An ad masquerading as “insight” is nothing new in this space. Author is eating up the NVDA marketing and joining the cult.
And kind of a poor choice of words, since what they "did for gaming" recently is abandon what made them a powerhouse company to begin with on some new adventure of borderline scam chasing
Even so, I expected something a little more insightful than a copy paste of nvidia's press release. And for some reason it is upvoted to the top of HN.
They did stuff to desktop AI?
This is just an ad.
From what I could gather on related communities Project Digits will run 200B models very slowly so there is no breakthrough there'
I’ll wait for the benchmarks. NVDA marketing is known to oversell.
Expensive and rare?
These things look pretty small I wonder if someone will make a 2U rack tray to hold a few of them.
The track record of standalone Nvidia appliances is pretty poor. The shield console and it's portable version disappeared fairly quickly, the Jetson dev board is for laughs since software support is awful, so I am not holding my breath for this one.
It will take more than jeans and leather jackets to sell those
The Shield TV came out in 2015, was last refreshed in 2019 (which you can still buy today), and the entire line is still getting updates. That's longer than e.g. the PS Vita.