• divbzero an hour ago

    The article is focused on Nvidia, but note that Apple [1][2] and Google [3] have also been working in this area and will undoubtedly continue to do so.

    [1]: https://developer.apple.com/machine-learning/core-ml/

    [2]: https://machinelearning.apple.com/research/neural-engine-tra...

    [3]: https://research.google/blog/improved-on-device-ml-on-pixel-...

    • airstrike 2 hours ago

      400B parameters in a "plug-and-play" form factor for $6k is wild.

      Meanwhile all the usual "desktop" players are still trying to find a way to make good on their promises to develop their own competitive chips for AI inference and training workloads in the cloud.

      I'm betting on Nvidia to continue to outperform them. The talent, culture and capabilities gap just feels insurmountable for the next decade at least barring major fumbles from Nvidia.

      • KronisLV 2 hours ago

        Even 200B for 3k USD is really good, 128 GB of memory for that price is surprising!

        Right now using a two RTX 2080 setup for a pilot project, where it runs ollama and qwen2.5-coder (14B quantized version), a more serious step up from there on a budget would realistically be an RX 7900 XTX (24 GB, though I've had issues with setting up ROCm) for 1k EUR or RTX A5000 (24 GB) for 2.5k EUR in the current market.

        Honestly, I don't even need the best performance for what I'm trying to do, even an Arc A770 (16 GB) for 350 EUR would be enough to iterate, except that it's not actually supported by ollama and lots of stuff out there: https://github.com/ollama/ollama/blob/main/docs/gpu.md (I know ollama isn't the only solution, but it sucks when the tools you like aren't available)

        • kristianp an hour ago

          What are you doing that requires local over calling an API provider? If you're developing an AI app that makes lots of calls and is designed for a local GPU it makes some sense perhaps?

          • redundantly an hour ago

            If performance isn't an issue, why not get a Mac mini?

            • KronisLV an hour ago

              A really good suggestion, I actually used a MacBook Air with an M1 in the first stages and it was fine for prototyping too!

              Though the local market here is a bit bad. There's a Mac Mini with an M2 Pro and 16 GB for 1.9k EUR, so more expensive than just two GPUs. I'm guessing that the local sellers are trying to profit quite a bit, because on Apple's site, the M4 16 GB version starts at 600 USD (and for 24 GB it's 800 USD and for 32 GB it's 1000 USD). That actually makes it a good option!

          • llm_nerd 2 hours ago

            nvidia could just release variations of their "gaming" cards with more RAM. There is absolutely nothing stopping them from releasing 64GB or 128GB 5080s and 5090s.

            But they don't because that would cannibalize the extremely overpriced enterprise offerings. The #1 reason people are forced into the tens of thousands of dollar cards is memory needs.

            So considering that, ask what niche this device really fills: Is it a new "supercomputer" for the home? Not really, given that it is silicon and memory bandwidth restricted so much that their $600 GPU can beat it soundly on every metric (not surprising when you look at the power and airflow/cooling needs of "real" GPUs) but in scenarios requiring large memory. But while this can hoist those larger models, it is going to be far removed from state of the art.

            It's neat, but the market for this is being grossly overstated on a lot of these hype advertorials. The large models you'll run on it will be quantized the point of absurdity, not to mention that for 99.9%+ of users, anything short of state of the art is basically useless.

            It's a neat eGPU of sorts for a Mac or something (they really hype the fact that you use CUDA for this). Still really trying to figure out what value it possibly brings outside of trying to lure a bunch of enthusiasts to blow money on this so they can fiddle with Llama for a week and then realize it's a waste of time.

            • throwup238 an hour ago

              Would it really cannibalize the enterprise offering? The whole point of that NVIDIA Enterprise license is that the their TOS forbids using anything else in datacenters.

            • esperent 2 hours ago

              Is there any info about how many tokens a second you would get with a 400b model? Without that it's like claiming a graphics card can output at 8k, but neglecting to state the frames per second. Suspect, in other words.

              Outputs at 8k resolution! (+)

              ...at 5 fps (+)

              • mmoskal an hour ago

                I saw somewhere it's 0.5T/s of memory bandwidth. So you can read the whole memory (128G) 4 times per second. So if you fill it up with model you get 4t/s give or take.

                For comparison H100 can read its memory 40 times per second, so if you use it all you can get around 40t/s.

                Of course in either case you don't have to fill it up, but instead use smaller model or more GPUs.

              • ekianjo 2 hours ago

                What special sauce do they have to run 400B models fast exactly?

                • zeta0134 2 hours ago

                  From the article:

                  > Huang also revealed ‘Project Digits,’ a new product based on its Grace Blackwell AI-specific architecture that aims to offer at-home AI processing capable of running 200 billion-parameter models locally for a projected retail cost of around $3,000.

                  > There are many exciting things about Project Digits, including the fact that two can be paired to offer 405 billion-parameter model support for ‘just’ $6,000

                  My experience with running local LLMs is quite limited, but most tools can split the workload between GPUs (or more commonly GPU+CPU) with minimal fuss. It parallelizes fairly well. There may not be any actual secret sauce beyond just having the necessary gobs and gobs of fast memory to load the model into.

                  • dragontamer 2 hours ago

                    LPDDR5x.

                    Which is way slower than GDDR6x or GDDR7 let alone HBM. I don't expect these machines to be anywhere near as fast as the hype.

                    256-bit LPDDR5X is impressive, don't get me wrong. But it's impressive for a CPU platform. It's actually pretty bad for a GPU.

                    • llm_trw 2 hours ago

                      Quantization and hype.

                  • richardw 2 hours ago

                    I don’t get the need. With gaming there’s a real benefit to having the card close to a display. There’s enough benefit that you don’t mind it being unused 20 hours a day. There’s relatively little benefit to having training happen a few feet away rather than a data center. Solid chance it sits unused most of the time, and when you really need it you run into capacity issues, so you’d need to predict your future needs carefully or be happy waiting for a job to finish.

                    AI training feels like transport. You rent the capacity/vehicle you need on demand, benefit from yearly upgrades. Very few people are doing so much training that they need a local powerhouse, upgraded every year or so.

                    Even sharing the hardware in a pool seems more rational. Pay 200/month for access to a semi private cluster rather than having it sit on your desk.

                    • ClemFandango22 2 hours ago

                      I'm not too familiar with the AI space but I wonder if this is an effort from NVIDIA to combine their AI and Gaming markets. Did this come from a conversational question stating, 'How do we sell discrete cards through our existing manufacturing partnerships to both gaming enthusiasts and AI enthusiasts?'. I do wonder how comfortable they are pivoting back to being a consumer hardware company if AI becomes a more competitive space, or if the 'hype' subsides. Pure speculation and I'm probably off the mark.

                      • richardw an hour ago

                        Yeah quite possible. They have distribution, brand, customers used to paying a lot of money. Can go far just selling to the portion of existing customers who would like "the best local setup for AI".

                      • gunian 32 minutes ago

                        privacy for machine learning to be locally usable besides a chat interface in a meaningful way I would have to give it access to pretty much all my digital data

                        sending that over the network feels very idk icky because it's not just photos or emails

                        • kadushka 2 hours ago

                          I see the DIGITs box as mainly for inference, not training. It allows me to load a fairly large model (e.g. 70B llama or 12B flux) and run it locally at decent speeds.

                          • richardw an hour ago

                            Then surely far simpler custom chips are the eventual model, like happened with crypto? Groq, Etched etc. In that universe, Nvidia has absolutely no moat and a thousand chips are coming.

                          • echelon 2 hours ago

                            A video editor wants the tool sitting on their desk. Not pay-per-gen SaaS where the results are garbage nine out of ten times.

                            The comfy ecosystem is rife with people that want local tools.

                            • returnInfinity 9 minutes ago

                              6000 / 24 = 250 usd

                              If you replace your GPU every 2 years, its 250 usd per month

                              If the price halves, still its 125 usd.

                              Even if price halves and use for 5 years, its 50 usd per month.

                          • gnabgib 2 hours ago

                            Related Nvidia's Project Digits is a 'personal AI supercomputer' (622 points, 8 days ago, 501 comments & jeans) https://news.ycombinator.com/item?id=42619139

                            • aquietlife 2 hours ago

                              What is Nvdia's track record with releasing/supporting its own Linux-based OS? Can I easily switch to a different OS?

                            • walterbell 2 hours ago

                              https://www.tomshardware.com/pc-components/cpus/nvidia-arm-s...

                              > Nvidia will be introducing two new chips, the N1X at the end of this year and the N1 in 2026. Nvidia is expected to ship 3 million N1X chips in Q4 this year and 13 million vanilla N1 units next year. Nvidia will be partnering with MediaTek to build these chips, with MediaTek receiving $2 billion in revenue.. Nvidia will show off its upcoming ARM-based SoCs in Computex in May.

                              • rsanek an hour ago

                                pretty unconvinced. when desktop gaming started you didn't have low latency high bandwidth reliable internet. if you did, you probably wouldn't have people buying cards at all and instead GeForce Now would have been the whole market.

                                we're already at that stage now with AI / LLMs. this type of physical product will remain niche.

                                • gunian 24 minutes ago

                                  we do now but Stadia failed to have a meaningful impact for a reason

                                  obviously companies will try to migrate everything to the SaaS model because it is better for revenue and bug fixes but not necessarily the best user experience

                                  don't want games to stop if I dont't have wifi and i don't want ML stuff to require internet or feel comfortable sending my entire digital data to a third party or have models be updated because of some compliance policy

                                  but it's still in the infancy companies will do everything to make it over the network one or two nice tech people will try to make it local/hybrid hopefully they succeed

                                  • divbzero an hour ago

                                    Not everyone has low latency high bandwidth reliable internet, and almost no one has it all the time. In addition, the privacy and security benefits from running things locally is a requirement for some.

                                    • hx8 an hour ago

                                      The people I know that use local AI instead of remote AI like the privacy, response time, and not being charged per-query.

                                    • prashp 2 hours ago

                                      Still no word on where and how to buy these?

                                      • jazzyjackson 2 hours ago

                                        It’s just been announced, shipping in July last I heard, will be a few months before you can pre order

                                      • paxys 2 hours ago

                                        Is this just an ad for nvidia's new box or is the author actually making a point?

                                        • xyst 2 hours ago

                                          An ad masquerading as “insight” is nothing new in this space. Author is eating up the NVDA marketing and joining the cult.

                                          • beefnugs 30 minutes ago

                                            And kind of a poor choice of words, since what they "did for gaming" recently is abandon what made them a powerhouse company to begin with on some new adventure of borderline scam chasing

                                            • paxys 2 hours ago

                                              Even so, I expected something a little more insightful than a copy paste of nvidia's press release. And for some reason it is upvoted to the top of HN.

                                          • MetroWind 2 hours ago

                                            They did stuff to desktop AI?

                                            • enasterosophes 2 hours ago

                                              This is just an ad.

                                              • ekianjo 2 hours ago

                                                From what I could gather on related communities Project Digits will run 200B models very slowly so there is no breakthrough there'

                                                • xyst 2 hours ago

                                                  I’ll wait for the benchmarks. NVDA marketing is known to oversell.

                                                  • skywhopper 2 hours ago

                                                    Expensive and rare?

                                                    • ls612 2 hours ago

                                                      These things look pretty small I wonder if someone will make a 2U rack tray to hold a few of them.

                                                      • ekianjo 2 hours ago

                                                        The track record of standalone Nvidia appliances is pretty poor. The shield console and it's portable version disappeared fairly quickly, the Jetson dev board is for laughs since software support is awful, so I am not holding my breath for this one.

                                                        It will take more than jeans and leather jackets to sell those

                                                        • murderfs 2 hours ago

                                                          The Shield TV came out in 2015, was last refreshed in 2019 (which you can still buy today), and the entire line is still getting updates. That's longer than e.g. the PS Vita.