• interroboink 4 hours ago

    Seems like a nice idea — instead of the stark CPU/GPU divide we have today, this would fit somewhere in the middle.

    Reminds me slightly of the Cell processor, with its dedicated SPUs for fast processing, orchestrated by a traditional CPU. But we all saw how successful that was (: And that had some pretty big backing.

    Overcoming the inertia of the current computing hardware landscape is such a huge task. Maybe they can find some niche(s).

    • winwang 22 minutes ago

      I'd believe more in a heterogenous chip (e.g. MI300X, Apple M series, or even APUs) than in completely new chip tech.

    • wmf 9 minutes ago

      This is based on legitimate (although second-tier) academic research that appears to combine aspects of GPU-style SIMD/SIMT with Tera-style massive multithreading. (main paper appears to be https://www.utupub.fi/bitstream/handle/10024/164790/MPP-TPA-... )

      Historically, the chance of such research turning into a chip you can buy is zero.

      • Animats 3 hours ago

        Does anyone know what they mean by "wave synchronization"? That's supposedly their trick to prevent all those parallel CPUs from blocking waiting for data. Found a reference to something called that for transputers, from 1994.[1] May be something else.

        Historically, this has been a dead end. Most problems are hard to cut up into pieces for such machines. But now that there's much interest in neural nets, there's more potential for highly parallel computers. Neural net operations are very regular. The inner loop for backpropagation is about a page of code. This is a niche, but it seems to be a trillion dollar niche.

        Neural net operations are so regular they belong on purpose-built hardware. Something even more specialized than a GPU. We're starting to see "AI chips" in that space. It's not clear that something highly parallel and more general purpose than a GPU has a market niche. What problem is it good for?

        [1] https://www.sciencedirect.com/science/article/abs/pii/014193...

        • bhouston 3 hours ago

          GPUs have wavefronts so I assume it is similar? Here is a page that explains it:

          https://gpuopen.com/learn/occupancy-explained/

          • narag 3 hours ago

            We're starting to see "AI chips" in that space.

            "Positronic" came to my mind.

          • gnabgib 4 hours ago

            Discussion (28 points, 3 months ago, 32 comments) https://news.ycombinator.com/item?id=40650662

            • elromulous 2 hours ago

              Does anyone have any knowledge/understanding on how this is (or isn't?) fundamentally different from Intel's Xeon Phi?

              https://en.wikipedia.org/wiki/Xeon_Phi

              • throwawayffffas 3 hours ago

                How is this different from an integrated gpu other than it presumably doesn't do graphics.

                • yeahwhatever10 4 hours ago

                  When will we get the “Mill” cpu?

                  • theLiminator 2 hours ago

                    I've been following that saga for a long time. Seems mostly like vapourware sadly.

                    • mshook 3 hours ago

                      At this point, probably never it seems...

                    • exabrial an hour ago

                      I’m still waiting for a clockless core… some day

                      • aidenn0 35 minutes ago

                        www.greenarraychips.com

                      • pier25 3 hours ago

                        I'm probably missing something but why not use gpus for parallel processing?

                        • nine_k 2 hours ago

                          GPUs work on massive amounts of data in parallel, but they execute basically the same operations every step, maybe skipping or slightly varying some steps depending on the data seen by a particular processing unit. But processing units cannot execute independent streams of instructions.

                          GPUs of course have several parts that can work in parallel, but they are few, and every part consists of large amounts that execute the same instruction stream simultaneously over a large chunk of data.

                          • winwang 31 minutes ago

                            This is not true. Take the NVidia 4090. 128 SMs = 4x128=512 SMSPs. This is the number of warps which can execute independently of each other. In contrast, a warp is a 32-width vector, i.e. 32 "same operations", and up to 512 different batches in parallel. So, it's more like a 512-core 1024-bit vector processor.

                            That being said, I believe the typical number of warps to saturate an SM is normally around 6 rather than 4, so more like 768 concurrent 32-wide "different" operations to saturate compute. Of course, the issue with that is you get into overhead problems and memory bandwidth issues, both of which are highly difficult to navigate around -- the register file storing all the register of each process is extremely power-hungry (in fact, the most power-hungry part I believe), for example.

                            A PPU with less vector width (e.g. AVX512) would have proportionally more overhead (possibly more than linearly so in terms of the circuit design). This is without talking about how most programs depend on latency-optimized RAM (rather than bandwidth-optmized GDDR/HBM).

                            • nine_k a few seconds ago

                              I'm happy to stand corrected; apparently my idea about GPUs turned obsolete by now.

                          • JackSlateur 3 hours ago

                            Because GPU are physically built to manage parallel task, but only a few kinds

                            They are very specialized

                            CPU are generics, they have lots of transistors to handle a lot of different instructions

                            • Groxx 2 hours ago

                              Also moving data to and from the GPU takes MUCH more time than between CPU cores (though combined chips drastically lower this difference).

                          • somat an hour ago

                            Whenever I see the word "fintech" this is the article I am expecting, Instead I am disappointment with some drivel about banks.

                            I am not sure what is wrong with me, you would think my brain would have figured it out by now, but it always parses it wrong. perhaps if it were "finctech" that would help.

                            • cryptoz an hour ago

                              I’ve not yet had this problem but I surely will now! Thanks I guess.

                            • brotchie 4 hours ago

                              “Now, the team is working on a compiler for their PPU” good luck!

                              • bhouston 3 hours ago

                                While the Itanium failed the Ageia PPU did succeed with its compiler. It was acquired by NVIDIA and became CUDA.

                                https://en.wikipedia.org/wiki/Ageia

                                • gdiamos 3 hours ago

                                  It did indeed get merged into the CUDA group but I think the internal CUDA project predated it, or at least, several of the engineers working on it did

                                  • mepian an hour ago

                                    That's not the same PPU, is it?

                                  • claxo 3 hours ago

                                    Indeed, a very smart compiler would be necessary, perhaps too much for the current compiler art, like the itaniun.

                                    But...how about specializing to problems with inherent paralelism? LLMs maybe?

                                    • greenavocado 3 hours ago

                                      Is this like the Itanium architecture with its compiler challenges?

                                    • petermcneeley 3 hours ago

                                      > Now, the team is working on a compiler for their PPU

                                      I think a language is also required here. Extracting parallelism from C++ is non trivial.

                                      • poincaredisk 3 hours ago

                                        Something similar to CUDA or OpenCl should do it, right?

                                      • johnklos 3 hours ago

                                        Tell us something new, please.