Comments Page - Who are AMD, Intel's new manycore monster CPUs for?

« Back Who are AMD, Intel's new manycore monster CPUs for?theregister.comSubmitted by rbanffy 9 months ago

peutetre 9 months ago
Video encoding. To get the best quality for bitrate you need to use a software encoder, and to get the best encode time you need to give it CPU resources.
It's been impressive how much SVT-AV1 has increased performance between releases. SVT-AV1 2.2 is a significant step up from 2.1:
https://www.phoronix.com/news/SVT-AV1-2.2-Released
https://gitlab.com/AOMediaCodec/SVT-AV1
- undefined 9 months ago
  [deleted]
noizejoy 9 months ago
I’m fascinated by the idea of my future music making DAW computer having one of those manycore monsters.
That’s also because my favourite software synthesizers are increasingly modelling instruments, rather than sample based instruments. And that reduces the need for RAM and storage, but increases the thirst for CPU cycles.
mbrumlow 9 months ago
Me?
I compile most of my OS and I would like it faster. I also like being able to compile and game at the same time. Or run many OSs in VMs at the same time.
- shepherdjerred 9 months ago
  So many developer workloads can be sped up significantly by throwing more cores at the problem. Classic examples are compilation and linting, but even unit/integration/end-to-end testing can be made feasible locally if you have a strong machine and architect your tests properly.
  I’ve done this before with the testcontainers library and it enables some really nice workflows.
- stouset 9 months ago
  Honest question: what do you actually accomplish by compiling your OS?
  bravetraveler 9 months ago
  I want to get this out first: an OS is not a single unit. It is a collection of parts. Operating... System. With that out of the way, modification!
  It actually becomes my OS, not at the mercy of The Build or provider. Licensing gets tricky here. You have to be careful if intending to redistribute the work.
  Open source allows you to compile and not care about much else for your use. The code lets you do what you want or need. Compiled binaries limit your options, one can't as readily change them or find what they do.
  Imagine BigCo gives you something to support a widget or tool you use. Nay, need. It stops working, the job is no longer being done. What do you do? What you can or what they allow.
  While everyone may not do this, they absolutely benefit from the ability.
  Many compile simply to get the most-representative CPU optimizations. See '-march=native', for example. Binary distributions have to make assumptions. Compiling source lets you correct them.
  On the Linux side I prefer Fedora. Binary distribution with excellent packaging tools. I can rebuild anything with the same commands, but rarely have to. Good defaults with build options and patches.
  stouset 9 months ago
  I’m not asking what’s the benefit of open source. I’m asking what’s the benefit of compiling it yourself versus using distro-provided binary packages, and based on this answer you’re confirming what I learned from two years of Gentoo twentyish years ago: pretty much not a damn thing.
  Spending dozens of hours compiling packages from source over and over is a comically poor tradeoff in time, energy, and dollars for getting a few minor CPU-hyperspecific optimizations that are utterly unnoticeable for everything outside of extremely specific circumstances. And in those cases compiling one or two things yourself can get you 98% of the sought-after benefits.
  To each their own but… I’ll pass.
  bravetraveler 9 months ago
  I cover both, but you're right about choice. Not looking to sway. I explained background, my position, and some reasoning. I'll try again.
  I don't compile the whole OS, or as you say, spend dozens of hours compiling packages from source.
  In the small cases where I do compile something, it's this process:
  fedpkg clone -a -b ... packagename cd packagename # toy around, make customizations that justify the compilation fedpkg mockbuild dnf up ./results*/*/*.rpm
  You're making a strawman with this literal position of compiling everything, bathing it in hyperbole. I agree, that's a waste of time, so I don't run Gentoo. I run Fedora where I can compile what I need to. Reliably. Or I'd still be on Arch.
  I apply patches, test them, suggest fixes upstream. That's what I accomplish. I can't speak for everyone but I can try to help. You're welcome; it's literal maintenance. Again, not everyone does it. Remember when I said this?
  > While everyone may not do this, they absolutely benefit from the ability.
  Another reminder, this was the question:
  > what do you actually accomplish by compiling your OS?
  I do it [for components] so others don't have to. Christ. I'm explaining how it's used, not why you should change anything. I feel we mostly agree, I rambled - like I did here. It's rarely worth it. I found a decent middle ground. Build/test what I have to, most easily.
  Getting back on topic: more CPU cores help with the time this requires. Machine time is spent so human time isn't.
  stouset 9 months ago
  Sorry, but I wasn’t making a straw man: the GP I responded to explicitly made the point that they wanted a manycore monster CPU to speed up compiling their entire OS from scratch.
  If all you’re talking about is compiling a few targeted packages for customization or performance reasons, that’s absolutely a reasonable and sensible approach to things.
  My question was targeted at the GP’s use case which I wholeheartedly believe is a largely senseless waste of time, effort, CPU cycles, and money.
  bravetraveler 9 months ago
  > Sorry, but I wasn’t making a straw man: the GP I responded to explicitly made the point that they wanted a manycore monster CPU to speed up compiling their entire OS from scratch.
  This is what they said, emphasis mine:
  > **most** of my OS
  Want to try again? It's so arbitrary, I'm good. "49% would be acceptable, 51% - hah!". Nobody dare modify/test 'glibc' or any of the precious libraries.
  Later, test police :) No hard feelings, I'll admit I don't communicate well and like this stuff a little too much
  stouset 9 months ago
  Every continuum has arbitrary points, a la “when is a pile a heap”.
  In this case it becomes insensible somewhere between selecting a few packages which are genuine performance bottlenecks for your workloads or need specific compiler flags for necessary features, and spec’ing out hardware to fit a $5,000+ 192-core CPU so you can feel good about your kernel being able to take advantage of AVX512 for your Linux desktop box.
  homarp 9 months ago
  see https://old.reddit.com/r/linux4noobs/comments/9tsaxl/why_doe...
teleforce 9 months ago
Imagine one of this 128-core CPUs utilizing several TBs of RAM connected via CXL and several PBs of storage combine with real-time Linux, and I'm in software-defined-X (SD-x for SDN, SDR, etc) workstation wonderland [1], [2].
[1] Samsung Unveils CXL Memory Module Box: Up to 16 TB at 60 GB/s:
https://www.anandtech.com/show/21333/samsung-unveils-cxl-mem...
[2] Huawei unveils its OceanStor A800 AI-specific storage solution; announces 128TB high-capacity SSD
https://www.datacenterdynamics.com/en/news/huawei-unveils-it...
[3] Real-time Linux is officially part of the kernel:
https://news.ycombinator.com/item?id=41594862
- dagurp 9 months ago
  imagine a Beowulf cluster of those
m463 9 months ago
```
  make -j$(nproc)
```
physicsguy 9 months ago
Supercomputers? I've done work on 192-core machines in the past doing MPI jobs.
- bee_rider 9 months ago
  It is crazy how few MPI processes it takes to get to 1000 cores now, haha.
  highfrequency 9 months ago
  What is MPI?
  bee_rider 9 months ago
  Message Passing Interface. It is a very popular API for distributed computing, particularly in the scientific computing/hpc world.
  porcoda 9 months ago
  Distributed memory parallel computing library. Most common in supercomputers with a high performance interconnect where you want to run one program that coordinates separate instances of itself via message passing (“SPMD” - single program multiple data parallelism). Slightly different than distributed computing that generally doesn’t assume a tightly coupled system with a low latency, high bandwidth interconnect. Allows you to scale up beyond what you can do within shared memory spaces where you are limited to low core counts (hundreds) and low memory (<1TB) - MPI is what you use when you’re in the many thousands of cores, many TB of RAM regime. In practice, supercomputers require a mixture: shared memory for many core parallelism, some hybrid host programming model for bringing accelerators in, and then MPI to coordinate between these shared memory/accelerator enabled hosts.
  dorfsmay 9 months ago
  https://en.wikipedia.org/wiki/Message_Passing_Interface
- NGRhodes 9 months ago
  As a fresh example, we (a University) have just this month had delivered about 9000 cores (and about 80 GPUs) of machines, these are being added to our existing 6000 cores (approx 5 year old) machines. This is a modest HPC system.
- CaliforniaKarl 9 months ago
  Supercomputers, yes, but also genomics workloads. The vast majority of those workloads are effectively tied to one machine, since they don’t support cross-machine scaling, nor things like MPI.
  Jtsummers 9 months ago
  Do they not support MPI and scaling across machines because of the features of the workloads or because of the features of the systems performing the work?
  If the latter, then maybe they can use multiple cores using things like OpenMP and the like to get some "free" scaling, but they could be made to work with MPI or across machines. However, if it's the former, more cores won't help them.
credit_guy 9 months ago
It looks to me Intel will cannibalize its older, 5th, generation Xeon processors. Here's a comparison of two 64 core processors, one is 5th generation, launched in Q4 '23 [1], and one is 6th generation, launched in Q2 '24 [2]. The first one as an MSRP of $12400 (not a mistake) and the second $2749. It is true that the 5th generation CPU has 128 threads, vs only 64 for the 6th generation, but is this worth a price premium of 350% ?
[1] https://www.intel.com/content/www/us/en/products/sku/237252/...
[2] https://www.intel.com/content/www/us/en/products/sku/240363/...
wyldfire 9 months ago
It's tremendously difficult to keep 192 cores truly busy at the same time. Unless you have enormous caches and enormous memory throughput, that is.
- c2h5oh 9 months ago
  That 192 core CPU comes with 12 ddr5 memory channels up to 6000mt/s - test results show 420-450 GB/s for both reads and writes.
  bubaumba 9 months ago
  NVidia should be afraid. Same GB/s on CPU are much more than on GPU. The problem here will be the price of the whole system. Anyway it's a step in the right direction if we want something like Copilot on premises or at home.
  exceptione 9 months ago
  Just curious, much more what?
  janwas 9 months ago
  Most of the SIMD code I write can consume tens of GB/s BW per core. From this perspective, it seems that 192 cores make for an imbalanced system.
- bubaumba 9 months ago
  Don't worry, 20 year back most people had no clue how to use the second core. They believed in GHz and when it became obvious the theoretical limit is close thought that's the end of evolution.
  Jtsummers 9 months ago
  People knew how to use the second core 20 years ago, SMP machines had been around for decades by that point. For the most part, the second core was taken care of by the OS which already had support for those SMP systems, and using multithreading or multiprocessing workloads.
  7speter 9 months ago
  I remember reading about how Intel thought they'd be making 10GHz CPUs by 2010 in 2004.
bhouston 9 months ago
Yeah, they are mostly for cloud providers. They will be running all of our docker containers and Kubernetes clusters and function as a service workflows as well as our databases.
It really should mean that cloud data centres should be able to greatly increase capacity without getting larger in terms of physical size. That is a huge net win for the cloud providers.
- mikepurvis 9 months ago
  I assume you end up getting bottlenecks to RAM, disk, and network when there’s this much parallelism in CPU. So even for cloud providers it’s probably most suited to some fairly specific CPU-heavy workloads.
  adgjlsfhk1 9 months ago
  you're under-estimating the power of 512 mb of cache, 12 channel ddr5, pcie 5 NVMEs and 880GB/s networking cards. The giant L3 cache means that memory pressure is significantly reduced (especially if many of the cores use similar data). Ram is the weakest link here (12 channels of ddr5 6000 is a ton from a consumer standpoint, but scaling up from dual channel on a 16 core cpu would only bring you up to 96 cores). The 128 lanes of PCIE though means that you don't have much bottleneck outside of ram (if you're doing things properly). If you use 64 of those for storage, that can be 100s of TB of ridiculously fast storage, and you still have another 64 left over for ridiculously fast networking.
  everforward 9 months ago
  In the cloud vendor space (and really virtualization in general) local disk is practically non-existent. The vast majority of apps these days are stateless and generate very little IO, and even if they do generate IO, it's more than likely going to some kind of network share so that the VM can fail over. If the VM's data is on local disk, you can't migrate it to a new host if the host it's on dies (along with its local disks).
  Presuming the other responder is correct about there being 880GB/s of network cards (and that the units are correct at GB/s instead of Gb/s or Gbps), that should be more than plenty. At the largest CPUs, 192 cores would be 4.5 GB/s per core (2.3 GB/s if dual socket). That's more than enough; I see a lot of dual-socket servers with 24 or so total cores running on 10 Gbps or maybe bonded 10 Gbps. That's about 100 MB/s per core.
- jiggawatts 9 months ago
  You would think so, but both Azure and Amazon have dragged their feet deploying the last couple of CPU generations from both Intel and AMD.
  I suspect it's a sign of the global downturn, cloud adoption in general seems to have stalled. Hence, no new CPU models being deployed at scale.
  When the downturn lifts, I full expect the big cloud providers to start deploying huge numbers of these but that seems to be at least a year away.
  bhouston 9 months ago
  I think that Amazon has been deploying Graviton though.
- woleium 9 months ago
  usually dcs are limited by the power and or cooling per rack, you end up with having to run more power and hvac, which can be prohibitively expensive. Alternatively you half fill racks when you upgrade.
andrewstuart 9 months ago
Me. The more CPU the better for all my systems.
bubblesnort 9 months ago
I wonder how hard it is to kill a fork bomb with 192 cores running full blast.
bhaney 9 months ago
Me
- fhdsgbbcaA 9 months ago
  Likewise, and I will USE all the cores @ 100%, as The Good Lord commanded.
vlovich123 9 months ago
TLDR: As you’d expect hyperscalers.