• gnabgib 7 hours ago

    Small discussion already (16 points, 10 hours ago, 7 comments) https://news.ycombinator.com/item?id=41768144

    • blueflow 2 hours ago

      > To support those needs, there were clutches like Intel’s PAE, which allowed manipulating up to 64GB of RAM on 32-bit machines without changing the programming model, but they were just that: hacks.

      You can go look up how the 32-bit protected mode got hacked on top of the 16-big segmented virtual memory that the 286 introduced. The Global Descriptor Table is still with us on 64-bit long mode.

      So, its not PAE that is particularly hacky, its a more broader thing with x86.

      • renox an hour ago

        the RISC example in the article is a bit weird: on one hand, it may take even more instructions to load an address in a RISC, on the other hand all the RISC I know have an 'addi' instruction, no need to do 'li r2, 1; add r3, r1, r2' to add 1 to a register!

        • tzot 2 hours ago

          x32 ABI support exists at least in the kernel of Debian (and Debian based) distributions, and I know because I've built Python versions (along with everything else needed for specific workloads) as x32 executables. The speed difference was minor but existing, but the memory usage was quite a lot decreased. I've worked with a similar ABI known as n32 (there was o32 for old 32, n32 for new 32 and n64 for fully 64-bit programs) on SGI systems with 64-bit capable MIPS CPUs; it made a difference there too.

          Unfortunately I've read articles where quite-more-respected-than-me people said in a nutshell “no, x32 does not make a difference”, which is contrary to my experience, but I could only provide numbers where the reply was “that's your numbers in your case, not mine”.

          Amazon Linux kernel did not support x32 calls the last time I tried, so you can't provide images for more compact lambdas.

          • zokier 2 hours ago

            I find it weird that the convention to use char/short/int/long/long long has persisted so widely to this day. I would have thought that already back in the 16 -> 32 bit transition period people would have standardized and moved to stdint.h style types instead (i.e. int32_t etc).

            Sure, that doesn't change pointer sizes, but it would have reduced the impact of the different 64-bit data models, like Unix LP64 vs Windows LLP64

            • chipdart an hour ago

              > I find it weird that the convention to use char/short/int/long/long long has persisted so widely to this day.

              I don't think this is a reasonable take. Beyond ABI requirements and how developers use int over short, there are indeed requirements where the size of an integer value matters a lot, specially as this has a direct impact on data size and vectorization. To frame your analysis, I would recommend you took a peek at the not-so-recent push for hardware support for IEEE754 half-precision float/float16 types.

              • zokier 23 minutes ago

                The cases where you want platform-specific integer width (that is not something like size_t/uintptr_t) is extremely niche compared to cases where you want integer to have specific width.

                I don't see the relation to fp16; I don't think anyone is pushing for `float` to refer to fp16 (or fp64 for that matter) anywhere. `long double` is already bad enough.

            • jauntywundrkind 10 minutes ago

              Talking about the ISA needing to spend so much time addressing memory, I'm reminded of the really interesting Streaming Semantic Registers (SSRs) in Occamy, the excellent PULP group's 4096-core RISC-V research multichip design. https://arxiv.org/abs/1911.08356

              Just like the instruction pointer which implicitly increments as code executes, there are some dedicated data-pointer registers. There's a dedicated ALU for advancing/incrementing, so you can have interesting access patterns for your data.

              Rather than loops needing to load data, compute, store data, and loop, you can just compute and loop. The SSRs give the cores a DSP like level of performance. So so so neat. Ship it!