• andikleen2 14 hours ago

    Early x86-64 Linux had a similar problem. The x86-64 ABI uses registers for the first 6 arguments. To support variable number of arguments (like printf) requires passing the number of arguments in an extra register (RAX), so that the callee can save the registers to memory for va_arg() and friends. Doing this for every call is too expensive, so it's only done when the prototype is marked as stdarg.

    Now the initial gcc implemented this saving to memory with a kind of duffs device, with a computed jump into a block of register saving instructions to only save the needed registers. There was no boundary check, so if the no argument register (RAX) was not initialized correctly it would jump randomly based on the junk, and cause very confusing bug reports.

    This bit quite some software which didn't use correct prototypes, calling stdarg functions without indicating that in the prototype. On 32bit code which didn't use register arguments this wasn't a problem.

    Later compiler versions switched to saving all registers unconditionally.

    • veltas 13 hours ago

      In the SysV ABI for AMD64 the AL register is used to pass an upper bound on the number of vector registers used, is this related to what you're talking about?

    • Joker_vD 17 hours ago

      Raymond Chen has a whole "Introduction to IA-64" series of posts on his blog, by the way. It's such an unconventional ISA that I am baffled that Intel seriously thought they would've been able to persuade anyone to switch to it from x86: it's very poorly suited for general-purpose computations. Number crunching, sure, but anything more freeform, and you stare at the specs and wonder how the hell the designers supposed this thing to be programmed and used.

      • pjmlp 13 hours ago

        Itanium only failed because AMD for various reasons was able to come up with AMD64 and rug pull Intel's efforts.

        In an alternative universe without AMD64, Intel would have kept pushing Itanium while sorting out its issues, HP-UX was on it, and Windows XP as well.

        • pajko 4 minutes ago

          The first generation was complete garbage. Itanium 2 came too late and it did not get widespread due to wrong business decisions and marketing. By the time it could have been successful, AMD64 was out. And even then Intel targeted only the same high-end enterprise market segment, when they have implemented 64-bit on Xeon: https://www.cnet.com/tech/tech-industry/intel-expanding-64-b...

          • pjc50 11 hours ago

            Other way round: the only way any company other than Intel was able to get a new instruction set launched into the PC space was because Intel face-planted so hard with Itanium, and AMD64 was the architecture developers actually wanted to use - just make the registers wider and have more of them, and make it slightly more orthogonal.

            • pjmlp 9 hours ago

              Developers get to use the architectures OEM vendors make available to them.

              • pjc50 8 hours ago

                Sure, but the fact remains that AMD64 won in the market, despite the incumbent near-monopoly "Wintel" advantages.

                • pjmlp 5 hours ago

                  The whole premise is that it only won because AMD exists, and was allowed to come up with it.

                  People don't read comments before replying?

                  • Joker_vD 5 hours ago

                    In that different world, Transmeta would actually succeed in the market of x86-compatible CPUs and, perhaps, would even come up with their own 64-bit extension. Itanium would still flop.

                    Or maybe, if the push came to shove, the desktops would switch to something entirely different like Alpha. Or ARM! Such event would likely force ARM to come up with their AArch64 several years sooner than it actually happened.

                    • pjmlp 3 hours ago

                      Transmeta wasn't a success story to start with, died before Itanium, and Intel is one of the patent holders.

            • kelnos an hour ago

              We can't know for sure, but my guess is that Itanium still could have failed. I could imagine an alternative universe where, even with HP-UX and WinXP running on it, no one wanted to deal with porting their application software. And its emulation of 32-bit code (both in hardware and in software) was atrocious, so running existing, unported code wouldn't really take off either.

              Eventually Intel gives up after motherboard/desktop/laptop makers can't build a proper market for it. Maybe Intel then decides to go back and do something similar to what AMD did with x86_64. Maybe Intel just gives up on 64-bit and tries to convince people it's not necessary, but then starts losing market share to other companies with viable 64-bit ISAs, like IMB's POWER64 or Sun's SPARC64 or whatever.

              Obviously we can't know, but I think my scenario is at least as likely as yours.

              • undefined 8 hours ago
                [deleted]
                • bell-cot 10 hours ago

                  Suggested: https://en.wikipedia.org/wiki/Itanium#History

                  With how long it took Intel to ship expensive, incompatible, so-so performance ia64 chips - your theory needs an alternate universe where Intel has no competitors, ever, to take advantage of the obvious market opportunity.

                  • pjmlp 9 hours ago

                    I don't need suggestions for a time I live through, I am in computers since the 1980's.

                    Without AMD, there was no alternative in the PC world, It was already the first 64 bit version of Windows XP.

                    Since we're providing suggestions in computing history, I assume you can follow the dates,

                    https://en.wikipedia.org/wiki/Windows_XP_editions#Windows_XP...

                    • bell-cot 6 hours ago

                      > Without AMD, ...

                      Perhaps? I don't know enough to judge whether one of the other companies working on IA-32 compatible processors could plausibly have stepped in -

                      https://en.wikipedia.org/wiki/List_of_former_IA-32_compatibl...

                      It's true that most of those would have lacked the resources to replicate AMD's feat with AMD64. OTOH, AMD itself had to buy out NexGen to produce their K6. Without AMD and/or AMD64, there'd be plenty of larger players who might decide to fill the void.

                    • bombcar 9 hours ago

                      It was also an era where people were happily stating on PAE 32 bit x86 rather than pay the price and performance premium for Itanium.

                      4gb of RAM existed but many many systems weren’t even close to it yet.

                  • jcranmer 15 hours ago

                    Some guesses here:

                    First off, Itanium was definitely meant to be the 64-bit successor to x86 (that's why it's called IA-64 after all), and moving from 32-bit to 64-bit would absolutely have been a killer feature. It's basically only after the underwhelming launch of Itanium that AMD comes out with AMD64, which becomes the actual 64-bit version of x86; once that comes out, the 64-bitness of Itanium is no longer a differentiation.

                    Second... given that Itanium basically implements every weird architecture feature you've ever heard of, my guess is that they decided they had the resources to make all of this stuff work. And they got into a bubble where they just simply ignored any countervailing viewpoints anytime someone brought up a problem. (This does seem to be a particular specialty of Intel.)

                    Third, there's definitely a baseline assumption of a sufficiently-smart compiler. And my understanding is that the Intel compiler was actually halfway decent at Itanium, whereas gcc was absolute shit at it. So while some aspects of the design are necessarily inferior (a sufficiently-smart compiler will never be as good at hardware at scavenging ILP, hardware architects, so please stop trying to foist that job on us compiler writers), it actually did do reasonably well on performance in the HPC sector.

                    • happosai 14 hours ago

                      It appeared to me (from far outside) that Intel was trying to segment the market into "Affordable Home and office PC:s with x86" and "Expensive serious computing with itanium". Having everything so different was a feature, to justify the eyewateringly expensive itanium pricetag.

                      • windward 9 hours ago

                        Seems shortsighted (I'm not saying you're wrong, I can imagine Intel being shortsighted). Surely the advantage of artificial segmentation is that it's artificial: you don't double up the R&D costs.

                        • kuschku 10 hours ago

                          The same trick they pulled again with AVX512 and ECC support later on.

                          • clausecker 7 hours ago

                            And the same reason NVRAM was dead on arrival. No affordable dev systems meant that only enterprise software supported it.

                          • Earw0rm 12 hours ago

                            The IBM PS/2 play. And we all know how well that one worked out.

                            • happosai 4 hours ago

                              I'm sure it worked out for many bosses. They got their bonuses and promotions and someone else got to clean up mess.

                        • kragen 14 hours ago

                          They took technical risks that didn't pan out. They thought they'd be able to solve whatever problems they ran into, but they couldn't. They didn't know ahead of time that the result was going to suck. If you try to run an actual tech company, like Intel, without taking any technical risks, competitors who do take technical risks will leave you in the dust.

                          This doesn't apply to fake tech companies like AirBnB, Dropbox, and Stripe, and if you've spent your career at fake tech companies, your intuition is going to be "off" on this point.

                          • twoodfin 7 hours ago

                            They also aimed at what turned out to be the wrong target: When Itanium was conceived, high-performance CPUs were for technical applications like CAD and physics simulation. Raw floating point throughput was what mattered. And Itanium ended up pretty darn good at that.

                            But between conception and delivery, the web took over the world. Branchy integer code was now the dominant server workload & workstations were getting crowded out of their niche by the commodity economics of x86.

                            • kuschku 10 hours ago

                              Thanks for this comment - that's a beautiful perspective I hadn't considered before. A clean and simple definition of technology as everything that increases human productivity.

                              Now I can finally explain why some "tech" jobs feel like they're just not moving the needle.

                              • eru 12 hours ago

                                Computer hardware isn't the only 'tech' that exists, you know?

                                Problems in operations research (like logistics) or fraud detection can be just as technical.

                                • kragen 12 hours ago

                                  Fraud detection is a Red Queen's race. If the amount of resources that goes into fraud detection and fraud commission grows by 10×, 100×, 1000×, the resulting increase in human capacities and improvement in human welfare will be nil. It may be technically challenging but it isn't technology.

                                  Operations research is technology, but Uber isn't Gurobi, which is a real tech company like Intel, however questionable their ethics may be.

                                  • pjc50 11 hours ago

                                    > It may be technically challenging but it isn't technology.

                                    This feels like a distinction without a difference based on whether kragen thinks something is hardcore enough to count?

                                    • kragen 11 hours ago

                                      No, as I explained, it's based on the resulting increase in human capacities and improvement in human welfare. Technology is a collaborative, progressive endeavor in which we advance a skill (techne), generation by generation, through discourse (logos).

                                      Fraud detection can be (and is) extremely hardcore, but it isn't progressive in that way. It's largely tradecraft. Consequently its relationship to novelty and technical risk is fundamentally different.

                                    • eru 11 hours ago

                                      > Operations research is technology, but Uber isn't Gurobi, [...]

                                      Intel isn't ASML, either. They merely use their products. So what?

                                      Presumably Gurobi doesn't write their own compilers or fab their own chips. It's turtles all the way down.

                                      > Fraud detection is a Red Queen's race. If the amount of resources that goes into fraud detection and fraud commission grows by 10×, 100×, 1000×, the resulting increase in human capacities and improvement in human welfare will be nil. It may be technically challenging but it isn't technology.

                                      By that logic no military anywhere uses any technology? Nor is there any technology in Formula 1 cars?

                                      • kragen 10 hours ago

                                        "So what" is that Intel is making things ASML can't, things nobody has done before, and they have to try things that might not work in order to make things nobody yet knows how to make. Just to survive, they have to do things experts believe to be impossible.

                                        AirBnB isn't doing that; they're just booking hotel rooms. Their competitive moat consists of owning a two-sided marketplace and political maneuvering to legalize it. That's very valuable, but it's not the same kind of business as Intel or Gurobi.

                                        Nuclear weapons are certainly a case that tests the category of "technology" and which, indeed, sparked widespread despair and abandonment of progressivism: they increase human capabilities, but probably don't improve human welfare. But I don't think that categories become meaningless simply because they have fuzzy edges.

                                • eru 12 hours ago

                                  > It's such an unconventional ISA that I am baffled that Intel seriously thought they would've been able to persuade anyone to switch to it from x86 [...]

                                  I don't know, most people don't care about the ISA being weird as long as the compiler produces reasonably fast code?

                                  • fulafel 15 hours ago

                                    > baffled that Intel seriously thought they would've been able to persuade anyone to switch to it from x86

                                    They did persuade SGI, DEC and HP to switch from their RISCs to it though. Which turned out to be rather good for business.

                                    • fredoralive 14 hours ago

                                      I suspect SGI and DEC / Compaq could look at a chart and see that with P6 Intel was getting very close to their RISC chips, through the power of MONEY (simplification). They weren't hitting a CISC wall, and the main moat custom RISC had left was 64 bit. Intel's 64 bit chip would inevitably become the standard chip for PCs, and therefore Intel would be able to turn its money cannon onto overpowering all 64 bit RISCs in short order. May as well get aboard the 64 bit Intel train early.

                                      Which is nearly true 64 bit Intel chips did (mostly) kill RISC. But not their (and HP's) fun science project IA64, they had to copy AMD's "what if x86, but 64 bit?" idea instead.

                                      • zinekeller 14 hours ago

                                        SGI and DEC, yes, but HP? Itanium was HP's idea all along! [1]

                                        [1] https://en.wikipedia.org/wiki/Itanium#History

                                      • yongjik 17 hours ago

                                        Well, they did persuade HP to ditch their own homegrown PA-RISC architecture and jump on board with Itanium, so there's that. I wonder how much that decision contributed to the eventual demise of HP's high performance server division ...

                                        • classichasclass 14 hours ago

                                          A lot, I think. PA-RISC had a lot going for it, high performance, solid ISA, even some low-end consumer grade parts (not to the same degree as PowerPC but certainly more so than, say, SPARC). It could have gone much farther than it did.

                                          Not that HP was the only one to lose their minds over Itanic (SGI in particular), but I thought they were the ones who walked away from the most.

                                          • pjc50 10 hours ago

                                            Am I right in thinking that the old PA-Semi team was bought by Apple, and are substantially responsible for the success of the M-series parts?

                                            • scrlk 9 hours ago

                                              Acquiring P.A. Semi got them Dan Dobberpuhl and Jim Keller, which laid a good design foundation. However, IMO, I'd lean towards these as the decisive factors today:

                                              1) Apple's financial firepower allowing them to book out SOTA process nodes

                                              2) Apple being less cost-sensitive in their designs vs. Qualcomm or Intel. Since Apple sells devices, they can justify 'expensive' decisions like massive caches that require significantly more die area.

                                              • bombcar 8 hours ago

                                                They also had years to keep improving the iPhone chips until they were so good at power efficiency that they could slap it into a laptop.

                                                That’s much better than a decade of development with no product yet.

                                              • sgerenser 8 hours ago

                                                PA Semi (Palo Alto Semiconductor) had no relation to HP’s PA-RISC (Precision Architecture RISC).

                                          • AndrewStephens 15 hours ago

                                            I remember when IA-64 was going to be the next big thing and being utterly baffled when the instruction set was made public. Even if you could somehow ship code that efficiently used the weird instruction bundles, there was no indication that future IA-64 CPUs would have the same limits for instruction grouping.

                                            It did make a tiny bit of sense at the time. Java was ascendant and I think Intel assumed that JIT compiled languages were going to dominate the new century and that a really good compiler could unlock performance. It was not to be.

                                            • kragen 13 hours ago

                                              That is not what happened.

                                              EPIC development at HP started in 01989, and the Intel collaboration was publicly announced in 01994. The planned ship date for Merced, the first Itanic, was 01998, and it was first floorplanned in 01996, the year Java was announced. Merced finally taped out in July 01999, three months after the first JIT option for the JVM shipped. Nobody was assuming that JIT compiled languages were going to dominate the new century at that time, although there were some promising signs from Self and Strongtalk that maybe they could be half as fast as C.

                                              • AndrewStephens 11 hours ago

                                                By the time IA-64 actually got close to shipping Intel was certainly talking about JIT being a factor in its success. At least that was mentioned in the marketing guff they were putting out.

                                                • kragen 11 hours ago

                                                  You mean, in 01999? I'd have to see that, because my recollection of that time is that JIT was generally considered unproven (and Java slow). That was 9 years before Chrome shipped the first JavaScript JIT, for example. The only existing commercial products using JIT were Smalltalk implementations like VisualAge, which were also slow. Even HP's "Dynamo" research prototype paper wasn't published until 02000.

                                                  Or do you not count Merced as "shipping"?

                                                  • AndrewStephens 8 hours ago

                                                    Wikipedia tells me that Merced shipped in May 2001, which matches my recollection of not actually seeing a manufacturer’s sample until about then. That box was the largest computer I had ever seen and had so many fans it sounded like an engine. It was also significantly slower than the cheap x86 clones we had on own desks at running general purpose software.

                                                    JIT compilation was available before but became the default in Java1.3, released a year earlier to incredible hype.

                                                    Source: I was there, man.

                                                    • bombcar 8 hours ago

                                                      Also back then the hype was more important than the reality in many cases. The JIT hype was everywhere and reached a “of course everyone will use it” kind of like AI is at right now.

                                            • msla 16 hours ago

                                              "We don't care, we don't have to, we're Intel."

                                              Plus, DEC managed to move all of its VAX users to Alpha through the simple expedient of no longer making VAXen, so I wonder if HP (which by that point had swallowed what used to be DEC) thought it could repeat that trick and sunset x86, which Intel has wanted to do for very nearly as long as the x86 has existed. See also: Intel i860

                                              https://en.wikipedia.org/wiki/Intel_i860

                                              • kruador 11 hours ago

                                                The 8086 was a stop-gap solution until iAPX432 was ready.

                                                The 80286 was a stop-gap solution until iAPX432 was ready.

                                                The 80386 started as a stop-gap solution until iAPX432 was ready, until someone higher up finally decided to kill that one.

                                                • pjc50 11 hours ago

                                                  https://en.wikipedia.org/wiki/Intel_iAPX_432

                                                  I'd never heard of it myself, and reading that Wikipedia page it seems to have been a collection of every possible technology that didn't pan out in IC-language-OS codesign.

                                                  Meanwhile, in Britain a few years later in 1985, a small company and a dedicated engineer, Sophie Wilson, decided that what they needed was a RISC processor that was as plain and straightforward as possible ...

                                            • nayuki 15 hours ago

                                              > The ia64 is a very demanding architecture. In tomorrow’s entry, I’ll talk about some other ways the ia64 will make you pay the penalty when you take shortcuts in your code and manage to skate by on the comparatively error-forgiving i386.

                                              https://devblogs.microsoft.com/oldnewthing/20040120-00/?p=40... "ia64 – misdeclaring near and far data"

                                              https://devblogs.microsoft.com/oldnewthing/2004/01

                                              • jcalvinowens 3 hours ago

                                                At least they made the stack grow in the right direction! Well, half of it, anyway...

                                                • undefined 19 hours ago
                                                  [deleted]
                                                  • vardump 20 hours ago

                                                    Pretty surprising. So IA64 registers were 65 bit, with the extra bit describing whether the register contains garbage or not. If NaT (Not a Thing) is set, the register contents are invalid and that can cause "fun" things to happen...

                                                    Not that this matters to anyone anymore. IA64 utterly failed long ago.

                                                    • kragen 13 hours ago

                                                      It matters to people designing new hardware and maybe new virtual machine instruction sets.

                                                      • nottorp 7 hours ago

                                                        Or to people caring about their software working on more than just Chrome.

                                                        ... oh wait, on more than x86(64).

                                                      • ashleyn 18 hours ago

                                                        There are modern VLIW architectures. I think Groq uses one. The lessons on what works and what doesn't are worth learning from history.

                                                        • bri3d 18 hours ago

                                                          VLIW works for workloads where the compiler can somewhat accurately predict what will be resident in cache. It’s used everywhere in DSP, was common in GPU for awhile, and is present in lots of niche accelerators. It’s a dead end for situations where cache residency is not predictable, like any kind of multitenant general purpose workload.

                                                          • addaon 18 hours ago

                                                            A more everyday example is the Hexagon DSP ISA in Qualcomm chips. Four-wide VLIW + SMT.

                                                            • 0dyl 16 hours ago

                                                              The new TI C2000 F29 series of microcontrollers are VLIW

                                                              • vardump 18 hours ago

                                                                I meant narrowly only about IA64. There is sure some lessons learned value.

                                                                • msla 16 hours ago

                                                                  IA64 was EPIC, which, itself, was a "lessons learned" VLIW design, in that it had things like stop bits to explicitly demarcate dependency boundaries so instructions from multiple words could be combined on future hardware with more parallelism, and speculative execution and loads, which, well, see the article on how the speculative loads were a mixed blessing.

                                                                  https://en.wikipedia.org/wiki/Explicitly_parallel_instructio...

                                                                • msla 19 hours ago

                                                                  In case someone hasn't heard:

                                                                  https://en.wikipedia.org/wiki/Itanium

                                                                  > In 2019, Intel announced that new orders for Itanium would be accepted until January 30, 2020, and shipments would cease by July 29, 2021.[1] This took place on schedule.[9]

                                                                • ronsor 19 hours ago

                                                                  Yet another reason IA64 was a design disaster.

                                                                  VLIW architectures still live on in GPUs and special purpose (parallel) processors, where these sorts of constraints are more reasonable.

                                                                  • MindSpunk 18 hours ago

                                                                    Are any relevant GPUs VLIW anymore? As far as I'm aware they all dropped it too, moving to scalar ISAs on SIMT hardware. The last VLIW GPU I remember was AMD TeraScale, replaced by GCN where one of the most important architecture changes was dropping VLIW.

                                                                    • nneonneo 19 hours ago

                                                                      I mean, there is a reason why these sorts of constructs are UB, even if they work on popular architectures. The problems aren’t unique to IA64, either; the better solution is to be aware that UB means UB and to avoid it studiously. (Unfortunately, that’s also hard to do in C).

                                                                      • loeg 18 hours ago

                                                                        It's a very weird architecture to have these NAT states representable in registers but not main memory. Register spilling is a common requirement!

                                                                        • amluto 18 hours ago

                                                                          Hah, this is IA-64. It has special hardware support for register spills, and you can search for “NaT bits” here:

                                                                          https://portal.cs.umbc.edu/help/architecture/aig.pdf

                                                                          to discover at least two magical registers to hold up to 127 spilled registers worth of NaT bits. So they tried.

                                                                          The NaT bits are truly bizarre and I’m really not convinced they worked well. I’m not sure what happens to bits that don’t fit in those magic registers. And it’s definitely a mistake to have registers where the register’s value cannot be reliably represented in the common in-memory form of the register. x87 FPU’s 80-bit registers that are usually stored in 64-bit words in memory are another example.

                                                                          • dwattttt 18 hours ago

                                                                            CHERI looks at this and says "64+1 bits? A childish effort", and brings 128+1 to the table.

                                                                            EDIT: to be fair to it, they carry it through to main memory too

                                                                            • amluto 14 hours ago

                                                                              I no real complaints about CHERI here. What’s a pointer, anyway? Lots of old systems thought it was 8 or 16 bits that give a linear address. 8086 thought it was 16 + 16 bits split among two registers, with some interesting arithmetic [0]. You can’t add, say, 20000 to a pointer and get a pointer to a byte 20000 farther into memory. 80286 changed it so those high bits index into a table, and the actual segment registers are much wider than 16 bits and can’t be read or written directly [1]. Unprivileged code certainly cannot load arbitrary values into a segment register. 80386 added bits. Even x86_64 still technically has those extra segment registers, but they mostly don’t work any more.

                                                                              So who am I to complain if CHERI pointers are even wider and have strange rules? At least you can write a pointer to memory and read it back again.

                                                                              [0] I could be wrong. I’ve hacked on Linux’s v8086 support, but that’s virtual and I never really cared what its effect was in user mode so long as it worked.

                                                                              [1] You can read and write them via SMM entry or using virtualization extensions.

                                                                          • Someone 13 hours ago

                                                                            Old-time x86 sort-of has “states representable in registers but not main memory”, too.

                                                                            Compilers used to use its 80-bit floating point registers for 64-bit float computations, but also might spill them to memory as 64-bit float numbers.

                                                                            https://hal.science/hal-00128124v3/file/floating-point.pdf section 3 has some examples, including one where the assert can fail in:

                                                                              int main (void) {
                                                                                double x = 0x1p-1022, y = 0x1p100, z;
                                                                                do_nothing(&y);
                                                                                z = x / y;
                                                                                if (z != 0) {
                                                                                  do_nothing(&z);
                                                                                  assert(z != 0);
                                                                                }
                                                                              }
                                                                            
                                                                            with

                                                                              void do nothing (double *x) { }
                                                                            
                                                                            in a different compilation unit.
                                                                            • mwkaufma 18 hours ago

                                                                              I assume they were stored in an out-of-band mask word

                                                                            • awesome_dude 18 hours ago

                                                                              The bigger problem is that a user cannot avoid an application where someone was writing code with UB, unless they both have the source code, and expertise in understanding it.

                                                                              • eru 12 hours ago

                                                                                Isn't that a general problem?