Early x86-64 Linux had a similar problem. The x86-64 ABI uses registers for the first 6 arguments. To support variable number of arguments (like printf) requires passing the number of arguments in an extra register (RAX), so that the callee can save the registers to memory for va_arg() and friends. Doing this for every call is too expensive, so it's only done when the prototype is marked as stdarg.
Now the initial gcc implemented this saving to memory with a kind of duffs device, with a computed jump into a block of register saving instructions to only save the needed registers. There was no boundary check, so if the no argument register (RAX) was not initialized correctly it would jump randomly based on the junk, and cause very confusing bug reports.
This bit quite some software which didn't use correct prototypes, calling stdarg functions without indicating that in the prototype. On 32bit code which didn't use register arguments this wasn't a problem.
Later compiler versions switched to saving all registers unconditionally.
In the SysV ABI for AMD64 the AL register is used to pass an upper bound on the number of vector registers used, is this related to what you're talking about?
Raymond Chen has a whole "Introduction to IA-64" series of posts on his blog, by the way. It's such an unconventional ISA that I am baffled that Intel seriously thought they would've been able to persuade anyone to switch to it from x86: it's very poorly suited for general-purpose computations. Number crunching, sure, but anything more freeform, and you stare at the specs and wonder how the hell the designers supposed this thing to be programmed and used.
Itanium only failed because AMD for various reasons was able to come up with AMD64 and rug pull Intel's efforts.
In an alternative universe without AMD64, Intel would have kept pushing Itanium while sorting out its issues, HP-UX was on it, and Windows XP as well.
The first generation was complete garbage. Itanium 2 came too late and it did not get widespread due to wrong business decisions and marketing. By the time it could have been successful, AMD64 was out. And even then Intel targeted only the same high-end enterprise market segment, when they have implemented 64-bit on Xeon: https://www.cnet.com/tech/tech-industry/intel-expanding-64-b...
Other way round: the only way any company other than Intel was able to get a new instruction set launched into the PC space was because Intel face-planted so hard with Itanium, and AMD64 was the architecture developers actually wanted to use - just make the registers wider and have more of them, and make it slightly more orthogonal.
Developers get to use the architectures OEM vendors make available to them.
Sure, but the fact remains that AMD64 won in the market, despite the incumbent near-monopoly "Wintel" advantages.
The whole premise is that it only won because AMD exists, and was allowed to come up with it.
People don't read comments before replying?
In that different world, Transmeta would actually succeed in the market of x86-compatible CPUs and, perhaps, would even come up with their own 64-bit extension. Itanium would still flop.
Or maybe, if the push came to shove, the desktops would switch to something entirely different like Alpha. Or ARM! Such event would likely force ARM to come up with their AArch64 several years sooner than it actually happened.
Transmeta wasn't a success story to start with, died before Itanium, and Intel is one of the patent holders.
We can't know for sure, but my guess is that Itanium still could have failed. I could imagine an alternative universe where, even with HP-UX and WinXP running on it, no one wanted to deal with porting their application software. And its emulation of 32-bit code (both in hardware and in software) was atrocious, so running existing, unported code wouldn't really take off either.
Eventually Intel gives up after motherboard/desktop/laptop makers can't build a proper market for it. Maybe Intel then decides to go back and do something similar to what AMD did with x86_64. Maybe Intel just gives up on 64-bit and tries to convince people it's not necessary, but then starts losing market share to other companies with viable 64-bit ISAs, like IMB's POWER64 or Sun's SPARC64 or whatever.
Obviously we can't know, but I think my scenario is at least as likely as yours.
Suggested: https://en.wikipedia.org/wiki/Itanium#History
With how long it took Intel to ship expensive, incompatible, so-so performance ia64 chips - your theory needs an alternate universe where Intel has no competitors, ever, to take advantage of the obvious market opportunity.
I don't need suggestions for a time I live through, I am in computers since the 1980's.
Without AMD, there was no alternative in the PC world, It was already the first 64 bit version of Windows XP.
Since we're providing suggestions in computing history, I assume you can follow the dates,
https://en.wikipedia.org/wiki/Windows_XP_editions#Windows_XP...
> Without AMD, ...
Perhaps? I don't know enough to judge whether one of the other companies working on IA-32 compatible processors could plausibly have stepped in -
https://en.wikipedia.org/wiki/List_of_former_IA-32_compatibl...
It's true that most of those would have lacked the resources to replicate AMD's feat with AMD64. OTOH, AMD itself had to buy out NexGen to produce their K6. Without AMD and/or AMD64, there'd be plenty of larger players who might decide to fill the void.
It was also an era where people were happily stating on PAE 32 bit x86 rather than pay the price and performance premium for Itanium.
4gb of RAM existed but many many systems weren’t even close to it yet.
Some guesses here:
First off, Itanium was definitely meant to be the 64-bit successor to x86 (that's why it's called IA-64 after all), and moving from 32-bit to 64-bit would absolutely have been a killer feature. It's basically only after the underwhelming launch of Itanium that AMD comes out with AMD64, which becomes the actual 64-bit version of x86; once that comes out, the 64-bitness of Itanium is no longer a differentiation.
Second... given that Itanium basically implements every weird architecture feature you've ever heard of, my guess is that they decided they had the resources to make all of this stuff work. And they got into a bubble where they just simply ignored any countervailing viewpoints anytime someone brought up a problem. (This does seem to be a particular specialty of Intel.)
Third, there's definitely a baseline assumption of a sufficiently-smart compiler. And my understanding is that the Intel compiler was actually halfway decent at Itanium, whereas gcc was absolute shit at it. So while some aspects of the design are necessarily inferior (a sufficiently-smart compiler will never be as good at hardware at scavenging ILP, hardware architects, so please stop trying to foist that job on us compiler writers), it actually did do reasonably well on performance in the HPC sector.
It appeared to me (from far outside) that Intel was trying to segment the market into "Affordable Home and office PC:s with x86" and "Expensive serious computing with itanium". Having everything so different was a feature, to justify the eyewateringly expensive itanium pricetag.
Seems shortsighted (I'm not saying you're wrong, I can imagine Intel being shortsighted). Surely the advantage of artificial segmentation is that it's artificial: you don't double up the R&D costs.
The same trick they pulled again with AVX512 and ECC support later on.
And the same reason NVRAM was dead on arrival. No affordable dev systems meant that only enterprise software supported it.
The IBM PS/2 play. And we all know how well that one worked out.
I'm sure it worked out for many bosses. They got their bonuses and promotions and someone else got to clean up mess.
They took technical risks that didn't pan out. They thought they'd be able to solve whatever problems they ran into, but they couldn't. They didn't know ahead of time that the result was going to suck. If you try to run an actual tech company, like Intel, without taking any technical risks, competitors who do take technical risks will leave you in the dust.
This doesn't apply to fake tech companies like AirBnB, Dropbox, and Stripe, and if you've spent your career at fake tech companies, your intuition is going to be "off" on this point.
They also aimed at what turned out to be the wrong target: When Itanium was conceived, high-performance CPUs were for technical applications like CAD and physics simulation. Raw floating point throughput was what mattered. And Itanium ended up pretty darn good at that.
But between conception and delivery, the web took over the world. Branchy integer code was now the dominant server workload & workstations were getting crowded out of their niche by the commodity economics of x86.
Thanks for this comment - that's a beautiful perspective I hadn't considered before. A clean and simple definition of technology as everything that increases human productivity.
Now I can finally explain why some "tech" jobs feel like they're just not moving the needle.
Computer hardware isn't the only 'tech' that exists, you know?
Problems in operations research (like logistics) or fraud detection can be just as technical.
Fraud detection is a Red Queen's race. If the amount of resources that goes into fraud detection and fraud commission grows by 10×, 100×, 1000×, the resulting increase in human capacities and improvement in human welfare will be nil. It may be technically challenging but it isn't technology.
Operations research is technology, but Uber isn't Gurobi, which is a real tech company like Intel, however questionable their ethics may be.
> It may be technically challenging but it isn't technology.
This feels like a distinction without a difference based on whether kragen thinks something is hardcore enough to count?
No, as I explained, it's based on the resulting increase in human capacities and improvement in human welfare. Technology is a collaborative, progressive endeavor in which we advance a skill (techne), generation by generation, through discourse (logos).
Fraud detection can be (and is) extremely hardcore, but it isn't progressive in that way. It's largely tradecraft. Consequently its relationship to novelty and technical risk is fundamentally different.
> Operations research is technology, but Uber isn't Gurobi, [...]
Intel isn't ASML, either. They merely use their products. So what?
Presumably Gurobi doesn't write their own compilers or fab their own chips. It's turtles all the way down.
> Fraud detection is a Red Queen's race. If the amount of resources that goes into fraud detection and fraud commission grows by 10×, 100×, 1000×, the resulting increase in human capacities and improvement in human welfare will be nil. It may be technically challenging but it isn't technology.
By that logic no military anywhere uses any technology? Nor is there any technology in Formula 1 cars?
"So what" is that Intel is making things ASML can't, things nobody has done before, and they have to try things that might not work in order to make things nobody yet knows how to make. Just to survive, they have to do things experts believe to be impossible.
AirBnB isn't doing that; they're just booking hotel rooms. Their competitive moat consists of owning a two-sided marketplace and political maneuvering to legalize it. That's very valuable, but it's not the same kind of business as Intel or Gurobi.
Nuclear weapons are certainly a case that tests the category of "technology" and which, indeed, sparked widespread despair and abandonment of progressivism: they increase human capabilities, but probably don't improve human welfare. But I don't think that categories become meaningless simply because they have fuzzy edges.
> It's such an unconventional ISA that I am baffled that Intel seriously thought they would've been able to persuade anyone to switch to it from x86 [...]
I don't know, most people don't care about the ISA being weird as long as the compiler produces reasonably fast code?
> baffled that Intel seriously thought they would've been able to persuade anyone to switch to it from x86
They did persuade SGI, DEC and HP to switch from their RISCs to it though. Which turned out to be rather good for business.
I suspect SGI and DEC / Compaq could look at a chart and see that with P6 Intel was getting very close to their RISC chips, through the power of MONEY (simplification). They weren't hitting a CISC wall, and the main moat custom RISC had left was 64 bit. Intel's 64 bit chip would inevitably become the standard chip for PCs, and therefore Intel would be able to turn its money cannon onto overpowering all 64 bit RISCs in short order. May as well get aboard the 64 bit Intel train early.
Which is nearly true 64 bit Intel chips did (mostly) kill RISC. But not their (and HP's) fun science project IA64, they had to copy AMD's "what if x86, but 64 bit?" idea instead.
SGI and DEC, yes, but HP? Itanium was HP's idea all along! [1]
Well, they did persuade HP to ditch their own homegrown PA-RISC architecture and jump on board with Itanium, so there's that. I wonder how much that decision contributed to the eventual demise of HP's high performance server division ...
A lot, I think. PA-RISC had a lot going for it, high performance, solid ISA, even some low-end consumer grade parts (not to the same degree as PowerPC but certainly more so than, say, SPARC). It could have gone much farther than it did.
Not that HP was the only one to lose their minds over Itanic (SGI in particular), but I thought they were the ones who walked away from the most.
Am I right in thinking that the old PA-Semi team was bought by Apple, and are substantially responsible for the success of the M-series parts?
Acquiring P.A. Semi got them Dan Dobberpuhl and Jim Keller, which laid a good design foundation. However, IMO, I'd lean towards these as the decisive factors today:
1) Apple's financial firepower allowing them to book out SOTA process nodes
2) Apple being less cost-sensitive in their designs vs. Qualcomm or Intel. Since Apple sells devices, they can justify 'expensive' decisions like massive caches that require significantly more die area.
They also had years to keep improving the iPhone chips until they were so good at power efficiency that they could slap it into a laptop.
That’s much better than a decade of development with no product yet.
PA Semi (Palo Alto Semiconductor) had no relation to HP’s PA-RISC (Precision Architecture RISC).
I remember when IA-64 was going to be the next big thing and being utterly baffled when the instruction set was made public. Even if you could somehow ship code that efficiently used the weird instruction bundles, there was no indication that future IA-64 CPUs would have the same limits for instruction grouping.
It did make a tiny bit of sense at the time. Java was ascendant and I think Intel assumed that JIT compiled languages were going to dominate the new century and that a really good compiler could unlock performance. It was not to be.
That is not what happened.
EPIC development at HP started in 01989, and the Intel collaboration was publicly announced in 01994. The planned ship date for Merced, the first Itanic, was 01998, and it was first floorplanned in 01996, the year Java was announced. Merced finally taped out in July 01999, three months after the first JIT option for the JVM shipped. Nobody was assuming that JIT compiled languages were going to dominate the new century at that time, although there were some promising signs from Self and Strongtalk that maybe they could be half as fast as C.
By the time IA-64 actually got close to shipping Intel was certainly talking about JIT being a factor in its success. At least that was mentioned in the marketing guff they were putting out.
You mean, in 01999? I'd have to see that, because my recollection of that time is that JIT was generally considered unproven (and Java slow). That was 9 years before Chrome shipped the first JavaScript JIT, for example. The only existing commercial products using JIT were Smalltalk implementations like VisualAge, which were also slow. Even HP's "Dynamo" research prototype paper wasn't published until 02000.
Or do you not count Merced as "shipping"?
Wikipedia tells me that Merced shipped in May 2001, which matches my recollection of not actually seeing a manufacturer’s sample until about then. That box was the largest computer I had ever seen and had so many fans it sounded like an engine. It was also significantly slower than the cheap x86 clones we had on own desks at running general purpose software.
JIT compilation was available before but became the default in Java1.3, released a year earlier to incredible hype.
Source: I was there, man.
Also back then the hype was more important than the reality in many cases. The JIT hype was everywhere and reached a “of course everyone will use it” kind of like AI is at right now.
"We don't care, we don't have to, we're Intel."
Plus, DEC managed to move all of its VAX users to Alpha through the simple expedient of no longer making VAXen, so I wonder if HP (which by that point had swallowed what used to be DEC) thought it could repeat that trick and sunset x86, which Intel has wanted to do for very nearly as long as the x86 has existed. See also: Intel i860
The 8086 was a stop-gap solution until iAPX432 was ready.
The 80286 was a stop-gap solution until iAPX432 was ready.
The 80386 started as a stop-gap solution until iAPX432 was ready, until someone higher up finally decided to kill that one.
https://en.wikipedia.org/wiki/Intel_iAPX_432
I'd never heard of it myself, and reading that Wikipedia page it seems to have been a collection of every possible technology that didn't pan out in IC-language-OS codesign.
Meanwhile, in Britain a few years later in 1985, a small company and a dedicated engineer, Sophie Wilson, decided that what they needed was a RISC processor that was as plain and straightforward as possible ...
> The ia64 is a very demanding architecture. In tomorrow’s entry, I’ll talk about some other ways the ia64 will make you pay the penalty when you take shortcuts in your code and manage to skate by on the comparatively error-forgiving i386.
https://devblogs.microsoft.com/oldnewthing/20040120-00/?p=40... "ia64 – misdeclaring near and far data"
At least they made the stack grow in the right direction! Well, half of it, anyway...
Pretty surprising. So IA64 registers were 65 bit, with the extra bit describing whether the register contains garbage or not. If NaT (Not a Thing) is set, the register contents are invalid and that can cause "fun" things to happen...
Not that this matters to anyone anymore. IA64 utterly failed long ago.
It matters to people designing new hardware and maybe new virtual machine instruction sets.
Or to people caring about their software working on more than just Chrome.
... oh wait, on more than x86(64).
There are modern VLIW architectures. I think Groq uses one. The lessons on what works and what doesn't are worth learning from history.
VLIW works for workloads where the compiler can somewhat accurately predict what will be resident in cache. It’s used everywhere in DSP, was common in GPU for awhile, and is present in lots of niche accelerators. It’s a dead end for situations where cache residency is not predictable, like any kind of multitenant general purpose workload.
A more everyday example is the Hexagon DSP ISA in Qualcomm chips. Four-wide VLIW + SMT.
The new TI C2000 F29 series of microcontrollers are VLIW
I meant narrowly only about IA64. There is sure some lessons learned value.
IA64 was EPIC, which, itself, was a "lessons learned" VLIW design, in that it had things like stop bits to explicitly demarcate dependency boundaries so instructions from multiple words could be combined on future hardware with more parallelism, and speculative execution and loads, which, well, see the article on how the speculative loads were a mixed blessing.
https://en.wikipedia.org/wiki/Explicitly_parallel_instructio...
In case someone hasn't heard:
https://en.wikipedia.org/wiki/Itanium
> In 2019, Intel announced that new orders for Itanium would be accepted until January 30, 2020, and shipments would cease by July 29, 2021.[1] This took place on schedule.[9]
Yet another reason IA64 was a design disaster.
VLIW architectures still live on in GPUs and special purpose (parallel) processors, where these sorts of constraints are more reasonable.
Are any relevant GPUs VLIW anymore? As far as I'm aware they all dropped it too, moving to scalar ISAs on SIMT hardware. The last VLIW GPU I remember was AMD TeraScale, replaced by GCN where one of the most important architecture changes was dropping VLIW.
I mean, there is a reason why these sorts of constructs are UB, even if they work on popular architectures. The problems aren’t unique to IA64, either; the better solution is to be aware that UB means UB and to avoid it studiously. (Unfortunately, that’s also hard to do in C).
It's a very weird architecture to have these NAT states representable in registers but not main memory. Register spilling is a common requirement!
Hah, this is IA-64. It has special hardware support for register spills, and you can search for “NaT bits” here:
https://portal.cs.umbc.edu/help/architecture/aig.pdf
to discover at least two magical registers to hold up to 127 spilled registers worth of NaT bits. So they tried.
The NaT bits are truly bizarre and I’m really not convinced they worked well. I’m not sure what happens to bits that don’t fit in those magic registers. And it’s definitely a mistake to have registers where the register’s value cannot be reliably represented in the common in-memory form of the register. x87 FPU’s 80-bit registers that are usually stored in 64-bit words in memory are another example.
CHERI looks at this and says "64+1 bits? A childish effort", and brings 128+1 to the table.
EDIT: to be fair to it, they carry it through to main memory too
I no real complaints about CHERI here. What’s a pointer, anyway? Lots of old systems thought it was 8 or 16 bits that give a linear address. 8086 thought it was 16 + 16 bits split among two registers, with some interesting arithmetic [0]. You can’t add, say, 20000 to a pointer and get a pointer to a byte 20000 farther into memory. 80286 changed it so those high bits index into a table, and the actual segment registers are much wider than 16 bits and can’t be read or written directly [1]. Unprivileged code certainly cannot load arbitrary values into a segment register. 80386 added bits. Even x86_64 still technically has those extra segment registers, but they mostly don’t work any more.
So who am I to complain if CHERI pointers are even wider and have strange rules? At least you can write a pointer to memory and read it back again.
[0] I could be wrong. I’ve hacked on Linux’s v8086 support, but that’s virtual and I never really cared what its effect was in user mode so long as it worked.
[1] You can read and write them via SMM entry or using virtualization extensions.
Old-time x86 sort-of has “states representable in registers but not main memory”, too.
Compilers used to use its 80-bit floating point registers for 64-bit float computations, but also might spill them to memory as 64-bit float numbers.
https://hal.science/hal-00128124v3/file/floating-point.pdf section 3 has some examples, including one where the assert can fail in:
int main (void) {
double x = 0x1p-1022, y = 0x1p100, z;
do_nothing(&y);
z = x / y;
if (z != 0) {
do_nothing(&z);
assert(z != 0);
}
}
with void do nothing (double *x) { }
in a different compilation unit.I assume they were stored in an out-of-band mask word
The bigger problem is that a user cannot avoid an application where someone was writing code with UB, unless they both have the source code, and expertise in understanding it.
Isn't that a general problem?