Comments Page - The Case of the Missing Increment

dzaima 2 days ago
uops.info's measurements show 'inc r64', interleaved with 'movsxd' instructions, still having zero latency[0], so it can't be just merging the immediates of successive increments (or there's additional fusion happening). Plain unrolled 'inc r64' shows an average latency of 0.2 cycles, i.e. 5 dependent ops per cycle. And 0.2 used ports per instr [1].
Similarly, 'lea r64, [r64+8]' (imm8) and 'lea r64, [r64+128]' (imm32) and 'add r64, 2' (imm8); but not 'add r64, 0x1000000' (imm32).
[0]: https://uops.info/html-lat/ADL-P/INC_R64-Measurements.html
[1]: https://uops.info/html-tp/ADL-P/INC_R64-Measurements.html