I agree with the article, but I think it could go farther. Instead of having primitives for every 32/48/64/122 bit block, we need good format-preserving encryption. Then all of this advice boils down to "use as many bits as you need" and we can keep using the standard primitives with hardware support. If you need more security in the future, you only need to decrypt and reencrypt with the new size.
Small sizes have to be used with extra care, so I wouldn't want to make a generic function for all sizes. For bigger sizes we already have nice functions that take care of everything.
Are you suggesting a very large custom blocksize? I don't think this would be feasible beyond a few megabytes.
Nowadays even many small microcontrollers get AES acceleration so I don't see much reason
Basically all of the use cases in the article don't make sense with AES. That's not because it's AES. That's because its blocks are significantly larger than the data you want to protect. That's the point the article was making: in very specific circumstances, there is practical value in having the cipher output be small.
If you want to encrypt a serial number, you don't want the output to be 256 bits.
Slightly unrelated, but aren't these AES-specific custom CPU instructions just a way to easily collect the encryption keys? There is a speedup but is it worth the risks?
If I were a nation state actor, I'd just store the encryption keys supplied to the AES CPU instruction somewhere and in case the data needs to be accessed you just read the stored keys.
No need to waste time deploying a backdoored CPU firmware and wait for days or weeks, and then touch the hardware a second time to extract the information.
When all AES encryption keys are already stored somewhere on the CPU, you can easily do a drive-by readout at any point in time.
Linux kernel has a compile time flag to disable use of custom CPU instructions for encryption, but it can't be disabled at runtime. If "software encryption" is used, the nation state actor needs to physically access the device at least two times or use a network-based exploit which could be logged.