This is pretty bad. Let's start with the very first instruction:
mov rax, 1
An actual "mov rax, 1" would assemble to 48 B8 01 00 00 00 00 00 00 00, a whopping TEN bytes.nasm will optimize this to the equivalent "mov eax, 1", that's 6 bytes, but still:
xor eax, eax ; 2 bytes
inc eax ; 2 bytes
would be much smaller. Second line: mov rdi, 1
You already have the value 1 in eax, so a "mov edi, eax" (two bytes) would suffice. Etc. etc. push 1
pop rax
is even shorter (credit: https://old.reddit.com/r/programming/comments/q6mnz1/what_is...)I feel like I shouldn't love x86 encoding, but there is something charming about this. Probably echoing its 8-bit predecessors. It seems like it's designed for tiny memory environments (embedded, bootstrapping, etc.) where you don't mind taking a hit for memory access.
I was able to shave off one additional byte with this:
...
xor rax, rax ; = 0
inc rax ; = 1 - syscall: sys_write
mov rdi, rax ; copy 1 - file descriptor: stdout
lea rsi, [rel msg] ; pointer to message
mov rdx, 14 ; message length
syscall
...
$ nasm -f bin -o elf elf.asm; wc -c elf; ./elf
166 elf
Hello, World!
So I guess NASM already optimizes this quite wellHowever, using the stack-based instructions as xpasky hinted at:
...
push 1 ; syscall: sys_write
pop rax
pop rdi ; copy 1 - file descriptor: stdout
lea rsi, [rel msg] ; pointer to message
push 14 ; message length
pop rdx
syscall
...
I get down to 159 bytes! I updated the article to reflect thatThat second snippet is pretty funny:
push 1
pop rax
pop rdi
You can't push a value once and pop it twice, that's not how a stack works! You're popping something else off the stack. So why does this even work?Linux passes your program arguments on the stack, with argc on top. So when you don't pass any arguments, argc just HAPPENS to be 1. Which you then pop into rdi. Gross!
Of course - you are completely right, an oversight in wanting to correct my mistake as quickly as possible.
With that fixed, is there any reason not to use push here?
Yes, because:
push 1 ; 6A 01 (2 bytes)
pop rdi ; 5F (1 byte)
is longer than a simple: mov edi, eax ; 89 C7 (2 bytes)
I think your statement might only apply to 32 bit (one of the constraints mentioned early in the blog post was 64 bit).
But even if it was 32 bit, then we would't have to copy a 1, since the syscall number for sys_write would be 4 instead of 1.
I get the same total size with both variants in 64 bit mode.
push 1
pop rax
mov rdi, rax
Assembling to 48 89 C7 (3 bytes)seems to be same in size as
push 1
pop rax
push 1
pop rdi
Assembling to 6A 01 5F (3 bytes)That's because you're using `mov rdi, rax` again. You keep changing `edi, eax` to `rdi, rax`. Why?
The default operand size in 64-bit mode is, for most instructions, still 32 bits. So `mov edi, eax` encodes the same in 32- and 64-bit mode.
For `mov rdi, rax` you need an extra REX prefix byte [1], that's the 48 you're seeing above, but you don't need it here.
[1] https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefi...
okay, I didn't know that, thanks for the background. I wonder why the assembler would not optimize this though.
I noticed that I then could also shave of one byte more by using lea esi, [rel msg] instead of lea rsi, [rel msg].
should be ... push 1 ; syscall: sys_write pop rax push 1 pop rdi
of course
Linux initializes all general purpose registers to zero. It's not documented AFAIK, but should be reliable - it has to init them to some value anyway to avoid leaking kernel state. So you can get away with:
mov al,1 ;write
mov edi,eax ;handle=stdout
mov esi,msg ;assumes load address below 4G
mov dl,msg.len
syscall
mov al,60 ;assuming syscall succeeded, EAX was bytes written
xor edi,edi
syscall
The load address stays constant unless there's some magic GNU extension header to enable ASLR. If we could get the code loaded below 64K, we could save another byte by using SI instead of ESI; however this doesn't work by default, you'd have to run 'echo 0 > /proc/sys/vm/mmap_min_addr' as root first.Initial register state is documented to be undefined except for rbp, rsp and rdx [1].
Can you say for certain that no other Linux version ever used GPRs to pass something else?
[1] System V ABI, page 29 (last line) and 30, https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf
For certain? No, but I wouldn't expect it. Not sure what that function pointer in rdx is intended for, but Linux doesn't use it.
(Note for pedants: rsp is technically a "general purpose register", but of course it is initialized to point to the userspace stack instead of zero.)
Assuming it is initial zero
inc eax
is a byte shorter than mov al, 1Yes, but only in 32 bit mode. Not that it matters, except for the hypothetical future processor or Linux kernel that is no longer compatible with that :)
Thanks, that makes total sense. I was so focused on the ELF part that I didn't even consider optimizing the initial assembly further. Will fix it and edit the article.
Here's a tiny DOS COM file that does it in 18 bytes:
;; 18 bytes
DB 'HELLO_WOIY<$' ; executes as machine code, returning SP to original position without overwriting return address
mov dx, si ; mov dx,0100h MS-DOS (all versions), FreeDOS 1.0, many other DOSes
xchg ax, bp ; mov ah,9 MS-DOS 4.0 and later, and FreeDOS 1.0
int 21h
ret
(credits: https://stackoverflow.com/questions/72635031/assembly-hello-...)Well, it prints something that is the same length as the correct message at least.
COM files for CP/M and DOS really are a no-nonsense executable format.
I'm a bit disappointed that Linux (or BSD, macOS, etc.) doesn't support them (or similar) out of the box, though Windows will sort of run them via ntvdm.
My favorite language for implementing short Hello World programs in is HQ9+ [1].
Joking aside, this page [2] used to be a great tutorial on writing small ELF binaries, but I'm not sure whether it will still work in 64-bit land. It proved very helpful for writing a 4K intro back in 1999.
[1] https://esolangs.org/wiki/HQ9%2B
[2] https://www.muppetlabs.com/~breadbox/software/tiny/teensy.ht...
These challenges are funny - they remind me of the old days. Back in the DOS/Windows days, we used to have the .com format, which was perfect for tiny programs. One could even write a program of less than 10 bytes that could actually do something!
We've come a long way since then, and is like, at some point, nobody cared about optimizing executable size anymore
People in the embedded space care. Symbian OS was compiled for small size, only certain parts were allowed use O3, such as the jpeg decoder.
Some people care about executable size, (mostly) everyone else ships Electron apps.
Tell it to my boss, he wants his app last week.
I learned to write COM programs at some point but quickly unlearned it. There were some spots where you can use them and not .bat files, but outside of that it’s a lot.
debug
-a 100
178A:0100 int 19
178A:0102
-r cx
CX 0000
:2
-n reboot.com
-w
Writing 00002 bytes
-q
Some more:
Quick'n'dirty:
.model small
.code
org 100h
start:
int 19h ; Bootstrap loader
end start
More "correct": .model small
.code
org 100h
start:
db 0EAh ; Jump to Power On Self Test - Cold Boot
dw 0,0FFFFh
end start
Even more "correct": .model small
.code
org 100h
start:
mov ah,0Dh
int 21h ; DOS Services ah=function 0Dh
; flush disk buffers to disk
sti ; Enable interrupts
hlt ; Halt processor
mov al,0FEh
out 64h,al ; port 64h, kybd cntrlr functn
; al = 0FEh, pulse CPU reset
end start
Great example, a two bytes reboot utility. From the times when we could turn off the computer with a push of a button without fearing a global catastrophe...
JMP FFFF:0000
INT 13h... uff chills
here's an 80 byte x86_64 linux 'hello world' (okay, not 'Hello world!'). convert to binary with xxd -r -p:
7f454c46488d3537000000ffc7b20eeb03003e00
b001eb1a01000000050000001800000000000000
1800000005000000b03c0f05ebfa380001006865
6c6c0000010068656c6c00006f20776f726c640a
i'm sure this can be improved -- but i could never get any x86_64 linux elf to under 80 bytes. see if you can fit the exclamation point still.Yeah I thought sth like this is possible, but (correct me if I'm wrong) this (ab)uses the ELF header and punts data in there, which goes against my requirement
> It should be a ‘proper‘ executable binary according to the spec
yes, this one conforms to 'whatever linux agrees to exec(2)', which apparently is a lot that is out of spec.
Could a script be a program?
Because it would be much smaller in a bat file than contains :
echo Hello World!
For the purposes of this challenge, no.
> Let’s first establish some rules for our ‘Hello World’ program:
> It should be able to execute directly without passing to any other programs first (so no decompression)
> It should be a ‘proper‘ executable binary according to the spec
Include the shebang but it’s still crazy how big minimal programs are.
php would be just
Hello World
Linking a similar, very popular past example of this: Teensy: https://www.muppetlabs.com/~breadbox/software/tiny/teensy.ht...
Thank you, I knew I had read somewhere someone put the program in the ELF header itself and got it down to 45 bytes, this is that exact post.
I realize TFA is trying for object code, but for source code, QuickBASIC (and its successors) isn't bad:
? "hello, world!"
PILOT eliminates the quotes: T:hello, world!
Of course a typical REPL (Python, JavaScript, Lisp, etc.) will print out something similar (but often quoted) if you just type the quoted string.And I'm sure there is already some language (call it HELLO) which simply prints "hello, world!" for an empty program.
There are probably some golfing languages out there where an empty program outputs Hello World.
I'm certain there is, but I don't have a reference for it yet other than my imaginary HELLO (Highly Efficient Limited Line Output) language.
Would it be fair to name the program 'Hello World!' and than use argv[0], which is on the stack, to print out 'Hello World!'?
Hehe a nice idea. But then it would not always print Hello World and you cannot execute it directly.
Now, can we make it even smaller applying https://nathanotterness.com/2021/10/tiny_elf_modernized.html ? We shouldn't need the full ELF header...
Oh it has been done: https://nathanotterness.com/2021/10/hello_105.asm
The smallest "hello, world" programs in Rust I did for Arduino are 294 bytes for blink and 388 bytes for hw.
129 byes...supposedly that's 2 punch cards according to Dr gpt. Small program!