• bd01 5 days ago

    This is pretty bad. Let's start with the very first instruction:

      mov rax, 1
    
    An actual "mov rax, 1" would assemble to 48 B8 01 00 00 00 00 00 00 00, a whopping TEN bytes.

    nasm will optimize this to the equivalent "mov eax, 1", that's 6 bytes, but still:

      xor eax, eax ; 2 bytes
      inc eax      ; 2 bytes
    
    would be much smaller. Second line:

      mov rdi, 1
    
    You already have the value 1 in eax, so a "mov edi, eax" (two bytes) would suffice. Etc. etc.
    • xpasky 5 days ago

        push 1
        pop rax
      
      is even shorter (credit: https://old.reddit.com/r/programming/comments/q6mnz1/what_is...)
      • musicale 5 days ago

        I feel like I shouldn't love x86 encoding, but there is something charming about this. Probably echoing its 8-bit predecessors. It seems like it's designed for tiny memory environments (embedded, bootstrapping, etc.) where you don't mind taking a hit for memory access.

      • michidk 4 days ago

        I was able to shave off one additional byte with this:

          ...
          xor rax, rax       ; = 0
          inc rax            ; = 1 - syscall: sys_write
          mov rdi, rax       ; copy 1 - file descriptor: stdout
          lea rsi, [rel msg] ; pointer to message
          mov rdx, 14        ; message length
          syscall
          ...
        
          $ nasm -f bin -o elf elf.asm; wc -c elf; ./elf
          166 elf
          Hello, World!
        
        So I guess NASM already optimizes this quite well

        However, using the stack-based instructions as xpasky hinted at:

          ...
          push 1             ; syscall: sys_write
          pop rax
          pop rdi       ; copy 1 - file descriptor: stdout
          lea rsi, [rel msg] ; pointer to message
          push 14            ; message length
          pop rdx
          syscall
          ...
        
        I get down to 159 bytes! I updated the article to reflect that
        • bd01 4 days ago

          That second snippet is pretty funny:

            push 1
            pop rax
            pop rdi
          
          You can't push a value once and pop it twice, that's not how a stack works! You're popping something else off the stack. So why does this even work?

          Linux passes your program arguments on the stack, with argc on top. So when you don't pass any arguments, argc just HAPPENS to be 1. Which you then pop into rdi. Gross!

          • michidk 3 days ago

            Of course - you are completely right, an oversight in wanting to correct my mistake as quickly as possible.

            With that fixed, is there any reason not to use push here?

            • bd01 3 days ago

              Yes, because:

                push 1       ; 6A 01 (2 bytes)
                pop rdi      ; 5F    (1 byte)
              
              is longer than a simple:

                mov edi, eax ; 89 C7 (2 bytes)
              • michidk 2 days ago

                I think your statement might only apply to 32 bit (one of the constraints mentioned early in the blog post was 64 bit).

                But even if it was 32 bit, then we would't have to copy a 1, since the syscall number for sys_write would be 4 instead of 1.

                I get the same total size with both variants in 64 bit mode.

                  push 1
                  pop rax
                  mov rdi, rax
                
                Assembling to 48 89 C7 (3 bytes)

                seems to be same in size as

                  push 1
                  pop rax
                  push 1
                  pop rdi
                
                Assembling to 6A 01 5F (3 bytes)
                • bd01 2 days ago

                  That's because you're using `mov rdi, rax` again. You keep changing `edi, eax` to `rdi, rax`. Why?

                  The default operand size in 64-bit mode is, for most instructions, still 32 bits. So `mov edi, eax` encodes the same in 32- and 64-bit mode.

                  For `mov rdi, rax` you need an extra REX prefix byte [1], that's the 48 you're seeing above, but you don't need it here.

                  [1] https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefi...

                  • michidk 2 days ago

                    okay, I didn't know that, thanks for the background. I wonder why the assembler would not optimize this though.

                    I noticed that I then could also shave of one byte more by using lea esi, [rel msg] instead of lea rsi, [rel msg].

          • michidk 3 days ago

            should be ... push 1 ; syscall: sys_write pop rax push 1 pop rdi

            of course

          • rep_lodsb 5 days ago

            Linux initializes all general purpose registers to zero. It's not documented AFAIK, but should be reliable - it has to init them to some value anyway to avoid leaking kernel state. So you can get away with:

                mov     al,1       ;write
                mov     edi,eax    ;handle=stdout
                mov     esi,msg    ;assumes load address below 4G
                mov     dl,msg.len
                syscall
                mov     al,60      ;assuming syscall succeeded, EAX was bytes written
                xor     edi,edi
                syscall
            
            The load address stays constant unless there's some magic GNU extension header to enable ASLR. If we could get the code loaded below 64K, we could save another byte by using SI instead of ESI; however this doesn't work by default, you'd have to run 'echo 0 > /proc/sys/vm/mmap_min_addr' as root first.
            • bd01 5 days ago

              Initial register state is documented to be undefined except for rbp, rsp and rdx [1].

              Can you say for certain that no other Linux version ever used GPRs to pass something else?

              [1] System V ABI, page 29 (last line) and 30, https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf

              • rep_lodsb 5 days ago

                For certain? No, but I wouldn't expect it. Not sure what that function pointer in rdx is intended for, but Linux doesn't use it.

                (Note for pedants: rsp is technically a "general purpose register", but of course it is initialized to point to the userspace stack instead of zero.)

              • retrac 5 days ago

                Assuming it is initial zero

                   inc eax
                
                is a byte shorter than mov al, 1
                • rep_lodsb 5 days ago

                  Yes, but only in 32 bit mode. Not that it matters, except for the hypothetical future processor or Linux kernel that is no longer compatible with that :)

              • michidk 5 days ago

                Thanks, that makes total sense. I was so focused on the ELF part that I didn't even consider optimizing the initial assembly further. Will fix it and edit the article.

              • Tepix 5 days ago

                Here's a tiny DOS COM file that does it in 18 bytes:

                    ;; 18 bytes
                    DB 'HELLO_WOIY<$'  ; executes as machine code, returning SP to original position without overwriting return address
                    
                    mov  dx, si    ; mov dx,0100h MS-DOS (all versions), FreeDOS 1.0, many other DOSes
                    xchg ax, bp    ; mov ah,9     MS-DOS 4.0 and later, and FreeDOS 1.0
                    int  21h
                    ret
                
                (credits: https://stackoverflow.com/questions/72635031/assembly-hello-...)
                • rep_lodsb 5 days ago

                  Well, it prints something that is the same length as the correct message at least.

                  • undefined 4 days ago
                    [deleted]
                  • musicale 5 days ago

                    COM files for CP/M and DOS really are a no-nonsense executable format.

                    I'm a bit disappointed that Linux (or BSD, macOS, etc.) doesn't support them (or similar) out of the box, though Windows will sort of run them via ntvdm.

                  • smokel 5 days ago

                    My favorite language for implementing short Hello World programs in is HQ9+ [1].

                    Joking aside, this page [2] used to be a great tutorial on writing small ELF binaries, but I'm not sure whether it will still work in 64-bit land. It proved very helpful for writing a 4K intro back in 1999.

                    [1] https://esolangs.org/wiki/HQ9%2B

                    [2] https://www.muppetlabs.com/~breadbox/software/tiny/teensy.ht...

                    • mrfinn 5 days ago

                      These challenges are funny - they remind me of the old days. Back in the DOS/Windows days, we used to have the .com format, which was perfect for tiny programs. One could even write a program of less than 10 bytes that could actually do something!

                      We've come a long way since then, and is like, at some point, nobody cared about optimizing executable size anymore

                      • secondcoming 5 days ago

                        People in the embedded space care. Symbian OS was compiled for small size, only certain parts were allowed use O3, such as the jpeg decoder.

                        • theandrewbailey 5 days ago

                          Some people care about executable size, (mostly) everyone else ships Electron apps.

                          • ramon156 5 days ago

                            Tell it to my boss, he wants his app last week.

                          • hinkley 5 days ago

                            I learned to write COM programs at some point but quickly unlearned it. There were some spots where you can use them and not .bat files, but outside of that it’s a lot.

                            • smokel 5 days ago

                                debug
                                -a 100
                                178A:0100 int 19
                                178A:0102
                                -r cx
                                CX 0000
                                :2
                                -n reboot.com
                                -w
                                Writing 00002 bytes
                                -q
                              • dim13 4 days ago

                                Some more:

                                Quick'n'dirty:

                                    .model small
                                    .code
                                     org 100h
                                    start:
                                     int 19h                 ; Bootstrap loader
                                    end start
                                
                                More "correct":

                                    .model small
                                    .code
                                     org 100h
                                    start:
                                     db 0EAh                 ; Jump to Power On Self Test - Cold Boot
                                     dw 0,0FFFFh
                                    end start
                                
                                Even more "correct":

                                    .model small
                                    .code
                                     org 100h
                                    start:
                                     mov ah,0Dh
                                     int 21h                 ; DOS Services  ah=function 0Dh
                                                             ;  flush disk buffers to disk
                                     sti                     ; Enable interrupts
                                     hlt                     ; Halt processor
                                     mov al,0FEh
                                     out 64h,al              ; port 64h, kybd cntrlr functn
                                                             ;  al = 0FEh, pulse CPU reset
                                    end start
                                • mrfinn 4 days ago

                                  Great example, a two bytes reboot utility. From the times when we could turn off the computer with a push of a button without fearing a global catastrophe...

                                • xpasky 5 days ago

                                  JMP FFFF:0000

                                  • mrfinn 5 days ago

                                    INT 13h... uff chills

                                • 5- 5 days ago

                                  here's an 80 byte x86_64 linux 'hello world' (okay, not 'Hello world!'). convert to binary with xxd -r -p:

                                    7f454c46488d3537000000ffc7b20eeb03003e00
                                    b001eb1a01000000050000001800000000000000
                                    1800000005000000b03c0f05ebfa380001006865
                                    6c6c0000010068656c6c00006f20776f726c640a
                                  
                                  i'm sure this can be improved -- but i could never get any x86_64 linux elf to under 80 bytes. see if you can fit the exclamation point still.
                                  • michidk 4 days ago

                                    Yeah I thought sth like this is possible, but (correct me if I'm wrong) this (ab)uses the ELF header and punts data in there, which goes against my requirement

                                    > It should be a ‘proper‘ executable binary according to the spec

                                    • 5- 4 days ago

                                      yes, this one conforms to 'whatever linux agrees to exec(2)', which apparently is a lot that is out of spec.

                                  • whynotmaybe 5 days ago

                                    Could a script be a program?

                                    Because it would be much smaller in a bat file than contains :

                                    echo Hello World!

                                    • theandrewbailey 5 days ago

                                      For the purposes of this challenge, no.

                                      > Let’s first establish some rules for our ‘Hello World’ program:

                                      > It should be able to execute directly without passing to any other programs first (so no decompression)

                                      > It should be a ‘proper‘ executable binary according to the spec

                                      • hinkley 5 days ago

                                        Include the shebang but it’s still crazy how big minimal programs are.

                                        • __m 4 days ago

                                          php would be just

                                          Hello World

                                        • gr33kdude 5 days ago

                                          Linking a similar, very popular past example of this: Teensy: https://www.muppetlabs.com/~breadbox/software/tiny/teensy.ht...

                                          • BergAndCo 4 days ago

                                            Thank you, I knew I had read somewhere someone put the program in the ELF header itself and got it down to 45 bytes, this is that exact post.

                                          • musicale 5 days ago

                                            I realize TFA is trying for object code, but for source code, QuickBASIC (and its successors) isn't bad:

                                                ? "hello, world!"
                                            
                                            PILOT eliminates the quotes:

                                                T:hello, world!
                                            
                                            Of course a typical REPL (Python, JavaScript, Lisp, etc.) will print out something similar (but often quoted) if you just type the quoted string.

                                            And I'm sure there is already some language (call it HELLO) which simply prints "hello, world!" for an empty program.

                                            • charlieyu1 5 days ago

                                              There are probably some golfing languages out there where an empty program outputs Hello World.

                                              • musicale 5 days ago

                                                I'm certain there is, but I don't have a reference for it yet other than my imaginary HELLO (Highly Efficient Limited Line Output) language.

                                                • undefined 5 days ago
                                                  [deleted]
                                              • undefined 5 days ago
                                                [deleted]
                                                • fjfaase 4 days ago

                                                  Would it be fair to name the program 'Hello World!' and than use argv[0], which is on the stack, to print out 'Hello World!'?

                                                  • michidk 4 days ago

                                                    Hehe a nice idea. But then it would not always print Hello World and you cannot execute it directly.

                                                  • xpasky 5 days ago

                                                    Now, can we make it even smaller applying https://nathanotterness.com/2021/10/tiny_elf_modernized.html ? We shouldn't need the full ELF header...

                                                  • oneshtein 5 days ago

                                                    The smallest "hello, world" programs in Rust I did for Arduino are 294 bytes for blink and 388 bytes for hw.

                                                    • Multicomp 5 days ago

                                                      129 byes...supposedly that's 2 punch cards according to Dr gpt. Small program!