• bd01 6 months ago

    This is pretty bad. Let's start with the very first instruction:

      mov rax, 1
    
    An actual "mov rax, 1" would assemble to 48 B8 01 00 00 00 00 00 00 00, a whopping TEN bytes.

    nasm will optimize this to the equivalent "mov eax, 1", that's 6 bytes, but still:

      xor eax, eax ; 2 bytes
      inc eax      ; 2 bytes
    
    would be much smaller. Second line:

      mov rdi, 1
    
    You already have the value 1 in eax, so a "mov edi, eax" (two bytes) would suffice. Etc. etc.
    • xpasky 6 months ago

        push 1
        pop rax
      
      is even shorter (credit: https://old.reddit.com/r/programming/comments/q6mnz1/what_is...)
      • musicale 6 months ago

        I feel like I shouldn't love x86 encoding, but there is something charming about this. Probably echoing its 8-bit predecessors. It seems like it's designed for tiny memory environments (embedded, bootstrapping, etc.) where you don't mind taking a hit for memory access.

      • rep_lodsb 6 months ago

        Linux initializes all general purpose registers to zero. It's not documented AFAIK, but should be reliable - it has to init them to some value anyway to avoid leaking kernel state. So you can get away with:

            mov     al,1       ;write
            mov     edi,eax    ;handle=stdout
            mov     esi,msg    ;assumes load address below 4G
            mov     dl,msg.len
            syscall
            mov     al,60      ;assuming syscall succeeded, EAX was bytes written
            xor     edi,edi
            syscall
        
        The load address stays constant unless there's some magic GNU extension header to enable ASLR. If we could get the code loaded below 64K, we could save another byte by using SI instead of ESI; however this doesn't work by default, you'd have to run 'echo 0 > /proc/sys/vm/mmap_min_addr' as root first.
        • bd01 6 months ago

          Initial register state is documented to be undefined except for rbp, rsp and rdx [1].

          Can you say for certain that no other Linux version ever used GPRs to pass something else?

          [1] System V ABI, page 29 (last line) and 30, https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf

          • rep_lodsb 6 months ago

            For certain? No, but I wouldn't expect it. Not sure what that function pointer in rdx is intended for, but Linux doesn't use it.

            (Note for pedants: rsp is technically a "general purpose register", but of course it is initialized to point to the userspace stack instead of zero.)

          • retrac 6 months ago

            Assuming it is initial zero

               inc eax
            
            is a byte shorter than mov al, 1
            • rep_lodsb 6 months ago

              Yes, but only in 32 bit mode. Not that it matters, except for the hypothetical future processor or Linux kernel that is no longer compatible with that :)

          • michidk 6 months ago

            I was able to shave off one additional byte with this:

              ...
              xor rax, rax       ; = 0
              inc rax            ; = 1 - syscall: sys_write
              mov rdi, rax       ; copy 1 - file descriptor: stdout
              lea rsi, [rel msg] ; pointer to message
              mov rdx, 14        ; message length
              syscall
              ...
            
              $ nasm -f bin -o elf elf.asm; wc -c elf; ./elf
              166 elf
              Hello, World!
            
            So I guess NASM already optimizes this quite well

            However, using the stack-based instructions as xpasky hinted at:

              ...
              push 1             ; syscall: sys_write
              pop rax
              pop rdi       ; copy 1 - file descriptor: stdout
              lea rsi, [rel msg] ; pointer to message
              push 14            ; message length
              pop rdx
              syscall
              ...
            
            I get down to 159 bytes! I updated the article to reflect that
            • bd01 6 months ago

              That second snippet is pretty funny:

                push 1
                pop rax
                pop rdi
              
              You can't push a value once and pop it twice, that's not how a stack works! You're popping something else off the stack. So why does this even work?

              Linux passes your program arguments on the stack, with argc on top. So when you don't pass any arguments, argc just HAPPENS to be 1. Which you then pop into rdi. Gross!

              • michidk 6 months ago

                Of course - you are completely right, an oversight in wanting to correct my mistake as quickly as possible.

                With that fixed, is there any reason not to use push here?

                • bd01 6 months ago

                  Yes, because:

                    push 1       ; 6A 01 (2 bytes)
                    pop rdi      ; 5F    (1 byte)
                  
                  is longer than a simple:

                    mov edi, eax ; 89 C7 (2 bytes)
                  • michidk 6 months ago

                    I think your statement might only apply to 32 bit (one of the constraints mentioned early in the blog post was 64 bit).

                    But even if it was 32 bit, then we would't have to copy a 1, since the syscall number for sys_write would be 4 instead of 1.

                    I get the same total size with both variants in 64 bit mode.

                      push 1
                      pop rax
                      mov rdi, rax
                    
                    Assembling to 48 89 C7 (3 bytes)

                    seems to be same in size as

                      push 1
                      pop rax
                      push 1
                      pop rdi
                    
                    Assembling to 6A 01 5F (3 bytes)
                    • bd01 6 months ago

                      That's because you're using `mov rdi, rax` again. You keep changing `edi, eax` to `rdi, rax`. Why?

                      The default operand size in 64-bit mode is, for most instructions, still 32 bits. So `mov edi, eax` encodes the same in 32- and 64-bit mode.

                      For `mov rdi, rax` you need an extra REX prefix byte [1], that's the 48 you're seeing above, but you don't need it here.

                      [1] https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefi...

                      • michidk 6 months ago

                        okay, I didn't know that, thanks for the background. I wonder why the assembler would not optimize this though.

                        I noticed that I then could also shave of one byte more by using lea esi, [rel msg] instead of lea rsi, [rel msg].

              • michidk 6 months ago

                should be ... push 1 ; syscall: sys_write pop rax push 1 pop rdi

                of course

              • michidk 6 months ago

                Thanks, that makes total sense. I was so focused on the ELF part that I didn't even consider optimizing the initial assembly further. Will fix it and edit the article.

              • Tepix 6 months ago

                Here's a tiny DOS COM file that does it in 18 bytes:

                    ;; 18 bytes
                    DB 'HELLO_WOIY<$'  ; executes as machine code, returning SP to original position without overwriting return address
                    
                    mov  dx, si    ; mov dx,0100h MS-DOS (all versions), FreeDOS 1.0, many other DOSes
                    xchg ax, bp    ; mov ah,9     MS-DOS 4.0 and later, and FreeDOS 1.0
                    int  21h
                    ret
                
                (credits: https://stackoverflow.com/questions/72635031/assembly-hello-...)
                • rep_lodsb 6 months ago

                  Well, it prints something that is the same length as the correct message at least.

                  • undefined 6 months ago
                    [deleted]
                  • musicale 6 months ago

                    COM files for CP/M and DOS really are a no-nonsense executable format.

                    I'm a bit disappointed that Linux (or BSD, macOS, etc.) doesn't support them (or similar) out of the box, though Windows will sort of run them via ntvdm.

                  • smokel 6 months ago

                    My favorite language for implementing short Hello World programs in is HQ9+ [1].

                    Joking aside, this page [2] used to be a great tutorial on writing small ELF binaries, but I'm not sure whether it will still work in 64-bit land. It proved very helpful for writing a 4K intro back in 1999.

                    [1] https://esolangs.org/wiki/HQ9%2B

                    [2] https://www.muppetlabs.com/~breadbox/software/tiny/teensy.ht...

                    • mrfinn 6 months ago

                      These challenges are funny - they remind me of the old days. Back in the DOS/Windows days, we used to have the .com format, which was perfect for tiny programs. One could even write a program of less than 10 bytes that could actually do something!

                      We've come a long way since then, and is like, at some point, nobody cared about optimizing executable size anymore

                      • secondcoming 6 months ago

                        People in the embedded space care. Symbian OS was compiled for small size, only certain parts were allowed use O3, such as the jpeg decoder.

                        • theandrewbailey 6 months ago

                          Some people care about executable size, (mostly) everyone else ships Electron apps.

                          • ramon156 6 months ago

                            Tell it to my boss, he wants his app last week.

                          • hinkley 6 months ago

                            I learned to write COM programs at some point but quickly unlearned it. There were some spots where you can use them and not .bat files, but outside of that it’s a lot.

                            • smokel 6 months ago

                                debug
                                -a 100
                                178A:0100 int 19
                                178A:0102
                                -r cx
                                CX 0000
                                :2
                                -n reboot.com
                                -w
                                Writing 00002 bytes
                                -q
                              • dim13 6 months ago

                                Some more:

                                Quick'n'dirty:

                                    .model small
                                    .code
                                     org 100h
                                    start:
                                     int 19h                 ; Bootstrap loader
                                    end start
                                
                                More "correct":

                                    .model small
                                    .code
                                     org 100h
                                    start:
                                     db 0EAh                 ; Jump to Power On Self Test - Cold Boot
                                     dw 0,0FFFFh
                                    end start
                                
                                Even more "correct":

                                    .model small
                                    .code
                                     org 100h
                                    start:
                                     mov ah,0Dh
                                     int 21h                 ; DOS Services  ah=function 0Dh
                                                             ;  flush disk buffers to disk
                                     sti                     ; Enable interrupts
                                     hlt                     ; Halt processor
                                     mov al,0FEh
                                     out 64h,al              ; port 64h, kybd cntrlr functn
                                                             ;  al = 0FEh, pulse CPU reset
                                    end start
                                • mrfinn 6 months ago

                                  Great example, a two bytes reboot utility. From the times when we could turn off the computer with a push of a button without fearing a global catastrophe...

                                • xpasky 6 months ago

                                  JMP FFFF:0000

                                  • mrfinn 6 months ago

                                    INT 13h... uff chills

                                • 5- 6 months ago

                                  here's an 80 byte x86_64 linux 'hello world' (okay, not 'Hello world!'). convert to binary with xxd -r -p:

                                    7f454c46488d3537000000ffc7b20eeb03003e00
                                    b001eb1a01000000050000001800000000000000
                                    1800000005000000b03c0f05ebfa380001006865
                                    6c6c0000010068656c6c00006f20776f726c640a
                                  
                                  i'm sure this can be improved -- but i could never get any x86_64 linux elf to under 80 bytes. see if you can fit the exclamation point still.
                                  • michidk 6 months ago

                                    Yeah I thought sth like this is possible, but (correct me if I'm wrong) this (ab)uses the ELF header and punts data in there, which goes against my requirement

                                    > It should be a ‘proper‘ executable binary according to the spec

                                    • 5- 6 months ago

                                      yes, this one conforms to 'whatever linux agrees to exec(2)', which apparently is a lot that is out of spec.

                                  • whynotmaybe 6 months ago

                                    Could a script be a program?

                                    Because it would be much smaller in a bat file than contains :

                                    echo Hello World!

                                    • theandrewbailey 6 months ago

                                      For the purposes of this challenge, no.

                                      > Let’s first establish some rules for our ‘Hello World’ program:

                                      > It should be able to execute directly without passing to any other programs first (so no decompression)

                                      > It should be a ‘proper‘ executable binary according to the spec

                                      • hinkley 6 months ago

                                        Include the shebang but it’s still crazy how big minimal programs are.

                                        • __m 6 months ago

                                          php would be just

                                          Hello World

                                        • gr33kdude 6 months ago

                                          Linking a similar, very popular past example of this: Teensy: https://www.muppetlabs.com/~breadbox/software/tiny/teensy.ht...

                                          • BergAndCo 6 months ago

                                            Thank you, I knew I had read somewhere someone put the program in the ELF header itself and got it down to 45 bytes, this is that exact post.

                                          • musicale 6 months ago

                                            I realize TFA is trying for object code, but for source code, QuickBASIC (and its successors) isn't bad:

                                                ? "hello, world!"
                                            
                                            PILOT eliminates the quotes:

                                                T:hello, world!
                                            
                                            Of course a typical REPL (Python, JavaScript, Lisp, etc.) will print out something similar (but often quoted) if you just type the quoted string.

                                            And I'm sure there is already some language (call it HELLO) which simply prints "hello, world!" for an empty program.

                                            • charlieyu1 6 months ago

                                              There are probably some golfing languages out there where an empty program outputs Hello World.

                                              • musicale 6 months ago

                                                I'm certain there is, but I don't have a reference for it yet other than my imaginary HELLO (Highly Efficient Limited Line Output) language.

                                                • undefined 6 months ago
                                                  [deleted]
                                              • undefined 6 months ago
                                                [deleted]
                                                • fjfaase 6 months ago

                                                  Would it be fair to name the program 'Hello World!' and than use argv[0], which is on the stack, to print out 'Hello World!'?

                                                  • michidk 6 months ago

                                                    Hehe a nice idea. But then it would not always print Hello World and you cannot execute it directly.

                                                  • xpasky 6 months ago

                                                    Now, can we make it even smaller applying https://nathanotterness.com/2021/10/tiny_elf_modernized.html ? We shouldn't need the full ELF header...

                                                  • oneshtein 6 months ago

                                                    The smallest "hello, world" programs in Rust I did for Arduino are 294 bytes for blink and 388 bytes for hw.

                                                    • Multicomp 6 months ago

                                                      129 byes...supposedly that's 2 punch cards according to Dr gpt. Small program!