Comments:"Hello from a libc-free world! (Part 1) (Ksplice Blog)"
URL:https://blogs.oracle.com/ksplice/entry/hello_from_a_libc_free
As an exercise, I want to write a Hello World program in C simple enough that I can disassemble it and be able to explain all of the assembly to myself.
This should be easy, right?
This adventure assumes compilation and execution on a Linux machine. Some familiarity with reading assembly is helpful.
Here's our basic Hello World program:
jesstess@kid-charlemagne:~/c$ cat hello.c#include <stdio.h>intmain(){printf("Hello World\n");return0;}
Let's compile it and get a bytecount:
jesstess@kid-charlemagne:~/c$ gcc -o hello hello.c jesstess@kid-charlemagne:~/c$ wc -c hello 10931 hello
Yikes! Where are 11 Kilobytes worth of executable coming from? objdump -t hello
gives us 79 symbol-table entries, most of which we can blame on our using the standard library.
So let's stop using it. We won't use printf
so we can get rid of our include file:
jesstess@kid-charlemagne:~/c$ cat hello.cintmain(){char*str="Hello World";return0;}
Recompiling and checking the bytecount:
jesstess@kid-charlemagne:~/c$ gcc -o hello hello.c jesstess@kid-charlemagne:~/c$ wc -c hello 10892 hello
What? That barely changed anything!
The problem is that gcc
is still using standard library startup files when linking. Want proof? We'll compile with -nostdlib
, which according to the gcc
man page won't "use the standard system libraries and startup files when linking. Only the files you specify will be passed to the linker".
jesstess@kid-charlemagne:~/c$ gcc -nostdlib -o hello hello.c
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 00000000004000e8
Well, it's just a warning; let's check it anyway:
jesstess@kid-charlemagne:~/c$ wc -c hello
1329 hello
That looks pretty good! We got our bytecount down to a much more reasonable size (an order of magnitude smaller!)...
jesstess@kid-charlemagne:~/c$ ./hello
Segmentation fault
...at the expense of segfaulting when it runs. Hrmph.
For fun, let's get our program to be actually runnable before digging into the assembly.
So what is this _start
entry symbol that appears to be required for our program to run? Where is it usually defined if you're using libc?
From the perspective of the linker, by default _start
is the actual entry point to your program, not main
. It is normally defined in the crt1.o
ELF relocatable. We can verify this by linking against crt1.o
and noting that _start
is now found (although we develop other problems by not having defined other necessary libc startup symbols):
# Compile the source files but don't link jesstess@kid-charlemagne:~/c$ gcc -Os -c hello.c# Now try to link jesstess@kid-charlemagne:~/c$ ld /usr/lib/crt1.o -o hello hello.o /usr/lib/crt1.o: In function`_start':/build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:106: undefined reference to `__libc_csu_fini' /build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:107: undefined reference to `__libc_csu_init'/build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:113: undefined reference to `__libc_start_main'
This check conveniently also tells us where _start
lives in the libc source: sysdeps/x86_64/elf/start.S
for this particular machine. This delightfully well-commented file exports the _start
symbol, sets up the stack and some registers, and calls __libc_start_main
. If we look at the very bottom of csu/libc-start.c
we see the call to our program's main
:
result= main (argc, argv, __environ MAIN_AUXVEC_PARAM);
and down the rabbit hole we go.
So that's what _start is all about. Conveniently, we can summarize what happens between _start
and the call to main
as "set up a bunch of stuff for libc and then call main
'', and since we don't care about libc, let's just export our own _start
symbol that just calls main
and link against that:
jesstess@kid-charlemagne:~/c$ cat stubstart.S.globl_start_start:callmain
Compiling and running with our stub _start
assembly file:
jesstess@kid-charlemagne:~/c$ gcc -nostdlib stubstart.S -o hello hello.c jesstess@kid-charlemagne:~/c$ ./hello Segmentation fault
Hurrah, our compilation problems go away! However, we still segfault. Why? Let's compile with debugging information and take a look in gdb
. We'll set a breakpoint at main
and step through until the segfault:
jesstess@kid-charlemagne:~/c$ gcc -g -nostdlib stubstart.S -o hello hello.c jesstess@kid-charlemagne:~/c$ gdb hello GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty"for details. This GDB was configured as "x86_64-linux-gnu"...(gdb)break main Breakpoint 1 at 0x4000f4: file hello.c, line 3.(gdb) run Starting program: /home/jesstess/c/hello Breakpoint 1, main () at hello.c:5 5 char *str ="Hello World";(gdb) step 6 return 0;(gdb) step 7 }(gdb) step 0x00000000004000ed in _start ()(gdb) step Single stepping until exit from function _start, which has no line number information. main () at helloint.c:4 4 {(gdb) step Breakpoint 1, main () at helloint.c:5 5 char *str ="Hello World";(gdb) step 6 return 0;(gdb) step 7 }(gdb) step Program received signal SIGSEGV, Segmentation fault. 0x0000000000000001 in ?? ()(gdb)
Wait, what? Why are we running through main
twice? ...It's time to look at the assembly:
jesstess@kid-charlemagne:~/c$ objdump -d hellohello: file format elf64-x86-64 Disassembly of section .text:00000000004000e8<_start>:4000e8:e8 03 00 00 00 callq4000f04000ed:90 nop4000ee:90 nop4000ef:90 nop00000000004000f0:4000f0:55 push%rbp4000f1:48 89 e5 mov%rsp,%rbp4000f4:48 c7 45 f8 03 01 40 movq$0x400103,-0x8(%rbp)4000fb:004000fc:b8 00 00 00 00 mov$0x0,%eax400101:c9 leaveq400102:c3 retq
D'oh! Let's save a detailed examination of the assembly for later, but in brief: when we return from the callq
to main
we hit some nop
s and run right back into main
. Since we re-entered main without putting a return instruction pointer
on the stack as part of the standard prologue for calling a function, the second call to retq
tries to pop a bogus return instruction pointer off the stack and jump to it and we bomb out. We need an exit strategy.
Literally. After the return from callq
, push 1
, the syscall number for SYS_exit
, into %eax
, and because we want to say that we're exiting cleanly, put a status of 0
, SYS_exit
's only argument, into %ebx
. Then make the interrupt to drop into the kernel with int $0x80
.
jesstess@kid-charlemagne:~/c$ cat stubstart.S.globl_start_start:callmainmovl$1,%eaxxorl%ebx,%ebxint$0x80
jesstess@kid-charlemagne:~/c$ gcc -nostdlib stubstart.S -o hello hello.c jesstess@kid-charlemagne:~/c$ ./hello jesstess@kid-charlemagne:~/c$
Success! It compiles, it runs, and if we step through this new version under gdb
it even exits normally.
Hello from a libc-free world!
Stay tuned for Part 2, where we'll walk through the parts of the executable in earnest and watch what happens to it as we add complexity, in the process understanding more about x86 linking and calling conventions and the structure of an ELF binary.