Hi, Could somebody please explain what GCC is doing for this piece of code? What is it initializing? The original code is:
#include <stdio.h>
int main()
{
}
And it was translated to:
.file "test1.c"
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
call __alloca
call ___main
leave
ret
I would be grateful if a compiler/assembly guru got me started by explaining the stack, register and the section initializations. I cant make head or tail out of the code.
EDIT: I am using gcc 3.4.5. and the command line argument is gcc -S test1.c
Thank You, kunjaan.
-
Well, dont know much about GAS, and i'm a little rusty on Intel assembly, but it looks like its initializing main's stack frame.
if you take a look, __main is some kind of macro, must be executing initializations. Then, as main's body is empty, it calls leave instruction, to return to the function that called main.
From http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax#.22hello.s.22_line-by-line:
This line declares the "_main" label, marking the place that is called from the startup code.
pushl %ebp movl %esp, %ebp subl $8, %espThese lines save the value of EBP on the stack, then move the value of ESP into EBP, then subtract 8 from ESP. The "l" on the end of each opcode indicates that we want to use the version of the opcode that works with "long" (32-bit) operands;
andl $-16, %espThis code "and"s ESP with 0xFFFF0000, aligning the stack with the next lowest 16-byte boundary. (neccesary when using simd instructions, not useful here)
movl $0, %eax movl %eax, -4(%ebp) movl -4(%ebp), %eaxThis code moves zero into EAX, then moves EAX into the memory location EBP-4, which is in the temporary space we reserved on the stack at the beginning of the procedure. Then it moves the memory location EBP-4 back into EAX; clearly, this is not optimized code.
call __alloca call ___mainThese functions are part of the C library setup. Since we are calling functions in the C library, we probably need these. The exact operations they perform vary depending on the platform and the version of the GNU tools that are installed.
Here's a useful link.
-
Here's a good step-by step breakdown of a simple
main()function as compiled by GCC, with lots of detailed info: GAS Syntax (Wikipedia)For the code you pasted, the instructions break down as follows:
- First four instructions (pushl through andl): set up a new stack frame
- Next five instructions (movl through sall): generating a weird value for eax, which will become the return value (I have no idea how it decided to do this)
- Next two instructions (both movl): store the computed return value in a temporary variable on the stack
- Next two instructions (both call): invoke the C library init functions
leaveinstruction: tears down the stack frameretinstruction: returns to caller (the outer runtime function, or perhaps the kernel function that invoked your program)
-
It looks like GCC is acting like it is ok to edit
main()to include CRT initialization code. I just confirmed that I get the exact same assembly listing from MinGW GCC 3.4.5 here, with your source text.The command line I used is:
gcc -S emptymain.c
Interestingly, if I change the name of the function to
qqq()instead ofmain(), I get the following assembly:.file "emptymain.c" .text .globl _qqq .def _qqq; .scl 2; .type 32; .endef _qqq: pushl %ebp movl %esp, %ebp popl %ebp retwhich makes much more sense for an empty function with no optimizations turned on.
-
It would really help to know what gcc version you are using and what libc. It looks like you have a very old gcc version or a strange platform or both. What's going on is some strangeness with calling conventions. I can tell you a few things:
Save the frame pointer on the stack according to convention:
pushl %ebp movl %esp, %ebpMake room for stuff at the old end of the frame, and round the stack pointer down to a multiple of 4 (why this is needed I don't know):
subl $8, %esp andl $-16, %espThrough an insane song and dance, get ready to return 1 from
main:movl $0, %eax addl $15, %eax addl $15, %eax shrl $4, %eax sall $4, %eax movl %eax, -4(%ebp) movl -4(%ebp), %eaxRecover any memory allocated with
alloca(GNU-ism):call __allocaAnnounce to libc that
mainis exiting (more GNU-ism):call ___mainRestore the frame and stack pointers:
leaveReturn:
retHere's what happens when I compile the very same source code with gcc 4.3 on Debian Linux:
.file "main.c" .text .p2align 4,,15 .globl main .type main, @function main: leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx) pushl %ebp movl %esp, %ebp pushl %ecx popl %ecx popl %ebp leal -4(%ecx), %esp ret .size main, .-main .ident "GCC: (Debian 4.3.2-1.1) 4.3.2" .section .note.GNU-stack,"",@progbitsAnd I break it down this way:
Tell the debugger and other tools the source file:
.file "main.c"Code goes in the text section:
.textBeats me:
.p2align 4,,15mainis an exported function:.globl main .type main, @functionmain's entry point:main:Grab the return address, align the stack on a 4-byte address, and save the return address again (why I can't say):
leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx)Save frame pointer using standard convention:
pushl %ebp movl %esp, %ebpInscrutable madness:
pushl %ecx popl %ecxRestore the frame pointer and the stack pointer:
popl %ebp leal -4(%ecx), %espReturn:
retMore info for the debugger?:
.size main, .-main .ident "GCC: (Debian 4.3.2-1.1) 4.3.2" .section .note.GNU-stack,"",@progbitsBy the way,
mainis special and magical; when I compileint f(void) { return 17; }I get something slightly more sane:
.file "f.c" .text .p2align 4,,15 .globl f .type f, @function f: pushl %ebp movl $17, %eax movl %esp, %ebp popl %ebp ret .size f, .-f .ident "GCC: (Debian 4.3.2-1.1) 4.3.2" .section .note.GNU-stack,"",@progbitsThere's still a ton of decoration, and we're still saving the frame pointer, moving it, and restoring it, which is utterly pointless, but the rest of the code make sense.
kunjaan : I am using gcc 3.4.5. which do you recommend?kunjaan : what are those .scl 2; .type 32; .endef after main btw? -
I should preface all my comments by saying, I am still learning assmebly.
I will ignore the section initialization. A explanation for the section initialization and basically everything else I cover can be found here: http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
The ebp register is the stack frame base pointer, hence the BP. It stores a pointer to the beginning of the current stack.
The esp register is the stack pointer. It holds the memory location of the top of the stack. Each time we push something on the stack esp is updated so that it always points to an address the top of the stack.
So ebp points to the base and esp points to the top. So the stack looks like:
esp -----> 000a3 fa 000a4 21 000a5 66 000a6 23 esb -----> 000a7 54If you push e4 on the stack this is what happens:
esp -----> 000a2 e4 000a3 fa 000a4 21 000a5 66 000a6 23 esb -----> 000a7 54Notice that the stack grows toward lower addresses, this fact will be important below.
The first two steps are known as the procedure prolog or more commonly the function prolog, they prepare the stack for use by local variables. See procedure prolog quote at the bottom.
In step 1 we save the pointer to the old stack frame on the stack by calling, pushl %ebp. Since main is the first function called, I have no idea what the previous value of %ebp points too.
Step 2, We are entering a new stack frame because we are entering a new function (main). Therefore, we must set a new stack frame base pointer. We use the value in esp to be the beginning of our stack frame.
Step 3. Allocates 8 bytes of space on the stack. As we mentioned above, the stack grows toward lower addresses thus, subtracting by 8, moves the top of the stack by 8 bytes.
Step 4; Alligns the stack, I've found different opinions on this. I'm not really sure exactly what this is done. I suspect it is done to allow large instructions (SIMD) to be allocated on the stack,
http://gcc.gnu.org/ml/gcc/2008-01/msg00282.html
This code "and"s ESP with 0xFFFF0000, aligning the stack with the next lowest 16-byte boundary. An examination of Mingw's source code reveals that this may be for SIMD instructions appearing in the "_main" routine, which operate only on aligned addresses. Since our routine doesn't contain SIMD instructions, this line is unnecessary.
http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
Steps 5 through 11 seem to have no purpose to me. I couldn't find any explanation on google. Could someone who really knows this stuff provide a deeper understanding. I've heard rumors that this stuff is used for C's exception handling.
Step 5, stores the return value of main 0, in eax.
Step 6 and 7 we add 15 in hex to eax for unknown reason. eax = 01111 + 01111 = 11110
Step 8 we shift the bits of eax 4 bits to the right. eax = 00001 because the last bits are shift off the end 00001 | 111.
Step 9 we shift the bits of eax 4 bits to the left, eax = 10000.
Steps 10 and 11 moves the value in the first 4 allocated bytes on the stack into eax and then moves it from eax back.
Steps 12 and 13 setup the c library.
We have reached the function epilogue. That is, the part of the function which returns the stack pointers, esp and ebp to the state they were in before this function was called.
Step 14, leave sets esp to the value of ebp, moving the top of stack to the address it was before main was called. Then it sets ebp to point to the address we saved on the top of the stack during step 1.
Leave can just be replaced with the following instructions:
mov %ebp, %esp pop %ebpStep 15, returns and exits the function.
1. pushl %ebp 2. movl %esp, %ebp 3. subl $8, %esp 4. andl $-16, %esp 5. movl $0, %eax 6. addl $15, %eax 7. addl $15, %eax 8. shrl $4, %eax 9. sall $4, %eax 10. movl %eax, -4(%ebp) 11. movl -4(%ebp), %eax 12. call __alloca 13. call ___main 14. leave 15. retProcedure Prolog:
The first thing a function has to do is called the procedure prolog. It first saves the current base pointer (ebp) with the instruction pushl %ebp (remember ebp is the register used for accessing function parameters and local variables). Now it copies the stack pointer (esp) to the base pointer (ebp) with the instruction movl %esp, %ebp. This allows you to access the function parameters as indexes from the base pointer. Local variables are always a subtraction from ebp, such as -4(%ebp) or (%ebp)-4 for the first local variable, the return value is always at 4(%ebp) or (%ebp)+4, each parameter or argument is at N*4+4(%ebp) such as 8(%ebp) for the first argument while the old ebp is at (%ebp).
http://www.milw0rm.com/papers/52
A really great stack overflow thread exists which answers much of this question. http://stackoverflow.com/questions/499842/why-are-there-extra-instructions-in-my-gcc-output
A good reference on x86 machine code instructions can be found here: http://programminggroundup.blogspot.com/2007/01/appendix-b-common-x86-instructions.html
This a lecture which contains some of the ideas used below: http://csc.colstate.edu/bosworth/cpsc5155/Y2006_TheFall/MySlides/CPSC5155_L23.htm
Here is another take on answering your question: http://www.phiral.net/linuxasmone.htm
None of these sources explain everything.
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.