PDA

View Full Version : Understanding the Stack --The assembly Language Perspective



sebas_phoenix
03-18-2011, 10:31 PM
Understanding the Stack --The assembly Language Perspective


By Sebas Sujeen aka "0x90" and Sridhar aka "phr3ak"

For developing and understanding the exploit development process, it is imperative to understand the working of the stack from the assembly language perspective too. This article will make you understand how stack works from this perspective. To get the most from this article, we would advise you to refer the Assembly Language Primer for Hackers by Vivek Ramachandran(http://www.securitytube.net) before reading this



An example C Program

#include<stdio.h>
main()
{
int i;
for(i=0;i<10;i++)
printf("Hello");
}


compile and create an assembly file using the '-S' switch in gcc



$gcc -mpreferred-stack-boundary=2 -fno-stack-protector -z execstack -o test.s test.c -S


Note we are giving the preferred stack boundary as 2^2 (2 refers to the power of 2 ie 4 bytes) to simplify the debug process. And if you are using latest versions of Linux, you can turn off stack-protector and also disable the NX(non executable stack) by using '-fno-stack-protector' and '-z execstack' respectively.

Below is the listing of the test.s file..
Note that the instructions are numbered only for convenience for referring it further..


.LC0:
.string "Hello" -->defines a string "Hello" with label .LC0 similar to variable in C
.text --> This is the text section, actual code goes here. This section is read only and any attempt to write into this section will result in Segmentation fault
.globl main --> start of the main function, in the Assembly Language primer this is given as _start: , since we are assembling and linking in the same step using gcc and since gcc recognizes only main, we give it as main
.type main, @function -->defines that main is a function
main: --> start of main function
pushl %ebp -->1
movl %esp, %ebp -->2
subl $8, %esp -->3
movl $0, -4(%ebp) -->4
jmp .L2 -->5
.L3:
movl $.LC0, %eax -->6
movl %eax, (%esp) -->7
call printf -->8
addl $1, -4(%ebp) -->9
.L2:
cmpl $9, -4(%ebp) -->10
jle .L3 -->11
leave-->12
ret -->13


Ok, here we go!!! The code is explained here!!


pushl %ebp
movl %esp,%ebp
subl $8,%esp


These 3 statements constitute the function prologue.Refer the assembly language primer for more details about function prologue. But one interesting thing to note is that, if main is the first function to be executed , where is it returning to?? Keep that thought in mind, we will get to it when we debug the program. Also we subtract esp by 8 bytes to allocate space for the local variable 'i' and the address of "Hello" which printf() needs.

Lets debug it by firing up gdb..

before that , lets create the executable file


$gcc -o test test.s -ggdb
The '-ggdb' switch makes debugging symbols to be loaded into the executable for analysis by the debugger..
Then lets fire up gdb..

$gdb ./test -q

'-q' is for quiet mode...
list command is used to list the disassembled code


(gdb) list
1 .file "first.c"
2 .section .rodata
3 .LC0:
4 .string "Hello"
5 .text
6 .globl main
7 .type main, @function
8 main:
9 pushl %ebp
10 movl %esp, %ebp
(gdb)
11 subl $8, %esp
12 movl $0, -4(%ebp)
13 jmp .L2
14 .L3:
15 movl $.LC0, %eax
16 movl %eax, (%esp)
17 call printf
18 addl $1, -4(%ebp)
19 .L2:
20 cmpl $9, -4(%ebp)
(gdb)
21 jle .L3
22 leave
23 ret
24 .size main, .-main
25 .ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
26 .section .note.GNU-stack,"",@progbits

set the break point at the appropriate location, lets set the break point at the first instruction after main function (ie) in this case line no 9


(gdb)break 9
This will set the break point at line no 9 and now if you run the program , the program will pause the execution at line no 9.
(gdb)run
this will run the program and the execution will halt at the break point
Analyse the esp register which points to the top of the stack
(gdb)i r esp
this will display the address "pointed to" by esp register
in our case ,"0xbffff44c"
(gdb)s
This will step the execution to the next instruction, so our instruction pushl %ebp is executed, now wat would have happened?? After a push instruction the stack pointer esp decrements by 4 bytes since the stack grows down in memory from higher to lower address ( 4 bytes since we use 32 bit processor)
(gdb)i r ebp
this will display the address "pointed to" by ebp , in our case " 0xbffff4c8"
(gdb) i r esp
this will display "0xbffff448" , note the decrement of 4 bytes from the previous value
(gdb)s
Now the current instruction is (2).. The implication of this instruction is to make the ebp to point to the new stack frame.
(gdb)i r esp
(gdb)i r ebp
both will display "0xbffff448"
(gdb)s
Now the instruction is (3), this will decrement the address "pointed to" by esp by 8 bytes to allocate space for local variable "i" and address to "hello" required by printf()
(gdb) i r esp
This will display "0xbffff440"
(gdb)s
The next instruction is (4)..Before that a few words, the ebp will be used to refer to the variables using an offset, now you can argue that why not use esp for the same purpose, but since esp's value changes with each push and pop it will require additional overhead to calculate the offset each time so we use ebp whose value doesnt change within a stack frame. Of the 8 bytes allocated , the first 4 bytes will be for "i" and the next 4 bytes to hold the address of "Hello". movl $0,-4(%ebp) , this is the indirect addressing mode. the () acts as a dereferencing operator , similar to '*' used with pointers in C. so what this instruction will do is that, it will 0 to address pointed to by (ebp)-4. Using the debugger will make it clear..
(gdb)i r ebp
gives "0xbffff448"
(gdb)print $ebp-4
This means 'get the address "pointed to" by ebp and subtract 4 from the address'
ie 0xbffff444
Now there is a way to examine the memory using the 'x' in gdb.. Refer the assembly language primer to more knw about the examine command
(gdb)x/1xw $ebp-4
This will examine 1 word (in hex) at address 0xbffff444
this will return 0x00000000
(gdb)s
The next insruction is "cmpl $9, -4(%ebp)"
ie it will compare 0 with 9 , basically as 0-9 but wont reflect the result back to -4(%ebp). This will obviously return a negative value
You can check that by referring the EFLAGS register and you can see that the Sign Flag is set (this will be set only if the result of an arithmetic expression returns a negative value)
(gdb)i r eflags
[ CF AF SF IF ID ] --> Note sign flag is set
(gdb)s
the next instruction is jle .L3.. this is the conditional jump statement. This will jump to .L3 if the result of the previous instruction is negative or zero(ie if the SF or ZF is set)
(gdb)s
the next instruction is "movl $.LC0, %eax"
This will move the address of "Hello" into eax register
(gdb)i r eax
this returns "0x80484d0"
Now examine the memory at this address
(gdb)x/1s 0x80484d0
This will return '0x80484d0: "Hello"'
(gdb)s
the next instruction is call printf(obvious what it does!!)
There are some things to remember when you call a function within an assembly program.
Assume the function call in C is like this : fn(1,2)
This is interpreted like this in assembly language
pushl $2
pushl $1
call fn
Note the arguments are pushed in reverse order coz stack is a LIFO structure so the argument pushed last comes out first!!
(gdb)s
the next instruction is addl $1,-4(%ebp)
this will increment the value of i by 1
And the process gets repeated until i <=9, thereby printing "Hello" ten times on the screen
Set a break point at line 22 (leave) and continue
(gdb)break 22
(gdb)cont
This will continue the execution till 'leave' instruction is encountered and pauses there
The leave instruction will clear up the space allocated for local vars and args to function..
This is equivalent to "movl %ebp,%esp"
"popl %ebp" this will pop the saved frame pointer so that the function that "called" this function can resume
The last instruction "ret" will put the saved EIP value into the eip register so that the execution can resume from the function which "called" the main function

Atlast after the ret instruction is executed, if u step with gdb
(gdb)s
you can see this , 0x00144bd6 in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
This means that main is called from __libc_start_main() and returns to that..

Hope it helps!!! Upcoming papers will be about exploit development!!(mainly Linux)..

Greetz : fb1h2s, Team SG and all g4h members!!!

feedback is welcome , whatever it may be!!!

References: Smashing the stack for fun and profit (Aleph1), Hacking the art of exploitation 2nd Edition, http://www.securitytube.net


Food for thought:
Try disassembling this and play with it!!!
#include<stdio.h>
int mul(int x,int y)
{
return x*y;
}
main()
{
int a,b,c;
scanf("%d %d",&a,&b);
c=mul(a,b);
}

his is a bit complicated considering the previous program!!But if you are able to understand the disassembled code...Then you have a greater clarity in understanding the stack...Gud Luck....Until next time..\m/ Peace Out

fb1h2s
03-18-2011, 11:10 PM
Hey thanks for the share and good job, one suggestion would be if u could add a digram of the instructions in the stack that would make it better. Give it a better understanding for the readers.

sebas_phoenix
03-19-2011, 01:39 AM
Hey thanks for the share and good job, one suggestion would be if u could add a digram of the instructions in the stack that would make it better. Give it a better understanding for the readers.

Thankx man!! Yeah i will provide the diagrams with my next article...

carlos2000
07-30-2012, 04:23 AM
thank u a lot i'm agree with friends diagrams can make it full:D

marc_kriss
08-05-2012, 09:17 PM
hello sebas..nice tut. I have a problem. When i try to access the memory address of ESI or EDI in a simple assembly program via GDB. GDB gives error:- "value cant be converted to integer". can u help out in this?

sebas_phoenix
08-06-2012, 04:29 PM
hello sebas..nice tut. I have a problem. When i try to access the memory address of ESI or EDI in a simple assembly program via GDB. GDB gives error:- "value cant be converted to integer". can u help out in this?

Can you please post your code?