View Full Version : Assembly for Hackers

10-07-2010, 12:50 PM
Dear All,

Following is an attempt to provide some supplements to those who are new to assembly language programming and finding it hard to start the venture of shell coding and/or exploitation techniques:

The readers of this document would be broadly categorized into two categories per the prerequisites:

1. Those who understand the basics of assembly and are familiar with assembly instructions, memory layout etc.
2. Those who are totally new to this subject.

For those who fall into category 2, it’s strongly suggested to grab the video series “Assembly Primer for Hackers” by Vivek Ramachandran. He has done an awesome job by creating such a simple to understand video tutorials on assembly programming. There are 11 video, each of 10-30 minutes time duration. That would give a kick start in understanding the basics of assembly programming language, the memory layout, registers and stack.

Our attempt is to present couple of programs (10-15) which would help beginners to learn and practice. Few things might not be clear as its gonna be quick manual. So dig more on google or shoot your query.

So lets start....

10-07-2010, 12:59 PM
Development Platform: Linux
Assembler: GAS (The GNU Assembler)
Linker: ld
Compiler: GCC
Debugger: GDB
Operation on 32-bit registers on Intel architecture

Some one-liners to refresh your concepts:

1. GAS terminology: movl source, destination

addl S, D --> Add source to destination and store in destination
subl S, D --> Subtract source from destination and store in destination
imull S, D --> Multiply source by the destination and store in destination
idivl number --> Dividend has to be in register eax, “number” is the divisor, quotient is then transferred to eax and the remainder to eds. The divisor can be any register or memory location.

2. Moving the values between registers:
movl %eax, %ebx --> Moving a double-word value (4 bytes) from the register eax into register ebx. The value in eax remains the same
movw %ax, %bx --> Moving a word value (2 bytes) from the register ax into register bx
movb %ah, %bh --> Moving a byte value (1 byte) from the register ah into register bh

Hence you can perform operations on either of the following:
• The whole 32 bit register, as we did in the first mov statement by appending the character ‘l’ (small L) and fetching 32-bit registers, or
• The lower 16 bits of the register, as we did in the second mov statement by appending the character ‘w’ (word) and fetching 16-bit registers, or
• Either of the lowest 8-bits by addressing them as ah and al using movb (b ~ byte) instruction.
Please note that just for the sake of example the register “eax” has been taken. It could have been ebx or ecx or edx.

Word = 2 bytes
Dword = 4 bytes
Short = 16 bit
Int = 32 bit

The mov instruction is useful for transferring data along any of the following paths:
• To a register from memory
• To memory from a register
• Between general registers
• Immediate data to a register
• Immediate data to a memory

The mov instruction cannot move from memory to memory. Memory-to-memory moves can be performed, however, by the string move instruction MOVSx series discussed later in the document.

3. Some Jump instructions:

cmpl %eax, %ebx

je --> Jump if the values under comparison are equal
jg --> Jump if the 2nd value is greater than the 1st value
jge --> Jump if the 2nd value is greater than equal to the 1st value
jl --> Jump if the 2nd value is less than the 1st value
jle --> Jump if the 2nd value is less than equal to the 1st value
jmp --> Unconditional jump

4. The difference between “call” and “jmp” is that “call” also pushes the return address onto the stack so that the function can return from where it was been called, while the “jmp” does not. This would be clearer with the examples in the later part of the document.

5. A specific integer value is associated with each syscall; this value must be placed into the register eax.
There are six registers that are used for the arguments that the system call takes. The first argument goes in EBX, the second in ECX, then EDX, ESI, EDI, and finally EBP, if there are so many. If there are more than six arguments, EBX must contain the memory location where the list of argument is stored – but don’t worry about this because it’s unlikely that you’ll use a syscall with more than six arguments.

to be contd...

10-08-2010, 07:21 AM
6. Moving Strings from one memory location to another (MOVSx series)

movsb --> move a byte (8 bits)
movsw --> move a word (16 bits)
movsl --> move a double word (32 bits)

Source --> ESI points to memory location
Destination --> EDI points to memory location

Interestingly, whenever any of the movsx series instruction is executed, the ESI and EDI are automatically incremented or decremented according to the Direction Flag (DF).

If DF (part of EFLAGS registers) is set i.e. has a value ‘1’, ESI and EDI registers are decremented.
If DF is cleared i.e. has a value ‘0’, ESI and EDI registers are incremented.

We can set DF using the STD instruction and it can be cleared using the CLD instruction.

7. Moving Strings from memory location into registers (LODSx series)

lodsb --> load a byte from memory location into AL
lodsw --> load a word from memory location into AX
lodsl --> load a double word from memory location into EAX

The loading is always done into EAX register and the source string has to be pointed to by ESI.

The register ESI would be automatically incremented or decremented based on DF flag after the LODSx instruction executes.

8. Storing Strings from registers into memory location (STOSx series)

stosb --> store a byte from AL into memory location
stosw --> store a word from AX into memory location
stosl --> stores a double word from EAX into memory location

The storing is always done from EAX register and the EDI points to the destination memory.

The register EDI would be automatically incremented or decremented based on DF flat after the STOSx instruction executes.

9. Comparing Strings (CMPSx series to compare various strings)

cmpsb --> compares a byte value
cmpsw --> compares a word value
cmpsl --> compares a double word value

For comparison, the ESI should point to the source string and EDI should point to the destination string.

The register ESI and EDI would automatically incremented or decremented based on the DF flag after the CMPSx instruction executes.
When CMPSx instruction executes, it subtracts the destination string from the source string and appropriately sets the Zero Flag (ZF) in EFLAGS register. When the comparison matches, ZF is set to ‘0’, else it is set to ‘1’.

*Remember that when ZF or DF are ‘set’, they have a numeral value of ‘1’ and when they are ‘not set’, the have a numeral value of ‘0’.

CLD --> clear the DF (DF = 0). ESI and EDI would get incremented
STD --> set the DF (DF = 1). ESI and EDI would get decremented
CMPSx --> When both of the strings are same, the subtraction of destination from source comes out to be ‘0’ and ZF gets set i.e. it gets a value of ‘1’
CMPSx --> When both the strings are different, ZF gets a value of ‘0’ and is not set.

(gdb) info registers --> would show only the ‘set’ components of EFLAGS

GDB would be covered in detail in later part.

to be contd...

10-10-2010, 07:03 AM
Data Accessing Modes

Data accessing modes or methods are different ways a processor can adopt to access data. This section will deal with how those addressing modes are represented in assembly language instructions.

The general form of memory address references is following:

BaseAddress( %Offset, %Index, DataSize)

Perform the following calculation to calculate the address:

Final_address = BaseAddress + %Offset + (DataSize x %Index)

BaseAddress and DataSize must both be constants, while the other two, i.e. %Offset and %Index, must be registers. If any of the pieces is left out, it is just substituted with zero in the equation.

All of the following discussed addressing modes except immediate addressing mode can be represented in this fashion.

If you are new to this stuff, you might not be able to digest and understand it properly. So just go through them once and do keep referring them while programming.

1. Immediate Addressing Mode

Instruction --> movl $10, %eax

It says; load the value 10 into the register eax. This mode is used to load direct values into registers or memory location. Please pay attention to the $ sign. It’s the $ sign which is making it “Immediate Addressing Mode”. Without it, the instruction would instruct to load the ‘value’ present at the memory location 10 into eax rather than the number 10 itself and thus making it “Direct Addressing Mode” instead of “Immediate Addressing Mode”.

2. Direct Addressing Mode

Instruction --> movl ADDRESS, %eax

Hence, this is done by only using the BaseAddress portion, and rests of the fields have been substituted with zero in the equation.

It says; load the value at the ADDRESS into the register eax. This terminology should be quite clear to the readers acquainted with pointers in programming languages.

.section .data

.int 16
.section .text

.globl _start
movl IntValue, %eax

The above code will pass the value 16 into register eax. Please do not worry about the code if you are not comfortable with it at the current moment. They would be clearer as you proceed with the document.

Another example could be:

movl 1002, %eax.
It is Direct Addressing Mode considering 1002 as some memory address containing some value.

3. Indirect Addressing Mode

Instruction --> movl (%eax), %ebx

It says; eax is holding some address, and we want to move the value at that address into register ebx. Hence, the “Indirect Addressing Mode” loads a value from the address indicated by a register.

A very nice example of this addressing mode is to obtain the top of the stack without popping out the top value:

movl (%esp), %eax

4. Indexed Addressing Mode

Instruction --> movl BaseAddress(%Offset , %Index, DataSize), %DestinationRegister

.section .data

.long 1, 2, 3, 4, 5
.section .text

.globl _start
movl $0, %esi
movl $0, %edi
movl IntArray(%esi, %edi, 4), %eax

This will move the value “1” from the initialized array into the register eax.
Actually the above statement says, “start at the beginning of IntArray as the %Offset is zero, and take the first item number (because %Index is 0 and the counting of array starts from 0 itself).
Also remember that each number takes up four storage locations (because data type is ‘long’ i.e. 4 bytes).”

If edi is incremented to 1 i.e. if the %Index holds numeral value 1, the last code statement would move the number ‘2’ from IntArray into eax.

5. Base Pointer Addressing Mode

Instruction --> movl 4(%eax), %ebx

Base-pointer addressing is similar to indirect addressing, except that it adds a constant value to the address in the register.

movl (%esp), %eax --> Indirect addressing mode. It would copy the value on the top of the stack into eax

movl 4(%esp), %eax --> Base pointer addressing mode to access the 2nd top value on a stack

movl $9, 4(%edi) --> copy the value 9 in the memory pointed out by (edi + 4)

movl $9, -2(%edi) --> copy the value 9 in the memory pointed out by (edi – 2)

We would be using base pointer addressing mode very frequently while making programs in this guide.

6. Register Addressing Mode

Instruction --> movl %eax, %ebx

Register mode simply moves data in or out of a register.

10-10-2010, 07:14 AM
Few examples

Being said the above terminology; let us play moving some values in and out of memory/registers for practice.

Instead of taking examples one-by-one at this stage, let us pen down what generally arouses in mind of a newbie programmer.

Before that, we need to declare some memory locations and keep in mind that while “moving” the data from “source” to “destination” does not actually change the value at source. It is simply copied into the destination contrary to the word “move”.

.section .data

.int 10
.int 10, 20, 30, 40, 50

1. How to move a value 15 in register?
movl $15, %eax --> Immediate Addressing Mode

2. How to move a value 15 in the location?
movl $15, mem_location --> This would change the value in mem_location from 10 to 15

3. How to move the value in the mem_location in a register and vice versa?
movl mem_location, %eax
movl %eax, mem_location

4. What if I need to copy the address of mem_location in a register? i.e. the value stored in the register would be the addess of the mem_location

movl $mem_location, %eax --> Notice the prepended “$” dollar to memory location

print &mem_location = print /x $eax (Some GDB terminology you would come across later)
Similarly, movl $mem_location, another_location, will load the address of mem_location to another_location.

5. What if I need to copy something from one register to another?

movl %eax, %ebx --> To move a 32 bit value
movw %ax, %bx --> To move a 16 bit value
movb %ah, %bh --> To move a 8 bit value.

Bottom line is that both, the source and destination, should be of same size.

6. How to access value in an array?

BaseAddress(Offset, Index, Data_Size)
Here the trap is, the “Offset” and “Index” needs to be mentioned in registers. “Data_Size” would be an integer value and it’s basically the size of the data type under operation.
Let us say you want to change the 4th variable of array to 44, following would be the instructions:

movl $0, %eax
movl $3, %ebx
movl $44, IntegerArray(%eax, %ebx, 4)

7. How to do indirectly (Indirect Addressing Mode)?

movl $mem_location, %eax --> move the address of label mem_location into register eax

movl (%eax), %ebx --> move the value at the address stored in register eax into register ebx i.e. mov the value of label mem_location into register ebx

movl $35, (%eax) --> move the value 35 at the location pointed by the address stored in register eax i.e. here in the current case, the current value of label mem_location would be over written with integer value 35

to be contd...

10-11-2010, 01:58 PM
# Start of Program.
# Anything after the symbol “#”is a comment alike many other programming languages.
# Any assembly program has following three sections and structure:

.section .data

} All initialized data goes here

.section .bss

} All uninitialized data goes here

.section .text

.globl _start

Program Instructions
More Instructions
Some more Instructions
# End of Program

.section .data

Under this section you initialize your data. The initialized data will consume memory and would contribute in the size of executable file. The space is reserved during compile time only. Some examples of declaration could be:

.ascii --> A non-NULL terminated string
.asciz --> A NULL terminated string
.byte --> 1 byte value
.short --> 16 bit integer
.int --> 32 bit integer
.long 10 --> What about it?
.float --> Single precision floating point number
.double --> Double precision floating point number
.int 10, 20, 30, 40, 50 --> Declaration of Integer Array
db ‘/bin/bash’ --> The DB, or define byte directive (it’s not technically an instruction), allows us to set aside space in memory for a string

.section .bss

All uninitialized data is stored here. Anything declared in this segment is created at run time. Hence, whatever you declare here is not going to occupy any space inside the executable. Only when the program is loaded into memory, the space actually will be created. Following could be the declaration examples:

.comm buffer, 1000 --> declares a ‘buffer’ of 1000 bytes. ‘buffer’ would be the Label_name i.e. it would refer to the location that follows it.

.comm --> declares common memory area
.lcomm --> declares local common memory area

This section can reserve storage, but it can not initialize it. This section is primarily useful for buffers because we do not need to initialize them anyway; we just need to reserve storage.

.section .text

This section comprises of program instructions.

.globl _start

This is somewhat like the “main()” function of ‘C’ programming language, i.e. assembler would hunt for it to be treated as the start of the program.

We are free to include only that section of program which has some data or significance in our program. For example, if we do not have any uninitialized data in our program, we can exclude the .bss section from our program without any harm.

to be contd...

10-11-2010, 10:31 PM
And yes grab the book Professional Assembly Language and start reading a good book for starters.

10-25-2010, 08:56 AM
Learn your debugger well to debug the code efficiently. This section comprises of some tricks/commands/short-cuts to use GDB efficiently. To cut it short, it’s a cheat sheet for GDB

1. If intending to open compiled ‘C’ programs using GDB, you need to tell your compiler to compile your code with symbolic debugging information included. E.g.

# gcc –g –o hello hello.c
# gcc –ggdb –o hello hello.c
# g++ -g –o hello hello.c

2. To run the program in GDB, do either of the following:

# gdb ./<binary> [Return Key] --> This will open up the binary in GDB
# gdb [Return Key] --> This will open up the debugger without loading any program. On the gdb prompt, pass the command “file <binary_name>” and that will cause the executable to be loaded up:
(gdb) file <binary_name> [Return Key]
# gdb –tui ./<binary> [Return Key] --> For console-cum-GUI GDB

3. If arguments as well have to be passed to the program to be loaded into GDB, following options can be opted:

# gdb <binary> --args arg1 arg2 arg3 …. argN [Return Key]
# gdb <binary> [Return Key]
(gdb) run arg1 arg2 arg3 ….. argN

4. Hitting the ‘RETURN’ at gdb prompt will repeat the last command entered.

5. Break Points
Use the “break” or “b” command at gdb prompt to specify a location which could be a function name, a line number or a source file and line number.

Set Break Point
break main, to set a break point at the function “main”
break 5, to set a break point at the code line number 5
break hello.c:5, to set a break point at code line number 5 of imported file hello
break *_start+1, Include “nop” on the next line.

Check Break Point

 info breakpoints, to list the current break points ( type ‘i b’ without quotes for shortcut)

Clear Break Point
 clear main to clear the break point set at particular function
 delete <breakpoint number> Check it out
 If the program has already been “run” but you forget to set breakpoints, hit CTRL-C and that will stop the program where ever it happens to be and return you to the gdb prompt. At that point, you can set up a proper breakpoint somewhere and ‘continue’ to that break point.

6. ‘next’, and ‘step’ (s for shortcut) to proceed step by step after you have hit the breakpoint. ‘continue’ (c) to continue until next breakpoint or end of program.
One shortcut could be just hitting RETURN as it repeats the last command entered. This will save you typing ‘next’ or’s’ over and over again.

7. Following and the next point (8) are gdb commands which you would use very frequently while debugging your program:

(gdb) list To list the source code of executable loaded
(gdb) disassemble <function_name> To dump the assembly code of function referred
(gdb) help <keyword> gdb help pages
(gdb) info registers To see the content and state of all registers
(gdb) info variables To see all variables and their respective addresses

8. Examine command

(gdb) print variable_name To see the value of a variable in decimal
(gdb) print /x variable_name To see the value of a variable in hex
(gdb) print /c variable_name To see the value of a variable in ASCII
(gdb) print &Label_name To see the address of Label_name
(gdb) print /x &Label_name To see the address of Lable_name in better format
(gdb) x/FMT &Label_name To see the value of variable (useful in case of integers)
(gdb) x/1s &Label_name To see the whole string in single-shot (useful in case of strings)
(gdb) x/1s $register To see the whole string in single-shot located at the address stored in register
(gdb) x/1s 0x080000 i.e address To see the whole string in single-shot at a particular address
(gdb) print /c $eax To see the value in register in ASCII
(gdb) print /d $eax To see the value in register in Decimal
(gdb) print /x $eax To see the value in register in HEX
(gdb) x/FMT Address Address could be something like 0x08.. or ‘&Label_name’
 If there is not Label_name, take the address and fetch to examine command

10-29-2010, 08:02 AM
Purpose To exit the program “cleanly” and pass the exit code to the Linux kernel
Input Nothing
Program Flow

• Call the “exit()” function and exit out of program
• Check the return code at console
Output Nothing. Just check the exit code.

# Program to explain the way to exit() from a Linux Assembly Program

.section .data

.section .bss

.section .text

.globl _start

movl $1, %eax
movl $0, %ebx
int $0x80

# End of program

Let’s dissect the program

We have not initialized anything in .data or .bss section as we are only interested in exiting from the program successfully. Hence just for the sake of completeness they have been included; else they can be dropped as well from the program code.

The ‘C’ programming terminology for exit is:
e.g. exit(0) or exit(1)

As a programmer, we generally pass the integer value ‘0’ on success and integer value ‘1’ on failure. So the program logic is, call the exit function and pass the relevant integer value to it as an exit integer-status.

Following are the steps we need to follow in Assembly language programs.
• Load the system call for relevant function (i.e. call the exit function in current program)
• Load it’s parameters (i.e. pass the integer value to it)
• Call Linux kernel interrupt to run the command (i.e. execute the exit function in current program)

The system call is always loaded into the register eax with the instruction:

movl $System_Call_Number, %eax
In the current case of exit, the System_Call_Number is ‘1’, hence the instruction would be:

movl $1, %eax
The numbers of parameters required for the successful function call are fetched sequentially into ebx, ecx, edx and so on.

In the current case of exit, only one parameter is required which is either 0 (success status) or 1 (failure status), hence just ebx needs to be loaded:

movl $0, %ebx
(In the example of read() or write() function call we will see how other parameters are loaded into registers)

Finally the control is handed over to Linux kernel by calling the interrupt int $0x80 to run the exit command.

int $0x80

So the following three instructions in assembly language are equivalent to the exit(0) function call in ‘C’ programming language:

movl $1, %eax
movl $0, %ebx
int $0x80

For all such calls we need to follow the same pattern i.e. load the system call number into the register eax and start loading the required parameters into ebx, ecx, edx and so on. Finally call the Linux kernel interrupt with the instruction int $0x80 and run the desired command.

EAX --> System Call number
EBX --> First argument
ECX --> Second argument
EDX --> Third argument
ESI --> Fourth argument
EDI --> Fifth argument

For system calls which require more than 5 arguments, we go ahead and pass a pointer to structures containing those arguments.


Name the program --> exit.s
Assemble the program --> $ as –gstabs –o exit.o exit.s
Link the program --> $ ld –o exit exit.o
Execute the program --> $ ./exit
Check the output --> $ echo $?
You must get ‘0’ at the console as output.

If any of the above commands report error(s), do spell check for the source code and commands. After correcting the source code, you have to re-run all the commands.

You must always re-assemble and re-link assembly programs after the source code file has been modified

11-16-2010, 10:37 PM
Assembly is really essential to step into exploitation and hacking ..Thanks a million b0nd bro for the wonderful effort.. :)

11-16-2010, 11:09 PM
also Windows Assembly Language Primer has just been started by Vivek on SecurityTube . Check out the 1st video >>