View Full Version : x86 32-bit Assembly for Atheists

12-21-2010, 09:05 PM
Got a nice article over the internet long time ago..Thought of sharing it with you all :) Hope it proves to be useful ;)


This document is an introduction to creating programs for microprocessors of the x86 architecture family - in particular 32-bit code. The reader is expected to be familiar with programming in C/C++ (or similar languages such as Java at least) and the essential API of the operating system they are using. Some mathematical knowledge up to highschool/university level is essential for understanding a lot of things aswell. I will try not to be too OS specific but the environments I am going to focus on in this document are Windows, Linux and BSD. Pure knowledge of assembly in itself is useless if you do nott know how to combine it with the API of the operating system you are going to run it on so I want to make sure that this will be demonstrated to a certain extent. It is not that different from the way you would do it in a high level programming language such as C++ so it is not difficult to understand.
You might wonder why the name of this document contains the "for Atheists" part. I added it primarily to distinguish it from other documents by name but also because I am an outspoken atheist, as every scientist should be. It was more of a joke really, though.

What is Assembly?

Assembly (short ASM) is the lowest level programming possible - if you are a programmer and you want to get as close to the hardware as you can get then this is the place you want to be. In assembly code you get to control every single instruction that your CPU (central processing unit) is going to execute. There are different assembly languages for different microprocessor architectures and all of them are different from each other. Usually they are totally incompatible so you will have to write assembly code that is specific to the particular architecture you want your program to run on. In assembly you write instructions using the ASCII character set which directly represent machine code instructions that are executed by your processor. The names of these instructions are usually extremely short and are often abbreviations of full names. These assembly instruction names are called mnemonics.
Before your microprocessor can actually run the program you have written in assembly you will have to run it through a program which translates all the mnemonics and arguments to numerical machine code. This program is called an assembler. Assemblers often also support more features than just the pure instructions to make the jobs easier for the programmers but you will see how they do that later on.

What is the Purpose of Learning Assembly?

This is a very important question and subject to a lot of discussion. My answer to this question is a list of reasons, really. Better understanding of what goes on at the lowest level can make you a better programmer at a higher level. It allows you to see what goes on behind the scenes and it often gives you a totally new perspective on things. It has its uses in writing high performance parts of high level language programming where you need to use features of your processor that are not easily accessible in that high level language. Knowing assembly is obviously also necessary to be able to write compilers for a particular microprocessor architecture which convert high level language code to machine instructions.
The truth is that most people learn x86 ASM nowadays to crack commercial software, to reverse engineer closed source programs and to write cheats for computer games. Cracking is the one that made me learn it but I have to admit that I never got particularly good at it and I got totally distracted from my original goal in the process of understanding how it works. It is my experience that it is essential to learn how to write x86 ASM yourself first in order to be successful at cracking and reverse engineering. Knowing how to manually translate a C++ program to assembly is a valuable skill to have for this purpose.

Getting to know the x86 Architecture

So, what are we dealing with here? The x86 microprocessor is basically register machine which uses a CISC (complex instruction set computer) instruction set. At first I am going to explain what a register machine is. After that I will move on to the CISC part. A register machine is a computing device which stores results of arithmetic operations and such primarily in so called registers. These are small but highly efficient storage units inside the processor. When you write assembly you deal a lot with them. They are simple hardware implementations of integers. They are incapable of holding a lot of data at once but they are essential as temporary placeholders used in most instructions executed by the processor. In the terms of the memory hierarchy of contemporary computers they are at the very top. The hierarchy looks like this:
1. CPU registers
2. CPU cache
3. RAM (random access memory)
4. HDD (hard disk drive storage)
5. External storage, DVDs and such
Registers hold minimal amounts of data and are extremely fast. The cache holds far larger amounts of data but it is still pretty fast. The RAM holds even larger amounts of data and it is far slower than any memory operation inside your CPU. Hard disks have an even larger capacity than your RAM and they are very slow in comparison to the objects at the top of the hierarchy.
At this point I should probably briefly explain what the CPU cache actually is. It is a small high performance storage unit inside your CPU to which chunks of memory from the RAM are copied whenever you perform a RAM memory access. This way the CPU does not have to access the RAM over and over again when it is processing the same piece of data. This speeds up the execution of code a lot - RAM access is vey slow in comparison to cache access after all. In reality this cache is actually not a single unit but it is divided into multiple cache levels. The Level 1 Cache is the smallest but fastest one. The next level is bigger but slower and so on. Caching is not of much interest to somebody who is new to assembly, though. I might cover this topic later in sections with deal with optimising code for speed.
Back to the CISC part I mentioned earlier. There are two major microprocessor architecture philosophies known as RISC (reduced instruction set computer) and CISC (complex instruction set computer). In RISC architectures instructions are really fundamental and atomic. These instructions are incapable of performing multiple actions at once but they are very fast. The instruction format for such architectures is usually quite uniform with instructions that all take up the same amount of bytes which makes them very easy to decode for the processor. in CISC architectures it is common practice to have complex instructions which perform multiple tasks sequentially, like loading a value from memory, peforming an arithmetic operation on it and then writing back the result of the operation to the memory. In a RISC architecture this would be divided into multiple steps. CISC instructions are usually of variable length and they are very complicated to decode for the CPU. The CISC philosophy actually predates RISC, even though it was not called CISC at that time. Back in the early days of microprocessors (around 1971) most code had to be written in architecture specific assembly. High level languages as we know them today were pretty much impossible in those days so it was desirable to make the job for the programmers easier by providing them with comfortable to use instructions which would perform commonly grouped actions at once. When they finally realised that this was not very efficient (around 1975) it was already too late. The microprocessor industry based around CISC architectures would flourish and dominate to eventually totally conquer the desktop market. I would like to emphasise that this did not happen because of superior technology but because of superior resources and marketing strategies. We are basically using an inherently inferior instruction architecture for most desktop computers nowadays. The pure design philosophies do not exist as such in practice anymore and you will find elements of both of them in architectures nowadays

The complete document can be downloaded from here (http://www.2shared.com/document/F9sEbXgq/x86_programming.html). Keep sharing :)

11-06-2012, 06:02 AM
You might wonder why the name of this document contains the "for Atheists" part. I added it primarily to distinguish it from other documents by name but also because I am an outspoken atheist, as every scientist should be.

Ha Ha, did you know that a very large number of major scientist's have been christians...
Issac Newton, Lord Kelvin, Johannes Kepler, Blaise Pascal,Robert Boyle,George Boole, Michael Faraday, Charles Babbage(the dude who 'invented' the computer'), James Clerk Maxwell, Louis Pasteur and many more