As we have already discussed the importance of assembly language for hackers, security researcher in the previous tutorial Getting started with Assembly-Assembly Language for Hacker And Security Researcher ! . Consider a case, when you needs the shell bind tcp shellcode that spawns a shell on connect. When you Google on shell bind tcp shellcode, you may get desired shellcode but the shellcode may be detected by antivirus, IDS, firewall. In this case, you needs to write your own shellcode that fulfill your requirement and not detected by anti-virus. To write shellcode, assembly language is must.
Before we start writing assembly programs, let's have a look at the Structure of an Assembly Language Program. The basic structure of an assembly programs is :
.section .data
} All initialized data goes here
.section .bss
} All uninitialized data goes here
.section .text
.globl _start
_start:
Program Instructions
# End of Program
Now let's understand Outline of an Assembly Language Program :
Anything after the symbol “#” is a comment.Comments are not translated by the assembler. They are used only for the programmer to talk to anyone who looks at the code in the future.
.section .data : Under this section you initialize your data. The initialized data will consume memory and would contribute in the size of executable file. The .section command breaks your program up into sections. This command starts the data section, where you list any memory storage you will need for data. Our program doesn’t use any, so we don’t need the section. It’s just here for completeness. The space is reserved during compile time only. Some examples of declaration could be:
.ascii A non-NULL terminated string
.byte 1 byte value
.int 32 bit integer
.float Single precision floating point number
db '/bin/bash' The DB, or define byte directive, allows us to set aside space in memory for a string.
.section .bss : All uninitialized data is stored here. Anything declared in this segment is created at run time. Hence, whatever you declare here is not going to occupy any space inside the executable. Only when the program is loaded into memory, the space actually will be created.
.section .text : The text section of a program is where the program instructions live.This section comprises of program instructions.
.globl _start : This instructs the assembler that _start is important to remember. _start is a symbol,which means that it is going to be replaced by something else either during assembly or linking.every time you had to insert a piece of data or code you would have to change all the addresses in your program! Symbols are used so that the assembler and linker can take care of keeping track of addresses, and you can concentrate on writing your program. _start is a special symbol that always needs to be marked with .globl because it marks the location of the start of the program. Without marking this location in this way, when the computer loads your program it won’t know where to begin running your program
_start:This is somewhat like the “main()” function of 'C' programming language, i.e. assembler would hunt for it to be treated as the start of the program. it defines the value of the _start label. A label is a symbol followed by a colon. Labels define a symbol’s value. When the assembler is assembling the program, it has to assign each data value and instruction an address. Labels tell the assembler to make the symbol’s value be wherever the next instruction or data element will be.
The process layout map in memory looks like follow:
Writing First Assembly Program:
Type the source code :
section .data
.section .bss
.section .text
.globl _start
_start:
movl $1, %eax
movl $0, %ebx
int $0x80
# End of program
Let's dissect the program: We have not initialized anything in .data or .bss section as we are only interested in exiting from the program successfully. Hence just for the sake of completeness they have been included; else they can be dropped as well from the program code.
Now we get into actual computer instructions. The first such instruction is this:
movl $1, %eax
When the program runs, this instruction transfers the number 1 into the %eax register. In assembly language, many instructions have operands. movl has two operands - the source and the destination.The system call is always loaded into the register eax with the instruction:
movl $System_Call_Number, %eax
In the current case of exit, the System_Call_Number is "1", hence the instruction would be:
movl $1, %eax
Now the numbers of parameters required for the successful function call are fetched sequentially into ebx:
movl $0, %ebx
In the exit system call, %ebx is required to be loaded with the exit status. In the current case of exit, only one parameter is required which is either 0 (success status) or 1 (failure status), hence just ebx needs to be loaded:
movl $0, %ebx
Finally the control is handed over to Linux kernel by calling the interrupt int $0x80 to run the exit command:
int $0x80
The int stands for interrupt. The 0x80 is the interrupt number to use.An interrupt interrupts the normal program flow, and transfers control from our program to Linux so that it will do a system call.
After understanding the program, now let's run it. In order to transform it into a program that a computer can run, we need to assemble and link it.The first step is to assemble it. Assembling is the process that transforms what you typed into instructions for the machine. To assembly the program type in the command :
as exit.s -o exit.o
as is the command which runs the assembler, exit.s is the source file, and -o exit.o tells the assemble to put it’s output in the file exit.o. exit.o is an object file. An object file is code that is in the machine’s language, but has not been completely put together.
To link the file, enter the command
ld exit.o -o exit
ld is the command to run the linker, exit.o is the object file we want to link, and -o exit instructs the linker to output the new program into a file called exit.
You can run exit by typing in the command
./exit
If you like this post or have any question, please feel free to comment!
Before we start writing assembly programs, let's have a look at the Structure of an Assembly Language Program. The basic structure of an assembly programs is :
.section .data
} All initialized data goes here
.section .bss
} All uninitialized data goes here
.section .text
.globl _start
_start:
Program Instructions
# End of Program
Now let's understand Outline of an Assembly Language Program :
Anything after the symbol “#” is a comment.Comments are not translated by the assembler. They are used only for the programmer to talk to anyone who looks at the code in the future.
.section .data : Under this section you initialize your data. The initialized data will consume memory and would contribute in the size of executable file. The .section command breaks your program up into sections. This command starts the data section, where you list any memory storage you will need for data. Our program doesn’t use any, so we don’t need the section. It’s just here for completeness. The space is reserved during compile time only. Some examples of declaration could be:
.ascii A non-NULL terminated string
.byte 1 byte value
.int 32 bit integer
.float Single precision floating point number
db '/bin/bash' The DB, or define byte directive, allows us to set aside space in memory for a string.
.section .bss : All uninitialized data is stored here. Anything declared in this segment is created at run time. Hence, whatever you declare here is not going to occupy any space inside the executable. Only when the program is loaded into memory, the space actually will be created.
.section .text : The text section of a program is where the program instructions live.This section comprises of program instructions.
.globl _start : This instructs the assembler that _start is important to remember. _start is a symbol,which means that it is going to be replaced by something else either during assembly or linking.every time you had to insert a piece of data or code you would have to change all the addresses in your program! Symbols are used so that the assembler and linker can take care of keeping track of addresses, and you can concentrate on writing your program. _start is a special symbol that always needs to be marked with .globl because it marks the location of the start of the program. Without marking this location in this way, when the computer loads your program it won’t know where to begin running your program
_start:This is somewhat like the “main()” function of 'C' programming language, i.e. assembler would hunt for it to be treated as the start of the program. it defines the value of the _start label. A label is a symbol followed by a colon. Labels define a symbol’s value. When the assembler is assembling the program, it has to assign each data value and instruction an address. Labels tell the assembler to make the symbol’s value be wherever the next instruction or data element will be.
The process layout map in memory looks like follow:
Writing First Assembly Program:
Type the source code :
section .data
.section .bss
.section .text
.globl _start
_start:
movl $1, %eax
movl $0, %ebx
int $0x80
# End of program
Let's dissect the program: We have not initialized anything in .data or .bss section as we are only interested in exiting from the program successfully. Hence just for the sake of completeness they have been included; else they can be dropped as well from the program code.
Now we get into actual computer instructions. The first such instruction is this:
movl $1, %eax
When the program runs, this instruction transfers the number 1 into the %eax register. In assembly language, many instructions have operands. movl has two operands - the source and the destination.The system call is always loaded into the register eax with the instruction:
movl $System_Call_Number, %eax
In the current case of exit, the System_Call_Number is "1", hence the instruction would be:
movl $1, %eax
Now the numbers of parameters required for the successful function call are fetched sequentially into ebx:
movl $0, %ebx
In the exit system call, %ebx is required to be loaded with the exit status. In the current case of exit, only one parameter is required which is either 0 (success status) or 1 (failure status), hence just ebx needs to be loaded:
movl $0, %ebx
Finally the control is handed over to Linux kernel by calling the interrupt int $0x80 to run the exit command:
int $0x80
The int stands for interrupt. The 0x80 is the interrupt number to use.An interrupt interrupts the normal program flow, and transfers control from our program to Linux so that it will do a system call.
After understanding the program, now let's run it. In order to transform it into a program that a computer can run, we need to assemble and link it.The first step is to assemble it. Assembling is the process that transforms what you typed into instructions for the machine. To assembly the program type in the command :
as exit.s -o exit.o
as is the command which runs the assembler, exit.s is the source file, and -o exit.o tells the assemble to put it’s output in the file exit.o. exit.o is an object file. An object file is code that is in the machine’s language, but has not been completely put together.
To link the file, enter the command
ld exit.o -o exit
ld is the command to run the linker, exit.o is the object file we want to link, and -o exit instructs the linker to output the new program into a file called exit.
You can run exit by typing in the command
./exit
If you like this post or have any question, please feel free to comment!