Introduction
Binary Exploitation
is about finding vulnerabilities in programs and utilising them to do what you wish. Sometimes this can result in an authentication bypass or the leaking of classified information, but occasionally (if you’re lucky) it can also result in Remote Code Execution (RCE). The most basic forms of binary exploitation occur on the stack, a region of memory that stores temporary variables created by functions in code. When a new function is called, a memory address in the calling function is pushed to the stack - this way, the program knows where to return to once the called function finishes execution. Let’s look at a basic binary to show this.
Prerequisites
This post uses radere2, pwndbg, etc
for dynamic analysis. And pwntools
python package is useful for binary exploitation
.
Investigation
- Basic
1
2
3
4
5
6
file ./example
strings ./example
objdump -d ./example
# -M: type
objdump -M intel -d ./example
Security Properties
First check the executable properties.
1
checksec --file=./example
RELRO (stands for Relocation Read-Only)
Partial RELRO - We can read/write the global offset table.
Full RELRO - We can only read the global offset table. So we cannot overwrite GOT.
STACK CANARY
- No canary found - It’s vulnerable to buffer overflow.
NX (stands for Non-eXecutable segments)
- NX enabled - We cannot execute custom shellcode from the stack.
PIE (stands for Position Independent Executable)
- No PIE - The binary always starts at same address.
ASLR (Address Space Layout Randomization) in Machine
ASLR is a security technique involved in preventing exploitation of memory corruption vulnerabilities.
1
2
cat /proc/sys/kernel/randomize_va_space
2
- 0 - The address space is NOT randomized.
- 1 - The address space is randomized.
- 2 - The address space is randomized, and data segment as well.
x86 Architecture
General-Purpose Registers (GPR) - 16-bit naming conventions
The 8 GPRs are as follows :
Accumulator register (AX)
. Used in arithmetic operationsCounter register (CX)
. Used in shift/rotate instructions and loops.Data register (DX)
. Used in arithmetic operations and I/O operations.Base register (BX)
. Used as a pointer to data (located in segment register DS, when in segmented mode).Stack Pointer register (SP)
. Pointer to the top of the stack.Stack Base Pointer register (BP)
. Used to point to the base of the stack.Source Index register (SI)
. Used as a pointer to a source in stream operations.Destination Index register (DI)
. Used as a pointer to a destination in stream operations.
Identifiers to access registers and parts thereof.
Register | 64-bit | 32-bit | 16-bit | 8-bit |
---|---|---|---|---|
Accumulator | RAX | EAX | AX | AL |
Counter | RCX | ECX | CX | CL |
Data | RDX | EDX | DX | DL |
Base | RBX | EBX | BX | BL |
Stack Ptr | RSP | ESP | SP | SPL |
Base Ptr | RBP | EBP | BP | BPL |
Source | RSI | ESI | SI | SIL |
Destination | RDI | EDI | DI | DIL |
Register
Registers
are essentially places that the processor can store memory. You can think of them as buckets which the processor can store information in. Here is a list of the x64 registers, and what their common use cases are.
rbp
: Base Pointer, points to the bottom of the current stack framersp
: Stack Pointer, points to the top of the current stack framerip
: Instruction Pointer, points to the instruction to be executed
Register | Description |
---|---|
rax | Accumulator register |
rbx | Base register |
rcx | Counter register |
rdx | Data register |
rsi | Source index register |
rdi | Destination index register |
r8 | General-purpose register |
r9 | General-purpose register |
r10 | General-purpose register |
r11 | General-purpose register |
r12 | General-purpose register |
r13 | General-purpose register |
r14 | General-purpose register |
r15 | General-purpose register |
There are sixteen, 64-bit General Purpose Registers (GPRs)
. The GPRs are described in the following table. A GPR register can be accessed with all 64-bits or some portion or subset accessed.
64-bit Register | 32-bit | 16-bit | 8-bit |
---|---|---|---|
rax | eax | ax | al |
rbx | ebx | bx | bl |
rcx | ecx | cx | cl |
rdx | edx | dx | dl |
rsi | esi | si | sil |
rdi | edi | di | dil |
rbp | ebp | bp | bpl |
rsp | esp | sp | spl |
r8 | r8d | r8w | r8b |
r9 | r9d | r9w | r9b |
r10 | r10d | r10w | r10b |
r11 | r11d | r11w | r11b |
r12 | r12d | r12w | r12b |
r13 | r13d | r13w | r13b |
r14 | r14d | r14w | r14b |
r15 | r15d | r15w | r15b |
In x64 linux arguments to a function are passed via registers. The first few args are passed by these registers :
Register | Argument Number |
---|---|
rdi | First Argument |
rsi | Second Argument |
rdx | Third Argument |
rcx | Fourth Argument |
r8 | Fifth Argument |
r9 | Sixth Argument |
With the x86 elf
architecture, arguments are passed on the stack. Also one thing as you may know, in C function can return a value. In x64, this value is passed in the rax
register. In x86 this value is passed in the eax
register.
Also one thing, there are different sizes for registers. These typical sizes we will be dealing with are 8 bytes, 4 bytes, 2 bytes, and 1
. The reason for these different sizes is due to the advancement of technology, we can store more data in a register.
8 Byte Register | Lower 4 Bytes | Lower 2 Bytes | Lower Byte |
---|---|---|---|
rbp | ebp | bp | bpl |
rsp | esp | sp | spl |
rip | eip | ||
rax | eax | ax | al |
rbx | ebx | bx | bl |
rcx | ecx | cx | cl |
rdx | edx | dx | dl |
rsi | esi | si | sil |
rdi | edi | di | dil |
r8 | r8d | r8w | r8b |
r9 | r9d | r9w | r9b |
r10 | r10d | r10w | r10b |
r11 | r11d | r11w | r11b |
r12 | r12d | r12w | r12b |
r13 | r13d | r13w | r13b |
r14 | r14d | r14w | r14b |
r15 | r15d | r15w | r15b |
In x64
we will see the 8 byte registers
. However in x86 the largest sized registers we can use are the 4 byte registers like ebp, esp, eip etc
. Now we can also use smaller registers, than the maximum sized registers for the architecture.
In x64 there is the rax, eax, ax, and al register
. The rax register points to the full 8. The eax register is just the lower four bytes of the rax
register. The ax register is the last 2 bytes of the rax
register. Lastly the al
register is the last byte of the rax
register.
Stack
In computer architecture, the stack is a hardware manifestation of the stack data structure (a Last In, First Out queue).
In x86, the stack is simply an area in RAM that was chosen to be the stack - there is no special hardware to store stack contents. The esp/rsp register holds the address in memory where the bottom of the stack resides. When something is pushed to the stack, esp decrements by 4 (or 8 on 64-bit x86), and the value that was pushed is stored at that location in memory. Likewise, when a pop instruction is executed, the value at esp is retrieved (i.e. esp is dereferenced), and esp is then incremented by 4 (or 8).
N.B. The stack "grows" down to lower memory addresses!
Conventionally, ebp/rbp contains the address of the top of the current stack frame, and so sometimes local variables are referenced as an offset relative to ebp rather than an offset to esp. A stack frame is essentially just the space used on the stack by a given function.
- ->
Uses
The stack is primarily used for a few things :
Storing function arguments
Storing local variables
Storing processor state between function calls.
Now one of the most common memory regions you will be dealing with is the stack. It is where local variables in the code are stored.
For instance, in this code the variable x
is stored in the stack :
1
2
3
4
5
6
7
#include <stdio.h>
void main(void)
{
int x = 5;
puts("hi");
}
Specifically we can see it is stored on the stack at rbp-0x4
.
1
2
3
4
5
6
7
8
9
10
11
12
13
0000000000001135 <main>:
1135: 55 push rbp
1136: 48 89 e5 mov rbp,rsp
1139: 48 83 ec 10 sub rsp,0x10
113d: c7 45 fc 05 00 00 00 mov DWORD PTR [rbp-0x4],0x5
1144: 48 8d 3d b9 0e 00 00 lea rdi,[rip+0xeb9] # 2004 <_IO_stdin_used+0x4>
114b: e8 e0 fe ff ff call 1030 <puts@plt>
1150: 90 nop
1151: c9 leave
1152: c3 ret
1153: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
115a: 00 00 00
115d: 0f 1f 00 nop DWORD PTR [rax]
Now values on the stack are moved on by either pushing them onto the stack, or popping them off. That is the only way to add or remove values from the stack (it is a LIFO data structure). However we can reference values on the stack.
The exact bounds of the stack is recorded by two registers, rbp and rsp
. The base pointer rbp
points to the bottom of the stack. The stack pointer rsp
points to the top of the stack.
Flags
There is one register that contains flags. A flag is a particular bit of this register. If it is set or not, will typically mean something. Here is the list of flags.
Flag Index | Flag Name | Description |
---|---|---|
00 | Carry Flag | Indicates a carry or borrow occurred in an operation |
01 | Always 1 | Always set to 1 |
02 | Parity Flag | Indicates the parity (even or odd) of the result |
03 | Always 0 | Always set to 0 |
04 | Adjust Flag | Adjusts the result of BCD arithmetic operations |
05 | Always 0 | Always set to 0 |
06 | Zero Flag | Indicates the result of an operation is zero |
07 | Sign Flag | Indicates the sign (negative or positive) of the result |
08 | Trap Flag | Allows single-step execution for debugging purposes |
09 | Interruption Flag | Enables or disables maskable hardware interrupts |
10 | Direction Flag | Specifies the direction for string instructions |
11 | Overflow Flag | Indicates signed arithmetic overflow or underflow |
12 | I/O Privilege Field (Lower) | Represents the privilege level for I/O operations (Lower bit) |
13 | I/O Privilege Field (Higher) | Represents the privilege level for I/O operations (Higher bit) |
14 | Nested Task Flag | Indicates if the current task is nested |
15 | Resume Flag | Controls the type of task switch |
There are other flags then the one listed, however we really don’t deal with them too much (and out of these, there are only a few we actively deal with).
Instructions
Now we will be covering some of the more common instructions you will see. This isn’t everything you will see, but here are the more common things you will see.
mov
The move instruction just moves data from one register to another. For instance :
1
mov rax, rdx
This will just move the data from the rdx register to the rax register.
dereference
If you ever see brackets like [], they are meant to dereference, which deals with pointers. A pointer is a value that points to a particular memory address (it is a memory address). Dereferencing a pointer means to treat a pointer like the value it points to. For instance :
1
mov rax, [rdx]
Will move the value pointed to by rdx into the rax register. On the flipside :
1
mov [rax], rdx
Will move the value of the rdx register into whatever memory is pointed to by the rax register. The actual value of the rax register does not change.
lea
The lea instruction calculates the address of the second operand, and moves that address in the first. For instance :
1
lea rdi, [rbx+0x10]
This will move the address rbx+0x10 into the rdi register.
add
This just adds the two values together, and stores the sum in the first argument. For instance:
1
add rax, rdx
That will set rax equal to rax + rdx
sub
This value will subtract the second operand from the first one, and store the difference in the first argument. For instance :
1
sub rsp, 0x10
This will set the rsp register equal to rsp - 0x10
xor
This will perform the binary operation xor on the two arguments it is given, and stores the result in the first operation :
1
xor rdx, rax
That will set the rdx register equal to rdx ^ rax
.
The and and or operations essentially do the same thing, except with the and or or binary operators.
push
The push instruction will grow the stack by either 8 bytes (for x64, 4 for x86), then push the contents of a register onto the new stack space. For instance :1
push rax
This will grow the stack by 8 bytes, and the contents of the rax register will be on top of the stack.
pop
The pop instruction will pop the top 8 bytes (for x64, 4 for x86) off of the stack and into the argument. Then it will shrink the stack. For instance:
1
pop rax
The top 8 bytes of the stack will end up in the rax register.
jmp
The jmp instruction will jump to an instruction address. It is used to redirect code execution. For instance:
1
jmp 0x602010
That instruction will cause the code execution to jump to 0x602010, and execute whatever instruction is there.
call & ret
This is similar to the jmp instruction
. The difference is it will push the values of rbp and rip onto the stack, then jump to whatever address it is given. This is used for calling functions. After the function is finished, a ret instruction
is called which uses the pushed values of rbp and rip (saved base and instruction pointers) it can continue execution right where it left off
cmp
The cmp instruction is similar to that of the sub instruction. Except it doesn’t store the result in the first argument. It checks if the result is less than zero, greater than zero, or equal to zero. Depending on the value it will set the flags accordingly.
jnz / jz
This jump if not zero
and jump if zero (jnz/jz)
instructions are pretty similar to the jump instruction. The difference is they will only execute the jump depending on the status of the zero flag
. For jz it will only jump if the zero flag is set. The opposite is true for jnz.
Analysis
In conducting an analysis the first thing to start :
check the file
to see which is the executable format for Linux (it is recommended to follow along with this with a Virtual Machine of your own, preferably Linux).
then we have to know the security in the program can use the checksec command
, like what I described above.
GDB Introductions
GDB, or the GNU Debugger, is the standard debugger of Linux systems developed by the GNU Project. It has been ported to many systems and supports the programming languages C, C++, Objective-C, FORTRAN, Java, and many more.
GDB provides us with the usual traceability features like breakpoints or stack trace output and allows us to intervene in the execution of programs. It also allows us, for example, to manipulate the variables of the application or to call functions independently of the normal execution of the program.
We use GNU Debugger (GDB) to view the created binary on the assembler level. Once we have executed the binary with GDB, we can disassemble the program’s main function.
- Start Debug
1
2
3
4
5
# Change permission for debugging
chmod +x example
# -q: Debug mode
plugin -q example
$ gdb -q <File>
$ r2 -d -A <file>
The -d
runs it while the -A
performs analysis.
after we debug the thing that needs to be considered is to see the function
information of a program.
example :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0x08049000 _init
0x08049030 gets@plt
0x08049040 puts@plt
0x08049050 __libc_start_main@plt
0x08049060 _start
0x080490a0 _dl_relocate_static_pie
0x080490b0 __x86.get_pc_thunk.bx
0x080490c0 deregister_tm_clones
0x08049100 register_tm_clones
0x08049140 __do_global_dtors_aux
0x08049170 frame_dummy
0x08049172 unsafe
0x080491ab main
0x080491c3 __x86.get_pc_thunk.ax
0x080491d0 __libc_csu_init
0x08049230 __libc_csu_fini
0x08049231 __x86.get_pc_thunk.bp
0x08049238 _fini
It should be noted that this is only an example because each function is different, depending on the respective program.