Introduction

Binary Exploitation is about finding vulnerabilities in programs and utilising them to do what you wish. Sometimes this can result in an authentication bypass or the leaking of classified information, but occasionally (if you’re lucky) it can also result in Remote Code Execution (RCE). The most basic forms of binary exploitation occur on the stack, a region of memory that stores temporary variables created by functions in code. When a new function is called, a memory address in the calling function is pushed to the stack - this way, the program knows where to return to once the called function finishes execution. Let’s look at a basic binary to show this.

Prerequisites

This post uses radere2, pwndbg, etc for dynamic analysis. And pwntools python package is useful for binary exploitation.

Investigation

Basic

file ./example
strings ./example

objdump -d ./example
# -M: type
objdump -M intel -d ./example

Security Properties

First check the executable properties.

checksec --file=./example

RELRO (stands for Relocation Read-Only)

Partial RELRO - We can read/write the global offset table.
Full RELRO - We can only read the global offset table. So we cannot overwrite GOT.

STACK CANARY

No canary found - It’s vulnerable to buffer overflow.

NX (stands for Non-eXecutable segments)

NX enabled - We cannot execute custom shellcode from the stack.

PIE (stands for Position Independent Executable)

No PIE - The binary always starts at same address.

ASLR (Address Space Layout Randomization) in Machine

ASLR is a security technique involved in preventing exploitation of memory corruption vulnerabilities.

cat /proc/sys/kernel/randomize_va_space
2

0 - The address space is NOT randomized.
1 - The address space is randomized.
2 - The address space is randomized, and data segment as well.

x86 Architecture

General-Purpose Registers (GPR) - 16-bit naming conventions

The 8 GPRs are as follows :

Accumulator register (AX). Used in arithmetic operations
Counter register (CX). Used in shift/rotate instructions and loops.
Data register (DX). Used in arithmetic operations and I/O operations.
Base register (BX). Used as a pointer to data (located in segment register DS, when in segmented mode).
Stack Pointer register (SP). Pointer to the top of the stack.
Stack Base Pointer register (BP). Used to point to the base of the stack.
Source Index register (SI). Used as a pointer to a source in stream operations.
Destination Index register (DI). Used as a pointer to a destination in stream operations.

Identifiers to access registers and parts thereof.

Register	64-bit	32-bit	16-bit	8-bit
Accumulator	RAX	EAX	AX	AL
Counter	RCX	ECX	CX	CL
Data	RDX	EDX	DX	DL
Base	RBX	EBX	BX	BL
Stack Ptr	RSP	ESP	SP	SPL
Base Ptr	RBP	EBP	BP	BPL
Source	RSI	ESI	SI	SIL
Destination	RDI	EDI	DI	DIL

Register

Registers are essentially places that the processor can store memory. You can think of them as buckets which the processor can store information in. Here is a list of the x64 registers, and what their common use cases are.

rbp : Base Pointer, points to the bottom of the current stack frame
rsp : Stack Pointer, points to the top of the current stack frame
rip : Instruction Pointer, points to the instruction to be executed

Register	Description
rax	Accumulator register
rbx	Base register
rcx	Counter register
rdx	Data register
rsi	Source index register
rdi	Destination index register
r8	General-purpose register
r9	General-purpose register
r10	General-purpose register
r11	General-purpose register
r12	General-purpose register
r13	General-purpose register
r14	General-purpose register
r15	General-purpose register

There are sixteen, 64-bit General Purpose Registers (GPRs). The GPRs are described in the following table. A GPR register can be accessed with all 64-bits or some portion or subset accessed.

64-bit Register	32-bit	16-bit	8-bit
rax	eax	ax	al
rbx	ebx	bx	bl
rcx	ecx	cx	cl
rdx	edx	dx	dl
rsi	esi	si	sil
rdi	edi	di	dil
rbp	ebp	bp	bpl
rsp	esp	sp	spl
r8	r8d	r8w	r8b
r9	r9d	r9w	r9b
r10	r10d	r10w	r10b
r11	r11d	r11w	r11b
r12	r12d	r12w	r12b
r13	r13d	r13w	r13b
r14	r14d	r14w	r14b
r15	r15d	r15w	r15b

In x64 linux arguments to a function are passed via registers. The first few args are passed by these registers :

Register	Argument Number
rdi	First Argument
rsi	Second Argument
rdx	Third Argument
rcx	Fourth Argument
r8	Fifth Argument
r9	Sixth Argument

With the x86 elf architecture, arguments are passed on the stack. Also one thing as you may know, in C function can return a value. In x64, this value is passed in the rax register. In x86 this value is passed in the eax register.

Also one thing, there are different sizes for registers. These typical sizes we will be dealing with are 8 bytes, 4 bytes, 2 bytes, and 1. The reason for these different sizes is due to the advancement of technology, we can store more data in a register.

8 Byte Register	Lower 4 Bytes	Lower 2 Bytes	Lower Byte
rbp	ebp	bp	bpl
rsp	esp	sp	spl
rip	eip
rax	eax	ax	al
rbx	ebx	bx	bl
rcx	ecx	cx	cl
rdx	edx	dx	dl
rsi	esi	si	sil
rdi	edi	di	dil
r8	r8d	r8w	r8b
r9	r9d	r9w	r9b
r10	r10d	r10w	r10b
r11	r11d	r11w	r11b
r12	r12d	r12w	r12b
r13	r13d	r13w	r13b
r14	r14d	r14w	r14b
r15	r15d	r15w	r15b

In x64 we will see the 8 byte registers. However in x86 the largest sized registers we can use are the 4 byte registers like ebp, esp, eip etc. Now we can also use smaller registers, than the maximum sized registers for the architecture.

In x64 there is the rax, eax, ax, and al register. The rax register points to the full 8. The eax register is just the lower four bytes of the rax register. The ax register is the last 2 bytes of the rax register. Lastly the al register is the last byte of the rax register.

Stack

In computer architecture, the stack is a hardware manifestation of the stack data structure (a Last In, First Out queue).

In x86, the stack is simply an area in RAM that was chosen to be the stack - there is no special hardware to store stack contents. The esp/rsp register holds the address in memory where the bottom of the stack resides. When something is pushed to the stack, esp decrements by 4 (or 8 on 64-bit x86), and the value that was pushed is stored at that location in memory. Likewise, when a pop instruction is executed, the value at esp is retrieved (i.e. esp is dereferenced), and esp is then incremented by 4 (or 8).

N.B. The stack "grows" down to lower memory addresses!

Conventionally, ebp/rbp contains the address of the top of the current stack frame, and so sometimes local variables are referenced as an offset relative to ebp rather than an offset to esp. A stack frame is essentially just the space used on the stack by a given function.

-> Uses

The stack is primarily used for a few things :

Storing function arguments
Storing local variables
Storing processor state between function calls.

Now one of the most common memory regions you will be dealing with is the stack. It is where local variables in the code are stored.

For instance, in this code the variable x is stored in the stack :

  
#include <stdio.h>

void main(void)
{
    int x = 5;
    puts("hi");
}

Specifically we can see it is stored on the stack at rbp-0x4.

  
0000000000001135 <main>:
    1135:       55                      push   rbp
    1136:       48 89 e5                mov    rbp,rsp
    1139:       48 83 ec 10             sub    rsp,0x10
    113d:       c7 45 fc 05 00 00 00    mov    DWORD PTR [rbp-0x4],0x5
    1144:       48 8d 3d b9 0e 00 00    lea    rdi,[rip+0xeb9]        # 2004 <_IO_stdin_used+0x4>
    114b:       e8 e0 fe ff ff          call   1030 <puts@plt>
    1150:       90                      nop
    1151:       c9                      leave  
    1152:       c3                      ret    
    1153:       66 2e 0f 1f 84 00 00    nop    WORD PTR cs:[rax+rax*1+0x0]
    115a:       00 00 00
    115d:       0f 1f 00                nop    DWORD PTR [rax]

Now values on the stack are moved on by either pushing them onto the stack, or popping them off. That is the only way to add or remove values from the stack (it is a LIFO data structure). However we can reference values on the stack.

The exact bounds of the stack is recorded by two registers, rbp and rsp. The base pointer rbp points to the bottom of the stack. The stack pointer rsp points to the top of the stack.

Flags

There is one register that contains flags. A flag is a particular bit of this register. If it is set or not, will typically mean something. Here is the list of flags.

Flag Index	Flag Name	Description
00	Carry Flag	Indicates a carry or borrow occurred in an operation
01	Always 1	Always set to 1
02	Parity Flag	Indicates the parity (even or odd) of the result
03	Always 0	Always set to 0
04	Adjust Flag	Adjusts the result of BCD arithmetic operations
05	Always 0	Always set to 0
06	Zero Flag	Indicates the result of an operation is zero
07	Sign Flag	Indicates the sign (negative or positive) of the result
08	Trap Flag	Allows single-step execution for debugging purposes
09	Interruption Flag	Enables or disables maskable hardware interrupts
10	Direction Flag	Specifies the direction for string instructions
11	Overflow Flag	Indicates signed arithmetic overflow or underflow
12	I/O Privilege Field (Lower)	Represents the privilege level for I/O operations (Lower bit)
13	I/O Privilege Field (Higher)	Represents the privilege level for I/O operations (Higher bit)
14	Nested Task Flag	Indicates if the current task is nested
15	Resume Flag	Controls the type of task switch

There are other flags then the one listed, however we really don’t deal with them too much (and out of these, there are only a few we actively deal with).

Instructions

Now we will be covering some of the more common instructions you will see. This isn’t everything you will see, but here are the more common things you will see.

mov

The move instruction just moves data from one register to another. For instance :

mov rax, rdx

This will just move the data from the rdx register to the rax register.

dereference

If you ever see brackets like [], they are meant to dereference, which deals with pointers. A pointer is a value that points to a particular memory address (it is a memory address). Dereferencing a pointer means to treat a pointer like the value it points to. For instance :

mov rax, [rdx]

Will move the value pointed to by rdx into the rax register. On the flipside :

mov [rax], rdx

Will move the value of the rdx register into whatever memory is pointed to by the rax register. The actual value of the rax register does not change.

lea

The lea instruction calculates the address of the second operand, and moves that address in the first. For instance :

lea rdi, [rbx+0x10]

This will move the address rbx+0x10 into the rdi register.

add

This just adds the two values together, and stores the sum in the first argument. For instance:

add rax, rdx

That will set rax equal to rax + rdx

sub

This value will subtract the second operand from the first one, and store the difference in the first argument. For instance :

sub rsp, 0x10

This will set the rsp register equal to rsp - 0x10

xor

This will perform the binary operation xor on the two arguments it is given, and stores the result in the first operation :

xor rdx, rax

That will set the rdx register equal to rdx ^ rax.

The and and or operations essentially do the same thing, except with the and or or binary operators.

push The push instruction will grow the stack by either 8 bytes (for x64, 4 for x86), then push the contents of a register onto the new stack space. For instance :
1 push rax

This will grow the stack by 8 bytes, and the contents of the rax register will be on top of the stack.

pop

The pop instruction will pop the top 8 bytes (for x64, 4 for x86) off of the stack and into the argument. Then it will shrink the stack. For instance:

pop rax

The top 8 bytes of the stack will end up in the rax register.

jmp

The jmp instruction will jump to an instruction address. It is used to redirect code execution. For instance:

jmp 0x602010

That instruction will cause the code execution to jump to 0x602010, and execute whatever instruction is there.

call & ret

This is similar to the jmp instruction. The difference is it will push the values of rbp and rip onto the stack, then jump to whatever address it is given. This is used for calling functions. After the function is finished, a ret instruction is called which uses the pushed values of rbp and rip (saved base and instruction pointers) it can continue execution right where it left off

cmp

The cmp instruction is similar to that of the sub instruction. Except it doesn’t store the result in the first argument. It checks if the result is less than zero, greater than zero, or equal to zero. Depending on the value it will set the flags accordingly.

jnz / jz

This jump if not zero and jump if zero (jnz/jz) instructions are pretty similar to the jump instruction. The difference is they will only execute the jump depending on the status of the zero flag. For jz it will only jump if the zero flag is set. The opposite is true for jnz.

Analysis

In conducting an analysis the first thing to start :

check the file to see which is the executable format for Linux (it is recommended to follow along with this with a Virtual Machine of your own, preferably Linux).

then we have to know the security in the program can use the checksec command, like what I described above.

GDB Introductions

GDB, or the GNU Debugger, is the standard debugger of Linux systems developed by the GNU Project. It has been ported to many systems and supports the programming languages C, C++, Objective-C, FORTRAN, Java, and many more.

GDB provides us with the usual traceability features like breakpoints or stack trace output and allows us to intervene in the execution of programs. It also allows us, for example, to manipulate the variables of the application or to call functions independently of the normal execution of the program.

We use GNU Debugger (GDB) to view the created binary on the assembler level. Once we have executed the binary with GDB, we can disassemble the program’s main function.

Start Debug

# Change permission for debugging
chmod +x example

# -q: Debug mode
plugin -q example

$ gdb -q <File>
$ r2 -d -A <file>

The -d runs it while the -A performs analysis.

after we debug the thing that needs to be considered is to see the function information of a program.

example :

  
0x08049000  _init
0x08049030  gets@plt
0x08049040  puts@plt
0x08049050  __libc_start_main@plt
0x08049060  _start
0x080490a0  _dl_relocate_static_pie
0x080490b0  __x86.get_pc_thunk.bx
0x080490c0  deregister_tm_clones
0x08049100  register_tm_clones
0x08049140  __do_global_dtors_aux
0x08049170  frame_dummy
0x08049172  unsafe
0x080491ab  main
0x080491c3  __x86.get_pc_thunk.ax
0x080491d0  __libc_csu_init
0x08049230  __libc_csu_fini
0x08049231  __x86.get_pc_thunk.bp
0x08049238  _fini

It should be noted that this is only an example because each function is different, depending on the respective program.

BINARY EXPLOITATION