CPU & Registers
The CPU (Central Processing Unit) is the brain of your computer. To reverse engineer binaries, you must understand how CPUs work at a fundamental level. This chapter explains CPU architecture, registers, and why they matter for security.
What is a CPU?
A CPU executes instructions in sequence. It reads data from memory, processes it, and writes results back. The two main components of a CPU are:
Control Unit (CU). The Control Unit directs traffic in the CPU. It reads instructions from memory, decodes them, tells other parts what to do, and manages the flow of data. Think of it as the conductor of an orchestra.
Execution Unit (EU). The Execution Unit actually performs calculations and operations. It executes arithmetic (ADD, SUB), logical operations (AND, OR), comparisons (CMP), and memory operations. It's the worker that does the real work.
What Are Registers?
Registers are tiny, ultra-fast storage units inside the CPU. They hold data that the CPU is actively using. Unlike RAM (which is gigabytes), registers are measured in bits and are incredibly fast.
Register Hierarchy (x86-64 Architecture)
On modern 64-bit x86 CPUs, registers come in different sizes. The main "General Purpose Registers" are:
| 64-bit (QWORD) | 32-bit (DWORD) | 16-bit (WORD) | 8-bit HIGH | 8-bit LOW |
|---|---|---|---|---|
| RAX | EAX | AX | AH | AL |
| RBX | EBX | BX | BH | BL |
| RCX | ECX | CX | CH | CL |
| RDX | EDX | DX | DH | DL |
| RSI | ESI | SI | - | |
| RDI | EDI | DI | - | |
| RSP | ESP | SP | - | |
| RBP | EBP | BP | - | |
| R8 | R8D | R8W | R8B | |
| R9 | R9D | R9W | R9B | |
| R10 | R10D | R10W | R10B | |
| R11 | R11D | R11W | R11B | |
| R12 | R12D | R12W | R12B | |
| R13 | R13D | R13W | R13B | |
| R14 | R14D | R14W | R14B | |
| R15 | R15D | R15W | R15B |
When you use a 32-bit register (like EAX), it automatically zeros the upper 32 bits of the 64-bit register (RAX). This is important in reverse engineering because it affects what data is preserved.
Key x86-64 Registers
📌 RAX (Accumulator) - General Purpose
RAX is the primary accumulator register. It's used for:
- Arithmetic operations (ADD, SUB, MUL)
- Return values from function calls
- Syscall numbers in system calls
- Division operations (quotient stored here)
- Special I/O operations
Example: When a function returns an integer, it's in RAX.
mov rax, 1 ; Set RAX to 1 (often a success code)
ret ; Return - RAX contains the return value📌 RBX (Base) - General Purpose
RBX is traditionally the base register for addressing. It's used for:
- Addressing memory (calculating addresses)
- General-purpose data storage
- Preserved register (must be saved by functions)
Important: RBX is a "callee-saved" register, meaning if a function modifies RBX, it must restore it before returning.
📌 RCX (Counter) - General Purpose
RCX is traditionally the counter register. It's used for:
- Loop counters (REP instructions)
- Fourth function argument (x86-64 calling convention)
- Shift and rotate counts
- General-purpose operations
Example: In a for loop, you might load the loop count into RCX and use the LOOP instruction.
📌 RDX (Data) - General Purpose
RDX is traditionally the data register. It's used for:
- Third function argument (x86-64 calling convention)
- Division operations (remainder stored here)
- I/O operations
- General-purpose operations
Example: When dividing RAX by RBX using DIV RBX, the remainder is stored in RDX.
📌 RSI/RDI (Source/Destination Index) - General Purpose
RSI (Source Index) and RDI (Destination Index) are used for:
- RSI = second function argument
- RDI = first function argument
- String operations (MOVS, STOS, SCAS)
- Memory operations
Important: In the x86-64 calling convention, RDI holds the first argument to a function. This is crucial for understanding function calls.
mov rdi, 10 ; 1st argument: 10
mov rsi, 20 ; 2nd argument: 20
mov rdx, 30 ; 3rd argument: 30
call add_three ; Call function - returns result in RAX📌 RIP (Instruction Pointer) - Program Counter
RIP (or EIP in 32-bit mode) is the Instruction Pointer. It always contains the address of the next instruction to execute.
- Automatically incremented after each instruction
- Jumps change RIP to branch to different code
- Function calls push return address and change RIP
- Can't directly modify RIP (use JMP or CALL)
Why important for reversing: When you see a breakpoint or a crash, RIP tells you exactly where in the code execution stopped.
📌 RSP & RBP (Stack Pointers)
RSP (Stack Pointer) and RBP (Base Pointer) manage the call stack:
- RSP: Points to the top (most recent item) of the stack
- RBP: Points to the base of the current stack frame (where local variables are)
- PUSH decrements RSP and writes data
- POP increments RSP and reads data
- CALL pushes the return address onto the stack
Stack Layout Example:
Special Registers - Flags Register (RFLAGS)
The RFLAGS (or FLAGS in 32-bit) register contains condition flags — single bits that indicate the status of the last operation.
| Flag | Name | Meaning | Set When |
|---|---|---|---|
| ZF | Zero Flag | Result is zero | Result of last operation = 0 |
| CF | Carry Flag | Unsigned overflow | Addition/subtraction carries/borrows |
| SF | Sign Flag | Result is negative | MSB (most significant bit) = 1 |
| OF | Overflow Flag | Signed overflow | Signed arithmetic overflow occurs |
| PF | Parity Flag | Even parity | Result has even number of 1 bits |
| AF | Adjust Flag | BCD carry | Carry in lower nibble |
Why flags matter: Conditional jumps (JE, JNE, JZ, JG, etc.) check these flags to decide whether to jump. Understanding flags is essential for reading assembly code.
x86-64 Calling Convention
When a function is called, arguments are passed through specific registers in a specific order. This is the calling convention. Understanding it is crucial for debugging.
Any arguments beyond the 6th are passed on the stack.
Putting It Together
Now you understand:
- CPUs have two main components: Control Unit and Execution Unit
- Registers are ultra-fast CPU storage
- Different registers have different purposes
- The Flags register controls conditional jumps
- Function arguments are passed through specific registers
- RIP points to the next instruction
- RSP and RBP manage the call stack
System Calls (Syscalls)
A system call is a request from a user program to the kernel to perform a privileged operation. When a program needs to read a file, write to the screen, or allocate memory, it can't do it directly — it must ask the kernel through a syscall.
User Space vs Kernel Space
Modern operating systems use a layered privilege model:
How System Calls Work
When your program executes a SYSCALL instruction:
- 1. Program sets up registers with syscall number and arguments
- 2. SYSCALL instruction transitions to kernel mode
- 3. Kernel executes the requested operation
- 4. Control returns to user program with result in RAX
x86-64 Linux Syscall ABI
On 64-bit Linux, syscalls follow a specific convention. Let's break it down:
Common Syscalls Explained
📌 exit (Syscall #60) - Terminate Program
exit(int code) terminates the program with an exit status.
mov rax, 60 ; exit syscall number
mov rdi, 0 ; exit code = 0 (success)
syscall ; Call kernel to exitEquivalent C code:
exit(0); // Terminate with status 0📌 write (Syscall #1) - Write to File/Console
write() writes data to a file descriptor (like stdout for console output).
mov rax, 1 ; write syscall number
mov rdi, 1 ; fd = 1 (stdout)
mov rsi, msg ; rsi = pointer to message
mov rdx, 5 ; length = 5 bytes
syscall ; Write to stdoutFile Descriptor Reference:
| FD | Name | Purpose |
|---|---|---|
| 0 | stdin | Standard input (keyboard) |
| 1 | stdout | Standard output (console) |
| 2 | stderr | Standard error (console) |
📌 read (Syscall #0) - Read from File/Console
read() reads data from a file descriptor into a buffer.
mov rax, 0 ; read syscall number
mov rdi, 0 ; fd = 0 (stdin)
mov rsi, buffer ; rsi = pointer to buffer
mov rdx, 10 ; read up to 10 bytes
syscall ; Read from stdin
; RAX now contains number of bytes read📌 open (Syscall #2) - Open a File
open() opens a file and returns a file descriptor.
| Flag | Value | Meaning |
|---|---|---|
| O_RDONLY | 0 | Read only |
| O_WRONLY | 1 | Write only |
| O_RDWR | 2 | Read and write |
| O_CREAT | 64 | Create if doesn't exist |
| O_APPEND | 1024 | Append to file |
📌 close (Syscall #3) - Close a File
close() closes a file descriptor, freeing the resource.
Complete Syscall Example: Write to Console
Let's write a complete program that uses syscalls to print "Hello World":
section .data
msg: db "Hello World", 0x0a
len: equ $ - msg ; Calculate length
section .text
global _start
_start:
; write syscall to print message
mov rax, 1 ; syscall: write
mov rdi, 1 ; fd: stdout
mov rsi, msg ; buffer
mov rdx, len ; length
syscall
; exit syscall
mov rax, 60 ; syscall: exit
mov rdi, 0 ; status: 0 (success)
syscallSyscall Return Values & Error Handling
After a syscall, the kernel returns a value in RAX:
- If RAX ≥ 0: Success, RAX contains the result
- If RAX < 0: Error occurred, RAX contains negative error code
Error codes are typically in the range -1 to -4095. Common errors:
| Error Code | Constant | Meaning |
|---|---|---|
| -1 | EPERM | Operation not permitted |
| -2 | ENOENT | No such file or directory |
| -13 | EACCES | Permission denied |
| -14 | EFAULT | Bad address |
Complete System Call Database
Here's a comprehensive reference of Linux x86-64 system calls with arguments, return values, and detailed descriptions:
| RAX | Name | Arguments (RDI, RSI, RDX, R10, R8, R9) | Return Value | Description |
|---|---|---|---|---|
| 0 | read | RDI: unsigned int fdRSI: char *bufRDX: size_t count | ssize_t(bytes read or -1) | Reads up to count bytes from file descriptor fd into buffer buf. Returns number of bytes read, 0 on EOF, or -1 on error. Commonly used with fd=0 (stdin) to read user input. |
| 1 | write | RDI: unsigned int fdRSI: const char *bufRDX: size_t count | ssize_t(bytes written or -1) | Writes up to count bytes from buffer buf to file descriptor fd. Returns number of bytes written or -1 on error. Use fd=1 (stdout) for console output, fd=2 (stderr) for error messages. |
| 2 | open | RDI: const char *filenameRSI: int flagsRDX: mode_t mode | int(file descriptor or -1) | Opens file specified by filename. Flags: O_RDONLY(0), O_WRONLY(1), O_RDWR(2), O_CREAT(64), O_APPEND(1024). Mode specifies permissions (e.g., 0644). Returns file descriptor on success. |
| 3 | close | RDI: unsigned int fd | int(0 or -1) | Closes file descriptor fd, freeing the resource. Returns 0 on success, -1 on error. Always close files when done to prevent resource leaks. |
| 4 | stat | RDI: const char *filenameRSI: struct stat *statbuf | int(0 or -1) | Retrieves file information (size, permissions, timestamps) for filename and stores in statbuf. Returns 0 on success. Does not follow symlinks for lstat(6). |
| 5 | fstat | RDI: unsigned int fdRSI: struct stat *statbuf | int(0 or -1) | Like stat, but operates on an already-opened file descriptor instead of a filename. Useful when you already have the file open. |
| 8 | lseek | RDI: unsigned int fdRSI: off_t offsetRDX: unsigned int whence | off_t(new position or -1) | Repositions file offset of fd. Whence: SEEK_SET(0)=absolute, SEEK_CUR(1)=relative, SEEK_END(2)=from end. Returns new offset from beginning of file. |
| 9 | mmap | RDI: void *addrRSI: size_t lengthRDX: int protR10: int flagsR8: int fdR9: off_t offset | void*(address or MAP_FAILED) | Maps file or device into memory. Prot: PROT_READ(1), PROT_WRITE(2), PROT_EXEC(4). Flags: MAP_PRIVATE(2), MAP_ANONYMOUS(32). Returns pointer to mapped area. Critical for memory management. |
| 11 | munmap | RDI: void *addrRSI: size_t length | int(0 or -1) | Unmaps a previously mapped memory region starting at addr with length. Returns 0 on success. Always unmap when done to free memory. |
| 12 | brk | RDI: void *addr | void*(new break or -1) | Changes program break (end of data segment) to addr. Used by malloc() internally. Returns new program break on success. Rarely used directly in modern programs (prefer mmap). |
| 16 | ioctl | RDI: unsigned int fdRSI: unsigned int cmdRDX: unsigned long arg | int(varies) | Device-specific input/output control. Command and arguments vary by device. Used for operations that don't fit read/write model (terminal settings, disk operations, etc.). |
| 22 | pipe | RDI: int pipefd[2] | int(0 or -1) | Creates a pipe (unidirectional data channel). pipefd[0] is read end, pipefd[1] is write end. Returns 0 on success. Used for inter-process communication. |
| 32 | dup | RDI: unsigned int fildes | int(new fd or -1) | Duplicates file descriptor fildes using the lowest available fd number. Both fds refer to same file. Returns new fd on success. |
| 33 | dup2 | RDI: unsigned int oldfdRSI: unsigned int newfd | int(new fd or -1) | Duplicates oldfd to newfd. If newfd is open, it's closed first. Commonly used to redirect stdin/stdout/stderr in child processes. |
| 57 | fork | (none) | pid_t(child PID or 0 in child) | Creates new process by duplicating calling process. Returns child PID to parent, returns 0 in child process. Child gets copy of parent's memory and file descriptors. |
| 59 | execve | RDI: const char *filenameRSI: char *const argv[]RDX: char *const envp[] | int(never returns on success) | Executes program specified by filename, replacing current process image. argv is argument array, envp is environment. Only returns on error. Used with fork() to run new programs. |
| 60 | exit | RDI: int status | void(never returns) | Terminates calling process with exit status. 0 indicates success, non-zero indicates error. Flushes buffers, closes file descriptors, and returns status to parent. |
| 61 | wait4 | RDI: pid_t pidRSI: int *statusRDX: int optionsR10: struct rusage *rusage | pid_t(PID or -1) | Waits for child process to change state. Returns child PID on success. Status contains exit code. Used by parent to collect terminated children (prevent zombies). |
| 39 | getpid | (none) | pid_t(process ID) | Returns process ID (PID) of calling process. Always succeeds. Useful for logging, creating unique filenames, or process identification. |
| 110 | getppid | (none) | pid_t(parent PID) | Returns parent process ID of calling process. If parent has exited, returns 1 (init/systemd). Always succeeds. |
| 102 | getuid | (none) | uid_t(user ID) | Returns real user ID of calling process. Used for permission checks. Always succeeds. |
| 104 | getgid | (none) | gid_t(group ID) | Returns real group ID of calling process. Used for permission checks. Always succeeds. |
| 105 | setuid | RDI: uid_t uid | int(0 or -1) | Sets effective user ID. If privileged, sets real, effective, and saved UIDs. Returns 0 on success. Used for privilege dropping or SUID executables. |
| 106 | setgid | RDI: gid_t gid | int(0 or -1) | Sets effective group ID. If privileged, sets real, effective, and saved GIDs. Returns 0 on success. |
| 62 | kill | RDI: pid_t pidRSI: int sig | int(0 or -1) | Sends signal sig to process pid. Common signals: SIGTERM(15), SIGKILL(9), SIGUSR1(10). Returns 0 on success. If pid=0, sends to process group. |
| 13 | rt_sigaction | RDI: int sigRSI: const struct sigaction *actRDX: struct sigaction *oldact | int(0 or -1) | Examines or changes signal handler for signal sig. act specifies new action, oldact receives old action. Returns 0 on success. Modern signal handling interface. |
| 34 | pause | (none) | int(always -1) | Suspends process until signal is received. Always returns -1 with errno=EINTR after signal handler returns. Used for waiting on signals. |
| 41 | socket | RDI: int domainRSI: int typeRDX: int protocol | int(socket fd or -1) | Creates communication endpoint. Domain: AF_INET(2)=IPv4, AF_INET6(10)=IPv6. Type: SOCK_STREAM(1)=TCP, SOCK_DGRAM(2)=UDP. Returns socket file descriptor. |
| 49 | bind | RDI: int sockfdRSI: const struct sockaddr *addrRDX: socklen_t addrlen | int(0 or -1) | Assigns address (IP and port) to socket sockfd. Must be called before listen() for servers. Returns 0 on success. Port numbers below 1024 require root. |
| 50 | listen | RDI: int sockfdRSI: int backlog | int(0 or -1) | Marks socket as passive (ready to accept connections). Backlog specifies maximum queue length for pending connections. Returns 0 on success. |
| 43 | accept | RDI: int sockfdRSI: struct sockaddr *addrRDX: socklen_t *addrlen | int(new socket fd or -1) | Accepts incoming connection on listening socket. Blocks until connection arrives. Returns new socket for the connection, original socket continues listening. |
| 42 | connect | RDI: int sockfdRSI: const struct sockaddr *addrRDX: socklen_t addrlen | int(0 or -1) | Initiates connection to remote address. For TCP, performs 3-way handshake. Blocks until connection established or timeout. Returns 0 on success. |
| 44 | sendto | RDI: int sockfdRSI: const void *bufRDX: size_t lenR10: int flagsR8: const struct sockaddr *dest_addrR9: socklen_t addrlen | ssize_t(bytes sent or -1) | Sends message on socket to specific address. For UDP sockets. Use send() for connected sockets. Returns number of bytes sent. |
| 45 | recvfrom | RDI: int sockfdRSI: void *bufRDX: size_t lenR10: int flagsR8: struct sockaddr *src_addrR9: socklen_t *addrlen | ssize_t(bytes received or -1) | Receives message from socket and captures sender's address. For UDP. Returns number of bytes received, 0 on connection close. |
| 201 | time | RDI: time_t *tloc | time_t(seconds since epoch) | Returns current time as seconds since Unix epoch (Jan 1, 1970). If tloc is non-NULL, also stores there. Simple but low precision (1 second). |
| 96 | gettimeofday | RDI: struct timeval *tvRSI: struct timezone *tz | int(0 or -1) | Gets current time with microsecond precision. tv contains seconds and microseconds since epoch. tz is obsolete (pass NULL). Returns 0 on success. |
| 35 | nanosleep | RDI: const struct timespec *reqRSI: struct timespec *rem | int(0 or -1) | Suspends execution for time specified in req (seconds + nanoseconds). If interrupted by signal, remaining time stored in rem. Returns 0 on success. |
| 228 | clock_gettime | RDI: clockid_t clk_idRSI: struct timespec *tp | int(0 or -1) | Retrieves time from specified clock. clk_id: CLOCK_REALTIME(0)=wall clock, CLOCK_MONOTONIC(1)=monotonic time. Nanosecond precision. Returns 0 on success. |
| 83 | mkdir | RDI: const char *pathnameRSI: mode_t mode | int(0 or -1) | Creates directory specified by pathname. Mode specifies permissions (e.g., 0755). Returns 0 on success, -1 if already exists or permission denied. |
| 84 | rmdir | RDI: const char *pathname | int(0 or -1) | Removes empty directory. Returns -1 if directory not empty or doesn't exist. Returns 0 on success. |
| 87 | unlink | RDI: const char *pathname | int(0 or -1) | Deletes file specified by pathname. Decrements link count; if 0 and no process has file open, file is deleted. Returns 0 on success. |
| 82 | rename | RDI: const char *oldpathRSI: const char *newpath | int(0 or -1) | Renames/moves file from oldpath to newpath. Atomic operation. If newpath exists, it's replaced. Returns 0 on success. |
| 90 | chmod | RDI: const char *pathnameRSI: mode_t mode | int(0 or -1) | Changes file permissions. Mode is octal like 0644 (rw-r--r--) or 0755 (rwxr-xr-x). Returns 0 on success. Only owner or root can change permissions. |
| 92 | chown | RDI: const char *pathnameRSI: uid_t ownerRDX: gid_t group | int(0 or -1) | Changes file owner and/or group. Pass -1 to leave unchanged. Only root can change owner. File owner can change group to one they belong to. Returns 0 on success. |
| 80 | chdir | RDI: const char *path | int(0 or -1) | Changes current working directory to path. Affects relative path resolution. Returns 0 on success. Each process has its own working directory. |
| 79 | getcwd | RDI: char *bufRSI: size_t size | char*(buf or NULL) | Copies current working directory into buf (max size bytes). Returns buf on success, NULL on error. Size must be large enough for full path. |
| 21 | access | RDI: const char *pathnameRSI: int mode | int(0 or -1) | Checks whether calling process can access file. Mode: F_OK(0)=exists, R_OK(4)=read, W_OK(2)=write, X_OK(1)=execute. Returns 0 if permitted. |
| 86 | link | RDI: const char *oldpathRSI: const char *newpath | int(0 or -1) | Creates hard link named newpath to existing file oldpath. Both names refer to same inode. Deleting one doesn't affect the other. Returns 0 on success. |
| 88 | symlink | RDI: const char *targetRSI: const char *linkpath | int(0 or -1) | Creates symbolic link named linkpath containing string target. Symlink can point to non-existent files and cross filesystems. Returns 0 on success. |
| 89 | readlink | RDI: const char *pathnameRSI: char *bufRDX: size_t bufsiz | ssize_t(bytes copied or -1) | Reads value of symbolic link pathname into buf. Does not null-terminate. Returns number of bytes placed in buf, -1 on error. |
| 10 | mprotect | RDI: void *addrRSI: size_t lenRDX: int prot | int(0 or -1) | Changes memory protection for page(s) starting at addr. Prot: PROT_NONE(0), PROT_READ(1), PROT_WRITE(2), PROT_EXEC(4). Used for security and exploit mitigation. Returns 0 on success. |
| 28 | madvise | RDI: void *addrRSI: size_t lengthRDX: int advice | int(0 or -1) | Gives kernel advice about memory usage. Advice: MADV_NORMAL(0), MADV_SEQUENTIAL(2), MADV_DONTNEED(4). Hints for performance optimization. Returns 0 on success. |
| 56 | clone | RDI: unsigned long flagsRSI: void *child_stackRDX: int *ptidR10: int *ctidR8: unsigned long newtls | pid_t(child PID or -1) | Creates new process/thread. More flexible than fork(). Flags control what is shared (memory, files, etc.). Used to implement threads. Returns child PID in parent, 0 in child. |
| 231 | exit_group | RDI: int status | void(never returns) | Terminates all threads in calling process's thread group. Like exit(), but affects all threads. Used by exit() in threaded programs. Never returns. |
| 157 | prctl | RDI: int optionRSI: unsigned long arg2RDX: unsigned long arg3R10: unsigned long arg4R8: unsigned long arg5 | int(varies) | Process control operations. Options include PR_SET_NAME (set process name), PR_SET_DUMPABLE, PR_SET_SECCOMP. Highly versatile. Return value depends on option. |
| 37 | alarm | RDI: unsigned int seconds | unsigned int(previous alarm) | Arranges for SIGALRM to be delivered in seconds. Pass 0 to cancel. Returns number of seconds remaining from previous alarm. Only one alarm can be scheduled. |
| 72 | fcntl | RDI: unsigned int fdRSI: unsigned int cmdRDX: unsigned long arg | int(varies) | Performs various operations on file descriptor. Cmd: F_GETFL(3)=get flags, F_SETFL(4)=set flags, F_DUPFD(0)=duplicate. Returns depend on cmd. |
| 76 | truncate | RDI: const char *pathRSI: off_t length | int(0 or -1) | Truncates file to specified length. If longer, extra data is discarded. If shorter, extended with null bytes. Returns 0 on success. |
| 74 | fsync | RDI: unsigned int fd | int(0 or -1) | Synchronizes file's in-memory state with storage device (flushes all modified data and metadata). Returns 0 when data is safely on disk. Critical for data integrity. |
| 14 | rt_sigprocmask | RDI: int howRSI: const sigset_t *setRDX: sigset_t *oldset | int(0 or -1) | Examines and changes blocked signals. How: SIG_BLOCK(0)=add, SIG_UNBLOCK(1)=remove, SIG_SETMASK(2)=replace. Returns 0 on success. Used to protect critical sections. |
| 131 | sigaltstack | RDI: const stack_t *ssRSI: stack_t *old_ss | int(0 or -1) | Sets or gets alternate signal stack. Used when main stack is compromised (stack overflow). Returns 0 on success. Important for robust signal handling. |
| 101 | ptrace | RDI: long requestRSI: pid_t pidRDX: void *addrR10: void *data | long(varies) | Process trace and debug. Allows parent to control child execution, read/write memory and registers. Used by debuggers (GDB). Powerful and dangerous. Returns vary by request. |
| 169 | reboot | RDI: int magicRSI: int magic2RDX: int cmdR10: void *arg | int(never on success) | Reboots or halts system. Requires CAP_SYS_BOOT capability (root). Cmd: LINUX_REBOOT_CMD_RESTART, HALT, POWER_OFF. For emergency use. Returns only on error. |
| 23 | select | RDI: int nfdsRSI: fd_set *readfdsRDX: fd_set *writefdsR10: fd_set *exceptfdsR8: struct timeval *timeout | int(ready fds or -1) | Monitors multiple file descriptors for I/O readiness. Returns when fd is ready or timeout. Used for non-blocking I/O multiplexing. Returns number of ready fds. |
| 7 | poll | RDI: struct pollfd *fdsRSI: nfds_t nfdsRDX: int timeout | int(ready fds or -1) | Like select but better API. Monitors file descriptors for events (POLLIN, POLLOUT, POLLERR). Timeout in milliseconds (-1=infinite). Returns number of ready fds. |
| 213 | epoll_create | RDI: int size | int(epoll fd or -1) | Creates epoll instance for scalable I/O event notification. More efficient than select/poll for many file descriptors. Size is ignored (kept for compatibility). Returns epoll fd. |
Assembly Language - Complete Tutorial
Assembly language is the lowest-level programming language that directly corresponds to machine instructions. Each assembly instruction performs one CPU operation. To reverse engineer binaries, you must be able to read and understand assembly.
What is Assembly?
Assembly is a symbolic representation of machine code. Instead of writing binary (1s and 0s), you write mnemonics like MOV, ADD, JMP that are more readable. An assembler converts this into machine code.
Instruction Format
Most assembly instructions follow this format:
Important: In Intel syntax (which we use), the destination comes first, then the source. This is opposite to AT&T syntax.
Core Assembly Instructions
📌 MOV - Move/Copy Data
MOV destination, source copies data from source to destination.
Important Notes:
- Cannot move between two memory addresses directly
- Does NOT affect flags
- 32-bit operations zero the upper 32 bits (e.g., mov eax, 1 zeros RAX[63:32])
- Square brackets [] indicate memory address dereferencing
mov rax, 5 ; RAX = 5
mov rbx, rax ; RBX = RAX (which is 5)
mov rcx, [rax] ; RCX = value at address RAX
mov [rax], 10 ; Store 10 at address in RAX📌 ADD - Addition
ADD destination, source adds source to destination: destination = destination + source
Flags affected: ZF, CF, SF, OF
mov rax, 10 ; RAX = 10
mov rbx, 20 ; RBX = 20
add rax, rbx ; RAX = 30 (10 + 20)
add rax, 5 ; RAX = 35 (30 + 5)📌 SUB - Subtraction
SUB destination, source subtracts source from destination: destination = destination - source
Flags affected: ZF, CF, SF, OF
mov rax, 50 ; RAX = 50
mov rbx, 30 ; RBX = 30
sub rax, rbx ; RAX = 20 (50 - 30)📌 CMP - Compare
CMP destination, source compares two values by subtracting source from destination and setting flags (but doesn't store the result).
What CMP does: Performs RAX - RBX, sets flags, discards the result
Why use CMP: To check if two values are equal, which is greater, etc.
- If RAX == RBX: ZF = 1 (zero flag set)
- If RAX != RBX: ZF = 0
- If RAX < RBX: CF = 1 (carry/borrow)
- If RAX > RBX: CF = 0
mov rax, 10
cmp rax, 10 ; Compare RAX with 10
je equal ; Jump if Equal (checks ZF)
mov rax, 5
cmp rax, 10 ; 5 < 10
jl less_than ; Jump if Less📌 TEST - Logical AND
TEST destination, source performs a bitwise AND but only affects flags (doesn't store result).
Why: TEST rax, rax sets ZF=1 if RAX==0, ZF=0 if RAX!=0. This is faster than CMP rax, 0.
mov rax, 0
test rax, rax ; ZF = 1 (result is zero)
jz zero_case ; Jump if Zero📌 JMP - Unconditional Jump
JMP label unconditionally jumps to a label (changes program flow).
What it does: Sets RIP to the target address, so the next instruction executed is at the jump target.
mov rax, 1
jmp skip ; Skip next instruction
mov rax, 2 ; This is never executed
skip:
mov rbx, rax ; RBX = 1 (not 2)📌 Conditional Jumps (JE, JNE, JZ, JG, JL, etc.)
Conditional jumps jump only if certain flags are set. They check the result of a previous CMP or TEST instruction.
| Instruction | Condition | Jumps When |
|---|---|---|
| JE / JZ | Jump if Equal / Jump if Zero | ZF = 1 (result was zero) |
| JNE / JNZ | Jump if Not Equal / Jump if Not Zero | ZF = 0 (result was non-zero) |
| JG | Jump if Greater | Destination > Source (signed) |
| JL | Jump if Less | Destination < Source (signed) |
| JGE | Jump if Greater or Equal | Destination ≥ Source (signed) |
| JLE | Jump if Less or Equal | Destination ≤ Source (signed) |
| JA | Jump if Above | Destination > Source (unsigned) |
| JB | Jump if Below | Destination < Source (unsigned) |
| JAE / JNC | Jump if Above or Equal / Jump if No Carry | Destination ≥ Source (unsigned) / CF = 0 |
| JBE / JC | Jump if Below or Equal / Jump if Carry | Destination ≤ Source (unsigned) / CF = 1 |
| JO | Jump if Overflow | OF = 1 (overflow occurred) |
| JNO | Jump if No Overflow | OF = 0 |
mov rax, 10
cmp rax, 20
jl less_than ; Jump if Less (10 < 20) - WILL jump
mov rbx, 1
jmp done
less_than:
mov rbx, 0
done:📌 CALL & RET - Function Calls
CALL label calls a function. RET returns from a function.
global _start
_start:
call my_function ; Call function
; After function returns, we continue here
mov rax, 60
mov rdi, 0
syscall
my_function:
; Function code here
mov rax, 1 ; Put return value in RAX
ret ; Return to caller📌 SYSCALL - System Call
SYSCALL transitions to kernel mode and executes a system call.
mov rax, 60 ; exit syscall number
mov rdi, 0 ; exit code = 0
syscall ; Call kernelMemory Operations - Loading & Storing
To access memory, use square brackets []:
mov rax, [rbx] ; Load 8 bytes from address in RBX
mov [rax], 100 ; Store 100 at address in RAX
mov rcx, [rax + 8] ; Load from address (RAX + 8)
mov [rbx - 16], rax ; Store to address (RBX - 16)Stack Operations - PUSH & POP
The stack is a Last-In-First-Out (LIFO) data structure. PUSH and POP manage it:
📌 PUSH - Push Value onto Stack
PUSH source decrements RSP and writes value to stack.
mov rax, 123
push rax ; RSP decreases by 8, [RSP] = 123📌 POP - Pop Value from Stack
POP destination reads value from stack and increments RSP.
pop rbx ; RBX = [RSP], RSP increases by 8Complete Assembly Program Example
Let's combine everything into a complete program:
section .text
global _start
_start:
mov rax, 0 ; Counter = 0
loop:
; Print number (simplified)
add rax, 1 ; Increment counter
cmp rax, 10 ; Compare with 10
jl loop ; Jump if Less - repeat loop
; Exit
mov rax, 60
mov rdi, 0
syscallAssembly Code Structure
Every assembly program has a specific structure with defined sections for code and data. Understanding this structure is essential for writing and analyzing assembly.
The Three Main Sections
📌 .text Section - Executable Code
The .text section contains all executable code — the assembly instructions that the CPU actually runs.
📌 .data Section - Initialized Data
The .data section contains data with known values — strings, constants, arrays, etc.
section .data
msg: db "Hello World", 0x0a
num: dq 12345
array: dd 1, 2, 3, 4, 5| Directive | Size | Meaning |
|---|---|---|
| db | 1 byte | Define Byte |
| dw | 2 bytes | Define Word |
| dd | 4 bytes | Define Double-word |
| dq | 8 bytes | Define Quad-word |
📌 .bss Section - Uninitialized Data
The .bss section reserves space for variables without initial values (like buffers, arrays, etc.)
section .bss
buffer: resb 256 ; Reserve 256 bytes (uninitialized)
array: resq 10 ; Reserve space for 10 quad-words- resb: Reserve bytes
- resw: Reserve words (2 bytes each)
- resd: Reserve double-words (4 bytes each)
- resq: Reserve quad-words (8 bytes each)
Complete Program Structure
; ============================================
; DATA SECTION - Initialized data
; ============================================
section .data
msg: db "Hello", 0x0a
msg_len: equ $ - msg
; ============================================
; BSS SECTION - Uninitialized data (buffers)
; ============================================
section .bss
buffer: resb 1024
; ============================================
; TEXT SECTION - Code (executable)
; ============================================
section .text
global _start
_start:
; Program entry point
mov rax, 1 ; write syscall
mov rdi, 1 ; fd = stdout
mov rsi, msg ; buffer = msg
mov rdx, msg_len ; count = length
syscall
; Exit cleanly
mov rax, 60
mov rdi, 0
syscallGlobal Symbols & Labels
In assembly, you can define:
- Labels: Mark positions in code (used for jumps)
- Global symbols: Mark entry points that external code can jump to
global _start ; _start is accessible from outside
section .text
_start: ; Program entry (global symbol)
call my_function
mov rax, 60
syscall
my_function: ; Local label (not global)
mov rax, 1
ret
loop_start: ; Another label
add rcx, 1
cmp rcx, 10
jl loop_start ; Jump back to labelSymbol Definition with EQU
Use EQU to define constants:
section .data
msg: db "Hello World", 0x0a
msg_len: equ $ - msg ; $ = current position, msg_len = length
section .text
global _start
BUFFER_SIZE equ 256
_start:
sub rsp, BUFFER_SIZE ; Allocate space on stackKey insight: $ - msg calculates the distance between current position and msg, giving the string length.
Assembler, Compiler, Linker & ELF Format
To run an assembly program, you need to convert it from assembly language to machine code. This involves the assembler, linker, and understanding the ELF binary format.
The Compilation Pipeline
Step 1: Assembly → Machine Code (NASM)
NASM (Netwide Assembler) converts assembly source code to machine code object files.
nasm -f elf64 program.asm -o program.oWhat happens:
- NASM parses your .asm file
- Converts each instruction to machine code bytes
- Produces program.o (object file)
Step 2: Linking (LD)
The linker (ld) combines object files and resolves all symbol references to create the final executable.
ld program.o -o programComplete Assembly Workflow Example
# Step 1: Create assembly file
cat > program.asm << 'EOF'
section .text
global _start
_start:
mov rax, 60
mov rdi, 0
syscall
EOF
# Step 2: Assemble (ASM → Object)
nasm -f elf64 program.asm -o program.o
# Step 3: Link (Object → Executable)
ld program.o -o program
# Step 4: Run
./program
echo $? ; Exit code: 0 (success)Understanding ELF64 Format
ELF (Executable and Linkable Format) is the standard binary format for Linux. All executables, libraries, and object files use this format.
$ file program
program: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not strippedMeaning of each part:
| Term | Meaning |
|---|---|
| 64-bit | 64-bit architecture (x86-64) |
| LSB | Little-Endian Byte Order (least significant byte first) |
| executable | Can be directly run as a program |
| x86-64 | Intel x86-64 instruction set |
| statically linked | All libraries compiled in (no external .so dependencies) |
| not stripped | Symbol table intact (function names visible) |
Viewing ELF Structure with readelf & objdump
📌 readelf - Display ELF Information
readelf displays detailed information about ELF files.
readelf -h program ; Show ELF header
readelf -S program ; Show sections
readelf -l program ; Show program headers
readelf -s program ; Show symbol table
readelf -d program ; Show dynamic section📌 objdump - Disassemble Binaries
objdump displays disassembled machine code and detailed binary information.
objdump -d program ; Disassemble all code
objdump -M intel -d program ; Disassemble in Intel syntax
objdump -s program ; Show all sections (hex dump)
objdump -t program ; Show symbol table
objdump -h program ; Show section headersReverse Engineering with GDB
GDB (GNU Debugger) is the industry-standard debugger for Linux. It lets you execute programs step-by-step, inspect memory and registers, and understand exactly what code is doing.
GDB Installation
# Ubuntu/Debian
sudo apt-get install gdb
# Fedora/RHEL
sudo dnf install gdb
# macOS
brew install gdbStarting GDB
gdb ./program
gdb -q ./program ; Quiet mode (no banner)Essential GDB Commands
📌 run - Execute the Program
run [args] starts the program with optional command-line arguments.
(gdb) run
(gdb) run arg1 arg2 ; Pass arguments
(gdb) run < input.txt ; Redirect stdin📌 break - Set Breakpoints
break location sets a breakpoint, pausing execution at that point.
(gdb) break main ; Break at function main
(gdb) break 0x08048400 ; Break at address
(gdb) break *0x08048400 ; Break at address (safer syntax)
(gdb) info break ; List all breakpoints
(gdb) delete 1 ; Delete breakpoint 1
(gdb) disable 1 ; Disable (don't remove) breakpoint 1
(gdb) enable 1 ; Re-enable breakpoint 1📌 continue - Resume Execution
continue (or c) resumes execution until the next breakpoint.
(gdb) continue
(gdb) c ; Shorthand📌 info functions - List Functions
info functions shows all functions in the binary.
(gdb) info functions
All defined functions:
File program.c:
void check_password(char*);
int main();
void hidden_function();Why useful: Quickly find all functions in a binary, especially in stripped binaries.
📌 info registers - Show Register Values
info registers (or i r) displays current register values.
(gdb) info registers
rax 0x1 1
rbx 0x0 0
rcx 0x7ffffffde7f8 140737488281592
rdx 0x7ffffffde8f8 140737488282360
rsi 0x7ffffffde8e8 140737488282344
rdi 0x1 1
rbp 0x7ffffffde820 0x7ffffffde820
rsp 0x7ffffffde800 0x7ffffffde800
rip 0x401000 0x401000 <_start>(gdb) print $rax ; Print RAX in decimal
(gdb) print/x $rax ; Print RAX in hex
(gdb) print/d $rax ; Print RAX in decimal
(gdb) print/s $rsi ; Print RSI as string📌 disassemble - Show Assembly Code
disassemble function shows assembly code for a function.
(gdb) disassemble main ; Disassemble main function
(gdb) disassemble ; Disassemble current function
(gdb) disassemble 0x401000 0x401050 ; Range of addresses📌 x - Examine Memory
x [/format] address displays memory at an address.
(gdb) x/10x $rsp ; View 10 hex values at RSP
(gdb) x/10w $rbp ; View 10 words (4 bytes) at RBP
(gdb) x/20b $rax ; View 20 bytes at RAX
(gdb) x/s $rsi ; View string at RSI
(gdb) x/i $rip ; View instruction at RIP| Format | Display As |
|---|---|
| x | Hex |
| d | Decimal (signed) |
| u | Unsigned decimal |
| s | String |
| i | Instruction |
| c | Character |
| o | Octal |
📌 si/ni - Step Instructions
si (step into) executes one instruction, stepping into function calls.
ni (next instruction) executes one instruction, stepping over function calls.
(gdb) si ; Step into next instruction
(gdb) si 5 ; Step 5 times
(gdb) ni ; Next instruction (over calls)
(gdb) step ; Source-level step into
(gdb) next ; Source-level next instruction📌 set $register = value - Modify Registers
set $register = value modifies register values during debugging.
(gdb) set $rax = 100 ; Set RAX to 100
(gdb) set $rdi = 0 ; Set RDI to 0
(gdb) info registers ; Verify changesWhy useful: Bypass password checks, change comparison results, test alternative code paths.
Complete GDB Debugging Walkthrough
Let's debug a real binary step-by-step:
(gdb) gdb ./crackme
(gdb) set disassembly-flavor intel ; Use Intel syntax
(gdb) info functions ; List all functions
(gdb) break main ; Set breakpoint at main
(gdb) run secret123 ; Run with password argument
Breakpoint 1 at 0x0010149a
(gdb) disassemble main ; View main function code
(gdb) si ; Step into first instruction
(gdb) info registers ; Check all registers
(gdb) x/s $rdi ; View command-line arg (1st arg in RDI)
(gdb) continue ; Run to next breakpoint
(gdb) quit ; Exit GDBpwndbg - Enhanced GDB
pwndbg is an awesome GDB plugin that adds powerful reverse engineering features.
git clone https://github.com/pwndbg/pwndbg
cd pwndbg
./setup.shpwndbg enhancements:
- Better disassembly display (syntax highlighting)
- Visual stack and register display
- Memory map view
- Additional commands: nearpc, telescope, vmmap
Radare2 - Advanced Binary Analysis
Radare2 is a powerful, open-source framework for reverse engineering and analyzing binaries. It combines static analysis, dynamic analysis, and visualization in one tool.
Radare2 Installation
# Linux
git clone https://github.com/radareorg/radare2
cd radare2
sys/install.sh
# Or via package manager
sudo apt-get install radare2Launching Radare2
r2 ./binary ; Open binary for analysis
r2 -w ./binary ; Write mode (can modify binary)Essential Radare2 Commands
📌 aaa - Analyze All
aaa performs full analysis on the binary — finds functions, data, and creates control flow graphs.
[0x08048400]> aaa ; Full analysis
[0x08048400]> afl ; List all functions (use after aaa)📌 afl - List Functions
afl lists all discovered functions with addresses.
[0x08048400]> afl
0x08048400 1 42 entry0
0x08048432 1 37 sym.main
0x08048460 1 52 sym.check_password
0x08048495 1 25 sym.print_success📌 pdf - Print Disassembly of Function
pdf @address prints disassembled function at address.
[0x08048400]> pdf @ sym.main ; Disassemble main
[0x08048400]> pdf ; Disassemble current function📌 db - Debug Mode
db enters debug mode to execute and trace the binary.
[0x08048400]> db main ; Set breakpoint at main
[0x08048400]> dc ; Continue execution
[0x08048400]> dr ; Show registers
[0x08048400]> ds ; Step instruction📌 dc - Continue Execution
dc continues binary execution until breakpoint.
📌 V - Visual Mode
V opens visual/interactive mode with graphical display.
[0x08048400]> V ; Enter visual mode
; Inside visual mode:
p ; Change view mode
j/k ; Move down/up
q ; Quit visual mode📌 VV - Graph Mode
VV shows control flow graph in visual mode.
[0x08048400]> VV ; Enter graph mode
j/k ; Navigate blocks
Enter ; Follow jump
Esc ; Go back
q ; Exit📌 iz - Show Strings
iz lists all strings found in the binary.
[0x08048400]> iz
Strings
0x08049f00 11 Wrong password
0x08049f0c 10 Access granted
0x08049f17 15 Enter password:Complete Radare2 Workflow
$ r2 ./crackme
[0x08048400]> aaa ; Analyze everything
[0x08048400]> afl ; List functions
[0x08048400]> iz ; Show strings
[0x08048400]> pdf @ sym.main ; View main function
[0x08048400]> V ; Visual mode to exploreStatic Analysis - Professional Tools
Static analysis means examining a binary without running it. You analyze code structure, disassembly, and data flow to understand what a program does. Professional tools like Ghidra, IDA Pro, and Binary Ninja dominate this space.
Professional Static Analysis Tools
📌 Ghidra - NSA Open-Source Reverse Engineering
Ghidra is the free, open-source reverse engineering tool from the NSA. It's powerful enough to compete with commercial tools.
# Launch Ghidra
ghidraRun
# Then:
1. File → New Project
2. Import → Select binary
3. Double-click binary to open
4. Let it analyze (auto analyze runs)
5. View → Functions to see all functions
6. Double-click function to decompileDecompilation window shows:
- Left: Function list
- Center: Decompiled C-like code
- Right: Assembly code
- Bottom: Comments and cross-references
📌 IDA Pro - Industry Standard
IDA Pro is the gold standard in reverse engineering. Used by security researchers worldwide.
📌 Binary Ninja - Modern Alternative
Binary Ninja is a modern reverse engineering platform with excellent Python API and collaborative features.
Command-Line Static Analysis Tools
Essential tools for quick binary inspection and analysis:
📌 file - Identify File Type
file determines file type by examining magic bytes and file structure.
file ./program ; Identify binary type
file -i ./program ; Show MIME type
file -b ./program ; Brief mode (no filename)
file * | grep ELF ; Find all ELF files in directoryExample output:
program: ELF 64-bit LSB executable, x86-64, dynamically linked, strippedWhen to use: First step in binary analysis to understand architecture, linking type, and whether symbols are present.
📌 strings - Extract Printable Strings
strings extracts human-readable text from binary files, useful for finding hardcoded credentials, URLs, error messages, and function names.
strings ./program ; Extract ASCII strings (default min length: 4)
strings -n 10 ./program ; Minimum string length 10
strings -a -t x ./program ; Show all strings with hex offset
strings -e l ./program ; Unicode strings (little-endian)
strings ./program | grep -i password ; Search for specific strings
strings ./program | grep "^/" ; Find file pathsWhen to use: First step in binary analysis to quickly identify interesting text, function names, library paths, or hardcoded secrets.
Pro tip: Combine with grep to search for URLs, IP addresses, API keys, or specific keywords.
📌 hexedit / hexdump / xxd - Hex Editors & Viewers
Hex editors allow viewing and modifying binary files at the byte level.
# hexdump - View hex representation
hexdump -C ./program | head ; Canonical hex+ASCII view
hexdump -C ./program | grep "ELF" ; Find ELF magic bytes
# xxd - Hex dump tool
xxd ./program | head ; Hex dump with ASCII
xxd -l 100 ./program ; First 100 bytes only
# hexedit - Terminal hex editor (interactive)
hexedit ./program ; Edit binary filesWhen to use: Examine file headers, find magic bytes, analyze packed/obfuscated binaries, or patch binaries directly.
📌 nm - List Symbols
nm lists symbols from object files and libraries. Fails gracefully on stripped binaries.
nm ./program ; List all symbols
nm -D ./program ; List dynamic symbols only
nm -g ./program ; List external symbols
nm -C ./program ; Demangle C++ symbols
nm -A *.o ; List symbols from all object filesSymbol types:
- T: Text section (code)
- D: Initialized data
- B: Uninitialized data (BSS)
- U: Undefined (external reference)
When to use: Check if binary is stripped, identify imported/exported functions, or find specific symbols.
📌 ldd - Print Shared Library Dependencies
ldd prints shared libraries required by a dynamically linked binary.
ldd ./program ; Show library dependencies
ldd -v ./program ; Verbose (version information)
ldd -r ./program ; Report missing symbolsExample output:
linux-vdso.so.1 => (0x00007fff...)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f...)
/lib64/ld-linux-x86-64.so.2 (0x00007f...)When to use: Understand binary dependencies, troubleshoot missing libraries, or identify which libc version is required.
📌 readelf - Display ELF File Information
readelf displays detailed information about ELF files (covered in assembler section, but worth repeating here).
readelf -h ./program ; Show ELF header
readelf -S ./program ; Show section headers
readelf -l ./program ; Show program headers (segments)
readelf -s ./program ; Show symbol table
readelf -d ./program ; Show dynamic section
readelf -r ./program ; Show relocations
readelf -n ./program ; Show notes (build ID, etc.)When to use: Deep dive into ELF structure, find entry points, analyze security features (NX, PIE, RELRO), or debug linking issues.
📌 objdump - Object File Dumper & Disassembler
objdump is GNU's swiss-army knife for binary analysis and disassembly.
objdump -d ./program ; Disassemble executable sections
objdump -M intel -d ./program ; Disassemble in Intel syntax
objdump -D ./program ; Disassemble ALL sections
objdump -s ./program ; Full hex dump of all sections
objdump -t ./program ; Symbol table
objdump -T ./program ; Dynamic symbol table
objdump -h ./program ; Section headers
objdump -p ./program ; Program headers
objdump -R ./program ; Dynamic relocationsWhen to use: Quick disassembly, examine specific sections, or verify compiler output.
📌 radare2 / r2 - Reverse Engineering Framework
radare2 is covered in detail in its own section, but deserves mention here as a powerful command-line static analysis tool.
r2 -A ./program ; Auto-analyze on load
r2 -c "aaa; pdf @ main" ./program ; Analyze and disassemble mainSee the Radare2 section for comprehensive commands and usage.
📌 checksec - Check Binary Security Properties
checksec checks security features enabled in a binary (RELRO, Stack Canary, NX, PIE, RPATH, RUNPATH).
# Install checksec
sudo apt-get install checksec ; Debian/Ubuntu
wget https://github.com/slimm609/checksec.sh/raw/master/checksec && chmod +x checksec
# Check single binary
checksec --file=./program
# Check all binaries in directory
checksec --dir=/binSecurity features explained:
- RELRO (Relocation Read-Only): Makes GOT read-only after relocation
- Stack Canary: Detects buffer overflows
- NX (No Execute): Marks stack/heap non-executable
- PIE (Position Independent Executable): Enables ASLR
When to use: Assess exploit difficulty, verify compiler flags, or check if binary was compiled with security hardening.
📌 binwalk - Firmware Analysis Tool
binwalk analyzes, extracts, and reverse engineers firmware images and embedded files.
binwalk firmware.bin ; Scan for embedded files/filesystems
binwalk -e firmware.bin ; Extract embedded files
binwalk -E firmware.bin ; Entropy analysis (detect encryption/compression)
binwalk -A firmware.bin ; Scan for executable codeWhen to use: Analyze firmware images, extract embedded file systems (squashfs, cramfs), or identify packed/encrypted sections.
📌 exiftool - Extract Metadata
exiftool reads and writes metadata in files. Useful for forensics and identifying compilation details.
exiftool ./program ; Extract all metadata
exiftool -time:all ./program ; Show timestamps
exiftool -Binary ./program ; Show binary-specific metadataWhen to use: Find compilation timestamps, compiler versions, or embedded metadata that may reveal development environment.
📌 ltrace - Library Call Tracer (Static Context)
ltrace is primarily for dynamic analysis (covered in Dynamic Analysis section), but can reveal which library functions a binary uses.
See Dynamic Analysis section for comprehensive ltrace usage.
📌 strace - System Call Tracer (Static Context)
strace is primarily for dynamic analysis (covered in Dynamic Analysis section), but understanding syscall usage is part of static analysis.
See Dynamic Analysis section for comprehensive strace usage.
Advanced Static Analysis Tools
📌 Cutter - GUI for Radare2
Cutter provides a modern Qt-based GUI for radare2 with decompilation support.
# Download from https://cutter.re
sudo apt-get install cutter ; Ubuntu 20.04+Features: Graphical control flow, decompiler (Ghidra plugin), hex editor, debugger integration.
When to use: Modern alternative to IDA/Ghidra for free, visual binary analysis with radare2 backend.
📌 Hopper - macOS/Linux Disassembler
Hopper is a commercial reverse engineering tool for macOS and Linux.
Price: ~$100 (personal), cheaper than IDA Pro.
Features: Disassembler, pseudo-code decompiler, Python scripting, x86/ARM/MIPS support.
When to use: Professional alternative to IDA at lower cost, especially on macOS.
String Analysis & Pattern Matching
Beyond basic string extraction, pattern analysis helps identify functionality:
strings ./binary | grep -E "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" ; Find emails
strings ./binary | grep -E "https?://[^\s]+" ; Find URLs
strings ./binary | grep -E "^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$" ; Find IP addresses
strings ./binary | grep -i "key\|password\|secret\|token\|api" ; Find credentialsWhy strings are useful:
- Often reveal hardcoded passwords or API keys
- Show error messages that hint at program logic
- Identify libraries and functions
- Quickly find interesting areas to analyze
- Discover hidden features or debug messages
- Identify encryption algorithms by string constants
Control Flow Analysis
Understanding how code branches and jumps helps identify:
- Conditional logic: If/else patterns in assembly
- Function calls: External dependencies
- Dead code: Unreachable branches
All professional tools (Ghidra, IDA, Binary Ninja) show control flow graphs that visualize this.
Dynamic Analysis - Runtime Behavior
Dynamic analysis means running the binary in a controlled environment while monitoring its behavior. Watch system calls, library calls, memory modifications, and network traffic to understand what code actually does.
System Call Tracing with strace
strace intercepts and logs all system calls made by a process.
strace ./program ; Trace all syscalls
strace -e trace=open,read ./program ; Trace specific syscalls
strace -o trace.txt ./program ; Save to file
strace -c ./program ; Summary (count syscalls)
strace -p 1234 ; Attach to running processWhat strace reveals:
- Files being read/written
- Network connections (socket, connect syscalls)
- Environment variables being read
- Memory mappings
- Signal handling
Library Call Tracing with ltrace
ltrace traces library function calls (libc, libcrypto, etc.).
ltrace ./program ; Trace library calls
ltrace -c ./program ; Summary (count function calls)
ltrace -o trace.txt ./program ; Save to file
ltrace -e strcmp ./program ; Trace specific functionsUseful library functions to trace:
- strcmp: String comparison (password checks)
- malloc/free: Memory allocation
- printf: Output (what's being printed)
- getenv: Environment variable access
Combined strace + ltrace
Use together for complete picture:
strace -f ltrace ./program ; Both (slower)
strace -e trace=file ./program ; Focus on file operationsAdvanced Dynamic Analysis - Frida
Frida is a powerful instrumentation framework. Inject code into running processes to hook functions and modify behavior in real-time.
# Install
pip install frida frida-tools
# List processes
frida-ps
# Attach to process
frida -p 1234
# Spawn and trace
frida -n ./programFrida capabilities:
- Hook any function (intercept and modify behavior)
- Read/write process memory
- Dump arguments and return values
- Modify program flow in real-time
- Works on binaries you don't have source for
Analyzing Stripped Binaries
A stripped binary has all debug symbols removed — function names, variable names, and type information are gone. This makes reverse engineering harder but not impossible.
Identifying Stripped Binaries
file ./program
Output examples:
not stripped - has symbols
stripped - symbols removed
file -i ./program ; MIME type info
readelf -S ./program ; Show sections
nm ./program ; Empty if stripped
objdump -t ./program ; Symbol tableTechniques for Stripped Binaries
📌 Function Identification via Signatures
Even without names, you can identify common library functions by their machine code patterns.
How it works: Compiler generates same code patterns for common functions (strlen, malloc, etc.). Tools match these patterns and identify functions automatically.
In Ghidra:
1. Window → Function ID
2. Load database → Select standard library
3. Search → Auto-identify known functions
Many libc functions automatically named📌 Heuristic Analysis - Entry Points
Without symbols, look for patterns that reveal function boundaries:
- Function prologue:
push rbp; mov rbp, rsp(function start) - Function epilogue:
pop rbp; ret(function end) - Call patterns:
callfollowed by function prologue = new function - Loops: Backwards jumps to earlier code
- Data references: Addresses that reference strings or constants
📌 Cross-referencing & String Analysis
Strings often identify function purposes:
# Step 1: Extract strings
strings ./program | grep -i error
# Step 2: Find where strings are referenced
In Ghidra: Search → For Strings...
Double-click string → Shows code that uses it
# Step 3: Identify surrounding function
Look at prologue/epilogue to find function bounds
Analyze logic based on string context📌 Machine Learning-Based Symbol Recovery
Modern research uses LLMs to recover function names from stripped binaries.
How it works: Train ML models on decompiled code patterns. Given stripped binary, model predicts likely function names and variable types.
Dynamic Analysis of Stripped Binaries
Use runtime tracing to understand behavior without symbols:
# Trace syscalls to understand behavior
strace -o syscalls.txt ./program
# Trace library calls
ltrace -o libcalls.txt ./program
# Use GDB to set breakpoints and inspect registers
gdb ./program
(gdb) break *0x401000
(gdb) run
(gdb) info registers ; See actual valuesPractical Example - Analyzing Stripped Binary
# 1. Identify if stripped
$ file ./crackme
crackme: ELF 64-bit, stripped
# 2. Extract strings - look for clues
$ strings ./crackme | grep -i password
Incorrect password
Access granted
# 3. Open in Ghidra
- Window → Function ID → Load standard library
- Many stdlib functions now identified
- Search → For Strings → Find "password" references
- Double-click string to see code using it
# 4. Analyze the function using string as anchor
- Look at function prologue/epilogue
- Identify comparisons and jumps
- Look for password check logic
# 5. Use dynamic analysis if stuck
$ ltrace ./crackme
strcmp("myinput", "secretpass") = -37
puts("Incorrect password") = 19
Now you know the password!Binary Patching - Code Modification
Binary patching means modifying a binary's machine code to change its behavior. Used to bypass password checks, remove license verification, or modify logic flow.
Why Patch Binaries?
- Bypass authentication/license checks
- Change program behavior for analysis
- Create custom versions without source
- Remove anti-debugging code
- Test vulnerability fixes
Three Patching Approaches
📌 Method 1: Hex Editor - Direct Modification
Most direct method: Use hex editor to change machine code bytes.
# Step 1: Find the instruction to patch in IDA/Ghidra
cmp eax, 0x12345
jne fail ; This is at offset 0x1234
# Step 2: Convert jne to NOP (0x90)
jne opcode = EB 05 (jump)
NOP opcode = 90
# Step 3: Open hex editor, go to offset 0x1234
Replace: EB 05 → 90 90 (2 NOPs to fill space)
# Step 4: Save and test
./patched_binaryKey instruction to know:
NOP (0x90): No operation - does nothing, safe filler
Replace conditional jumps with NOPs to bypass checks📌 Method 2: IDA/Ghidra Built-in Patching
Both IDA and Ghidra have native patching capabilities.
# In IDA hex view:
1. Right-click on byte
2. Select "Edit"
3. Type new hex values
4. Right-click → "Apply changes"
# Save patched binary:
File → Produce file → Create DIF file (diff/patch file)Ghidra Patching
# In Ghidra disassembly view:
1. Window → Hex
2. Right-click byte → Edit (pencil icon)
3. Type replacement values
4. File → Export Program → Binary
# Now you have a modified binary📌 Method 3: Assembly Modification + Reassemble
▼
For more complex changes, write assembly, assemble it, patch in.
Advanced Patching - Replace Function
# Step 1: Identify function to replace (offset 0x401000, 50 bytes)
# Step 2: Write replacement assembly
mov rax, 1 ; Return 1 (success)
ret
# Step 3: Assemble it
nasm -f bin replacement.asm -o replacement.bin
hexdump -C replacement.bin
Output: 48 c7 c0 01 00 00 00 c3 (8 bytes)
# Step 4: Pad with NOPs to match original size (50 bytes)
Need 50 bytes total, have 8, so add 42 NOPs (0x90)
# Step 5: Patch hex in original binary at offset 0x401000
hex editor: Go to 0x401000, replace with new bytes
Real-World Patching Example
Complete patching workflow - Bypass password
# Binary: crackme - asks for password
$ ./crackme
Enter password: test
Incorrect!
# Step 1: Open in Ghidra, find password check
0x401234: mov rax, [rip + 0x2dc6] ; Load input
0x40123b: mov rbx, [rip + 0x2dc5] ; Load expected password
0x401242: cmp rax, rbx ; Compare
0x401245: jne 0x401260 ; Jump to fail if not equal
0x401247: call print_success ; Otherwise print success
# Step 2: We want to skip the jne (jump to fail)
# Option A: Replace jne with NOPs
jne opcode at 0x401245: 75 19 (2 bytes)
Replace with: 90 90 (2 NOPs)
# Step 3: Use hex editor to patch
Go to file offset 0x401245
Find bytes: 75 19
Replace with: 90 90
Save file
# Step 4: Test
$ ./crackme_patched
Enter password: anything
Success!
# Password check bypassed! Any input works now
Common Patching Targets
What to Patch
Pattern
Replacement
Password check
cmp; jne failure
Replace jne with NOPs
License validation
call validate_license; jne fail
NOP out the jne
Anti-debug
call is_debugged; jne exit
Make function return 0
Trial expiration
cmp rax, expiration_date
Change expiration_date value
Error message
lea rdi, [rip + error_str]
Change string pointer/content
✓ Binary Patching Mastered!
You can modify binaries to change behavior, bypass checks, and test modifications.Anti-Reversing Techniques & Bypasses
Software developers implement anti-reversing techniques to protect intellectual property and prevent cracking. Understanding these techniques helps you bypass them and analyze protected binaries.
Common Anti-Reversing Techniques
📌 Anti-Debugging - Detect Debuggers
Anti-debug code detects if a debugger is attached and terminates or behaves differently.
int main() {
if (ptrace(PTRACE_TRACEME, 0, 1, 0) == -1) {
printf("Debugger detected! Exiting.\n");
exit(1);
}
// Program continues if not debugged
}📌 Code Obfuscation - Hide Logic
Code obfuscation makes code hard to understand without changing functionality.
ORIGINAL:
if (x > 10)
print("big")
else
print("small")
OBFUSCATED:
a = random()
if (a == 1)
if (x > 10) print("big")
else if (a == 2)
if (x <= 10) print("small")
else if (a == 3) ...
Same logic, much harder to follow!📌 Packing & Compression - Hide Code
Packers compress/encrypt the entire binary. Only decompressed in memory at runtime.
Popular Packers
UPX: Open-source, compresses binaries
Themida: Commercial, strong obfuscation + packing
Code Virtualizer: Turns native code into VM bytecodeUPX Example
# Pack a binary
upx -9 ./program -o program.packed
# Detect if packed
file ./program.packed
Output: packed with UPX
# Unpack (if UPX)
upx -d ./program.packed
# If custom packer, must unpack manually:
1. Run in GDB
2. Find OEP (Original Entry Point)
3. Dump memory region
4. Analyze dumped binary📌 Anti-Tampering - Detect Modifications
Anti-tampering detects if binary or memory has been modified.
ASLR - Address Space Layout Randomization
ASLR randomizes memory addresses each run. Makes exploitation and analysis harder.
# Check ASLR status
cat /proc/sys/kernel/randomize_va_space
0 = disabled, 1 = conservative, 2 = full
# Disable ASLR (requires root)
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
# Or run single binary without ASLR
setarch $(uname -m) -R ./program
# In GDB
(gdb) set disable-randomization onStack Canaries
Stack canaries detect buffer overflows by placing magic value before return address.
checksec ./program
Output shows: Canary found = yes/no
readelf -x .note.gnu.property ./program
Look for 0x1 bit in CF_PROTECTION_BRANCHDEP/NX - Data Execution Prevention
DEP/NX marks data pages as non-executable. Prevents shellcode execution.
checksec ./program
Output shows: NX enabled/disabled
readelf -l ./program | grep GNU_STACK
RWX = no NX protection, RW = NX enabledangr - Automated Symbolic Execution
angr is a powerful binary analysis framework that uses symbolic execution to find inputs that reach specific code paths. Instead of manually analyzing, angr explores all possible paths and solves constraints.
What is Symbolic Execution?
Instead of concrete values, variables are treated as symbolic — representing all possible values. Branches create constraints.
Installation & Setup
Install angr
pip install angr
pip install angr[all] ; Install with optional dependenciesBasic angr Workflow
Simple angr Script - Crack Password
import angr
# Load binary
project = angr.Project("./crackme")
# Create symbolic variable for input (stdin)
initial_state = project.factory.entry_state(
stdin=angr.SimFile(content_size=16) ; 16-byte input
)
# Create simulation manager
simgr = project.factory.simgr(initial_state)
# Address of success message
success_addr = 0x401234
failure_addr = 0x401256
# Explore until we find success or hit failure
simgr.explore(
find=success_addr,
avoid=failure_addr
)
# Get the solution
if simgr.found:
solution_state = simgr.found[0]
solution = solution_state.posix.dumps(0) ; 0 = stdin
print(f"Password found: {solution.decode()}")
else:
print("No solution found")Key angr Concepts
📌 State - Program Snapshot
State represents a point in program execution - registers, memory, constraints.
Working with States
state = project.factory.entry_state()
# Access registers
print(state.regs.rax)
# Read memory
data = state.memory.load(address, size)
# Symbolic variable
sym_input = angr.BVS('input', 64) ; 64-bit symbolic input📌 SimulationManager - Explore States
SimulationManager (simgr) manages multiple execution states simultaneously.
SimulationManager Usage
simgr = project.factory.simgr(initial_state)
# Explore automatically
simgr.explore(find=success_address)
# Manual stepping
simgr.step()
# Check state categories
print(simgr.active) ; Active (continuing)
print(simgr.found) ; Found target address
print(simgr.avoided) ; Hit avoided address
print(simgr.deadended) ; Dead ends (no more branches)📌 Constraint Solving with Z3
angr uses Z3 solver to solve constraints and find satisfying values.
Solve Constraints
# Get concrete values from symbolic state
solution = state.solver.eval(sym_variable) ; Get one solution
all_solutions = state.solver.eval_all(sym_variable) ; Get all possibleReal-World Example - CTF Challenge
import angr
import claripy
# Load the binary
binary_path = "./crackme"
project = angr.Project(binary_path, auto_load_libs=False)
# Create initial state (execution starts at main)
main_address = 0x401234 ; Address of main()
state = project.factory.blank_state(addr=main_address)
# Create symbolic argv[1] (16 bytes)
password = claripy.BVS('password', 128) ; 16 bytes * 8 bits
# Simulate program with symbolic input in argv[1]
# (assumes binary reads argv[1] as password)
# Create simulation manager
simgr = project.factory.simgr(state)
# Explore - find "Correct!" message at 0x401300
; avoid "Incorrect!" at 0x401350
simgr.explore(find=0x401300, avoid=[0x401350])
# Check results
if simgr.found:
solution_state = simgr.found[0]
password_value = solution_state.solver.eval(password, cast_to=bytes)
print(f"[+] Password found: {password_value}")
else:
print("[-] No solution found")
if simgr.avoided:
print(f"[!] Hit avoided addresses: {simgr.avoided}")Advanced Techniques
📌 Function Hooking - Speed Up Analysis
Hook slow functions to avoid symbolic execution overhead.
Hooking Example
# Hook strlen to avoid complex simulation
def hook_strlen(state):
s = state.memory.load(state.regs.rdi, 256)
length = claripy.Length(s)
state.regs.rax = length
project.hook(0x401000, hook_strlen) ; Hook at function address📌 Taint Analysis - Track Data Flow
Track how user input flows through program to find sensitive operations.
Taint Input
# Mark input as tainted
state.memory.taint(input_addr, input_size)
# Later: check if value is tainted
if state.memory.is_tainted(rax):
print("RAX contains tainted data (user input)")When angr Excels vs Struggles
| Best For | Struggles With |
|---|---|
| Finding password/key (simple comparison) | Complex floating-point math |
| Reaching specific code path | Cryptographic operations (very slow) |
| Constraint solving (small inputs) | Large state spaces (too many branches) |
| CTF challenges (designed for automation) | Real-world complex binaries |