Skip to content

INTERCEPT·DOC-2026-001-reverse

Reverse Engineering Masterclass

FILED 2026-05-14·69 min read·REVERSE-ENGINEERING · BINARY-EXPLOITATION · LINUX · TUTORIAL

A 15-part walkthrough — CPU registers, syscalls, x86-64 assembly, GDB, radare2, dynamic analysis, patching, anti-reversing, and beyond.

CPU & Registers

The CPU (Central Processing Unit) is the brain of your computer. To reverse engineer binaries, you must understand how CPUs work at a fundamental level. This chapter explains CPU architecture, registers, and why they matter for security.

What is a CPU?

A CPU executes instructions in sequence. It reads data from memory, processes it, and writes results back. The two main components of a CPU are:

Control Unit (CU). The Control Unit directs traffic in the CPU. It reads instructions from memory, decodes them, tells other parts what to do, and manages the flow of data. Think of it as the conductor of an orchestra.

Execution Unit (EU). The Execution Unit actually performs calculations and operations. It executes arithmetic (ADD, SUB), logical operations (AND, OR), comparisons (CMP), and memory operations. It's the worker that does the real work.

What Are Registers?

Registers are tiny, ultra-fast storage units inside the CPU. They hold data that the CPU is actively using. Unlike RAM (which is gigabytes), registers are measured in bits and are incredibly fast.

Register Hierarchy (x86-64 Architecture)

On modern 64-bit x86 CPUs, registers come in different sizes. The main "General Purpose Registers" are:

64-bit (QWORD)32-bit (DWORD)16-bit (WORD)8-bit HIGH8-bit LOW
RAXEAXAXAHAL
RBXEBXBXBHBL
RCXECXCXCHCL
RDXEDXDXDHDL
RSIESISI-
RDIEDIDI-
RSPESPSP-
RBPEBPBP-
R8R8DR8WR8B
R9R9DR9WR9B
R10R10DR10WR10B
R11R11DR11WR11B
R12R12DR12WR12B
R13R13DR13WR13B
R14R14DR14WR14B
R15R15DR15WR15B

When you use a 32-bit register (like EAX), it automatically zeros the upper 32 bits of the 64-bit register (RAX). This is important in reverse engineering because it affects what data is preserved.

Key x86-64 Registers

📌 RAX (Accumulator) - General Purpose

RAX is the primary accumulator register. It's used for:

  • Arithmetic operations (ADD, SUB, MUL)
  • Return values from function calls
  • Syscall numbers in system calls
  • Division operations (quotient stored here)
  • Special I/O operations

Example: When a function returns an integer, it's in RAX.

ASM
mov rax, 1       ; Set RAX to 1 (often a success code)
ret              ; Return - RAX contains the return value

📌 RBX (Base) - General Purpose

RBX is traditionally the base register for addressing. It's used for:

  • Addressing memory (calculating addresses)
  • General-purpose data storage
  • Preserved register (must be saved by functions)

Important: RBX is a "callee-saved" register, meaning if a function modifies RBX, it must restore it before returning.

📌 RCX (Counter) - General Purpose

RCX is traditionally the counter register. It's used for:

  • Loop counters (REP instructions)
  • Fourth function argument (x86-64 calling convention)
  • Shift and rotate counts
  • General-purpose operations

Example: In a for loop, you might load the loop count into RCX and use the LOOP instruction.

📌 RDX (Data) - General Purpose

RDX is traditionally the data register. It's used for:

  • Third function argument (x86-64 calling convention)
  • Division operations (remainder stored here)
  • I/O operations
  • General-purpose operations

Example: When dividing RAX by RBX using DIV RBX, the remainder is stored in RDX.

📌 RSI/RDI (Source/Destination Index) - General Purpose

RSI (Source Index) and RDI (Destination Index) are used for:

  • RSI = second function argument
  • RDI = first function argument
  • String operations (MOVS, STOS, SCAS)
  • Memory operations

Important: In the x86-64 calling convention, RDI holds the first argument to a function. This is crucial for understanding function calls.

ASM
mov rdi, 10      ; 1st argument: 10
mov rsi, 20      ; 2nd argument: 20
mov rdx, 30      ; 3rd argument: 30
call add_three   ; Call function - returns result in RAX

📌 RIP (Instruction Pointer) - Program Counter

RIP (or EIP in 32-bit mode) is the Instruction Pointer. It always contains the address of the next instruction to execute.

  • Automatically incremented after each instruction
  • Jumps change RIP to branch to different code
  • Function calls push return address and change RIP
  • Can't directly modify RIP (use JMP or CALL)

Why important for reversing: When you see a breakpoint or a crash, RIP tells you exactly where in the code execution stopped.

📌 RSP & RBP (Stack Pointers)

RSP (Stack Pointer) and RBP (Base Pointer) manage the call stack:

  • RSP: Points to the top (most recent item) of the stack
  • RBP: Points to the base of the current stack frame (where local variables are)
  • PUSH decrements RSP and writes data
  • POP increments RSP and reads data
  • CALL pushes the return address onto the stack

Stack Layout Example:

Special Registers - Flags Register (RFLAGS)

The RFLAGS (or FLAGS in 32-bit) register contains condition flags — single bits that indicate the status of the last operation.

FlagNameMeaningSet When
ZFZero FlagResult is zeroResult of last operation = 0
CFCarry FlagUnsigned overflowAddition/subtraction carries/borrows
SFSign FlagResult is negativeMSB (most significant bit) = 1
OFOverflow FlagSigned overflowSigned arithmetic overflow occurs
PFParity FlagEven parityResult has even number of 1 bits
AFAdjust FlagBCD carryCarry in lower nibble

Why flags matter: Conditional jumps (JE, JNE, JZ, JG, etc.) check these flags to decide whether to jump. Understanding flags is essential for reading assembly code.

x86-64 Calling Convention

When a function is called, arguments are passed through specific registers in a specific order. This is the calling convention. Understanding it is crucial for debugging.

Any arguments beyond the 6th are passed on the stack.

Putting It Together

Now you understand:

  • CPUs have two main components: Control Unit and Execution Unit
  • Registers are ultra-fast CPU storage
  • Different registers have different purposes
  • The Flags register controls conditional jumps
  • Function arguments are passed through specific registers
  • RIP points to the next instruction
  • RSP and RBP manage the call stack

System Calls (Syscalls)

A system call is a request from a user program to the kernel to perform a privileged operation. When a program needs to read a file, write to the screen, or allocate memory, it can't do it directly — it must ask the kernel through a syscall.

User Space vs Kernel Space

Modern operating systems use a layered privilege model:

How System Calls Work

When your program executes a SYSCALL instruction:

  • 1. Program sets up registers with syscall number and arguments
  • 2. SYSCALL instruction transitions to kernel mode
  • 3. Kernel executes the requested operation
  • 4. Control returns to user program with result in RAX

x86-64 Linux Syscall ABI

On 64-bit Linux, syscalls follow a specific convention. Let's break it down:

Common Syscalls Explained

📌 exit (Syscall #60) - Terminate Program

exit(int code) terminates the program with an exit status.

ASM
mov rax, 60        ; exit syscall number
mov rdi, 0         ; exit code = 0 (success)
syscall            ; Call kernel to exit

Equivalent C code:

ASM
exit(0);  // Terminate with status 0

📌 write (Syscall #1) - Write to File/Console

write() writes data to a file descriptor (like stdout for console output).

ASM
mov rax, 1         ; write syscall number
mov rdi, 1         ; fd = 1 (stdout)
mov rsi, msg       ; rsi = pointer to message
mov rdx, 5         ; length = 5 bytes
syscall            ; Write to stdout

File Descriptor Reference:

FDNamePurpose
0stdinStandard input (keyboard)
1stdoutStandard output (console)
2stderrStandard error (console)

📌 read (Syscall #0) - Read from File/Console

read() reads data from a file descriptor into a buffer.

ASM
mov rax, 0         ; read syscall number
mov rdi, 0         ; fd = 0 (stdin)
mov rsi, buffer    ; rsi = pointer to buffer
mov rdx, 10        ; read up to 10 bytes
syscall            ; Read from stdin
                        ; RAX now contains number of bytes read

📌 open (Syscall #2) - Open a File

open() opens a file and returns a file descriptor.

FlagValueMeaning
O_RDONLY0Read only
O_WRONLY1Write only
O_RDWR2Read and write
O_CREAT64Create if doesn't exist
O_APPEND1024Append to file

📌 close (Syscall #3) - Close a File

close() closes a file descriptor, freeing the resource.

Complete Syscall Example: Write to Console

Let's write a complete program that uses syscalls to print "Hello World":

ASM
section .data
msg:        db "Hello World", 0x0a
len:        equ $ - msg        ; Calculate length
 
section .text
global _start
 
_start:
    ; write syscall to print message
    mov rax, 1         ; syscall: write
    mov rdi, 1         ; fd: stdout
    mov rsi, msg       ; buffer
    mov rdx, len       ; length
    syscall
 
    ; exit syscall
    mov rax, 60        ; syscall: exit
    mov rdi, 0         ; status: 0 (success)
    syscall

Syscall Return Values & Error Handling

After a syscall, the kernel returns a value in RAX:

  • If RAX ≥ 0: Success, RAX contains the result
  • If RAX < 0: Error occurred, RAX contains negative error code

Error codes are typically in the range -1 to -4095. Common errors:

Error CodeConstantMeaning
-1EPERMOperation not permitted
-2ENOENTNo such file or directory
-13EACCESPermission denied
-14EFAULTBad address

Complete System Call Database

Here's a comprehensive reference of Linux x86-64 system calls with arguments, return values, and detailed descriptions:

RAXNameArguments (RDI, RSI, RDX, R10, R8, R9)Return ValueDescription
0readRDI: unsigned int fdRSI: char *bufRDX: size_t countssize_t(bytes read or -1)Reads up to count bytes from file descriptor fd into buffer buf. Returns number of bytes read, 0 on EOF, or -1 on error. Commonly used with fd=0 (stdin) to read user input.
1writeRDI: unsigned int fdRSI: const char *bufRDX: size_t countssize_t(bytes written or -1)Writes up to count bytes from buffer buf to file descriptor fd. Returns number of bytes written or -1 on error. Use fd=1 (stdout) for console output, fd=2 (stderr) for error messages.
2openRDI: const char *filenameRSI: int flagsRDX: mode_t modeint(file descriptor or -1)Opens file specified by filename. Flags: O_RDONLY(0), O_WRONLY(1), O_RDWR(2), O_CREAT(64), O_APPEND(1024). Mode specifies permissions (e.g., 0644). Returns file descriptor on success.
3closeRDI: unsigned int fdint(0 or -1)Closes file descriptor fd, freeing the resource. Returns 0 on success, -1 on error. Always close files when done to prevent resource leaks.
4statRDI: const char *filenameRSI: struct stat *statbufint(0 or -1)Retrieves file information (size, permissions, timestamps) for filename and stores in statbuf. Returns 0 on success. Does not follow symlinks for lstat(6).
5fstatRDI: unsigned int fdRSI: struct stat *statbufint(0 or -1)Like stat, but operates on an already-opened file descriptor instead of a filename. Useful when you already have the file open.
8lseekRDI: unsigned int fdRSI: off_t offsetRDX: unsigned int whenceoff_t(new position or -1)Repositions file offset of fd. Whence: SEEK_SET(0)=absolute, SEEK_CUR(1)=relative, SEEK_END(2)=from end. Returns new offset from beginning of file.
9mmapRDI: void *addrRSI: size_t lengthRDX: int protR10: int flagsR8: int fdR9: off_t offsetvoid*(address or MAP_FAILED)Maps file or device into memory. Prot: PROT_READ(1), PROT_WRITE(2), PROT_EXEC(4). Flags: MAP_PRIVATE(2), MAP_ANONYMOUS(32). Returns pointer to mapped area. Critical for memory management.
11munmapRDI: void *addrRSI: size_t lengthint(0 or -1)Unmaps a previously mapped memory region starting at addr with length. Returns 0 on success. Always unmap when done to free memory.
12brkRDI: void *addrvoid*(new break or -1)Changes program break (end of data segment) to addr. Used by malloc() internally. Returns new program break on success. Rarely used directly in modern programs (prefer mmap).
16ioctlRDI: unsigned int fdRSI: unsigned int cmdRDX: unsigned long argint(varies)Device-specific input/output control. Command and arguments vary by device. Used for operations that don't fit read/write model (terminal settings, disk operations, etc.).
22pipeRDI: int pipefd[2]int(0 or -1)Creates a pipe (unidirectional data channel). pipefd[0] is read end, pipefd[1] is write end. Returns 0 on success. Used for inter-process communication.
32dupRDI: unsigned int fildesint(new fd or -1)Duplicates file descriptor fildes using the lowest available fd number. Both fds refer to same file. Returns new fd on success.
33dup2RDI: unsigned int oldfdRSI: unsigned int newfdint(new fd or -1)Duplicates oldfd to newfd. If newfd is open, it's closed first. Commonly used to redirect stdin/stdout/stderr in child processes.
57fork(none)pid_t(child PID or 0 in child)Creates new process by duplicating calling process. Returns child PID to parent, returns 0 in child process. Child gets copy of parent's memory and file descriptors.
59execveRDI: const char *filenameRSI: char *const argv[]RDX: char *const envp[]int(never returns on success)Executes program specified by filename, replacing current process image. argv is argument array, envp is environment. Only returns on error. Used with fork() to run new programs.
60exitRDI: int statusvoid(never returns)Terminates calling process with exit status. 0 indicates success, non-zero indicates error. Flushes buffers, closes file descriptors, and returns status to parent.
61wait4RDI: pid_t pidRSI: int *statusRDX: int optionsR10: struct rusage *rusagepid_t(PID or -1)Waits for child process to change state. Returns child PID on success. Status contains exit code. Used by parent to collect terminated children (prevent zombies).
39getpid(none)pid_t(process ID)Returns process ID (PID) of calling process. Always succeeds. Useful for logging, creating unique filenames, or process identification.
110getppid(none)pid_t(parent PID)Returns parent process ID of calling process. If parent has exited, returns 1 (init/systemd). Always succeeds.
102getuid(none)uid_t(user ID)Returns real user ID of calling process. Used for permission checks. Always succeeds.
104getgid(none)gid_t(group ID)Returns real group ID of calling process. Used for permission checks. Always succeeds.
105setuidRDI: uid_t uidint(0 or -1)Sets effective user ID. If privileged, sets real, effective, and saved UIDs. Returns 0 on success. Used for privilege dropping or SUID executables.
106setgidRDI: gid_t gidint(0 or -1)Sets effective group ID. If privileged, sets real, effective, and saved GIDs. Returns 0 on success.
62killRDI: pid_t pidRSI: int sigint(0 or -1)Sends signal sig to process pid. Common signals: SIGTERM(15), SIGKILL(9), SIGUSR1(10). Returns 0 on success. If pid=0, sends to process group.
13rt_sigactionRDI: int sigRSI: const struct sigaction *actRDX: struct sigaction *oldactint(0 or -1)Examines or changes signal handler for signal sig. act specifies new action, oldact receives old action. Returns 0 on success. Modern signal handling interface.
34pause(none)int(always -1)Suspends process until signal is received. Always returns -1 with errno=EINTR after signal handler returns. Used for waiting on signals.
41socketRDI: int domainRSI: int typeRDX: int protocolint(socket fd or -1)Creates communication endpoint. Domain: AF_INET(2)=IPv4, AF_INET6(10)=IPv6. Type: SOCK_STREAM(1)=TCP, SOCK_DGRAM(2)=UDP. Returns socket file descriptor.
49bindRDI: int sockfdRSI: const struct sockaddr *addrRDX: socklen_t addrlenint(0 or -1)Assigns address (IP and port) to socket sockfd. Must be called before listen() for servers. Returns 0 on success. Port numbers below 1024 require root.
50listenRDI: int sockfdRSI: int backlogint(0 or -1)Marks socket as passive (ready to accept connections). Backlog specifies maximum queue length for pending connections. Returns 0 on success.
43acceptRDI: int sockfdRSI: struct sockaddr *addrRDX: socklen_t *addrlenint(new socket fd or -1)Accepts incoming connection on listening socket. Blocks until connection arrives. Returns new socket for the connection, original socket continues listening.
42connectRDI: int sockfdRSI: const struct sockaddr *addrRDX: socklen_t addrlenint(0 or -1)Initiates connection to remote address. For TCP, performs 3-way handshake. Blocks until connection established or timeout. Returns 0 on success.
44sendtoRDI: int sockfdRSI: const void *bufRDX: size_t lenR10: int flagsR8: const struct sockaddr *dest_addrR9: socklen_t addrlenssize_t(bytes sent or -1)Sends message on socket to specific address. For UDP sockets. Use send() for connected sockets. Returns number of bytes sent.
45recvfromRDI: int sockfdRSI: void *bufRDX: size_t lenR10: int flagsR8: struct sockaddr *src_addrR9: socklen_t *addrlenssize_t(bytes received or -1)Receives message from socket and captures sender's address. For UDP. Returns number of bytes received, 0 on connection close.
201timeRDI: time_t *tloctime_t(seconds since epoch)Returns current time as seconds since Unix epoch (Jan 1, 1970). If tloc is non-NULL, also stores there. Simple but low precision (1 second).
96gettimeofdayRDI: struct timeval *tvRSI: struct timezone *tzint(0 or -1)Gets current time with microsecond precision. tv contains seconds and microseconds since epoch. tz is obsolete (pass NULL). Returns 0 on success.
35nanosleepRDI: const struct timespec *reqRSI: struct timespec *remint(0 or -1)Suspends execution for time specified in req (seconds + nanoseconds). If interrupted by signal, remaining time stored in rem. Returns 0 on success.
228clock_gettimeRDI: clockid_t clk_idRSI: struct timespec *tpint(0 or -1)Retrieves time from specified clock. clk_id: CLOCK_REALTIME(0)=wall clock, CLOCK_MONOTONIC(1)=monotonic time. Nanosecond precision. Returns 0 on success.
83mkdirRDI: const char *pathnameRSI: mode_t modeint(0 or -1)Creates directory specified by pathname. Mode specifies permissions (e.g., 0755). Returns 0 on success, -1 if already exists or permission denied.
84rmdirRDI: const char *pathnameint(0 or -1)Removes empty directory. Returns -1 if directory not empty or doesn't exist. Returns 0 on success.
87unlinkRDI: const char *pathnameint(0 or -1)Deletes file specified by pathname. Decrements link count; if 0 and no process has file open, file is deleted. Returns 0 on success.
82renameRDI: const char *oldpathRSI: const char *newpathint(0 or -1)Renames/moves file from oldpath to newpath. Atomic operation. If newpath exists, it's replaced. Returns 0 on success.
90chmodRDI: const char *pathnameRSI: mode_t modeint(0 or -1)Changes file permissions. Mode is octal like 0644 (rw-r--r--) or 0755 (rwxr-xr-x). Returns 0 on success. Only owner or root can change permissions.
92chownRDI: const char *pathnameRSI: uid_t ownerRDX: gid_t groupint(0 or -1)Changes file owner and/or group. Pass -1 to leave unchanged. Only root can change owner. File owner can change group to one they belong to. Returns 0 on success.
80chdirRDI: const char *pathint(0 or -1)Changes current working directory to path. Affects relative path resolution. Returns 0 on success. Each process has its own working directory.
79getcwdRDI: char *bufRSI: size_t sizechar*(buf or NULL)Copies current working directory into buf (max size bytes). Returns buf on success, NULL on error. Size must be large enough for full path.
21accessRDI: const char *pathnameRSI: int modeint(0 or -1)Checks whether calling process can access file. Mode: F_OK(0)=exists, R_OK(4)=read, W_OK(2)=write, X_OK(1)=execute. Returns 0 if permitted.
86linkRDI: const char *oldpathRSI: const char *newpathint(0 or -1)Creates hard link named newpath to existing file oldpath. Both names refer to same inode. Deleting one doesn't affect the other. Returns 0 on success.
88symlinkRDI: const char *targetRSI: const char *linkpathint(0 or -1)Creates symbolic link named linkpath containing string target. Symlink can point to non-existent files and cross filesystems. Returns 0 on success.
89readlinkRDI: const char *pathnameRSI: char *bufRDX: size_t bufsizssize_t(bytes copied or -1)Reads value of symbolic link pathname into buf. Does not null-terminate. Returns number of bytes placed in buf, -1 on error.
10mprotectRDI: void *addrRSI: size_t lenRDX: int protint(0 or -1)Changes memory protection for page(s) starting at addr. Prot: PROT_NONE(0), PROT_READ(1), PROT_WRITE(2), PROT_EXEC(4). Used for security and exploit mitigation. Returns 0 on success.
28madviseRDI: void *addrRSI: size_t lengthRDX: int adviceint(0 or -1)Gives kernel advice about memory usage. Advice: MADV_NORMAL(0), MADV_SEQUENTIAL(2), MADV_DONTNEED(4). Hints for performance optimization. Returns 0 on success.
56cloneRDI: unsigned long flagsRSI: void *child_stackRDX: int *ptidR10: int *ctidR8: unsigned long newtlspid_t(child PID or -1)Creates new process/thread. More flexible than fork(). Flags control what is shared (memory, files, etc.). Used to implement threads. Returns child PID in parent, 0 in child.
231exit_groupRDI: int statusvoid(never returns)Terminates all threads in calling process's thread group. Like exit(), but affects all threads. Used by exit() in threaded programs. Never returns.
157prctlRDI: int optionRSI: unsigned long arg2RDX: unsigned long arg3R10: unsigned long arg4R8: unsigned long arg5int(varies)Process control operations. Options include PR_SET_NAME (set process name), PR_SET_DUMPABLE, PR_SET_SECCOMP. Highly versatile. Return value depends on option.
37alarmRDI: unsigned int secondsunsigned int(previous alarm)Arranges for SIGALRM to be delivered in seconds. Pass 0 to cancel. Returns number of seconds remaining from previous alarm. Only one alarm can be scheduled.
72fcntlRDI: unsigned int fdRSI: unsigned int cmdRDX: unsigned long argint(varies)Performs various operations on file descriptor. Cmd: F_GETFL(3)=get flags, F_SETFL(4)=set flags, F_DUPFD(0)=duplicate. Returns depend on cmd.
76truncateRDI: const char *pathRSI: off_t lengthint(0 or -1)Truncates file to specified length. If longer, extra data is discarded. If shorter, extended with null bytes. Returns 0 on success.
74fsyncRDI: unsigned int fdint(0 or -1)Synchronizes file's in-memory state with storage device (flushes all modified data and metadata). Returns 0 when data is safely on disk. Critical for data integrity.
14rt_sigprocmaskRDI: int howRSI: const sigset_t *setRDX: sigset_t *oldsetint(0 or -1)Examines and changes blocked signals. How: SIG_BLOCK(0)=add, SIG_UNBLOCK(1)=remove, SIG_SETMASK(2)=replace. Returns 0 on success. Used to protect critical sections.
131sigaltstackRDI: const stack_t *ssRSI: stack_t *old_ssint(0 or -1)Sets or gets alternate signal stack. Used when main stack is compromised (stack overflow). Returns 0 on success. Important for robust signal handling.
101ptraceRDI: long requestRSI: pid_t pidRDX: void *addrR10: void *datalong(varies)Process trace and debug. Allows parent to control child execution, read/write memory and registers. Used by debuggers (GDB). Powerful and dangerous. Returns vary by request.
169rebootRDI: int magicRSI: int magic2RDX: int cmdR10: void *argint(never on success)Reboots or halts system. Requires CAP_SYS_BOOT capability (root). Cmd: LINUX_REBOOT_CMD_RESTART, HALT, POWER_OFF. For emergency use. Returns only on error.
23selectRDI: int nfdsRSI: fd_set *readfdsRDX: fd_set *writefdsR10: fd_set *exceptfdsR8: struct timeval *timeoutint(ready fds or -1)Monitors multiple file descriptors for I/O readiness. Returns when fd is ready or timeout. Used for non-blocking I/O multiplexing. Returns number of ready fds.
7pollRDI: struct pollfd *fdsRSI: nfds_t nfdsRDX: int timeoutint(ready fds or -1)Like select but better API. Monitors file descriptors for events (POLLIN, POLLOUT, POLLERR). Timeout in milliseconds (-1=infinite). Returns number of ready fds.
213epoll_createRDI: int sizeint(epoll fd or -1)Creates epoll instance for scalable I/O event notification. More efficient than select/poll for many file descriptors. Size is ignored (kept for compatibility). Returns epoll fd.

Assembly Language - Complete Tutorial

Assembly language is the lowest-level programming language that directly corresponds to machine instructions. Each assembly instruction performs one CPU operation. To reverse engineer binaries, you must be able to read and understand assembly.

What is Assembly?

Assembly is a symbolic representation of machine code. Instead of writing binary (1s and 0s), you write mnemonics like MOV, ADD, JMP that are more readable. An assembler converts this into machine code.

Instruction Format

Most assembly instructions follow this format:

Important: In Intel syntax (which we use), the destination comes first, then the source. This is opposite to AT&T syntax.

Core Assembly Instructions

📌 MOV - Move/Copy Data

MOV destination, source copies data from source to destination.

Important Notes:

  • Cannot move between two memory addresses directly
  • Does NOT affect flags
  • 32-bit operations zero the upper 32 bits (e.g., mov eax, 1 zeros RAX[63:32])
  • Square brackets [] indicate memory address dereferencing
ASM
mov rax, 5             ; RAX = 5
mov rbx, rax           ; RBX = RAX (which is 5)
mov rcx, [rax]         ; RCX = value at address RAX
mov [rax], 10          ; Store 10 at address in RAX

📌 ADD - Addition

ADD destination, source adds source to destination: destination = destination + source

Flags affected: ZF, CF, SF, OF

ASM
mov rax, 10           ; RAX = 10
mov rbx, 20           ; RBX = 20
add rax, rbx           ; RAX = 30 (10 + 20)
add rax, 5             ; RAX = 35 (30 + 5)

📌 SUB - Subtraction

SUB destination, source subtracts source from destination: destination = destination - source

Flags affected: ZF, CF, SF, OF

ASM
mov rax, 50           ; RAX = 50
mov rbx, 30           ; RBX = 30
sub rax, rbx           ; RAX = 20 (50 - 30)

📌 CMP - Compare

CMP destination, source compares two values by subtracting source from destination and setting flags (but doesn't store the result).

What CMP does: Performs RAX - RBX, sets flags, discards the result

Why use CMP: To check if two values are equal, which is greater, etc.

  • If RAX == RBX: ZF = 1 (zero flag set)
  • If RAX != RBX: ZF = 0
  • If RAX < RBX: CF = 1 (carry/borrow)
  • If RAX > RBX: CF = 0
ASM
mov rax, 10
cmp rax, 10            ; Compare RAX with 10
je equal               ; Jump if Equal (checks ZF)
 
mov rax, 5
cmp rax, 10            ; 5 < 10
jl less_than           ; Jump if Less

📌 TEST - Logical AND

TEST destination, source performs a bitwise AND but only affects flags (doesn't store result).

Why: TEST rax, rax sets ZF=1 if RAX==0, ZF=0 if RAX!=0. This is faster than CMP rax, 0.

ASM
mov rax, 0
test rax, rax          ; ZF = 1 (result is zero)
jz zero_case           ; Jump if Zero

📌 JMP - Unconditional Jump

JMP label unconditionally jumps to a label (changes program flow).

What it does: Sets RIP to the target address, so the next instruction executed is at the jump target.

ASM
mov rax, 1
jmp skip              ; Skip next instruction
mov rax, 2             ; This is never executed
skip:
mov rbx, rax           ; RBX = 1 (not 2)

📌 Conditional Jumps (JE, JNE, JZ, JG, JL, etc.)

Conditional jumps jump only if certain flags are set. They check the result of a previous CMP or TEST instruction.

InstructionConditionJumps When
JE / JZJump if Equal / Jump if ZeroZF = 1 (result was zero)
JNE / JNZJump if Not Equal / Jump if Not ZeroZF = 0 (result was non-zero)
JGJump if GreaterDestination > Source (signed)
JLJump if LessDestination < Source (signed)
JGEJump if Greater or EqualDestination ≥ Source (signed)
JLEJump if Less or EqualDestination ≤ Source (signed)
JAJump if AboveDestination > Source (unsigned)
JBJump if BelowDestination < Source (unsigned)
JAE / JNCJump if Above or Equal / Jump if No CarryDestination ≥ Source (unsigned) / CF = 0
JBE / JCJump if Below or Equal / Jump if CarryDestination ≤ Source (unsigned) / CF = 1
JOJump if OverflowOF = 1 (overflow occurred)
JNOJump if No OverflowOF = 0
ASM
mov rax, 10
cmp rax, 20
jl less_than           ; Jump if Less (10 < 20) - WILL jump
mov rbx, 1
jmp done
 
less_than:
mov rbx, 0
done:

📌 CALL & RET - Function Calls

CALL label calls a function. RET returns from a function.

ASM
global _start
_start:
    call my_function       ; Call function
    ; After function returns, we continue here
    mov rax, 60
    mov rdi, 0
    syscall
 
my_function:
    ; Function code here
    mov rax, 1            ; Put return value in RAX
    ret                   ; Return to caller

📌 SYSCALL - System Call

SYSCALL transitions to kernel mode and executes a system call.

ASM
mov rax, 60           ; exit syscall number
mov rdi, 0            ; exit code = 0
syscall               ; Call kernel

Memory Operations - Loading & Storing

To access memory, use square brackets []:

ASM
mov rax, [rbx]         ; Load 8 bytes from address in RBX
mov [rax], 100         ; Store 100 at address in RAX
mov rcx, [rax + 8]     ; Load from address (RAX + 8)
mov [rbx - 16], rax    ; Store to address (RBX - 16)

Stack Operations - PUSH & POP

The stack is a Last-In-First-Out (LIFO) data structure. PUSH and POP manage it:

📌 PUSH - Push Value onto Stack

PUSH source decrements RSP and writes value to stack.

ASM
mov rax, 123
push rax               ; RSP decreases by 8, [RSP] = 123

📌 POP - Pop Value from Stack

POP destination reads value from stack and increments RSP.

ASM
pop rbx                ; RBX = [RSP], RSP increases by 8

Complete Assembly Program Example

Let's combine everything into a complete program:

ASM
section .text
global _start
 
_start:
    mov rax, 0             ; Counter = 0
 
loop:
    ; Print number (simplified)
    add rax, 1             ; Increment counter
    cmp rax, 10            ; Compare with 10
    jl loop                ; Jump if Less - repeat loop
 
    ; Exit
    mov rax, 60
    mov rdi, 0
    syscall

Assembly Code Structure

Every assembly program has a specific structure with defined sections for code and data. Understanding this structure is essential for writing and analyzing assembly.

The Three Main Sections

📌 .text Section - Executable Code

The .text section contains all executable code — the assembly instructions that the CPU actually runs.

📌 .data Section - Initialized Data

The .data section contains data with known values — strings, constants, arrays, etc.

ASM
section .data
    msg:    db "Hello World", 0x0a
    num:    dq 12345
    array:  dd 1, 2, 3, 4, 5
DirectiveSizeMeaning
db1 byteDefine Byte
dw2 bytesDefine Word
dd4 bytesDefine Double-word
dq8 bytesDefine Quad-word

📌 .bss Section - Uninitialized Data

The .bss section reserves space for variables without initial values (like buffers, arrays, etc.)

ASM
section .bss
    buffer:     resb 256    ; Reserve 256 bytes (uninitialized)
    array:      resq 10     ; Reserve space for 10 quad-words
  • resb: Reserve bytes
  • resw: Reserve words (2 bytes each)
  • resd: Reserve double-words (4 bytes each)
  • resq: Reserve quad-words (8 bytes each)

Complete Program Structure

ASM
; ============================================
; DATA SECTION - Initialized data
; ============================================
section .data
    msg:        db "Hello", 0x0a
    msg_len:    equ $ - msg
 
; ============================================
; BSS SECTION - Uninitialized data (buffers)
; ============================================
section .bss
    buffer:     resb 1024
 
; ============================================
; TEXT SECTION - Code (executable)
; ============================================
section .text
global _start
 
_start:
    ; Program entry point
    mov rax, 1             ; write syscall
    mov rdi, 1             ; fd = stdout
    mov rsi, msg           ; buffer = msg
    mov rdx, msg_len       ; count = length
    syscall
 
    ; Exit cleanly
    mov rax, 60
    mov rdi, 0
    syscall

Global Symbols & Labels

In assembly, you can define:

  • Labels: Mark positions in code (used for jumps)
  • Global symbols: Mark entry points that external code can jump to
ASM
global _start            ; _start is accessible from outside
 
section .text
 
_start:                 ; Program entry (global symbol)
    call my_function
 
    mov rax, 60
    syscall
 
my_function:            ; Local label (not global)
    mov rax, 1
    ret
 
loop_start:             ; Another label
    add rcx, 1
    cmp rcx, 10
    jl loop_start             ; Jump back to label

Symbol Definition with EQU

Use EQU to define constants:

ASM
section .data
    msg:        db "Hello World", 0x0a
    msg_len:    equ $ - msg    ; $ = current position, msg_len = length
 
section .text
global _start
BUFFER_SIZE equ 256
 
_start:
    sub rsp, BUFFER_SIZE   ; Allocate space on stack

Key insight: $ - msg calculates the distance between current position and msg, giving the string length.

Assembler, Compiler, Linker & ELF Format

To run an assembly program, you need to convert it from assembly language to machine code. This involves the assembler, linker, and understanding the ELF binary format.

The Compilation Pipeline

Step 1: Assembly → Machine Code (NASM)

NASM (Netwide Assembler) converts assembly source code to machine code object files.

ASM
nasm -f elf64 program.asm -o program.o

What happens:

  • NASM parses your .asm file
  • Converts each instruction to machine code bytes
  • Produces program.o (object file)

Step 2: Linking (LD)

The linker (ld) combines object files and resolves all symbol references to create the final executable.

ASM
ld program.o -o program

Complete Assembly Workflow Example

BASH
# Step 1: Create assembly file
cat > program.asm << 'EOF'
section .text
global _start
_start:
    mov rax, 60
    mov rdi, 0
    syscall
EOF
 
# Step 2: Assemble (ASM → Object)
nasm -f elf64 program.asm -o program.o
 
# Step 3: Link (Object → Executable)
ld program.o -o program
 
# Step 4: Run
./program
echo $?  ; Exit code: 0 (success)

Understanding ELF64 Format

ELF (Executable and Linkable Format) is the standard binary format for Linux. All executables, libraries, and object files use this format.

BASH
$ file program
program: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped

Meaning of each part:

TermMeaning
64-bit64-bit architecture (x86-64)
LSBLittle-Endian Byte Order (least significant byte first)
executableCan be directly run as a program
x86-64Intel x86-64 instruction set
statically linkedAll libraries compiled in (no external .so dependencies)
not strippedSymbol table intact (function names visible)

Viewing ELF Structure with readelf & objdump

📌 readelf - Display ELF Information

readelf displays detailed information about ELF files.

ASM
readelf -h program        ; Show ELF header
readelf -S program        ; Show sections
readelf -l program        ; Show program headers
readelf -s program        ; Show symbol table
readelf -d program        ; Show dynamic section

📌 objdump - Disassemble Binaries

objdump displays disassembled machine code and detailed binary information.

ASM
objdump -d program              ; Disassemble all code
objdump -M intel -d program      ; Disassemble in Intel syntax
objdump -s program               ; Show all sections (hex dump)
objdump -t program               ; Show symbol table
objdump -h program               ; Show section headers

Reverse Engineering with GDB

GDB (GNU Debugger) is the industry-standard debugger for Linux. It lets you execute programs step-by-step, inspect memory and registers, and understand exactly what code is doing.

GDB Installation

BASH
# Ubuntu/Debian
sudo apt-get install gdb
 
# Fedora/RHEL
sudo dnf install gdb
 
# macOS
brew install gdb

Starting GDB

BASH
gdb ./program
gdb -q ./program        ; Quiet mode (no banner)

Essential GDB Commands

📌 run - Execute the Program

run [args] starts the program with optional command-line arguments.

ASM
(gdb) run
(gdb) run arg1 arg2     ; Pass arguments
(gdb) run < input.txt   ; Redirect stdin

📌 break - Set Breakpoints

break location sets a breakpoint, pausing execution at that point.

ASM
(gdb) break main           ; Break at function main
(gdb) break 0x08048400  ; Break at address
(gdb) break *0x08048400 ; Break at address (safer syntax)
(gdb) info break        ; List all breakpoints
(gdb) delete 1          ; Delete breakpoint 1
(gdb) disable 1         ; Disable (don't remove) breakpoint 1
(gdb) enable 1          ; Re-enable breakpoint 1

📌 continue - Resume Execution

continue (or c) resumes execution until the next breakpoint.

ASM
(gdb) continue
(gdb) c                 ; Shorthand

📌 info functions - List Functions

info functions shows all functions in the binary.

ASM
(gdb) info functions
All defined functions:
 
File program.c:
void check_password(char*);
int main();
void hidden_function();

Why useful: Quickly find all functions in a binary, especially in stripped binaries.

📌 info registers - Show Register Values

info registers (or i r) displays current register values.

ASM
(gdb) info registers
rax            0x1                 1
rbx            0x0                 0
rcx            0x7ffffffde7f8      140737488281592
rdx            0x7ffffffde8f8      140737488282360
rsi            0x7ffffffde8e8      140737488282344
rdi            0x1                 1
rbp            0x7ffffffde820      0x7ffffffde820
rsp            0x7ffffffde800      0x7ffffffde800
rip            0x401000            0x401000 <_start>
ASM
(gdb) print $rax         ; Print RAX in decimal
(gdb) print/x $rax    ; Print RAX in hex
(gdb) print/d $rax    ; Print RAX in decimal
(gdb) print/s $rsi    ; Print RSI as string

📌 disassemble - Show Assembly Code

disassemble function shows assembly code for a function.

ASM
(gdb) disassemble main    ; Disassemble main function
(gdb) disassemble      ; Disassemble current function
(gdb) disassemble 0x401000 0x401050  ; Range of addresses

📌 x - Examine Memory

x [/format] address displays memory at an address.

ASM
(gdb) x/10x $rsp         ; View 10 hex values at RSP
(gdb) x/10w $rbp         ; View 10 words (4 bytes) at RBP
(gdb) x/20b $rax         ; View 20 bytes at RAX
(gdb) x/s $rsi           ; View string at RSI
(gdb) x/i $rip           ; View instruction at RIP
FormatDisplay As
xHex
dDecimal (signed)
uUnsigned decimal
sString
iInstruction
cCharacter
oOctal

📌 si/ni - Step Instructions

si (step into) executes one instruction, stepping into function calls.

ni (next instruction) executes one instruction, stepping over function calls.

ASM
(gdb) si                  ; Step into next instruction
(gdb) si 5              ; Step 5 times
(gdb) ni                ; Next instruction (over calls)
(gdb) step              ; Source-level step into
(gdb) next              ; Source-level next instruction

📌 set $register = value - Modify Registers

set $register = value modifies register values during debugging.

ASM
(gdb) set $rax = 100      ; Set RAX to 100
(gdb) set $rdi = 0     ; Set RDI to 0
(gdb) info registers   ; Verify changes

Why useful: Bypass password checks, change comparison results, test alternative code paths.

Complete GDB Debugging Walkthrough

Let's debug a real binary step-by-step:

ASM
(gdb) gdb ./crackme
(gdb) set disassembly-flavor intel   ; Use Intel syntax
(gdb) info functions                 ; List all functions
(gdb) break main                     ; Set breakpoint at main
(gdb) run secret123                  ; Run with password argument
Breakpoint 1 at 0x0010149a
 
(gdb) disassemble main               ; View main function code
(gdb) si                             ; Step into first instruction
(gdb) info registers                 ; Check all registers
(gdb) x/s $rdi                       ; View command-line arg (1st arg in RDI)
(gdb) continue                       ; Run to next breakpoint
(gdb) quit                           ; Exit GDB

pwndbg - Enhanced GDB

pwndbg is an awesome GDB plugin that adds powerful reverse engineering features.

ASM
git clone https://github.com/pwndbg/pwndbg
cd pwndbg
./setup.sh

pwndbg enhancements:

  • Better disassembly display (syntax highlighting)
  • Visual stack and register display
  • Memory map view
  • Additional commands: nearpc, telescope, vmmap

Radare2 - Advanced Binary Analysis

Radare2 is a powerful, open-source framework for reverse engineering and analyzing binaries. It combines static analysis, dynamic analysis, and visualization in one tool.

Radare2 Installation

BASH
# Linux
git clone https://github.com/radareorg/radare2
cd radare2
sys/install.sh
 
# Or via package manager
sudo apt-get install radare2

Launching Radare2

BASH
r2 ./binary           ; Open binary for analysis
r2 -w ./binary           ; Write mode (can modify binary)

Essential Radare2 Commands

📌 aaa - Analyze All

aaa performs full analysis on the binary — finds functions, data, and creates control flow graphs.

ASM
[0x08048400]> aaa    ; Full analysis
[0x08048400]> afl     ; List all functions (use after aaa)

📌 afl - List Functions

afl lists all discovered functions with addresses.

ASM
[0x08048400]> afl
0x08048400  1  42   entry0
0x08048432  1  37   sym.main
0x08048460  1  52   sym.check_password
0x08048495  1  25   sym.print_success

📌 pdf - Print Disassembly of Function

pdf @address prints disassembled function at address.

ASM
[0x08048400]> pdf @ sym.main    ; Disassemble main
[0x08048400]> pdf                           ; Disassemble current function

📌 db - Debug Mode

db enters debug mode to execute and trace the binary.

ASM
[0x08048400]> db main           ; Set breakpoint at main
[0x08048400]> dc               ; Continue execution
[0x08048400]> dr               ; Show registers
[0x08048400]> ds               ; Step instruction

📌 dc - Continue Execution

dc continues binary execution until breakpoint.

📌 V - Visual Mode

V opens visual/interactive mode with graphical display.

ASM
[0x08048400]> V           ; Enter visual mode
                        ; Inside visual mode:
p                       ; Change view mode
j/k                     ; Move down/up
q                       ; Quit visual mode

📌 VV - Graph Mode

VV shows control flow graph in visual mode.

ASM
[0x08048400]> VV          ; Enter graph mode
j/k                     ; Navigate blocks
Enter                   ; Follow jump
Esc                     ; Go back
q                       ; Exit

📌 iz - Show Strings

iz lists all strings found in the binary.

ASM
[0x08048400]> iz
Strings
0x08049f00 11 Wrong password
0x08049f0c 10 Access granted
0x08049f17 15 Enter password:

Complete Radare2 Workflow

BASH
$ r2 ./crackme
[0x08048400]> aaa                ; Analyze everything
[0x08048400]> afl                ; List functions
[0x08048400]> iz                 ; Show strings
[0x08048400]> pdf @ sym.main     ; View main function
[0x08048400]> V                  ; Visual mode to explore

Static Analysis - Professional Tools

Static analysis means examining a binary without running it. You analyze code structure, disassembly, and data flow to understand what a program does. Professional tools like Ghidra, IDA Pro, and Binary Ninja dominate this space.

Professional Static Analysis Tools

📌 Ghidra - NSA Open-Source Reverse Engineering

Ghidra is the free, open-source reverse engineering tool from the NSA. It's powerful enough to compete with commercial tools.

BASH
# Launch Ghidra
ghidraRun
 
# Then:
1. File New Project
2. Import Select binary
3. Double-click binary to open
4. Let it analyze (auto analyze runs)
5. View Functions to see all functions
6. Double-click function to decompile

Decompilation window shows:

  • Left: Function list
  • Center: Decompiled C-like code
  • Right: Assembly code
  • Bottom: Comments and cross-references

📌 IDA Pro - Industry Standard

IDA Pro is the gold standard in reverse engineering. Used by security researchers worldwide.

📌 Binary Ninja - Modern Alternative

Binary Ninja is a modern reverse engineering platform with excellent Python API and collaborative features.

Command-Line Static Analysis Tools

Essential tools for quick binary inspection and analysis:

📌 file - Identify File Type

file determines file type by examining magic bytes and file structure.

ASM
file ./program                    ; Identify binary type
file -i ./program                 ; Show MIME type
file -b ./program                 ; Brief mode (no filename)
file * | grep ELF                 ; Find all ELF files in directory

Example output:

ASM
program: ELF 64-bit LSB executable, x86-64, dynamically linked, stripped

When to use: First step in binary analysis to understand architecture, linking type, and whether symbols are present.

📌 strings - Extract Printable Strings

strings extracts human-readable text from binary files, useful for finding hardcoded credentials, URLs, error messages, and function names.

ASM
strings ./program                    ; Extract ASCII strings (default min length: 4)
strings -n 10 ./program             ; Minimum string length 10
strings -a -t x ./program           ; Show all strings with hex offset
strings -e l ./program              ; Unicode strings (little-endian)
strings ./program | grep -i password ; Search for specific strings
strings ./program | grep "^/"       ; Find file paths

When to use: First step in binary analysis to quickly identify interesting text, function names, library paths, or hardcoded secrets.

Pro tip: Combine with grep to search for URLs, IP addresses, API keys, or specific keywords.

📌 hexedit / hexdump / xxd - Hex Editors & Viewers

Hex editors allow viewing and modifying binary files at the byte level.

BASH
# hexdump - View hex representation
hexdump -C ./program | head        ; Canonical hex+ASCII view
hexdump -C ./program | grep "ELF"  ; Find ELF magic bytes
 
# xxd - Hex dump tool
xxd ./program | head               ; Hex dump with ASCII
xxd -l 100 ./program               ; First 100 bytes only
 
# hexedit - Terminal hex editor (interactive)
hexedit ./program                  ; Edit binary files

When to use: Examine file headers, find magic bytes, analyze packed/obfuscated binaries, or patch binaries directly.

📌 nm - List Symbols

nm lists symbols from object files and libraries. Fails gracefully on stripped binaries.

ASM
nm ./program                    ; List all symbols
nm -D ./program                 ; List dynamic symbols only
nm -g ./program                 ; List external symbols
nm -C ./program                 ; Demangle C++ symbols
nm -A *.o                       ; List symbols from all object files

Symbol types:

  • T: Text section (code)
  • D: Initialized data
  • B: Uninitialized data (BSS)
  • U: Undefined (external reference)

When to use: Check if binary is stripped, identify imported/exported functions, or find specific symbols.

📌 ldd - Print Shared Library Dependencies

ldd prints shared libraries required by a dynamically linked binary.

ASM
ldd ./program                   ; Show library dependencies
ldd -v ./program                ; Verbose (version information)
ldd -r ./program                ; Report missing symbols

Example output:

ASM
linux-vdso.so.1 => (0x00007fff...)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f...)
/lib64/ld-linux-x86-64.so.2 (0x00007f...)

When to use: Understand binary dependencies, troubleshoot missing libraries, or identify which libc version is required.

📌 readelf - Display ELF File Information

readelf displays detailed information about ELF files (covered in assembler section, but worth repeating here).

ASM
readelf -h ./program           ; Show ELF header
readelf -S ./program           ; Show section headers
readelf -l ./program           ; Show program headers (segments)
readelf -s ./program           ; Show symbol table
readelf -d ./program           ; Show dynamic section
readelf -r ./program           ; Show relocations
readelf -n ./program           ; Show notes (build ID, etc.)

When to use: Deep dive into ELF structure, find entry points, analyze security features (NX, PIE, RELRO), or debug linking issues.

📌 objdump - Object File Dumper & Disassembler

objdump is GNU's swiss-army knife for binary analysis and disassembly.

ASM
objdump -d ./program              ; Disassemble executable sections
objdump -M intel -d ./program      ; Disassemble in Intel syntax
objdump -D ./program               ; Disassemble ALL sections
objdump -s ./program               ; Full hex dump of all sections
objdump -t ./program               ; Symbol table
objdump -T ./program               ; Dynamic symbol table
objdump -h ./program               ; Section headers
objdump -p ./program               ; Program headers
objdump -R ./program               ; Dynamic relocations

When to use: Quick disassembly, examine specific sections, or verify compiler output.

📌 radare2 / r2 - Reverse Engineering Framework

radare2 is covered in detail in its own section, but deserves mention here as a powerful command-line static analysis tool.

BASH
r2 -A ./program            ; Auto-analyze on load
r2 -c "aaa; pdf @ main" ./program  ; Analyze and disassemble main

See the Radare2 section for comprehensive commands and usage.

📌 checksec - Check Binary Security Properties

checksec checks security features enabled in a binary (RELRO, Stack Canary, NX, PIE, RPATH, RUNPATH).

BASH
# Install checksec
sudo apt-get install checksec   ; Debian/Ubuntu
wget https://github.com/slimm609/checksec.sh/raw/master/checksec && chmod +x checksec
 
# Check single binary
checksec --file=./program
 
# Check all binaries in directory
checksec --dir=/bin

Security features explained:

  • RELRO (Relocation Read-Only): Makes GOT read-only after relocation
  • Stack Canary: Detects buffer overflows
  • NX (No Execute): Marks stack/heap non-executable
  • PIE (Position Independent Executable): Enables ASLR

When to use: Assess exploit difficulty, verify compiler flags, or check if binary was compiled with security hardening.

📌 binwalk - Firmware Analysis Tool

binwalk analyzes, extracts, and reverse engineers firmware images and embedded files.

ASM
binwalk firmware.bin           ; Scan for embedded files/filesystems
binwalk -e firmware.bin         ; Extract embedded files
binwalk -E firmware.bin         ; Entropy analysis (detect encryption/compression)
binwalk -A firmware.bin         ; Scan for executable code

When to use: Analyze firmware images, extract embedded file systems (squashfs, cramfs), or identify packed/encrypted sections.

📌 exiftool - Extract Metadata

exiftool reads and writes metadata in files. Useful for forensics and identifying compilation details.

ASM
exiftool ./program             ; Extract all metadata
exiftool -time:all ./program    ; Show timestamps
exiftool -Binary ./program      ; Show binary-specific metadata

When to use: Find compilation timestamps, compiler versions, or embedded metadata that may reveal development environment.

📌 ltrace - Library Call Tracer (Static Context)

ltrace is primarily for dynamic analysis (covered in Dynamic Analysis section), but can reveal which library functions a binary uses.

See Dynamic Analysis section for comprehensive ltrace usage.

📌 strace - System Call Tracer (Static Context)

strace is primarily for dynamic analysis (covered in Dynamic Analysis section), but understanding syscall usage is part of static analysis.

See Dynamic Analysis section for comprehensive strace usage.

Advanced Static Analysis Tools

📌 Cutter - GUI for Radare2

Cutter provides a modern Qt-based GUI for radare2 with decompilation support.

BASH
# Download from https://cutter.re
sudo apt-get install cutter    ; Ubuntu 20.04+

Features: Graphical control flow, decompiler (Ghidra plugin), hex editor, debugger integration.

When to use: Modern alternative to IDA/Ghidra for free, visual binary analysis with radare2 backend.

📌 Hopper - macOS/Linux Disassembler

Hopper is a commercial reverse engineering tool for macOS and Linux.

Price: ~$100 (personal), cheaper than IDA Pro.

Features: Disassembler, pseudo-code decompiler, Python scripting, x86/ARM/MIPS support.

When to use: Professional alternative to IDA at lower cost, especially on macOS.

String Analysis & Pattern Matching

Beyond basic string extraction, pattern analysis helps identify functionality:

ASM
strings ./binary | grep -E "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"  ; Find emails
strings ./binary | grep -E "https?://[^\s]+"  ; Find URLs
strings ./binary | grep -E "^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$"  ; Find IP addresses
strings ./binary | grep -i "key\|password\|secret\|token\|api"  ; Find credentials

Why strings are useful:

  • Often reveal hardcoded passwords or API keys
  • Show error messages that hint at program logic
  • Identify libraries and functions
  • Quickly find interesting areas to analyze
  • Discover hidden features or debug messages
  • Identify encryption algorithms by string constants

Control Flow Analysis

Understanding how code branches and jumps helps identify:

  • Conditional logic: If/else patterns in assembly
  • Function calls: External dependencies
  • Dead code: Unreachable branches

All professional tools (Ghidra, IDA, Binary Ninja) show control flow graphs that visualize this.

Dynamic Analysis - Runtime Behavior

Dynamic analysis means running the binary in a controlled environment while monitoring its behavior. Watch system calls, library calls, memory modifications, and network traffic to understand what code actually does.

System Call Tracing with strace

strace intercepts and logs all system calls made by a process.

ASM
strace ./program                   ; Trace all syscalls
strace -e trace=open,read ./program ; Trace specific syscalls
strace -o trace.txt ./program       ; Save to file
strace -c ./program                 ; Summary (count syscalls)
strace -p 1234                      ; Attach to running process

What strace reveals:

  • Files being read/written
  • Network connections (socket, connect syscalls)
  • Environment variables being read
  • Memory mappings
  • Signal handling

Library Call Tracing with ltrace

ltrace traces library function calls (libc, libcrypto, etc.).

ASM
ltrace ./program                    ; Trace library calls
ltrace -c ./program                 ; Summary (count function calls)
ltrace -o trace.txt ./program       ; Save to file
ltrace -e strcmp ./program          ; Trace specific functions

Useful library functions to trace:

  • strcmp: String comparison (password checks)
  • malloc/free: Memory allocation
  • printf: Output (what's being printed)
  • getenv: Environment variable access

Combined strace + ltrace

Use together for complete picture:

ASM
strace -f ltrace ./program         ; Both (slower)
strace -e trace=file ./program      ; Focus on file operations

Advanced Dynamic Analysis - Frida

Frida is a powerful instrumentation framework. Inject code into running processes to hook functions and modify behavior in real-time.

BASH
# Install
pip install frida frida-tools
 
# List processes
frida-ps
 
# Attach to process
frida -p 1234
 
# Spawn and trace
frida -n ./program

Frida capabilities:

  • Hook any function (intercept and modify behavior)
  • Read/write process memory
  • Dump arguments and return values
  • Modify program flow in real-time
  • Works on binaries you don't have source for

Analyzing Stripped Binaries

A stripped binary has all debug symbols removed — function names, variable names, and type information are gone. This makes reverse engineering harder but not impossible.

Identifying Stripped Binaries

ASM
file ./program
Output examples:
not stripped - has symbols
stripped - symbols removed
 
file -i ./program               ; MIME type info
readelf -S ./program            ; Show sections
nm ./program                    ; Empty if stripped
objdump -t ./program            ; Symbol table

Techniques for Stripped Binaries

📌 Function Identification via Signatures

Even without names, you can identify common library functions by their machine code patterns.

How it works: Compiler generates same code patterns for common functions (strlen, malloc, etc.). Tools match these patterns and identify functions automatically.

ASM
In Ghidra:
1. Window → Function ID
2. Load database → Select standard library
3. Search → Auto-identify known functions
Many libc functions automatically named

📌 Heuristic Analysis - Entry Points

Without symbols, look for patterns that reveal function boundaries:

  • Function prologue: push rbp; mov rbp, rsp (function start)
  • Function epilogue: pop rbp; ret (function end)
  • Call patterns: call followed by function prologue = new function
  • Loops: Backwards jumps to earlier code
  • Data references: Addresses that reference strings or constants

📌 Cross-referencing & String Analysis

Strings often identify function purposes:

BASH
# Step 1: Extract strings
strings ./program | grep -i error
 
# Step 2: Find where strings are referenced
In Ghidra: Search For Strings...
Double-click string Shows code that uses it
 
# Step 3: Identify surrounding function
Look at prologue/epilogue to find function bounds
Analyze logic based on string context

📌 Machine Learning-Based Symbol Recovery

Modern research uses LLMs to recover function names from stripped binaries.

How it works: Train ML models on decompiled code patterns. Given stripped binary, model predicts likely function names and variable types.

Dynamic Analysis of Stripped Binaries

Use runtime tracing to understand behavior without symbols:

BASH
# Trace syscalls to understand behavior
strace -o syscalls.txt ./program
 
# Trace library calls
ltrace -o libcalls.txt ./program
 
# Use GDB to set breakpoints and inspect registers
gdb ./program
(gdb) break *0x401000
(gdb) run
(gdb) info registers   ; See actual values

Practical Example - Analyzing Stripped Binary

BASH
# 1. Identify if stripped
$ file ./crackme
crackme: ELF 64-bit, stripped
 
# 2. Extract strings - look for clues
$ strings ./crackme | grep -i password
Incorrect password
Access granted
 
# 3. Open in Ghidra
- Window Function ID Load standard library
- Many stdlib functions now identified
- Search For Strings Find "password" references
- Double-click string to see code using it
 
# 4. Analyze the function using string as anchor
- Look at function prologue/epilogue
- Identify comparisons and jumps
- Look for password check logic
 
# 5. Use dynamic analysis if stuck
$ ltrace ./crackme
strcmp("myinput", "secretpass") = -37
puts("Incorrect password") = 19
Now you know the password!

Binary Patching - Code Modification

Binary patching means modifying a binary's machine code to change its behavior. Used to bypass password checks, remove license verification, or modify logic flow.

Why Patch Binaries?

  • Bypass authentication/license checks
  • Change program behavior for analysis
  • Create custom versions without source
  • Remove anti-debugging code
  • Test vulnerability fixes

Three Patching Approaches

📌 Method 1: Hex Editor - Direct Modification

Most direct method: Use hex editor to change machine code bytes.

BASH
# Step 1: Find the instruction to patch in IDA/Ghidra
cmp eax, 0x12345
jne fail            ; This is at offset 0x1234
 
# Step 2: Convert jne to NOP (0x90)
jne opcode = EB 05 (jump) 
NOP opcode = 90
 
# Step 3: Open hex editor, go to offset 0x1234
Replace: EB 05 90 90 (2 NOPs to fill space)
 
# Step 4: Save and test
./patched_binary
ASM
Key instruction to know:
                    NOP (0x90): No operation - does nothing, safe filler
                    Replace conditional jumps with NOPs to bypass checks

📌 Method 2: IDA/Ghidra Built-in Patching

Both IDA and Ghidra have native patching capabilities.

BASH
# In IDA hex view:
1. Right-click on byte
2. Select "Edit"
3. Type new hex values
4. Right-click "Apply changes"
 
# Save patched binary:
File  Produce file Create DIF file (diff/patch file)
ASM
Ghidra Patching
                        # In Ghidra disassembly view:
1. Window → Hex
2. Right-click byte → Edit (pencil icon)
3. Type replacement values
4. File → Export Program → Binary
 
# Now you have a modified binary
ASM
📌 Method 3: Assembly Modification + Reassemble

                
                
                    For more complex changes, write assembly, assemble it, patch in.
                    
                        Advanced Patching - Replace Function
                        # Step 1: Identify function to replace (offset 0x401000, 50 bytes)
 
# Step 2: Write replacement assembly
mov rax, 1      ; Return 1 (success)
ret
 
# Step 3: Assemble it
nasm -f bin replacement.asm -o replacement.bin
hexdump -C replacement.bin
Output: 48 c7 c0 01 00 00 00 c3 (8 bytes)
 
# Step 4: Pad with NOPs to match original size (50 bytes)
Need 50 bytes total, have 8, so add 42 NOPs (0x90)
 
# Step 5: Patch hex in original binary at offset 0x401000
hex editor: Go to 0x401000, replace with new bytes
                    
                
            
 
            Real-World Patching Example
 
            
                Complete patching workflow - Bypass password
                # Binary: crackme - asks for password
$ ./crackme
Enter password: test
Incorrect!
 
# Step 1: Open in Ghidra, find password check
0x401234: mov rax, [rip + 0x2dc6]  ; Load input
0x40123b: mov rbx, [rip + 0x2dc5]  ; Load expected password
0x401242: cmp rax, rbx              ; Compare
0x401245: jne 0x401260              ; Jump to fail if not equal
0x401247: call print_success        ; Otherwise print success
 
# Step 2: We want to skip the jne (jump to fail)
# Option A: Replace jne with NOPs
jne opcode at 0x401245: 75 19 (2 bytes)
Replace with: 90 90 (2 NOPs)
 
# Step 3: Use hex editor to patch
Go to file offset 0x401245
Find bytes: 75 19
Replace with: 90 90
Save file
 
# Step 4: Test
$ ./crackme_patched
Enter password: anything
Success!
# Password check bypassed! Any input works now
            
 
            Common Patching Targets
 
            
                
                    
                        What to Patch
                        Pattern
                        Replacement
                    
                    
                        Password check
                        cmp; jne failure
                        Replace jne with NOPs
                    
                    
                        License validation
                        call validate_license; jne fail
                        NOP out the jne
                    
                    
                        Anti-debug
                        call is_debugged; jne exit
                        Make function return 0
                    
                    
                        Trial expiration
                        cmp rax, expiration_date
                        Change expiration_date value
                    
                    
                        Error message
                        lea rdi, [rip + error_str]
                        Change string pointer/content
                    
                
            
 
            
                ✓ Binary Patching Mastered!
                You can modify binaries to change behavior, bypass checks, and test modifications.

Anti-Reversing Techniques & Bypasses

Software developers implement anti-reversing techniques to protect intellectual property and prevent cracking. Understanding these techniques helps you bypass them and analyze protected binaries.

Common Anti-Reversing Techniques

📌 Anti-Debugging - Detect Debuggers

Anti-debug code detects if a debugger is attached and terminates or behaves differently.

C
int main() {
    if (ptrace(PTRACE_TRACEME, 0, 1, 0) == -1) {
        printf("Debugger detected! Exiting.\n");
        exit(1);
    }
    // Program continues if not debugged
}

📌 Code Obfuscation - Hide Logic

Code obfuscation makes code hard to understand without changing functionality.

ASM
ORIGINAL:
if (x > 10)
    print("big")
else
    print("small")
 
OBFUSCATED:
a = random()
if (a == 1)
    if (x > 10) print("big")
else if (a == 2)
    if (x <= 10) print("small")
else if (a == 3) ...
Same logic, much harder to follow!

📌 Packing & Compression - Hide Code

ASM
Packers compress/encrypt the entire binary. Only decompressed in memory at runtime.
                    
                        Popular Packers
                        UPX: Open-source, compresses binaries
                        Themida: Commercial, strong obfuscation + packing
                        Code Virtualizer: Turns native code into VM bytecode
ASM
UPX Example
                        # Pack a binary
upx -9 ./program -o program.packed
 
# Detect if packed
file ./program.packed
Output: packed with UPX
 
# Unpack (if UPX)
upx -d ./program.packed
 
# If custom packer, must unpack manually:
1. Run in GDB
2. Find OEP (Original Entry Point)
3. Dump memory region
4. Analyze dumped binary

📌 Anti-Tampering - Detect Modifications

Anti-tampering detects if binary or memory has been modified.

ASLR - Address Space Layout Randomization

ASLR randomizes memory addresses each run. Makes exploitation and analysis harder.

BASH
# Check ASLR status
cat /proc/sys/kernel/randomize_va_space
0 = disabled, 1 = conservative, 2 = full
 
# Disable ASLR (requires root)
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
 
# Or run single binary without ASLR
setarch $(uname -m) -R ./program
 
# In GDB
(gdb) set disable-randomization on

Stack Canaries

Stack canaries detect buffer overflows by placing magic value before return address.

ASM
checksec ./program
Output shows: Canary found = yes/no
 
readelf -x .note.gnu.property ./program
Look for 0x1 bit in CF_PROTECTION_BRANCH

DEP/NX - Data Execution Prevention

DEP/NX marks data pages as non-executable. Prevents shellcode execution.

ASM
checksec ./program
Output shows: NX enabled/disabled
 
readelf -l ./program | grep GNU_STACK
RWX = no NX protection, RW = NX enabled

angr - Automated Symbolic Execution

angr is a powerful binary analysis framework that uses symbolic execution to find inputs that reach specific code paths. Instead of manually analyzing, angr explores all possible paths and solves constraints.

What is Symbolic Execution?

Instead of concrete values, variables are treated as symbolic — representing all possible values. Branches create constraints.

Installation & Setup

ASM
Install angr
                pip install angr
pip install angr[all]  ; Install with optional dependencies

Basic angr Workflow

ASM
Simple angr Script - Crack Password
                import angr
 
# Load binary
project = angr.Project("./crackme")
 
# Create symbolic variable for input (stdin)
initial_state = project.factory.entry_state(
    stdin=angr.SimFile(content_size=16)  ; 16-byte input
)
 
# Create simulation manager
simgr = project.factory.simgr(initial_state)
 
# Address of success message
success_addr = 0x401234
failure_addr = 0x401256
 
# Explore until we find success or hit failure
simgr.explore(
    find=success_addr,
    avoid=failure_addr
)
 
# Get the solution
if simgr.found:
    solution_state = simgr.found[0]
    solution = solution_state.posix.dumps(0)  ; 0 = stdin
    print(f"Password found: {solution.decode()}")
else:
    print("No solution found")

Key angr Concepts

📌 State - Program Snapshot

State represents a point in program execution - registers, memory, constraints.

ASM
Working with States
                        state = project.factory.entry_state()
 
# Access registers
print(state.regs.rax)
 
# Read memory
data = state.memory.load(address, size)
 
# Symbolic variable
sym_input = angr.BVS('input', 64)  ; 64-bit symbolic input

📌 SimulationManager - Explore States

SimulationManager (simgr) manages multiple execution states simultaneously.

ASM
SimulationManager Usage
                        simgr = project.factory.simgr(initial_state)
 
# Explore automatically
simgr.explore(find=success_address)
 
# Manual stepping
simgr.step()
 
# Check state categories
print(simgr.active)      ; Active (continuing)
print(simgr.found)       ; Found target address
print(simgr.avoided)     ; Hit avoided address
print(simgr.deadended)   ; Dead ends (no more branches)

📌 Constraint Solving with Z3

angr uses Z3 solver to solve constraints and find satisfying values.

ASM
Solve Constraints
                        # Get concrete values from symbolic state
solution = state.solver.eval(sym_variable)  ; Get one solution
all_solutions = state.solver.eval_all(sym_variable)  ; Get all possible

Real-World Example - CTF Challenge

PYTHON
import angr
import claripy
 
# Load the binary
binary_path = "./crackme"
project = angr.Project(binary_path, auto_load_libs=False)
 
# Create initial state (execution starts at main)
main_address = 0x401234  ; Address of main()
state = project.factory.blank_state(addr=main_address)
 
# Create symbolic argv[1] (16 bytes)
password = claripy.BVS('password', 128)  ; 16 bytes * 8 bits
 
# Simulate program with symbolic input in argv[1]
# (assumes binary reads argv[1] as password)
 
# Create simulation manager
simgr = project.factory.simgr(state)
 
# Explore - find "Correct!" message at 0x401300
; avoid "Incorrect!" at 0x401350
simgr.explore(find=0x401300, avoid=[0x401350])
 
# Check results
if simgr.found:
    solution_state = simgr.found[0]
    password_value = solution_state.solver.eval(password, cast_to=bytes)
    print(f"[+] Password found: {password_value}")
else:
    print("[-] No solution found")
    if simgr.avoided:
        print(f"[!] Hit avoided addresses: {simgr.avoided}")

Advanced Techniques

📌 Function Hooking - Speed Up Analysis

Hook slow functions to avoid symbolic execution overhead.

ASM
Hooking Example
                        # Hook strlen to avoid complex simulation
def hook_strlen(state):
    s = state.memory.load(state.regs.rdi, 256)
    length = claripy.Length(s)
    state.regs.rax = length
 
project.hook(0x401000, hook_strlen)  ; Hook at function address

📌 Taint Analysis - Track Data Flow

Track how user input flows through program to find sensitive operations.

ASM
Taint Input
                        # Mark input as tainted
state.memory.taint(input_addr, input_size)
 
# Later: check if value is tainted
if state.memory.is_tainted(rax):
    print("RAX contains tainted data (user input)")

When angr Excels vs Struggles

Best ForStruggles With
Finding password/key (simple comparison)Complex floating-point math
Reaching specific code pathCryptographic operations (very slow)
Constraint solving (small inputs)Large state spaces (too many branches)
CTF challenges (designed for automation)Real-world complex binaries
// END OF INTERCEPT //