student@ubuntu:~$
ctf Lesson 37 30 min read

NCL: Binary Analysis & Reverse Engineering

strings, objdump, ltrace, strace, and ELF format

Binary Analysis & Reverse Engineering

When you compile a C program with gcc, the output is a binary — a file of machine instructions the CPU executes directly. The source code is gone. All that remains is bytes.

Binary analysis is the process of figuring out what a binary does without having the source code. This is how security researchers analyze malware, how CTF players solve reverse engineering challenges, and how forensic investigators examine suspicious programs found on compromised systems.

This page covers the core command-line tools for binary analysis on Linux. Every tool here ships with a standard Linux install or is one apt install away. No expensive commercial software, no GUI — just your terminal.

Prerequisites: You should understand compilation from Lesson 2.4 and have written programs with arrays/pointers from Lesson 2.10.


1. The ELF Format — What’s Inside a Binary

Linux executables use the ELF (Executable and Linkable Format) format. When gcc compiles your code, it produces an ELF binary. Every ELF file is divided into sections, each holding a different kind of data:

Section Contains Purpose
.text Machine code The actual instructions the CPU runs
.data Initialized globals Global variables with initial values
.bss Uninitialized globals Global variables zeroed at startup
.rodata Read-only data String literals, constants
.symtab Symbol table Function and variable names (if not stripped)
.dynamic Dynamic linking info Shared library references

When you write printf("hello") in C, the string "hello" goes into .rodata and the compiled printf call instruction goes into .text. The printf function itself lives in a shared library (libc.so), not in your binary — .dynamic records that dependency.

The ELF header sits at the very beginning of the file. Its magic bytes are 7F 45 4C 46 (ASCII: .ELF). If you see those bytes with xxd, you are looking at a Linux executable.


2. strings — The First Tool You Reach For

strings extracts every sequence of printable ASCII characters (4+ characters by default) from any file. This is always step one in binary analysis because it requires zero expertise and catches an astonishing amount:

strings binary                        # Dump all readable text
strings binary | grep -i flag         # Search for flags
strings binary | grep -i http         # Find URLs
strings binary | grep -i password     # Find credential strings
strings -n 8 binary                   # Minimum 8 chars (cuts noise)

Why does this work? Compilers embed string literals directly into the binary. When a programmer writes:

if (strcmp(input, "s3cr3t_p4ss") == 0) {
    printf("Access granted\n");
}

Both "s3cr3t_p4ss" and "Access granted\n" end up as plaintext in the .rodata section. strings finds them instantly. No disassembly needed.

In CTF challenges, flags are frequently embedded as string literals. In malware analysis, strings reveals command-and-control URLs, encryption keys left in debug builds, and error messages that reveal program logic.

Checkpoint: You run strings on a binary and get 3,000 lines of output. Most of it is library junk. How do you narrow it down?
  1. strings binary | grep -i "flag\|key\|secret\|pass" — search for keywords
  2. strings binary | grep -E '^.{20,}$' — show only long strings (interesting ones are usually longer)
  3. strings binary | sort -u | less — deduplicate and browse
  4. strings -t x binary | grep flag — show the hex offset of each match (useful for locating it in the binary later)

3. file and readelf — Identify What You’re Looking At

Before analysis, know what you have. The file command reads magic bytes and reports the binary’s type:

$ file mystery
mystery: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
         dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
         for GNU/Linux 3.2.0, not stripped

That single line tells you:

  • ELF 64-bit — Linux executable, 64-bit architecture
  • LSB — Little-endian byte order
  • x86-64 — Intel/AMD 64-bit instruction set
  • dynamically linked — Uses shared libraries (libc, etc.)
  • not stripped — Symbol table is intact (function names are present)

If it says stripped, the developer removed the symbol table. Function names are gone, which makes analysis harder — but not impossible.

readelf digs deeper into the ELF structure:

readelf -h mystery    # ELF header: entry point, architecture, type
readelf -S mystery    # Section headers (list all sections and their sizes)
readelf -s mystery    # Symbol table (all function and variable names)
readelf -d mystery    # Dynamic section (shared library dependencies)
$ readelf -S mystery | head -15
There are 29 section headers, starting at offset 0x1960:

Section Headers:
  [Nr] Name              Type             Address           Offset    Size
  [ 1] .interp           PROGBITS         0000000000000318  00000318  0000001c
  [ 6] .text             PROGBITS         0000000000001060  00001060  00000195
  [15] .rodata           PROGBITS         0000000000002000  00002000  00000028
  [23] .data             PROGBITS         0000000000004000  00003000  00000010

The .text section at offset 0x1060 is where the executable code lives. The .rodata section at 0x2000 holds string literals. These offsets matter when you start reading disassembly.

Checkpoint: file reports "ELF 32-bit MSB executable, MIPS". Can you run it on your x86 laptop?

Not directly. It is compiled for the MIPS architecture (a different CPU instruction set, common in routers and embedded devices). You would need a MIPS emulator like qemu-mips to run it, or you can still analyze it statically with strings, readelf, and objdump -m mips -d binary.


4. objdump — Disassembly

objdump -d binary converts machine code back into assembly language — a human-readable representation of CPU instructions. You do not need to be an assembly expert. Focus on recognizing patterns.

objdump -d mystery | less         # Full disassembly
objdump -d mystery | grep -A 20 '<main>'  # Just main()

Here is what disassembled main looks like for a simple program that calls printf("Hello %s\n", name):

0000000000001149 <main>:
    1149:  55                   push   %rbp
    114a:  48 89 e5             mov    %rsp,%rbp
    114d:  48 83 ec 10          sub    $0x10,%rsp
    1151:  48 8d 05 ac 0e 00 00 lea    0xeac(%rip),%rax    # 2004 <_IO_stdin_used+0x4>
    1158:  48 89 45 f8          mov    %rax,-0x8(%rbp)
    115c:  48 8b 45 f8          mov    -0x8(%rbp),%rax
    1160:  48 89 c6             mov    %rax,%rsi
    1163:  48 8d 05 a2 0e 00 00 lea    0xea2(%rip),%rdi    # 200c <_IO_stdin_used+0xc>
    116a:  b8 00 00 00 00       mov    $0x0,%eax
    116f:  e8 dc fe ff ff       call   1050 <printf@plt>
    1174:  b8 00 00 00 00       mov    $0x0,%eax
    1179:  c9                   leave
    117a:  c3                   ret

You do not need to understand every instruction. Here is what to look for:

  • push rbp / mov rsp, rbp at the top — function prologue (marks the start of a function)
  • call — function calls. The target name appears if the binary is not stripped (printf@plt above)
  • lea ... (%rip) — loading an address, often a pointer to a string in .rodata
  • cmp / je / jne — comparisons and conditional jumps (if/else logic)
  • ret — function return

The comment # 2004 after the lea instruction tells you the address in .rodata where the string lives. You can check what’s there:

objdump -s -j .rodata mystery

This dumps the raw contents of .rodata, showing the actual string bytes.

Checkpoint: In the disassembly, you see call strcmp@plt followed by test eax, eax and je 0x1234. What is the program doing?

It is comparing two strings with strcmp, then branching based on the result. strcmp returns 0 if the strings match. test eax, eax checks if the return value is zero, and je (jump if equal/zero) jumps to address 0x1234 if the strings matched. This is a compiled if (strcmp(a, b) == 0) block. The two arguments to strcmp were loaded into rdi and rsi before the call — trace backward to find them.


5. ltrace and strace — Watch It Run

Instead of reading static code, you can watch the program execute and record every function it calls. This is dynamic analysis.

ltrace — Library Function Calls

ltrace intercepts calls to shared library functions (anything in libc: printf, strcmp, malloc, fopen, etc.):

$ ltrace ./mystery
__libc_start_main(0x401149, 1, 0x7ffd8e2b0a28, 0)
printf("Enter password: ")
fgets("s3cr3t\n", 256, 0x7f...)
strcmp("s3cr3t", "hunter2")                   = -1
puts("Wrong password!")

Look at that strcmp call. The program compared the user’s input "s3cr3t" against the hardcoded password "hunter2". Both arguments are visible in plaintext. The challenge is solved — the password is hunter2.

This is why ltrace is devastating against simple password checks. If the program uses strcmp to validate input, ltrace shows both the input and the expected value.

strace — System Calls

strace goes one level deeper. It records every system call — the interface between the program and the operating system kernel:

$ strace ./mystery 2>&1 | head -20
execve("./mystery", ["./mystery"], ...) = 0
brk(NULL)                               = 0x55a...
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY) = 3
read(3, "\x7fELF...", 832)              = 832
openat(AT_FDCWD, "secret.txt", O_RDONLY) = 4
read(4, "FLAG{hidden_in_file}\n", 4096) = 21
write(1, "Nothing to see here\n", 20)   = 20

The program opened secret.txt, read FLAG{hidden_in_file} from it, but only printed “Nothing to see here” to the screen. Without strace, you would never know it read that file.

Key system calls to watch for:

System Call What It Does
openat / open Opens a file — reveals which files the program accesses
read / write Reads from or writes to a file descriptor
connect Opens a network connection (shows IP address and port)
execve Runs another program
fork / clone Creates a child process
strace -e openat ./mystery    # Only show file opens
strace -e network ./mystery   # Only show network calls
strace -f ./mystery           # Follow child processes too
Checkpoint: ltrace shows strcmp(user_input, "XoR_kEy") but the program still rejects "XoR_kEy" as input. What's happening?

The program likely transforms your input before comparing. It might XOR it, reverse it, hash it, or apply some other transformation. The strcmp is comparing the transformed input against "XoR_kEy". You need to figure out the transformation (check the disassembly for operations between fgets/scanf and strcmp) and provide input that, after transformation, equals "XoR_kEy".


6. nm — The Symbol Table

nm lists all symbols in a binary — function names, global variables, and their memory addresses:

$ nm mystery
0000000000004010 B __bss_start
0000000000004018 b completed.0
0000000000004000 D __data_start
0000000000001149 T main
0000000000001180 T check_password
00000000000011b0 T decrypt_flag
                 U printf@GLIBC_2.2.5
                 U strcmp@GLIBC_2.2.5

The single-letter codes tell you what each symbol is:

Code Meaning Location
T Function (defined in this binary) .text section
t Static/local function .text section
D Initialized global variable .data section
B Uninitialized global variable .bss section
U Undefined (imported from shared library) External

From the output above, you can see that mystery has three functions: main, check_password, and decrypt_flag. It imports printf and strcmp from libc. The function names alone tell you a lot about the program’s logic.

If the binary is stripped (file says “stripped”), nm shows nothing useful. That is when you rely on strings, ltrace, and objdump.

nm mystery            # List symbols
nm -C mystery         # Demangle C++ names (turns _ZN3Foo3barEv into Foo::bar())
nm --defined-only mystery  # Only symbols defined in this binary

7. The Binary Analysis Workflow

Put the tools together in order. Each step narrows what you are looking for:

Step 1: file binary              → What is it? (ELF, PE, Mach-O? 32 or 64-bit? Stripped?)
         ↓
Step 2: strings binary | less    → Any readable clues? (flags, passwords, URLs, error messages)
         ↓
Step 3: nm binary                → What functions exist? (check_password? decrypt? validate?)
         ↓
Step 4: ltrace ./binary          → What library calls does it make at runtime?
         ↓
Step 5: strace ./binary          → What files/network/processes does it touch?
         ↓
Step 6: objdump -d binary | less → Read the disassembly (last resort — most time-consuming)
         ↓
Step 7: readelf -S binary        → Examine section layout for anything unusual

Most CTF challenges can be solved with steps 1-4. The disassembly in step 6 is a last resort for challenges that obfuscate strings or use custom encryption.

A real workflow example

You download a challenge binary called vault. Here is how you would approach it:

$ file vault
vault: ELF 64-bit LSB executable, x86-64, dynamically linked, not stripped

$ strings vault | grep -i flag
Enter the vault code to get the flag
FLAG FORMAT: flag{...}

$ strings vault | grep -i pass
incorrect password

$ nm vault | grep T
0000000000401180 T main
00000000004011b0 T validate_code
00000000004011f0 T print_flag

$ ltrace ./vault
printf("Enter the vault code to get the flag: ")
scanf("%d", 0x7ffc...)
validate_code(42, 1337, 0, 0)                = 0
puts("incorrect password")

The ltrace output shows validate_code was called with 42 (your input) and 1337. If the function checks whether your input equals 1337, try entering 1337. If there is a more complex check, disassemble validate_code with objdump.


8. Practice Exercises

Exercise 1 — strings scavenger hunt. Compile this program, then use only strings and grep to find the hidden message without reading the source:

#include <stdio.h>
int main(void) {
    char *visible = "This is the visible output";
    char *hidden = "HIDDEN{strings_finds_everything}";
    printf("%s\n", visible);
    return 0;
}
gcc -o scavenger scavenger.c
strings scavenger | grep HIDDEN

Exercise 2 — file identification. Download five unknown files. Use file and xxd | head -1 on each to determine the true file type, regardless of extension.

Exercise 3 — ltrace password recovery. Compile this program and use ltrace to recover the password without reading the source:

#include <stdio.h>
#include <string.h>
int main(void) {
    char input[64];
    printf("Password: ");
    fgets(input, sizeof(input), stdin);
    input[strcspn(input, "\n")] = '\0';
    if (strcmp(input, "buffalo_soldier") == 0) {
        printf("Access granted\n");
    } else {
        printf("Access denied\n");
    }
    return 0;
}

Exercise 4 — full workflow. Compile a C program of your own that has at least two functions and a string comparison. Give the compiled binary to a classmate. They must find the password using the workflow from Section 7 without seeing your source code.


Resources

Practice: picoCTF (search “reverse engineering”) · CrackMes (reverse engineering challenges) · pwnable.kr (binary exploitation)

Reference: ELF Format Specification · x86-64 Instruction Reference · Linux man pages for strace, ltrace, objdump, readelf, nm

Video: LiveOverflow — Binary Exploitation · John Hammond — Reverse Engineering CTF