NCL: Binary Analysis & Reverse Engineering
strings, objdump, ltrace, strace, and ELF format
Binary Analysis & Reverse Engineering
When you compile a C program with gcc, the output is a binary — a file of machine instructions the CPU executes directly. The source code is gone. All that remains is bytes.
Binary analysis is the process of figuring out what a binary does without having the source code. This is how security researchers analyze malware, how CTF players solve reverse engineering challenges, and how forensic investigators examine suspicious programs found on compromised systems.
This page covers the core command-line tools for binary analysis on Linux. Every tool here ships with a standard Linux install or is one apt install away. No expensive commercial software, no GUI — just your terminal.
Prerequisites: You should understand compilation from Lesson 2.4 and have written programs with arrays/pointers from Lesson 2.10.
1. The ELF Format — What’s Inside a Binary
Linux executables use the ELF (Executable and Linkable Format) format. When gcc compiles your code, it produces an ELF binary. Every ELF file is divided into sections, each holding a different kind of data:
| Section | Contains | Purpose |
|---|---|---|
.text |
Machine code | The actual instructions the CPU runs |
.data |
Initialized globals | Global variables with initial values |
.bss |
Uninitialized globals | Global variables zeroed at startup |
.rodata |
Read-only data | String literals, constants |
.symtab |
Symbol table | Function and variable names (if not stripped) |
.dynamic |
Dynamic linking info | Shared library references |
When you write printf("hello") in C, the string "hello" goes into .rodata and the compiled printf call instruction goes into .text. The printf function itself lives in a shared library (libc.so), not in your binary — .dynamic records that dependency.
The ELF header sits at the very beginning of the file. Its magic bytes are 7F 45 4C 46 (ASCII: .ELF). If you see those bytes with xxd, you are looking at a Linux executable.
2. strings — The First Tool You Reach For
strings extracts every sequence of printable ASCII characters (4+ characters by default) from any file. This is always step one in binary analysis because it requires zero expertise and catches an astonishing amount:
strings binary # Dump all readable text
strings binary | grep -i flag # Search for flags
strings binary | grep -i http # Find URLs
strings binary | grep -i password # Find credential strings
strings -n 8 binary # Minimum 8 chars (cuts noise)
Why does this work? Compilers embed string literals directly into the binary. When a programmer writes:
if (strcmp(input, "s3cr3t_p4ss") == 0) {
printf("Access granted\n");
}
Both "s3cr3t_p4ss" and "Access granted\n" end up as plaintext in the .rodata section. strings finds them instantly. No disassembly needed.
In CTF challenges, flags are frequently embedded as string literals. In malware analysis, strings reveals command-and-control URLs, encryption keys left in debug builds, and error messages that reveal program logic.
Checkpoint: You run strings on a binary and get 3,000 lines of output. Most of it is library junk. How do you narrow it down?
strings binary | grep -i "flag\|key\|secret\|pass"— search for keywordsstrings binary | grep -E '^.{20,}$'— show only long strings (interesting ones are usually longer)strings binary | sort -u | less— deduplicate and browsestrings -t x binary | grep flag— show the hex offset of each match (useful for locating it in the binary later)
3. file and readelf — Identify What You’re Looking At
Before analysis, know what you have. The file command reads magic bytes and reports the binary’s type:
$ file mystery
mystery: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
for GNU/Linux 3.2.0, not stripped
That single line tells you:
- ELF 64-bit — Linux executable, 64-bit architecture
- LSB — Little-endian byte order
- x86-64 — Intel/AMD 64-bit instruction set
- dynamically linked — Uses shared libraries (libc, etc.)
- not stripped — Symbol table is intact (function names are present)
If it says stripped, the developer removed the symbol table. Function names are gone, which makes analysis harder — but not impossible.
readelf digs deeper into the ELF structure:
readelf -h mystery # ELF header: entry point, architecture, type
readelf -S mystery # Section headers (list all sections and their sizes)
readelf -s mystery # Symbol table (all function and variable names)
readelf -d mystery # Dynamic section (shared library dependencies)
$ readelf -S mystery | head -15
There are 29 section headers, starting at offset 0x1960:
Section Headers:
[Nr] Name Type Address Offset Size
[ 1] .interp PROGBITS 0000000000000318 00000318 0000001c
[ 6] .text PROGBITS 0000000000001060 00001060 00000195
[15] .rodata PROGBITS 0000000000002000 00002000 00000028
[23] .data PROGBITS 0000000000004000 00003000 00000010
The .text section at offset 0x1060 is where the executable code lives. The .rodata section at 0x2000 holds string literals. These offsets matter when you start reading disassembly.
Checkpoint: file reports "ELF 32-bit MSB executable, MIPS". Can you run it on your x86 laptop?
Not directly. It is compiled for the MIPS architecture (a different CPU instruction set, common in routers and embedded devices). You would need a MIPS emulator like qemu-mips to run it, or you can still analyze it statically with strings, readelf, and objdump -m mips -d binary.
4. objdump — Disassembly
objdump -d binary converts machine code back into assembly language — a human-readable representation of CPU instructions. You do not need to be an assembly expert. Focus on recognizing patterns.
objdump -d mystery | less # Full disassembly
objdump -d mystery | grep -A 20 '<main>' # Just main()
Here is what disassembled main looks like for a simple program that calls printf("Hello %s\n", name):
0000000000001149 <main>:
1149: 55 push %rbp
114a: 48 89 e5 mov %rsp,%rbp
114d: 48 83 ec 10 sub $0x10,%rsp
1151: 48 8d 05 ac 0e 00 00 lea 0xeac(%rip),%rax # 2004 <_IO_stdin_used+0x4>
1158: 48 89 45 f8 mov %rax,-0x8(%rbp)
115c: 48 8b 45 f8 mov -0x8(%rbp),%rax
1160: 48 89 c6 mov %rax,%rsi
1163: 48 8d 05 a2 0e 00 00 lea 0xea2(%rip),%rdi # 200c <_IO_stdin_used+0xc>
116a: b8 00 00 00 00 mov $0x0,%eax
116f: e8 dc fe ff ff call 1050 <printf@plt>
1174: b8 00 00 00 00 mov $0x0,%eax
1179: c9 leave
117a: c3 ret
You do not need to understand every instruction. Here is what to look for:
push rbp/mov rsp, rbpat the top — function prologue (marks the start of a function)call— function calls. The target name appears if the binary is not stripped (printf@pltabove)lea ... (%rip)— loading an address, often a pointer to a string in.rodatacmp/je/jne— comparisons and conditional jumps (if/else logic)ret— function return
The comment # 2004 after the lea instruction tells you the address in .rodata where the string lives. You can check what’s there:
objdump -s -j .rodata mystery
This dumps the raw contents of .rodata, showing the actual string bytes.
Checkpoint: In the disassembly, you see call strcmp@plt followed by test eax, eax and je 0x1234. What is the program doing?
It is comparing two strings with strcmp, then branching based on the result. strcmp returns 0 if the strings match. test eax, eax checks if the return value is zero, and je (jump if equal/zero) jumps to address 0x1234 if the strings matched. This is a compiled if (strcmp(a, b) == 0) block. The two arguments to strcmp were loaded into rdi and rsi before the call — trace backward to find them.
5. ltrace and strace — Watch It Run
Instead of reading static code, you can watch the program execute and record every function it calls. This is dynamic analysis.
ltrace — Library Function Calls
ltrace intercepts calls to shared library functions (anything in libc: printf, strcmp, malloc, fopen, etc.):
$ ltrace ./mystery
__libc_start_main(0x401149, 1, 0x7ffd8e2b0a28, 0)
printf("Enter password: ")
fgets("s3cr3t\n", 256, 0x7f...)
strcmp("s3cr3t", "hunter2") = -1
puts("Wrong password!")
Look at that strcmp call. The program compared the user’s input "s3cr3t" against the hardcoded password "hunter2". Both arguments are visible in plaintext. The challenge is solved — the password is hunter2.
This is why ltrace is devastating against simple password checks. If the program uses strcmp to validate input, ltrace shows both the input and the expected value.
strace — System Calls
strace goes one level deeper. It records every system call — the interface between the program and the operating system kernel:
$ strace ./mystery 2>&1 | head -20
execve("./mystery", ["./mystery"], ...) = 0
brk(NULL) = 0x55a...
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY) = 3
read(3, "\x7fELF...", 832) = 832
openat(AT_FDCWD, "secret.txt", O_RDONLY) = 4
read(4, "FLAG{hidden_in_file}\n", 4096) = 21
write(1, "Nothing to see here\n", 20) = 20
The program opened secret.txt, read FLAG{hidden_in_file} from it, but only printed “Nothing to see here” to the screen. Without strace, you would never know it read that file.
Key system calls to watch for:
| System Call | What It Does |
|---|---|
openat / open |
Opens a file — reveals which files the program accesses |
read / write |
Reads from or writes to a file descriptor |
connect |
Opens a network connection (shows IP address and port) |
execve |
Runs another program |
fork / clone |
Creates a child process |
strace -e openat ./mystery # Only show file opens
strace -e network ./mystery # Only show network calls
strace -f ./mystery # Follow child processes too
Checkpoint: ltrace shows strcmp(user_input, "XoR_kEy") but the program still rejects "XoR_kEy" as input. What's happening?
The program likely transforms your input before comparing. It might XOR it, reverse it, hash it, or apply some other transformation. The strcmp is comparing the transformed input against "XoR_kEy". You need to figure out the transformation (check the disassembly for operations between fgets/scanf and strcmp) and provide input that, after transformation, equals "XoR_kEy".
6. nm — The Symbol Table
nm lists all symbols in a binary — function names, global variables, and their memory addresses:
$ nm mystery
0000000000004010 B __bss_start
0000000000004018 b completed.0
0000000000004000 D __data_start
0000000000001149 T main
0000000000001180 T check_password
00000000000011b0 T decrypt_flag
U printf@GLIBC_2.2.5
U strcmp@GLIBC_2.2.5
The single-letter codes tell you what each symbol is:
| Code | Meaning | Location |
|---|---|---|
T |
Function (defined in this binary) | .text section |
t |
Static/local function | .text section |
D |
Initialized global variable | .data section |
B |
Uninitialized global variable | .bss section |
U |
Undefined (imported from shared library) | External |
From the output above, you can see that mystery has three functions: main, check_password, and decrypt_flag. It imports printf and strcmp from libc. The function names alone tell you a lot about the program’s logic.
If the binary is stripped (file says “stripped”), nm shows nothing useful. That is when you rely on strings, ltrace, and objdump.
nm mystery # List symbols
nm -C mystery # Demangle C++ names (turns _ZN3Foo3barEv into Foo::bar())
nm --defined-only mystery # Only symbols defined in this binary
7. The Binary Analysis Workflow
Put the tools together in order. Each step narrows what you are looking for:
Step 1: file binary → What is it? (ELF, PE, Mach-O? 32 or 64-bit? Stripped?)
↓
Step 2: strings binary | less → Any readable clues? (flags, passwords, URLs, error messages)
↓
Step 3: nm binary → What functions exist? (check_password? decrypt? validate?)
↓
Step 4: ltrace ./binary → What library calls does it make at runtime?
↓
Step 5: strace ./binary → What files/network/processes does it touch?
↓
Step 6: objdump -d binary | less → Read the disassembly (last resort — most time-consuming)
↓
Step 7: readelf -S binary → Examine section layout for anything unusual
Most CTF challenges can be solved with steps 1-4. The disassembly in step 6 is a last resort for challenges that obfuscate strings or use custom encryption.
A real workflow example
You download a challenge binary called vault. Here is how you would approach it:
$ file vault
vault: ELF 64-bit LSB executable, x86-64, dynamically linked, not stripped
$ strings vault | grep -i flag
Enter the vault code to get the flag
FLAG FORMAT: flag{...}
$ strings vault | grep -i pass
incorrect password
$ nm vault | grep T
0000000000401180 T main
00000000004011b0 T validate_code
00000000004011f0 T print_flag
$ ltrace ./vault
printf("Enter the vault code to get the flag: ")
scanf("%d", 0x7ffc...)
validate_code(42, 1337, 0, 0) = 0
puts("incorrect password")
The ltrace output shows validate_code was called with 42 (your input) and 1337. If the function checks whether your input equals 1337, try entering 1337. If there is a more complex check, disassemble validate_code with objdump.
8. Practice Exercises
Exercise 1 — strings scavenger hunt. Compile this program, then use only strings and grep to find the hidden message without reading the source:
#include <stdio.h>
int main(void) {
char *visible = "This is the visible output";
char *hidden = "HIDDEN{strings_finds_everything}";
printf("%s\n", visible);
return 0;
}
gcc -o scavenger scavenger.c
strings scavenger | grep HIDDEN
Exercise 2 — file identification. Download five unknown files. Use file and xxd | head -1 on each to determine the true file type, regardless of extension.
Exercise 3 — ltrace password recovery. Compile this program and use ltrace to recover the password without reading the source:
#include <stdio.h>
#include <string.h>
int main(void) {
char input[64];
printf("Password: ");
fgets(input, sizeof(input), stdin);
input[strcspn(input, "\n")] = '\0';
if (strcmp(input, "buffalo_soldier") == 0) {
printf("Access granted\n");
} else {
printf("Access denied\n");
}
return 0;
}
Exercise 4 — full workflow. Compile a C program of your own that has at least two functions and a string comparison. Give the compiled binary to a classmate. They must find the password using the workflow from Section 7 without seeing your source code.
Resources
Practice: picoCTF (search “reverse engineering”) · CrackMes (reverse engineering challenges) · pwnable.kr (binary exploitation)
Reference: ELF Format Specification · x86-64 Instruction Reference · Linux man pages for strace, ltrace, objdump, readelf, nm
Video: LiveOverflow — Binary Exploitation · John Hammond — Reverse Engineering CTF