ctf Lesson 23 30 min read

NCL: Digital Forensics

File analysis, data recovery, and memory forensics with command-line tools

Digital Forensics

Digital forensics is the process of analyzing files, recovering data, and investigating system artifacts. In NCL, forensics challenges give you a file (an image, a document, a disk image, or a memory dump) and ask you to extract specific information from it.

This page covers the five forensics skill areas tested in NCL: file identification, metadata extraction, file carving, magic byte repair, and memory analysis. Each section includes the exact tools and commands.

Prerequisites: You should be comfortable with file navigation, piping, and basic shell commands from Weeks 1-2.


1. File Identification

The file command identifies a file’s true type by reading its magic bytes — the first few bytes that encode the file format. This works regardless of the file extension, which can be changed or missing entirely.

file suspicious_document        # Identify true type
file -i suspicious_document     # Show MIME type

Magic bytes reference

Each file format starts with a specific byte sequence:

Hex Bytes ASCII File Type
89 50 4E 47 .PNG PNG image
FF D8 FF JPEG image
25 50 44 46 %PDF PDF document
50 4B 03 04 PK.. ZIP archive (also DOCX, XLSX, PPTX)
47 49 46 38 GIF8 GIF image
7F 45 4C 46 .ELF Linux executable
4D 5A MZ Windows executable (PE)
1F 8B Gzip compressed

To view magic bytes directly:

xxd file | head -2          # Hex dump of first 32 bytes
hexdump -C file | head -2   # Alternative hex viewer
Checkpoint: A file named "photo.jpg" has magic bytes 50 4B 03 04. What is it actually?

A ZIP archive, not a JPEG. The extension is misleading — the magic bytes 50 4B 03 04 (ASCII: PK) identify it as ZIP. It could also be a DOCX, XLSX, or PPTX file, which are all ZIP archives internally. Try unzip photo.jpg to see what’s inside.


2. Text and Metadata Extraction

Two commands handle the majority of NCL forensics challenges:

strings extracts all printable ASCII sequences from any file. In CTF, flags are frequently hidden as plaintext inside binary files:

strings suspicious_file                    # All readable text
strings suspicious_file | grep -i "flag"   # Search for flag patterns
strings suspicious_file | grep "SKY-"      # NCL flag format

exiftool reads embedded metadata (EXIF, XMP, IPTC) from images, PDFs, Office documents, and other file types:

exiftool document.pdf          # All metadata
exiftool -s3 -Creator doc.pdf  # Just the creator software (value only)
pdfinfo document.pdf           # PDF-specific metadata

PDF forensics

PDFs are a common forensics target. Key questions NCL asks:

  • What software created the PDF? → exiftool -s3 -Creator file.pdf
  • What is the PDF version? → exiftool -s3 -PDFVersion file.pdf
  • Is there hidden text? → strings file.pdf | grep -i flag
  • Are there embedded files? → binwalk file.pdf
Checkpoint: You run strings on a PDF and find nothing useful. What else should you try?
  1. exiftool to check metadata fields (Author, Creator, Subject, Keywords — flags hide here)
  2. binwalk to scan for embedded files inside the PDF
  3. pdftotext file.pdf - to extract visible text (different from strings — handles PDF encoding)
  4. Open in a PDF viewer and check for hidden layers, white-on-white text, or redacted regions that can be un-redacted

3. File Carving with binwalk

binwalk scans a file for known file signatures embedded inside it. Files can contain other files — a JPEG might have a ZIP appended after its end-of-image marker, or a disk image might contain dozens of recoverable files.

binwalk suspicious_file         # Scan and list embedded signatures
binwalk -e suspicious_file      # Extract embedded files to _suspicious_file.extracted/

After extraction, check what was recovered:

ls _suspicious_file.extracted/
file _suspicious_file.extracted/*

Manual carving with dd

When binwalk cannot auto-extract, use dd to carve bytes manually. You need the offset (from binwalk or xxd):

# Extract everything starting at byte offset 0x2A00
dd if=suspicious_file bs=1 skip=$((0x2A00)) of=carved_file

# Extract a specific number of bytes
dd if=suspicious_file bs=1 skip=$((0x2A00)) count=1024 of=carved_file

Alternative: foremost

foremost is a dedicated file carving tool that recovers files by matching headers and footers:

foremost -i disk_image.dd -o recovered_files/
Checkpoint: binwalk shows "Zip archive data" at offset 0x5000 inside a JPEG file. How do you extract it?

Option 1: binwalk -e file.jpg (auto-extract — creates _file.jpg.extracted/ with the ZIP inside).

Option 2: dd if=file.jpg bs=1 skip=$((0x5000)) of=hidden.zip followed by unzip hidden.zip.

Auto-extract is faster; dd is the fallback when binwalk’s extraction doesn’t produce a clean file.


4. Magic Byte Repair

Some NCL challenges give you a file with corrupted or zeroed-out magic bytes. The file command reports it as “data” (unknown type). Your job is to figure out the original type and fix the header.

Process

  1. View the current header bytes: xxd corrupted_file | head -3
  2. Compare against known magic bytes (table above)
  3. Fix the header using Python or a hex editor:
# Fix a PNG with zeroed-out magic bytes
with open('corrupted', 'rb') as f:
    data = bytearray(f.read())

# Replace first 8 bytes with PNG signature
data[0:8] = b'\x89PNG\r\n\x1a\n'

with open('recovered.png', 'wb') as f:
    f.write(data)

Clues for identifying the original type

If the magic bytes are completely gone, look for other indicators:

  • File size (a 50KB file is unlikely to be a memory dump)
  • Internal structure (search strings output for format-specific text like “IHDR” for PNG, “Exif” for JPEG)
  • The challenge description may hint at the expected type
Checkpoint: xxd shows the first bytes as 00 00 00 00 0D 0A 1A 0A. The 5th-8th bytes (0D 0A 1A 0A) are part of the PNG signature. What should the first 4 bytes be?

89 50 4E 47 — the ASCII representation is .PNG. The full 8-byte PNG signature is 89 50 4E 47 0D 0A 1A 0A. The first 4 bytes were zeroed out; replacing them restores the file.


5. Memory Forensics with Volatility

Memory forensics is the most advanced NCL category. You are given a RAM dump (typically from a Windows machine) and must extract information about the running system: OS version, logged-in users, running processes, network connections, and password hashes.

Volatility is the standard tool. The workflow always starts with identifying the OS profile (which determines how memory structures are interpreted):

# Step 1: Identify the OS
volatility -f memory.dmp imageinfo

This outputs suggested profiles like Win7SP1x64. Use the first suggestion for subsequent commands:

# System information
volatility -f memory.dmp --profile=Win7SP1x64 printkey \
  -K "ControlSet001\Control\ComputerName\ComputerName"

# Running processes
volatility -f memory.dmp --profile=Win7SP1x64 pslist

# Network connections
volatility -f memory.dmp --profile=Win7SP1x64 netscan

# Password hashes (LM:NT format)
volatility -f memory.dmp --profile=Win7SP1x64 hashdump

# Open files
volatility -f memory.dmp --profile=Win7SP1x64 filescan

NCL memory forensics questions

Question Volatility command
OS type / version imageinfo
Computer name printkey with ComputerName key
Logged-in username pslist (look for explorer.exe owner) or printkey
Suspicious processes pslist (check for unusual names, high PIDs, wrong parents)
Network activity netscan (look for connections to unusual IPs/ports)
Password hashes hashdump (crack these with John or hashcat)
Checkpoint: Volatility's imageinfo suggests three profiles. You try the first one and pslist returns empty output. What do you do?

The profile is wrong — an incorrect profile causes Volatility to misinterpret memory structures, producing empty or garbled output. Try the second suggested profile. If that also fails, try the third. A correct profile produces recognizable process names like System, smss.exe, csrss.exe, explorer.exe.


6. Git Repository Forensics

NCL occasionally tests your ability to find information in version control history. Git repositories store the complete history of every file, including deleted content.

# View commit history
git log --all --oneline

# Search all commits for sensitive strings
git log --all -p | grep -i "password\|secret\|flag\|key"

# Show a specific commit's changes
git show <commit-hash>

# List all branches (including remote-tracking)
git branch -a

# Show deleted files in history
git log --all --diff-filter=D --name-only
Checkpoint: You clone a git repo and find no flags in the current files. Where else should you look?
  1. git log --all -p | grep flag — search the full diff history for flags in deleted or modified content
  2. git stash list — check for stashed changes
  3. git branch -a — check other branches
  4. git reflog — check for orphaned commits
  5. .git/ directory itself — sometimes flags are in git config or hooks

Resources

Practice: CyberDefenders (blue team forensics) · MemLabs (memory forensics CTF) · DFIR Training

Reference: Volatility Command Reference · File Signatures Table

Video: 13Cubed — Volatility tutorials · John Hammond — forensics CTF