File Processing Patterns
Line-by-line, token-by-token, and writing output files
After this lesson, you will be able to:
- Process files line-by-line with
hasNextLine()/nextLine() - Process files token-by-token with
hasNext()/next()and typed variants - Combine line-based and token-based reading using a second Scanner on a line
- Count, sum, and aggregate values read from a file
- Write output to files with
PrintStream - Apply the count-first, two-pass pattern for sizing arrays from file data
Your First Real File Task
You know how to open a file with Scanner. You know how to read tokens and lines. But here is the question that actually matters: how do you structure a program that reads an entire file, processes every piece of data, and writes the results somewhere?
That is what this lesson is about. Not individual Scanner methods — you already have those. This lesson is about patterns: reusable recipes for the file processing tasks that come up again and again.
We will work through a running example: a CSV gradebook file. By the end, you will read it, compute averages, and write a report to a new file.
From CSCD 110: In Python, you read files with
for line in open("file.txt")— one line at a time, automatically. Java requires you to be explicit about how you consume the file: line-by-line, token-by-token, or a mix of both. The upside is more control. The downside is more code.
Pattern 1: Line-by-Line Processing
The most common file processing pattern reads one complete line at a time. Use this when each line is a meaningful unit — a record, a sentence, a CSV row.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class LineByLine {
public static void main(String[] args) throws FileNotFoundException {
Scanner input = new Scanner(new File("gradebook.csv"));
// Skip the header row
String header = input.nextLine();
int count = 0;
while (input.hasNextLine()) {
String line = input.nextLine();
System.out.println("Row " + count + ": " + line);
count++;
}
input.close();
System.out.println("Total students: " + count);
}
}
Given gradebook.csv:
Name,Midterm,Final,Homework
Alice,88,92,95
Bob,72,68,80
Charlie,95,91,97
Output:
Row 0: Alice,88,92,95
Row 1: Bob,72,68,80
Row 2: Charlie,95,91,97
Total students: 3
The structure is always the same: while (input.hasNextLine()) guards the loop, input.nextLine() consumes one line, and you process it inside the loop body.
Pattern 2: Token-by-Token Processing
Sometimes a file contains individual values separated by whitespace, and you want to process each value independently. Scanner’s default behavior splits on whitespace — spaces, tabs, and newlines all count as delimiters.
Scanner input = new Scanner(new File("numbers.txt"));
int sum = 0;
int count = 0;
while (input.hasNextInt()) {
int value = input.nextInt();
sum += value;
count++;
}
input.close();
System.out.println("Sum: " + sum);
System.out.println("Count: " + count);
System.out.println("Average: " + (double) sum / count);
Given numbers.txt:
10 20 30
40 50
60
Output:
Sum: 210
Count: 6
Average: 35.0
Scanner treats the entire file as a stream of tokens. It does not care where the line breaks are — hasNextInt() skips over whitespace (including newlines) to find the next token.
The hasNext Cascade: Mixed Token Types
What if a file contains integers, doubles, and strings mixed together? You check types in a specific order — integers first, then doubles, then strings as a fallback:
Scanner input = new Scanner(new File("mixed.txt"));
while (input.hasNext()) {
if (input.hasNextInt()) {
int value = input.nextInt();
System.out.println("Integer: " + value);
} else if (input.hasNextDouble()) {
double value = input.nextDouble();
System.out.println("Double: " + value);
} else {
String value = input.next();
System.out.println("String: " + value);
}
}
input.close();
Given mixed.txt:
42 3.14 hello 7 world 2.71
Output:
Integer: 42
Double: 3.14
String: hello
Integer: 7
String: world
Double: 2.71
The order matters. Every integer is also a valid double (42 parses as 42.0). If you check hasNextDouble() first, you will never detect integers — they all get classified as doubles. Always check hasNextInt() before hasNextDouble().
In the hasNext cascade, why must hasNextInt() be checked before hasNextDouble()?
Pattern 3: Line-Then-Tokens (The Two-Scanner Pattern)
This is the most powerful pattern and the one you will use most often with structured data files. Read one line at a time from the file, then create a second Scanner on that line to extract individual tokens from it.
This pattern is essential for CSV and tabular data where each row has a fixed structure.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class GradebookProcessor {
public static void main(String[] args) throws FileNotFoundException {
Scanner fileScanner = new Scanner(new File("gradebook.csv"));
// Skip header
fileScanner.nextLine();
while (fileScanner.hasNextLine()) {
String line = fileScanner.nextLine();
// Second Scanner: parse tokens within this line
Scanner lineScanner = new Scanner(line);
lineScanner.useDelimiter(",");
String name = lineScanner.next();
int midterm = lineScanner.nextInt();
int finalExam = lineScanner.nextInt();
int homework = lineScanner.nextInt();
lineScanner.close();
double average = (midterm + finalExam + homework) / 3.0;
System.out.printf("%s: %.1f%n", name, average);
}
fileScanner.close();
}
}
Given the same gradebook.csv from before, this outputs:
Alice: 91.7
Bob: 73.3
Charlie: 94.3
The outer loop reads lines from the file. The inner Scanner reads tokens from the line. The useDelimiter(",") call tells the inner Scanner to split on commas instead of whitespace.
This two-scanner approach is cleaner than calling line.split(",") when the tokens are a mix of types — you get nextInt(), nextDouble(), and next() for free instead of parsing strings manually with Integer.parseInt().
Common Pitfall: Do not forget to close the inner
lineScanner. While it does not hold a file handle (it is scanning a String), closing it is good practice and prevents resource warnings from your IDE.
Counting and Summing Values from Files
Real file processing almost always involves aggregation: counting rows, summing values, computing averages, finding max/min. These are the same accumulator patterns from arrays, applied to file data.
Summing a Column
Scanner fileScanner = new Scanner(new File("gradebook.csv"));
fileScanner.nextLine(); // skip header
int totalMidterm = 0;
int studentCount = 0;
while (fileScanner.hasNextLine()) {
String line = fileScanner.nextLine();
Scanner lineScanner = new Scanner(line);
lineScanner.useDelimiter(",");
lineScanner.next(); // skip name
totalMidterm += lineScanner.nextInt(); // midterm column
lineScanner.close();
studentCount++;
}
fileScanner.close();
double avgMidterm = (double) totalMidterm / studentCount;
System.out.printf("Average midterm: %.1f%n", avgMidterm);
The Count-First Pattern (Two-Pass Reading)
Sometimes you need to store file data in an array, but you do not know how many rows the file contains. Arrays have a fixed size — you must know the size at creation. The solution is two passes through the file:
- First pass: count the items.
- Allocate an array of the correct size.
- Second pass: read the file again and fill the array.
// Pass 1: count lines
Scanner counter = new Scanner(new File("data.txt"));
int lineCount = 0;
while (counter.hasNextLine()) {
counter.nextLine();
lineCount++;
}
counter.close();
// Allocate array
String[] lines = new String[lineCount];
// Pass 2: fill array
Scanner reader = new Scanner(new File("data.txt"));
for (int i = 0; i < lines.length; i++) {
lines[i] = reader.nextLine();
}
reader.close();
A Scanner is like a bookmark — once it reaches the end of the file, you cannot rewind it. You must close it and create a new one. You can reuse the same variable name:
Scanner sc = new Scanner(new File("data.txt"));
// ... first pass ...
sc.close();
sc = new Scanner(new File("data.txt")); // new Scanner object, same variable
// ... second pass ...
sc.close();
Why does the count-first pattern require two separate Scanner objects (or closing and re-creating one)?
Writing Output with PrintStream
Reading is half the story. PrintStream lets you write output to a file using the same print(), println(), and printf() methods you already use with System.out — because System.out is a PrintStream.
import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintStream;
import java.util.Scanner;
public class GradebookReport {
public static void main(String[] args) throws FileNotFoundException {
Scanner fileScanner = new Scanner(new File("gradebook.csv"));
PrintStream out = new PrintStream(new File("report.txt"));
// Write report header
out.println("=== Gradebook Report ===");
out.println();
// Skip CSV header
fileScanner.nextLine();
int studentCount = 0;
int grandTotal = 0;
while (fileScanner.hasNextLine()) {
String line = fileScanner.nextLine();
Scanner lineScanner = new Scanner(line);
lineScanner.useDelimiter(",");
String name = lineScanner.next();
int midterm = lineScanner.nextInt();
int finalExam = lineScanner.nextInt();
int homework = lineScanner.nextInt();
lineScanner.close();
double average = (midterm + finalExam + homework) / 3.0;
grandTotal += midterm + finalExam + homework;
studentCount++;
// Write to file, not console
out.printf("%-10s Midterm: %3d Final: %3d HW: %3d Avg: %5.1f%n",
name, midterm, finalExam, homework, average);
}
out.println();
double classAvg = grandTotal / (studentCount * 3.0);
out.printf("Class average: %.1f (%d students)%n", classAvg, studentCount);
fileScanner.close();
out.close();
System.out.println("Report written to report.txt");
}
}
This produces report.txt:
=== Gradebook Report ===
Alice Midterm: 88 Final: 92 HW: 95 Avg: 91.7
Bob Midterm: 72 Final: 68 HW: 80 Avg: 73.3
Charlie Midterm: 95 Final: 91 HW: 97 Avg: 94.3
Class average: 86.4 (3 students)
Key PrintStream Facts
| Method | Behavior |
|---|---|
print(x) |
Writes x without a newline |
println(x) |
Writes x followed by a newline |
printf(format, args) |
Writes formatted output (same format strings as System.out.printf) |
println() |
Writes a blank line |
PrintStream creates the file if it does not exist. If the file already exists, it overwrites the contents — it does not append. Always close the PrintStream when you are done writing; otherwise, some output may be buffered and never written to disk.
next() vs. nextLine() — The Buffer Trap
These two methods look similar but behave differently:
next() |
nextLine() |
|
|---|---|---|
| Reads | One token (up to whitespace) | Everything until the next newline |
| Stops at | Space, tab, or newline | Newline only |
| Consumes delimiter? | No (leaves it in the buffer) | Yes (consumes the newline) |
The danger appears when you mix nextInt() (or nextDouble()) with nextLine():
Scanner sc = new Scanner(new File("test.txt"));
// File contents: "42\nhello world\n"
int num = sc.nextInt(); // reads 42, leaves \n in buffer
String line = sc.nextLine(); // reads "" (the leftover \n)
String real = sc.nextLine(); // reads "hello world"
After nextInt() consumes 42, the newline character is still in the buffer. The next nextLine() call reads that leftover newline and returns an empty string.
Fix: Add an extra nextLine() call after nextInt() or nextDouble() to consume the leftover newline. Or use nextLine() for everything and parse with Integer.parseInt().
A file contains "5\nAlice\n". What does line hold after this code runs?
Scanner sc = new Scanner(new File("data.txt"));
int num = sc.nextInt();
String line = sc.nextLine();
Putting It Together: Complete ETL Program
Here is a complete program that demonstrates the full read-process-write cycle. It reads a gradebook CSV, computes letter grades, and writes both a summary report and a new CSV with the grades added.
import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintStream;
import java.util.Scanner;
public class GradeCalculator {
public static void main(String[] args) throws FileNotFoundException {
File inputFile = new File("gradebook.csv");
if (!inputFile.exists()) {
System.out.println("File not found: " + inputFile.getAbsolutePath());
return;
}
// --- Pass 1: Count students ---
Scanner counter = new Scanner(inputFile);
counter.nextLine(); // skip header
int studentCount = 0;
while (counter.hasNextLine()) {
counter.nextLine();
studentCount++;
}
counter.close();
// --- Pass 2: Read and process ---
String[] names = new String[studentCount];
double[] averages = new double[studentCount];
Scanner reader = new Scanner(inputFile);
reader.nextLine(); // skip header
for (int i = 0; i < studentCount; i++) {
Scanner line = new Scanner(reader.nextLine());
line.useDelimiter(",");
names[i] = line.next();
int midterm = line.nextInt();
int finalExam = line.nextInt();
int homework = line.nextInt();
line.close();
averages[i] = (midterm + finalExam + homework) / 3.0;
}
reader.close();
// --- Write output ---
PrintStream out = new PrintStream(new File("grades_output.csv"));
out.println("Name,Average,Grade");
for (int i = 0; i < studentCount; i++) {
String grade = letterGrade(averages[i]);
out.printf("%s,%.1f,%s%n", names[i], averages[i], grade);
}
out.close();
System.out.println("Wrote grades for " + studentCount + " students.");
}
public static String letterGrade(double avg) {
if (avg >= 90) return "A";
if (avg >= 80) return "B";
if (avg >= 70) return "C";
if (avg >= 60) return "D";
return "F";
}
}
This program uses three patterns together: the count-first two-pass approach to size the arrays, the two-scanner pattern to parse CSV lines, and PrintStream to write the results. The output file grades_output.csv looks like:
Name,Average,Grade
Alice,91.7,A
Bob,73.3,C
Charlie,94.3,A
Quick Reference
| Task | Pattern |
|---|---|
| Read every line | while (sc.hasNextLine()) { String line = sc.nextLine(); } |
| Read every token | while (sc.hasNext()) { String token = sc.next(); } |
| Read typed tokens | while (sc.hasNextInt()) { int n = sc.nextInt(); } |
| Parse a line’s fields | Scanner lineSc = new Scanner(line); lineSc.useDelimiter(","); |
| Count lines in a file | Loop with hasNextLine(), increment counter, close, re-open |
| Write to a file | PrintStream out = new PrintStream(new File("out.txt")); |
| Mixed type detection | hasNextInt() then hasNextDouble() then next() |
Summary
File processing in Java comes down to a handful of reusable patterns. Line-by-line reading with hasNextLine() / nextLine() is for structured row data. Token-by-token reading with hasNext() / next() and typed variants is for flat value streams. The two-scanner pattern — line from the file, tokens from the line — handles CSV and other delimited formats cleanly.
When you need arrays but do not know the file size, use the count-first two-pass pattern: one Scanner to count, a new Scanner to fill. When you need to write results, PrintStream gives you the same print, println, and printf you already know from console output.
Watch for the buffer trap when mixing nextInt() with nextLine() — the leftover newline will get you if you forget to consume it.
These patterns cover the vast majority of file processing tasks you will encounter in this course and beyond. Master them, and you can handle any data file that lands on your desk.