file-io Lesson 2 20 min read

File Processing Patterns

Line-by-line, token-by-token, and writing output files

Reading: Reges & Stepp: Ch. 6

After this lesson, you will be able to:

  • Process files line-by-line with hasNextLine() / nextLine()
  • Process files token-by-token with hasNext() / next() and typed variants
  • Combine line-based and token-based reading using a second Scanner on a line
  • Count, sum, and aggregate values read from a file
  • Write output to files with PrintStream
  • Apply the count-first, two-pass pattern for sizing arrays from file data

Your First Real File Task

You know how to open a file with Scanner. You know how to read tokens and lines. But here is the question that actually matters: how do you structure a program that reads an entire file, processes every piece of data, and writes the results somewhere?

That is what this lesson is about. Not individual Scanner methods — you already have those. This lesson is about patterns: reusable recipes for the file processing tasks that come up again and again.

We will work through a running example: a CSV gradebook file. By the end, you will read it, compute averages, and write a report to a new file.

From CSCD 110: In Python, you read files with for line in open("file.txt") — one line at a time, automatically. Java requires you to be explicit about how you consume the file: line-by-line, token-by-token, or a mix of both. The upside is more control. The downside is more code.


Pattern 1: Line-by-Line Processing

The most common file processing pattern reads one complete line at a time. Use this when each line is a meaningful unit — a record, a sentence, a CSV row.

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class LineByLine {
    public static void main(String[] args) throws FileNotFoundException {
        Scanner input = new Scanner(new File("gradebook.csv"));

        // Skip the header row
        String header = input.nextLine();

        int count = 0;
        while (input.hasNextLine()) {
            String line = input.nextLine();
            System.out.println("Row " + count + ": " + line);
            count++;
        }
        input.close();

        System.out.println("Total students: " + count);
    }
}

Given gradebook.csv:

Name,Midterm,Final,Homework
Alice,88,92,95
Bob,72,68,80
Charlie,95,91,97

Output:

Row 0: Alice,88,92,95
Row 1: Bob,72,68,80
Row 2: Charlie,95,91,97
Total students: 3

The structure is always the same: while (input.hasNextLine()) guards the loop, input.nextLine() consumes one line, and you process it inside the loop body.


Pattern 2: Token-by-Token Processing

Sometimes a file contains individual values separated by whitespace, and you want to process each value independently. Scanner’s default behavior splits on whitespace — spaces, tabs, and newlines all count as delimiters.

Scanner input = new Scanner(new File("numbers.txt"));

int sum = 0;
int count = 0;

while (input.hasNextInt()) {
    int value = input.nextInt();
    sum += value;
    count++;
}
input.close();

System.out.println("Sum: " + sum);
System.out.println("Count: " + count);
System.out.println("Average: " + (double) sum / count);

Given numbers.txt:

10 20 30
40 50
60

Output:

Sum: 210
Count: 6
Average: 35.0

Scanner treats the entire file as a stream of tokens. It does not care where the line breaks are — hasNextInt() skips over whitespace (including newlines) to find the next token.

The hasNext Cascade: Mixed Token Types

What if a file contains integers, doubles, and strings mixed together? You check types in a specific order — integers first, then doubles, then strings as a fallback:

Scanner input = new Scanner(new File("mixed.txt"));

while (input.hasNext()) {
    if (input.hasNextInt()) {
        int value = input.nextInt();
        System.out.println("Integer: " + value);
    } else if (input.hasNextDouble()) {
        double value = input.nextDouble();
        System.out.println("Double: " + value);
    } else {
        String value = input.next();
        System.out.println("String: " + value);
    }
}
input.close();

Given mixed.txt:

42 3.14 hello 7 world 2.71

Output:

Integer: 42
Double: 3.14
String: hello
Integer: 7
String: world
Double: 2.71

The order matters. Every integer is also a valid double (42 parses as 42.0). If you check hasNextDouble() first, you will never detect integers — they all get classified as doubles. Always check hasNextInt() before hasNextDouble().

Check Your Understanding

In the hasNext cascade, why must hasNextInt() be checked before hasNextDouble()?


Pattern 3: Line-Then-Tokens (The Two-Scanner Pattern)

This is the most powerful pattern and the one you will use most often with structured data files. Read one line at a time from the file, then create a second Scanner on that line to extract individual tokens from it.

This pattern is essential for CSV and tabular data where each row has a fixed structure.

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class GradebookProcessor {
    public static void main(String[] args) throws FileNotFoundException {
        Scanner fileScanner = new Scanner(new File("gradebook.csv"));

        // Skip header
        fileScanner.nextLine();

        while (fileScanner.hasNextLine()) {
            String line = fileScanner.nextLine();

            // Second Scanner: parse tokens within this line
            Scanner lineScanner = new Scanner(line);
            lineScanner.useDelimiter(",");

            String name = lineScanner.next();
            int midterm = lineScanner.nextInt();
            int finalExam = lineScanner.nextInt();
            int homework = lineScanner.nextInt();
            lineScanner.close();

            double average = (midterm + finalExam + homework) / 3.0;
            System.out.printf("%s: %.1f%n", name, average);
        }
        fileScanner.close();
    }
}

Given the same gradebook.csv from before, this outputs:

Alice: 91.7
Bob: 73.3
Charlie: 94.3

The outer loop reads lines from the file. The inner Scanner reads tokens from the line. The useDelimiter(",") call tells the inner Scanner to split on commas instead of whitespace.

This two-scanner approach is cleaner than calling line.split(",") when the tokens are a mix of types — you get nextInt(), nextDouble(), and next() for free instead of parsing strings manually with Integer.parseInt().

Common Pitfall: Do not forget to close the inner lineScanner. While it does not hold a file handle (it is scanning a String), closing it is good practice and prevents resource warnings from your IDE.


Counting and Summing Values from Files

Real file processing almost always involves aggregation: counting rows, summing values, computing averages, finding max/min. These are the same accumulator patterns from arrays, applied to file data.

Summing a Column

Scanner fileScanner = new Scanner(new File("gradebook.csv"));
fileScanner.nextLine(); // skip header

int totalMidterm = 0;
int studentCount = 0;

while (fileScanner.hasNextLine()) {
    String line = fileScanner.nextLine();
    Scanner lineScanner = new Scanner(line);
    lineScanner.useDelimiter(",");

    lineScanner.next();              // skip name
    totalMidterm += lineScanner.nextInt();  // midterm column
    lineScanner.close();

    studentCount++;
}
fileScanner.close();

double avgMidterm = (double) totalMidterm / studentCount;
System.out.printf("Average midterm: %.1f%n", avgMidterm);

The Count-First Pattern (Two-Pass Reading)

Sometimes you need to store file data in an array, but you do not know how many rows the file contains. Arrays have a fixed size — you must know the size at creation. The solution is two passes through the file:

  1. First pass: count the items.
  2. Allocate an array of the correct size.
  3. Second pass: read the file again and fill the array.
// Pass 1: count lines
Scanner counter = new Scanner(new File("data.txt"));
int lineCount = 0;
while (counter.hasNextLine()) {
    counter.nextLine();
    lineCount++;
}
counter.close();

// Allocate array
String[] lines = new String[lineCount];

// Pass 2: fill array
Scanner reader = new Scanner(new File("data.txt"));
for (int i = 0; i < lines.length; i++) {
    lines[i] = reader.nextLine();
}
reader.close();

A Scanner is like a bookmark — once it reaches the end of the file, you cannot rewind it. You must close it and create a new one. You can reuse the same variable name:

Scanner sc = new Scanner(new File("data.txt"));
// ... first pass ...
sc.close();

sc = new Scanner(new File("data.txt"));  // new Scanner object, same variable
// ... second pass ...
sc.close();
Check Your Understanding

Why does the count-first pattern require two separate Scanner objects (or closing and re-creating one)?


Writing Output with PrintStream

Reading is half the story. PrintStream lets you write output to a file using the same print(), println(), and printf() methods you already use with System.out — because System.out is a PrintStream.

import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintStream;
import java.util.Scanner;

public class GradebookReport {
    public static void main(String[] args) throws FileNotFoundException {
        Scanner fileScanner = new Scanner(new File("gradebook.csv"));
        PrintStream out = new PrintStream(new File("report.txt"));

        // Write report header
        out.println("=== Gradebook Report ===");
        out.println();

        // Skip CSV header
        fileScanner.nextLine();

        int studentCount = 0;
        int grandTotal = 0;

        while (fileScanner.hasNextLine()) {
            String line = fileScanner.nextLine();
            Scanner lineScanner = new Scanner(line);
            lineScanner.useDelimiter(",");

            String name = lineScanner.next();
            int midterm = lineScanner.nextInt();
            int finalExam = lineScanner.nextInt();
            int homework = lineScanner.nextInt();
            lineScanner.close();

            double average = (midterm + finalExam + homework) / 3.0;
            grandTotal += midterm + finalExam + homework;
            studentCount++;

            // Write to file, not console
            out.printf("%-10s  Midterm: %3d  Final: %3d  HW: %3d  Avg: %5.1f%n",
                       name, midterm, finalExam, homework, average);
        }

        out.println();
        double classAvg = grandTotal / (studentCount * 3.0);
        out.printf("Class average: %.1f (%d students)%n", classAvg, studentCount);

        fileScanner.close();
        out.close();

        System.out.println("Report written to report.txt");
    }
}

This produces report.txt:

=== Gradebook Report ===

Alice       Midterm:  88  Final:  92  HW:  95  Avg:  91.7
Bob         Midterm:  72  Final:  68  HW:  80  Avg:  73.3
Charlie     Midterm:  95  Final:  91  HW:  97  Avg:  94.3

Class average: 86.4 (3 students)

Key PrintStream Facts

Method Behavior
print(x) Writes x without a newline
println(x) Writes x followed by a newline
printf(format, args) Writes formatted output (same format strings as System.out.printf)
println() Writes a blank line

PrintStream creates the file if it does not exist. If the file already exists, it overwrites the contents — it does not append. Always close the PrintStream when you are done writing; otherwise, some output may be buffered and never written to disk.


next() vs. nextLine() — The Buffer Trap

These two methods look similar but behave differently:

  next() nextLine()
Reads One token (up to whitespace) Everything until the next newline
Stops at Space, tab, or newline Newline only
Consumes delimiter? No (leaves it in the buffer) Yes (consumes the newline)

The danger appears when you mix nextInt() (or nextDouble()) with nextLine():

Scanner sc = new Scanner(new File("test.txt"));
// File contents: "42\nhello world\n"

int num = sc.nextInt();       // reads 42, leaves \n in buffer
String line = sc.nextLine();  // reads "" (the leftover \n)
String real = sc.nextLine();  // reads "hello world"

After nextInt() consumes 42, the newline character is still in the buffer. The next nextLine() call reads that leftover newline and returns an empty string.

Fix: Add an extra nextLine() call after nextInt() or nextDouble() to consume the leftover newline. Or use nextLine() for everything and parse with Integer.parseInt().

Check Your Understanding

A file contains "5\nAlice\n". What does line hold after this code runs?
Scanner sc = new Scanner(new File("data.txt"));
int num = sc.nextInt();
String line = sc.nextLine();


Putting It Together: Complete ETL Program

Here is a complete program that demonstrates the full read-process-write cycle. It reads a gradebook CSV, computes letter grades, and writes both a summary report and a new CSV with the grades added.

import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintStream;
import java.util.Scanner;

public class GradeCalculator {
    public static void main(String[] args) throws FileNotFoundException {
        File inputFile = new File("gradebook.csv");
        if (!inputFile.exists()) {
            System.out.println("File not found: " + inputFile.getAbsolutePath());
            return;
        }

        // --- Pass 1: Count students ---
        Scanner counter = new Scanner(inputFile);
        counter.nextLine(); // skip header
        int studentCount = 0;
        while (counter.hasNextLine()) {
            counter.nextLine();
            studentCount++;
        }
        counter.close();

        // --- Pass 2: Read and process ---
        String[] names = new String[studentCount];
        double[] averages = new double[studentCount];

        Scanner reader = new Scanner(inputFile);
        reader.nextLine(); // skip header

        for (int i = 0; i < studentCount; i++) {
            Scanner line = new Scanner(reader.nextLine());
            line.useDelimiter(",");

            names[i] = line.next();
            int midterm = line.nextInt();
            int finalExam = line.nextInt();
            int homework = line.nextInt();
            line.close();

            averages[i] = (midterm + finalExam + homework) / 3.0;
        }
        reader.close();

        // --- Write output ---
        PrintStream out = new PrintStream(new File("grades_output.csv"));
        out.println("Name,Average,Grade");

        for (int i = 0; i < studentCount; i++) {
            String grade = letterGrade(averages[i]);
            out.printf("%s,%.1f,%s%n", names[i], averages[i], grade);
        }
        out.close();

        System.out.println("Wrote grades for " + studentCount + " students.");
    }

    public static String letterGrade(double avg) {
        if (avg >= 90) return "A";
        if (avg >= 80) return "B";
        if (avg >= 70) return "C";
        if (avg >= 60) return "D";
        return "F";
    }
}

This program uses three patterns together: the count-first two-pass approach to size the arrays, the two-scanner pattern to parse CSV lines, and PrintStream to write the results. The output file grades_output.csv looks like:

Name,Average,Grade
Alice,91.7,A
Bob,73.3,C
Charlie,94.3,A

Quick Reference

Task Pattern
Read every line while (sc.hasNextLine()) { String line = sc.nextLine(); }
Read every token while (sc.hasNext()) { String token = sc.next(); }
Read typed tokens while (sc.hasNextInt()) { int n = sc.nextInt(); }
Parse a line’s fields Scanner lineSc = new Scanner(line); lineSc.useDelimiter(",");
Count lines in a file Loop with hasNextLine(), increment counter, close, re-open
Write to a file PrintStream out = new PrintStream(new File("out.txt"));
Mixed type detection hasNextInt() then hasNextDouble() then next()

Summary

File processing in Java comes down to a handful of reusable patterns. Line-by-line reading with hasNextLine() / nextLine() is for structured row data. Token-by-token reading with hasNext() / next() and typed variants is for flat value streams. The two-scanner pattern — line from the file, tokens from the line — handles CSV and other delimited formats cleanly.

When you need arrays but do not know the file size, use the count-first two-pass pattern: one Scanner to count, a new Scanner to fill. When you need to write results, PrintStream gives you the same print, println, and printf you already know from console output.

Watch for the buffer trap when mixing nextInt() with nextLine() — the leftover newline will get you if you forget to consume it.

These patterns cover the vast majority of file processing tasks you will encounter in this course and beyond. Master them, and you can handle any data file that lands on your desk.