file-io 18 min read

The I/O Class Recognition Guide

Why the JDK has so many reader and writer classes, and which one this course actually uses

In a nutshell

You have learned two I/O classes this week: Scanner (for reading) and PrintStream (for writing). The first time you peek into real Java code outside of class, you will see five or six other names: FileReader, BufferedReader, FileWriter, BufferedWriter, PrintWriter, Files.readString. None of those are bugs, none of those are wrong. They each solve the same general problem (move characters between memory and disk) optimized for a different situation.

This lesson is a recognition guide, not a fluency guide. After it, you will be able to:

  • Open a file in a textbook or a Stack Overflow answer and translate the I/O code in your head: “this is reading line by line, just like Scanner; this is writing tokens, just like PrintStream.”
  • Explain in one sentence why the JDK has both PrintStream and PrintWriter, and why this course picks PrintStream.
  • Recognize the byte-stream / character-stream split for what it is (a 1996 vs 1997 historical layer that still shows up in every modern Java program).
  • Pick the right reader and writer for an unfamiliar problem, given two facts: how big is the file, and do you need to parse types?

Two new APIs are not coming this week. The point is the opposite: by knowing what else is out there, you’ll trust your own choices more.

Today in three sentences. Java has many I/O classes because each is tuned for a different job (binary vs text, performance vs convenience, type parsing vs raw read). For this course, Scanner + PrintStream cover everything we do. The rest of the JDK I/O exists, you’ll see it in real code, and the recognition table below is enough to read it.

From CSCD 110. Python’s file API is small: open(path, mode) returns one object that can be iterated as lines, read as a string, or written to. Java’s API fans out across roughly fifteen classes that each do one slice of the same job. The fan-out exists because Java is twenty-eight years old and has never deleted an I/O class. Recognizing the families is most of the battle.


Why so many classes?

Look at any modern Java codebase and you’ll find I/O written four or five different ways, often inside the same project. That isn’t laziness; it’s the result of three design splits the JDK has been carrying since 1996.

Split 1: bytes vs characters. A file on disk is just a sequence of bytes (numbers from 0 to 255). A character is a human-readable symbol like A, é, or 🌮. The bridge between them is a character encoding (UTF-8, ASCII, windows-1252, etc.). Java 1.0 in 1996 only had byte streams; if you wanted to read text, you read bytes and converted manually. Java 1.1 in 1997 added a parallel hierarchy of character streams that handle the byte-to-character conversion for you. Both still exist in modern Java. Both are useful: byte streams for binary files (images, executables, ZIP archives), character streams for text (CSV, JSON, Java source code).

Split 2: raw vs convenient. A raw stream like FileInputStream knows how to give you one byte at a time. A convenient stream like Scanner or PrintStream adds methods like nextInt, println, and printf that handle parsing and formatting. The convenient classes are usually built on top of the raw ones. You can almost always replace a chain of two or three raw classes with a single convenient one, and vice versa.

Split 3: unbuffered vs buffered. Reading or writing one byte at a time is slow because each operation crosses from your program into the operating system. Buffered classes like BufferedReader and BufferedWriter accumulate ~8 KB at a time, then move it across the boundary in one big chunk. That single design choice is often the difference between a program that takes a minute and one that takes a millisecond on the same file.

Three splits, two values each, gives roughly eight cells in the matrix. The JDK fills most of those cells with a class. That’s why the family tree is wide. The good news: you only need to memorize where Scanner and PrintStream sit on the map.

Common pitfall: assuming the most-mentioned class is the best one. BufferedReader is the class students see most often in Stack Overflow answers because professional code does prefer it for performance. That doesn’t make it wrong to use Scanner in CS1; the two solve different problems (Scanner parses tokens, BufferedReader reads raw lines fast). Pick the class whose API matches the job you have, not the one with the biggest internet presence.


The read side

Five classes you’ll actually see in the wild. The first one is what you’ve been using; the rest are the alternatives.

Class Family What you say to it What it gives back Best for
Scanner utility (Java 5) nextInt, nextDouble, nextLine, next typed values (int, double, String) CS1, APE, mixed-type files. Slow on huge files because of regex parsing.
BufferedReader character + buffered (Java 1.1) readLine one String per line Large text files where you don’t need type parsing. The professional default.
FileReader character (Java 1.1) read one int (the next character) Rarely used directly. Wraps inside BufferedReader in older code.
FileInputStream byte (Java 1.0) read one int (the next byte) Binary files: images, ZIP, compiled .class, anything that isn’t text.
Files.readString(path) utility (Java 11) nothing (single call) the whole file as one String Tiny config or template files. UTF-8 by default. Don’t use on large files (loads the whole thing into memory).

A small map: each cell tells you what shape the class returns. Scanner returns parsed values. BufferedReader returns whole lines. FileReader returns one character at a time. FileInputStream returns one byte. Files.readString returns the entire file in one shot.

In your own code this term, every “read” is a Scanner. When you see BufferedReader in someone else’s code, translate it as “Scanner without the type parsing, just gives me whole lines.”

Reading the same file three ways

To anchor the recognition skill, here is the same job (sum the integers in a file, one per line) written three ways. You will only write the first one. The other two are for recognition.

The Scanner version (CS1):

Scanner sc = new Scanner(new File("nums.txt"));
int sum = 0;
while (sc.hasNextInt()) {
    sum += sc.nextInt();
}
sc.close();

The BufferedReader version (professional):

BufferedReader br = new BufferedReader(new FileReader("nums.txt"));
int sum = 0;
String line;
while ((line = br.readLine()) != null) {
    sum += Integer.parseInt(line.trim());
}
br.close();

The Files.readAllLines version (tiny file, all-at-once):

List<String> lines = Files.readAllLines(Path.of("nums.txt"));
int sum = 0;
for (String line : lines) {
    sum += Integer.parseInt(line.trim());
}

Read all three. They produce the same answer. The BufferedReader version returns each line as a raw String, so you have to call Integer.parseInt(line) yourself; that is the price of giving up Scanner’s type parsing. The Files.readAllLines version is one line shorter still, but it loads the whole file into memory, so it isn’t the right tool when the file might be millions of lines.

Common pitfall: writing a BufferedReader because Stack Overflow showed one. If your file is small (under a few megabytes) and you need to parse types, Scanner is the right answer. The BufferedReader pattern is only better when (a) the file is large and (b) you don’t need type parsing. Two conditions, both required. CS1 problems satisfy neither.

Check your understanding. A program needs to read a 50 GB log file and count the lines that start with ERROR. Which class is the right tool, and why?

Reveal answer

BufferedReader.readLine() in a loop. Reasons: (1) the file is far too large for Files.readAllLines or Files.readString (they would load 50 GB into RAM and crash); (2) we don’t need type parsing, since we’re just looking at the start of each line as a String. Scanner would also work but is noticeably slower on multi-gigabyte files because of regex overhead. The buffered, line-oriented reader is the right shape for this shape of file.


The write side

The same fan-out, mirrored. You’ve been using PrintStream. The alternatives are useful to recognize.

Class Family Convenience methods? Best for
PrintStream byte + buffered (Java 1.0) yes (print, println, printf) CS1, APE, anything System.out does. Same API as the console.
PrintWriter character + buffered (Java 1.1) yes (print, println, printf) Same API as PrintStream, better for non-English text (encoding-aware). Common in professional code.
BufferedWriter character + buffered (Java 1.1) no (just write and newLine) High-performance writing of plain String data, no formatting.
FileWriter character (Java 1.1) no (just write) Simple text output. Has a built-in append-mode constructor. Often wrapped in BufferedWriter.
FileOutputStream byte (Java 1.0) no (just write) Writing binary files. Also: the way to get append mode for PrintStream.
Files.writeString(path, s) utility (Java 11) nothing (single call) Tiny output files, one shot, UTF-8 by default.

The recognition shape: every line of “convenient text writing” boils down to send a string out a pipe. PrintStream and PrintWriter give you the formatted-text API on that pipe. BufferedWriter and FileWriter only give you raw write calls. FileOutputStream is the binary version, useful for images and for the append-mode trick we’ll see in the next lesson.

When the JDK uses each

It helps to know where each class shows up in the standard library and in real code:

  • System.out and System.err are PrintStream instances (Java 1.0 backwards compatibility: that decision predates the character-stream split).
  • Most modern frameworks (Spring, JUnit, web servers) write text with PrintWriter because it handles non-English characters cleanly.
  • Logging frameworks (java.util.logging, Log4j, SLF4J) usually write through a BufferedWriter for speed.
  • Build tools (Gradle, Maven) read configuration with Files.readString and write small reports with Files.writeString.

For this course, the choice is PrintStream for the same reason System.out is a PrintStream: the API is already in your fingers, and English-only output for a homework file does not need encoding awareness.

Common pitfall: mixing PrintWriter and PrintStream in the same program. They look identical at the call site (out.println(...)). If you copy a snippet that uses PrintWriter into a program that uses PrintStream, the code may compile but the imports will be wrong. Pick one for a given file and stay consistent. The simplest “always works” rule for CS1: PrintStream, because that’s what System.out is.

Check your understanding. Why does System.out.println("résumé") work fine on macOS but produce mangled characters when run on a Windows machine before Java 18?

Reveal answer

PrintStream is a byte stream. To print a character, it converts that character to bytes using a default character encoding. Before Java 18, that default was OS-specific: UTF-8 on macOS and Linux, but windows-1252 on Windows. The character é exists in both encodings but at different byte values, so a file written on one platform looks scrambled on another. Java 18 made UTF-8 the default everywhere (JEP 400), eliminating most of these “looks fine here, broken there” bugs. PrintWriter lets you specify the encoding explicitly, which is why professional code prefers it for international text.


Bytes vs characters: the recognition story

The whole “byte stream vs character stream” split confuses students every term because the names look interchangeable. Here is the one-paragraph version worth keeping.

A byte is a number from 0 to 255 stored on disk. A character is a human-readable symbol. They are not the same thing. ASCII gives you 128 characters that fit in one byte each, so for those characters, “byte” and “character” feel identical. Outside ASCII (accented letters, Cyrillic, Chinese, emoji, every non-English alphabet), one character takes more than one byte. The translation table is called a character encoding, and UTF-8 is the modern standard.

The byte-stream classes (InputStream, OutputStream, FileInputStream, FileOutputStream, PrintStream) read and write bytes directly. If a single character is multiple bytes, they neither know nor care; they treat it as a sequence of bytes.

The character-stream classes (Reader, Writer, FileReader, FileWriter, BufferedReader, BufferedWriter, PrintWriter) read and write characters, applying the encoding behind the scenes. They handle multi-byte characters correctly without any extra effort from you.

For pure ASCII text in CS1 (English letters, digits, punctuation), the two families are functionally interchangeable. The character-stream classes only earn their keep when the file might contain anything beyond plain ASCII, which is most files in production code.

Common pitfall: thinking PrintStream is the “old” class and PrintWriter is the “new” one. Both still exist for a reason. PrintStream is what System.out is, and Java cannot rename it without breaking every Java program ever written. PrintWriter was added later as the proper character-aware version. Modern code uses both, depending on context. Neither is deprecated.


Picking the right tool for an unfamiliar problem

Two questions decide most file-I/O class choices. Memorize them.

Question 1: do you need to parse types as you read? If yes (you want nextInt / nextDouble to give you typed values), Scanner is the answer almost regardless of file size. If no (you want each line as a String and you’ll handle parsing yourself), reach for BufferedReader for large files or Files.readAllLines for small ones.

Question 2: how big is the file? “Small” (under a few megabytes) means you can afford to load the whole thing into memory; Files.readString and Files.readAllLines are clean one-liners. “Large” (tens of megabytes or more) means you need to stream it; use BufferedReader.readLine in a loop and process one line at a time so memory stays flat.

For writing, the questions are similar. Need printf / println? PrintStream (CS1) or PrintWriter (professional). Just need to dump a String? Files.writeString. Maximum throughput? BufferedWriter.

A tiny reference card you can keep on the side of your monitor:

Reading:
  parse types?       -> Scanner
  large + lines?     -> BufferedReader
  tiny + all-at-once -> Files.readString or Files.readAllLines
  binary?            -> FileInputStream

Writing:
  println / printf?  -> PrintStream (CS1) or PrintWriter
  raw fast?          -> BufferedWriter
  tiny + one shot?   -> Files.writeString
  binary?            -> FileOutputStream

Eight cells. Most professional programs only ever touch three or four. CS1 only touches two: Scanner and PrintStream.

Check your understanding. You’re reading a 200-line CSV of student scores. Each line is name,grade1,grade2,grade3. You want to compute each student’s average. Which reader is most natural and why?

Reveal answer

Two reasonable answers. Scanner with useDelimiter(",") works but is awkward because the delimiter changes between commas (within a line) and newlines (between students). The cleaner shape is Scanner (or BufferedReader) with nextLine plus String.split(",") per line. Each line is read as a single String, then split on commas, then parsed: parts[0] is the name, Integer.parseInt(parts[1..3]) gives the grades. The nextLine + split pattern (introduced in lesson 7c) is the idiomatic CSV reader in CS1 Java. For a 200-line file, Files.readAllLines is also fine. For a 200-million-line file you’d want BufferedReader.readLine in a loop to keep memory bounded.


Wrap up and what’s next

Recap.

  • Java’s I/O zoo is wide because of three design splits (byte vs character, raw vs convenient, unbuffered vs buffered) that the JDK has carried since 1996.
  • This course writes Scanner for reading and PrintStream for writing, because Scanner gives you type parsing and PrintStream gives you the same API as System.out.
  • The other classes you will see in real code (BufferedReader, BufferedWriter, FileReader, FileWriter, PrintWriter, FileInputStream, FileOutputStream, Files.readString, Files.writeString) all solve the same general problem but optimize for size, encoding, or performance.
  • The byte-stream / character-stream split is the bridge between “raw bytes on disk” and “human-readable characters in memory.” For ASCII text, the two are interchangeable; outside ASCII, character streams handle the encoding.
  • Two questions decide the right reader: do you need to parse types, and is the file large? Two analogous questions decide the right writer.

What you can do now. Read someone else’s Java code and translate the I/O calls without panic. Explain to a classmate why their BufferedReader and your Scanner both compile, both work, and pick different trade-offs. Pick the right tool when you encounter a file-shape this course didn’t teach against (a giant log, a tiny config, a CSV).

Next up: File-Class Power Tools. What the File class can do beyond exists() and getAbsolutePath(). Append mode (so PrintStream doesn’t clobber yesterday’s log). Creating parent directories on the fly. Inspecting, deleting, and renaming files. The operations that turn “I can read and write a file” into “I can manage a folder full of them.”


  • The java.io package overview lists every reader and writer class with one-line summaries. Skim it once after this lesson.
  • Reges & Stepp, Building Java Programs, Chapter 6 mentions BufferedReader and PrintWriter briefly; the comparison tables here are a more complete map.
  • The full reference dump in COMPREHENSIVE-IO-REFERENCE.md (course materials) covers Files.lines, Files.walk, the NIO.2 Path API, and Java 18’s UTF-8 default if you want to go deeper.