arrays-and-algorithms 16 min read

Searching Arrays

Linear search walks the row; binary search halves the space (sorted required)

In a nutshell

You have an array. You want to know whether some target value is in it, and if so, at what index. There are two algorithms for this question, and the choice between them is the first algorithmic decision in this course where the shape of your data matters.

  • Linear search walks the array from 0 upward, comparing each slot against the target. It works on any array, sorted or not. In the worst case it inspects every slot.
  • Binary search jumps to the middle, decides whether the target is to the left or to the right, and halves the search space each step. It works only on a sorted array. In the worst case it inspects about $\log_2 n$ slots.

Both algorithms answer the same question and use the same return convention: return the index where the target was found, or return -1 if it was not. Today you will write both, trace a binary search step by step, and learn when each is the right tool.

Today in three sentences. Linear search is the everyday default and it works on any array. Binary search is faster but requires a sorted array. Both return -1 for “not found” and the index for “found.”

After this lesson, you will be able to:

  • Write linear search and binary search as static methods that return the index of the target (or -1).
  • Trace a binary search by hand using a lo / mid / hi table for a small input.
  • State the precondition that binary search relies on, and what happens if you call it on an unsorted array.
  • Compare the two algorithms by counting comparisons in the worst case.

From CSCD 110. Python’s nums.index(target) is linear search under the hood, and it raises ValueError if the target is missing. Java conventionally returns -1 instead of throwing. The standard library’s Arrays.binarySearch does the binary version, and it returns a negative number with a slightly different meaning (the negated insertion point) — but for everything you write this week, “found → index, not found → −1” is the convention.


Linear search and the -1 sentinel

The simplest possible array search. Walk every slot. As soon as the slot’s value equals the target, return the index. If you finish the loop without finding it, return -1.

public static int indexOf(int[] arr, int target) {
    for (int i = 0; i < arr.length; i++) {
        if (arr[i] == target) {
            return i;
        }
    }
    return -1;   // walked the whole array, no match
}

That is the entire algorithm. Read the body and notice three things.

The early return matters. As soon as the loop finds a match, it returns the index right there. The rest of the array is never inspected. This is what makes linear search “fast in the best case” — if the target happens to be at the front, you stop after one comparison.

The final return -1 matters too. If the loop finishes naturally (the index i reached arr.length), no match was found anywhere. The compiler will refuse to compile a method missing this final return: it cannot prove that the loop will always find a match, so it cannot let you fall off the end of the method.

-1 is a sentinel. It is a value chosen specifically because it can never be a valid array index (which always run 0..length-1). If a caller writes int idx = indexOf(arr, target); if (idx == -1) { ... }, the check is unambiguous. Using 0 as the sentinel would be a bug, because 0 is a perfectly valid index for “the target is in the first slot.”

Common pitfall: returning 0 for “not found”. Some students reach for 0 because it feels like “nothing.” It is not. 0 is a valid index. If indexOf(arr, target) returns 0, the caller cannot tell whether the target was found at the front or was missing. Always use -1 as the missing-element sentinel. Document the convention in the Javadoc.

For non-numeric arrays you compare with .equals(), not ==:

public static int indexOf(String[] arr, String target) {
    for (int i = 0; i < arr.length; i++) {
        if (arr[i] != null && arr[i].equals(target)) {
            return i;
        }
    }
    return -1;
}

The arr[i] != null guard handles the awkward case where some slots in the array are null (which they will be by default for any reference-type array; see lesson 6a). Calling .equals() on null throws NullPointerException.

Check your understanding. What does indexOf return for these calls, given int[] arr = {3, 7, 2, 9, 7}?

A. indexOf(arr, 9) B. indexOf(arr, 7) C. indexOf(arr, 100)

Reveal answer

A. 3. The value 9 is in slot 3. B. 1. The value 7 first appears at index 1; the loop returns immediately on the first match, so the second 7 (at index 4) is never reached. (If you wanted the last index, you would walk the array right to left.) C. -1. The loop runs to completion without finding a match.

Check your understanding. Spot the bug.

public static int indexOf(int[] arr, int target) {
    for (int i = 0; i < arr.length; i++) {
        if (arr[i] == target) {
            return i;
        } else {
            return -1;     // ???
        }
    }
    return -1;
}
Reveal answer

The return -1 inside the else branch fires on the first iteration whenever arr[0] does not equal the target. The method only ever inspects slot 0. The loop never reaches slot 1. The fix is to delete the entire else branch (the trailing return -1 after the loop already handles the not-found case correctly).


Binary search: halving the search space

Linear search inspects up to n slots in an n-element array. Binary search inspects up to about $\log_2 n$ slots — but it has a precondition: the array must already be sorted.

The idea, in English: keep track of a region of the array that might contain the target (call it lo to hi, inclusive). Look at the middle slot. If it matches, you are done. If the target is smaller than the middle, the answer (if any) must be in the left half: shrink hi to one less than the middle. If the target is larger, shrink lo to one more than the middle. Repeat. When lo passes hi, the region is empty, and the target is not in the array.

public static int binarySearch(int[] arr, int target) {
    int lo = 0;
    int hi = arr.length - 1;

    while (lo <= hi) {
        int mid = (lo + hi) / 2;
        if (arr[mid] == target) {
            return mid;
        } else if (arr[mid] < target) {
            lo = mid + 1;             // target must be to the right
        } else {
            hi = mid - 1;             // target must be to the left
        }
    }

    return -1;   // region became empty
}

Three subtle pieces.

The loop condition is lo <= hi. Both endpoints are inclusive. The region lo == hi still has one slot in it (the slot at index lo). The loop should run that final iteration. Off-by-one alert: writing lo < hi makes the loop quit one step too early.

mid = (lo + hi) / 2 rounds down. If lo = 0 and hi = 7, then mid = 3. If lo = 4 and hi = 7, then mid = 5. The bias does not matter for correctness; it just means you tend to look slightly left of true center. (Industrial code uses lo + (hi - lo) / 2 to avoid integer overflow when lo + hi exceeds Integer.MAX_VALUE. For Reges-style CS1 arrays you will never hit that.)

The mid + 1 and mid - 1 adjustments matter. When you decide the target is to the right, the new region must exclude the slot you just looked at. Otherwise you risk looping forever, repeatedly examining the same mid. Off-by-one alert: writing lo = mid instead of lo = mid + 1 is a classic infinite-loop bug.

If the array is not sorted, the algorithm still runs and still terminates, but its answer is meaningless. It might return -1 for a target that is in the array. It might return the index of one occurrence and miss another. Treat “the array is sorted” as a precondition that the caller is responsible for honoring; document it in the Javadoc.

Common pitfall: writing lo < hi for the loop condition. With <, the loop stops when lo == hi, which means the single-element region never gets inspected. The target can be in the array and the search will return -1. The correct condition is <=.

Common pitfall: writing lo = mid instead of lo = mid + 1. If the target is greater than arr[mid], the new region should exclude index mid (you already know it does not match). Forgetting the + 1 keeps mid in the region, and the very next iteration may pick the same mid, looping forever. The same applies to hi = mid vs hi = mid - 1.


Tracing a binary search by hand

Look up target = 41 in this sorted array of length 8:

Index 0 1 2 3 4 5 6 7
Value 3 8 12 19 25 33 41 58

Build a table. Each row is one iteration. Columns: lo, hi, mid, arr[mid], action.

Iter lo hi mid = (lo+hi)/2 arr[mid] Compare to target (41) Action
1 0 7 3 19 19 < 41 lo = 4
2 4 7 5 33 33 < 41 lo = 6
3 6 7 6 41 41 == 41 return 6

Three iterations. (Linear search would have inspected 7 slots before reaching the match.)

Now trace a miss. Look up target = 30 in the same array.

Iter lo hi mid arr[mid] Compare Action
1 0 7 3 19 19 < 30 lo = 4
2 4 7 5 33 33 > 30 hi = 4
3 4 4 4 25 25 < 30 lo = 5
4 5 4     lo > hi, exit return -1

Four iterations to be confident the target is missing. The region shrunk from 8 slots → 4 → 2 → 1 → 0, and once it was empty the loop exited.

The mechanical rhythm — mid, compare, narrow — is the whole algorithm. Practice on small inputs by hand until it is second nature. Trace tables are also exactly how you should solve binary-search exam questions: write the table, do not try to hold all of lo, hi, mid in your head at once.

Check your understanding. Trace binarySearch(arr, 12) on the same array {3, 8, 12, 19, 25, 33, 41, 58}. How many iterations? What does the method return?

Reveal answer
Iter lo hi mid arr[mid] Compare Action
1 0 7 3 19 19 > 12 hi = 2
2 0 2 1 8 8 < 12 lo = 2
3 2 2 2 12 12 == 12 return 2

Three iterations. Returns 2.

Check your understanding. What does binarySearch({3, 8, 12, 19}, 8) return?

Reveal answer
Iter lo hi mid arr[mid] Compare Action
1 0 3 1 8 8 == 8 return 1

One iteration. The integer division (0 + 3) / 2 = 1 lands on the match immediately.


When to use which

A small table.

  Linear search Binary search
Precondition None Array is sorted
Worst-case comparisons n about $\log_2 n$
Best-case comparisons 1 1
Code complexity About 4 lines About 12 lines
Easy to write correctly Yes Off-by-one prone
Works on String[] Yes (with .equals and a comparison rule) Yes (with .compareTo and a sorted array)

Pick linear search when the array is unsorted, when you are searching only a few times, or when you need every match (in which case you want the linear walk anyway, because binary search returns one index and forgets the rest). Pick binary search when the array is already sorted and you are going to search it many times — the cost of sorting once amortizes across many fast lookups.

A concrete number to internalize. For an array of one million elements:

  • Linear search worst case: one million comparisons.
  • Binary search worst case: about 20 comparisons.

The gap grows fast. For a billion elements, linear is a billion and binary is about 30. This is one of the few cases in CS1 where the algorithm choice matters more than the language, the loop syntax, or the variable names. Sort once, then search billions of times in 30 comparisons each.

Common pitfall: calling binary search on an unsorted array. The method runs and returns something, but the result is wrong: a target that is in the array can come back as -1. The standard library’s Arrays.binarySearch documents this with a hard “the results are undefined if the array is not sorted.” Treat your own binary search the same way: precondition it.

Check your understanding. You have an array int[] scores that students enter in arbitrary order. The program needs to answer “is 85 in this array?” exactly once. Which search should you use?

Reveal answer

Linear search. The array is unsorted, so binary search would either give wrong answers or you would have to sort first. Sorting an n-element array costs at least n log n comparisons (much more than the n of a single linear walk). For a single search on an unsorted array, linear wins.

Check your understanding. A program reads 10,000 sorted student IDs into an array, and over the course of the day handles 50,000 lookup queries: “is ID X in the array?” Which search should it use, and roughly how many comparisons total?

Reveal answer

Binary search. Each lookup costs at most about $\log_2 10\,000 \approx 14$ comparisons. Across 50,000 queries, that is roughly 700,000 comparisons. Linear search would average 5,000 comparisons per lookup (worst case 10,000), totaling 250 million to 500 million. Same answer; very different runtime.


Wrap up and what’s next

Recap.

  • Linear search walks every slot until it finds a match. Works on any array. Returns the index, or -1 for not found.
  • Binary search keeps a [lo, hi] region, halves it each iteration by comparing against arr[mid], and returns the index or -1. Works only on sorted arrays.
  • Use -1 as the not-found sentinel. Never use 0.
  • The binary-search shape — lo <= hi, mid = (lo + hi) / 2, lo = mid + 1 and hi = mid - 1 — is off-by-one prone. Trace small examples on paper until the pattern is automatic.
  • For a few searches on an unsorted array, prefer linear. For many searches on data that can be kept sorted, binary.

What you can do now. Write either search method from a blank page. Trace a binary search by hand on a small array. Decide which algorithm fits a given problem, and explain the trade-off in one sentence.

Next up: Sorting an Array. Two ways to put an array in order. Selection sort scans for the min and swaps it to the front. Insertion sort grows a sorted prefix one element at a time. Both rest on the swap pattern from this week, both are O(n²), and the choice between them turns out to depend on how nearly sorted the input already is.


  • Reges & Stepp, Building Java Programs, Chapter 13 sections 13.1 (linear) and 13.2 (binary). The Reges treatment is the source of the -1-sentinel convention.
  • FAQ entries on infinite loops in binary search and on Arrays.binarySearch’s slightly different return convention.