student@ubuntu:~$
pointers-memory Lesson 8 11 min read

Pointer Arithmetic & Arrays

p + n scales by sizeof(*p), arr[i] is *(arr + i), and array parameters decay

Based on content from Dr. Stu Steiner, Eastern Washington University.

Reading: Hanly & Koffman: §7.5 (pp. 390–403); K&R §5.3–§5.4 (pp. 97–100)

In a nutshell

When you write p + 1 in C, the address does not move by one byte. It moves by one element: sizeof(*p) bytes. For an int * on x86-64, that is four bytes. This scaling rule is the reason arrays and pointers click together so cleanly. The C standard literally defines a[i] as *(a + i), so a subscript is pointer arithmetic in disguise. In most expressions an array name decays into a pointer to its first element, which is why you can pass arr to a function and index through the pointer on the other side. The scaling rule is also why Heartbleed (CVE-2014-0160) was possible: C does not bounds-check pointer arithmetic, so an attacker-controlled length walks right past the end of your buffer.

Practice this topic: Pointer Arithmetic drill, or browse the practice gallery.

After this lesson, you will be able to:

  • Compute the byte-accurate result of p + n for any pointer type
  • Explain why a[i] and *(a + i) are defined as the same expression
  • Walk an array with either a subscript or a pointer, and translate between the two
  • Say why sizeof(arr) inside a function returns 8, not the array’s total bytes
  • Parse *p + 1, *(p + 1), *p++, and (*p)++ correctly, and know why they differ

Quick reference

Expression Parses as Type Effect
p + n p + n T * advance by n * sizeof(T) bytes
a[i] *(a + i) T element at index i (language definition)
*p + 1 (*p) + 1 T integer add on the dereferenced value
*(p + 1) *(p + 1) T pointer arithmetic, then dereference
*p++ *(p++) T read *p, then advance p
(*p)++ (*p)++ T post-increment the pointed-to object

Coming from CSCD 210

In Java an int[] is an object. It carries its length in arr.length, and every access goes through a bounds check that throws ArrayIndexOutOfBoundsException on overflow. In C an array is a run of bytes with a name. The name decays into a pointer in almost every expression, and once it has decayed there is no length attached. No bounds check happens at runtime, so arr[100] on a size-10 array reads whatever is 400 bytes past the start. Every function that takes an array has to also take a length parameter, because the pointer alone cannot tell the callee how many elements are there.


The scaling rule

Memory is byte-addressable. Every byte has its own address. An int on the lab machines (x86-64 Linux) occupies four bytes. So an int array at address 0x7540 lays out like this:

Element Address Bytes
arr[0] 0x7540 0x75400x7543
arr[1] 0x7544 0x75440x7547
arr[2] 0x7548 0x75480x754B
arr[3] 0x754C 0x754C0x754F

The gap between elements is four bytes, not one. C’s pointer arithmetic accounts for that automatically: for any pointer p of type T *,

p + n  ==  (byte address of p) + n * sizeof(T)

So if arr lives at 0x7540:

arr + 0  ->  0x7540   /* type int * */
arr + 1  ->  0x7544   /* 0x7540 + 1*4 */
arr + 2  ->  0x7548   /* 0x7540 + 2*4 */
arr + 3  ->  0x754C   /* 0x7540 + 3*4 */
arr + 4  ->  0x7550   /* legal to form; illegal to dereference */

The last line is the one-past-the-end pointer. The C standard guarantees you can form it and compare against it; dereferencing it is undefined behavior. That one-past-the-end pointer is what powers the idiomatic for (int *p = arr; p < arr + n; p++) loop.

Canonical sizes on the lab machines

Type sizeof
char 1
int 4
long 8
double 8
any T * 8

These are not the ISO C guarantees (the standard only promises minimum ranges); they are what gcc on Linux x86-64 produces, which is what every CSCD 240 quiz will assume. When you trace pointer arithmetic on paper, always show the scale factor:

base + n * sizeof(T) = result

If the pointer is int *, the scale is 4. If it is double *, long *, or any pointer-to-pointer, the scale is 8. void * cannot be incremented in standard C because it has no element size to scale by.

Check your understanding (fill in the blank)

long data[4];     /* lives at 0xA000 */
long *p = data;

p + 2 has type ______ and value 0x______. p + 4 has type ______ and value 0x______; dereferencing it is __.

Reveal answer
  • p + 2 has type long * and value 0xA010 (0xA000 + 2 * sizeof(long) = 0xA000 + 16 = 0xA010).
  • p + 4 has type long * and value 0xA020 (0xA000 + 4 * 8 = 0xA020). It is the one-past-the-end pointer; forming it is legal, dereferencing it is undefined behavior.

Arrays and pointers are the same expression

a[i] is defined as *(a + i)

This is the language definition, not an implementation detail. ISO/IEC 9899:2018 §6.5.2.1 paragraph 2: “The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).” Three consequences:

  • a[i] and *(a + i) produce the same value, the same type, and the same machine code. You can mix the notations freely.
  • Because addition commutes, a[i] is the same as i[a]. 5[arr] is a legal (if ugly) way to write arr[5]. Do not write it in production, but know why it is legal: [] is pointer arithmetic.
  • Indexing works identically on stack arrays, malloc-ed blocks, and function parameters, because the compiler emits the same arithmetic regardless of where the storage lives.

Array-name decay

When you declare int arr[5], the name arr has type “array of 5 int.” In almost every expression, C silently converts it to int *, pointing at &arr[0]. This is array-to-pointer decay (§6.3.2.1 paragraph 3).

Three contexts where decay does not happen:

  1. sizeof(arr) returns the size of the whole array, not the size of a pointer.
  2. &arr yields a pointer of type int (*)[5], the address of the array as a whole object.
  3. A string literal used to initialize an array: char s[] = "hello"; copies the literal rather than decaying it.

Everywhere else (arr + 1, arr[i], printf("%p", arr), foo(arr), arr == other) the name arr is an int * pointing at the first element.

sizeof(arr) versus sizeof(p)

Inside the scope that declared the array, sizeof tells the truth:

int arr[10];
size_t count = sizeof(arr) / sizeof(arr[0]);   /* 40 / 4 == 10 */

Cross a function boundary, and the story changes. ISO §6.7.6.3 paragraph 7 says: “A declaration of a parameter as ‘array of type’ shall be adjusted to ‘qualified pointer to type.’” All three of the following declare the same function:

void f(int arr[]);     /* looks like an array parameter   */
void f(int arr[10]);   /* the 10 is ignored by the compiler */
void f(int *arr);      /* what the compiler actually sees  */

The [10] is decoration. Inside f, the parameter is int *, and sizeof(arr) is 8, not 40.

void print_all(int arr[])
{
    size_t n = sizeof(arr) / sizeof(arr[0]);   /* 8 / 4 == 2. WRONG. */
    for (size_t i = 0; i < n; i++) {
        printf("%d\n", arr[i]);
    }
}

The sizeof(arr) / sizeof(arr[0]) idiom only works inside the scope that declared the array. Across a function boundary, pass the length as a separate parameter.

Check your understanding (predict the output)

#include <stdio.h>

void inspect(int arr[20])
{
    printf("inside:  %zu\n", sizeof(arr));
}

int main(void)
{
    int a[20];
    printf("outside: %zu\n", sizeof(a));
    inspect(a);
    return 0;
}
Reveal answer

Prints:

outside: 80
inside:  8

Inside main, a still has array type, so sizeof(a) is 20 * sizeof(int) = 80. Inside inspect, the int arr[20] parameter has already been adjusted to int *arr; the [20] is cosmetic. sizeof(arr) is sizeof(int *) = 8.

The fix is to pass the length: void inspect(int *arr, size_t n).

For the full machine-level story of array layout, &arr vs arr, and how a compiler lowers arr[i] to an address plus offset, see the machine model deep dive.


Precedence traps

The precedence table has 15 levels. Today you need four rows.

Precedence Operators Associativity
Highest (postfix) ++ -- (postfix), [], () left-to-right
Next (prefix/unary) ++ -- (prefix), unary *, & right-to-left
Additive binary +, binary - left-to-right
Relational < > <= >= == != left-to-right

Two facts fall out. Unary operators bind tighter than binary +, so *p + 1 is (*p) + 1, not *(p + 1). Postfix binds tighter than prefix, so *p++ is *(p++), not (*p)++.

*p + 1 versus *(p + 1)

With int arr[] = {10, 20, 30, 40} at 0x7540 and int *p = arr;:

Expression Parse Type Value
*p + 1 (*p) + 1 int 10 + 1 = 11
*(p + 1) *(p + 1) int *0x7544 = 20

Same operators, different parentheses, different type-value pair. *p + 1 dereferences first, then does integer addition. *(p + 1) does pointer arithmetic first (scaled by sizeof(int)), then dereferences.

*p++ versus (*p)++

Expression Parse What it does
*p++ *(p++) read *p, then advance p by one element
*++p *(++p) advance p first, then read *p
(*p)++ (*p)++ post-increment the object p points to; p unchanged
++*p ++(*p) pre-increment the object p points to; p unchanged

K&R §5.3 uses *p++ in the canonical string copy:

while ((*dst++ = *src++) != '\0') { }

Every iteration reads one byte from src, writes it to dst, and advances both pointers. When the read hits the \0 terminator, the loop exits (because \0 is also copied, and then compared to \0, which is false).

Defensive habit: parenthesize when operators cross classes

CERT C EXP00-C puts it plainly: when an expression mixes operators from different precedence classes, parenthesize. The classic traps:

  • x & 0xFF == y parses as x & (0xFF == y) because == binds tighter than &. Write (x & 0xFF) == y.
  • *p + 1 vs *(p + 1): different type-value pair. Parenthesize the one you mean.
  • *p++ vs (*p)++: different semantics. If you want “increment the object p points to,” write (*p)++.

If you cannot draw the parse tree for an expression in under ten seconds, add parentheses. The compiler optimizes redundant parentheses away; the reader of your code keeps their sanity.

Check your understanding (what-is-wrong)

A student writes this to print every element of a 10-element array:

void print_all(int arr[])
{
    for (int i = 0; i <= sizeof(arr) / sizeof(arr[0]); i++) {
        printf("%d\n", arr + i);
    }
}

Three bugs hide in four lines. Find all three.

Reveal answer
  1. sizeof(arr) inside the function returns 8 (the size of the pointer), not 40. sizeof(arr) / sizeof(arr[0]) is 2, not 10. The function only reads the first two elements. Fix: take the length as a parameter (int *arr, size_t n).
  2. i <= n (when n is the element count) accesses one past the last element, which is undefined behavior for a read through the subscript. Use i < n.
  3. printf("%d\n", arr + i) prints the address, not the element. %d with an int * argument is a format-specifier mismatch (undefined behavior). Write arr[i] or *(arr + i).

Corrected:

void print_all(const int *arr, size_t n)
{
    for (size_t i = 0; i < n; i++) {
        printf("%d\n", arr[i]);
    }
}

What comes next

Given int arr[] = {10, 20, 30, 40}; stored at address 0x7540, and int *p = arr;, which rows are correct?
Ap + 2 has type int * and value 0x7548.
B*p + 1 has type int * and value 0x7544.
C*(p + 1) has type int and value 20.
Darr[3] and *(arr + 3) produce identical machine code.
EInside void f(int a[4]), sizeof(a) is 16.
Correct: A, C, D.
  • B is wrong: *p + 1 parses as (*p) + 1, which is 10 + 1 = 11, type int. The type is int, not int *.
  • E is wrong: the [4] in the parameter is adjusted to int *, so sizeof(a) is 8, not 16.

In C, every time you do pointer arithmetic on an untrusted index, you are the bounds check. The memory-safety deep dive walks through Heartbleed and the other named exploits built on exactly this omission.

Next, Double Pointers introduces int **, the level of indirection you need when a function must reassign the caller’s pointer (not just write through it). Drill this page with the Pointer Arithmetic skill card, or browse the practice gallery.