student@ubuntu:~$
c-foundations Lesson 10 12 min read

Arrays & Strings

Fixed-size arrays, null-terminated strings, and bounded I/O

Based on content from Dr. Stu Steiner, Eastern Washington University.

Reading: Hanly & Koffman: §7.1–7.2 (pp. 378–383), §8.1–8.4 (pp. 456–479)

In a nutshell

Java’s int[] knew its own length. Java’s String was an object with methods. C does neither. A C array is a fixed-size, contiguous block of memory that does not carry its length and does not check its bounds. A C string is a char array whose last used byte is '\0'; every string function scans forward until it finds that byte. If the byte is missing, the function keeps reading. This lesson covers the habits that make arrays and strings safe to use without the safety net.

Practice this topic: Arrays or Strings drills, or browse the practice gallery.

After this lesson, you will be able to:

  • Declare and initialize C arrays, including zero-filling with = {0}
  • Compute element count with sizeof(arr) / sizeof(arr[0]) and say why it fails inside functions
  • Say why a C string is a char array with '\0' at the end
  • Use strlen, strcmp, strcpy, and strcat from <string.h>
  • Read input with fgets and say why gets is never an acceptable substitute
  • Cap a scanf("%s", ...) read with a width specifier

Quick reference

Goal C code Notes
Fixed-size array int scores[5]; uninitialized; holds garbage
Zero-filled array int zeros[100] = {0}; first 0 is duplicated for the rest
Partially initialized int g[5] = {90, 85}; remaining slots get 0
Element count (where declared) sizeof(arr) / sizeof(arr[0]) does not work on function parameters
String literal char s[] = "hi"; 3 bytes including '\0'
String length strlen(s) does not count the '\0'
Compare strings for equality strcmp(a, b) == 0 zero means equal
Read a word, bounded scanf("%49s", name) for char name[50] leaves room for '\0'
Read a line, bounded fgets(buf, sizeof(buf), stdin) reads at most size - 1 chars
Dangerous input function gets(buf) never; removed from C11

Coming from CSCD 210

In Java you called arr.length, s.length(), s.equals(other), and the language refused out-of-bounds access. In C you do none of that. A “string” is a plain char array with a sentinel byte at the end. Every <string.h> function you will meet exists because the language does not have string methods. The techniques below are the defensive habits for working without the safety net.

Style note for this lesson

This lesson appears in week 5. The course has moved from C90 to C99, so for (int i = 0; ...) is legal. For consistency with the earlier lessons, examples below still declare the loop counter at the top and use for (i = 0; ...). That style compiles under both C90 and C99.


Arrays without a safety net

In Java, arr[100] on a size-10 array throws ArrayIndexOutOfBoundsException. In C, the same access silently does something. Maybe it reads garbage. Maybe it overwrites another variable. Maybe it crashes. C does not check array bounds, ever.

Declaring arrays

int scores[5];                            /* 5 ints, UNINITIALIZED (garbage) */
int grades[5] = {90, 85, 77, 92, 88};     /* fully initialized */
int zeros[100] = {0};                     /* first element 0, rest auto-filled with 0 */
int partial[5] = {90, 85};                /* rest auto-filled with 0 */

Always initialize. Either with values or with = {0} to zero-fill. An uninitialized array holds whatever bytes were last in that memory.

Size must be a compile-time constant

#define MAX_SCORES 10
int scores[MAX_SCORES];       /* fine */

int n = 100;
int buf[n];                   /* C99 VLA; not in C90; avoid */

A literal, a #define, or (in C99+) an enum value all qualify. A const int variable does not qualify in C (it does in C++). Variable-length arrays (VLAs) are legal in C99 but optional in C11 and poorly supported across toolchains. For a runtime-sized buffer, use malloc (covered this week).

No .length property

C arrays do not know their own size. You track it yourself:

int grades[5] = {90, 85, 77, 92, 88};
int size = sizeof(grades) / sizeof(grades[0]);   /* 20 / 4 = 5 */
int i;

for (i = 0; i < size; i++) {
    printf("%d\n", grades[i]);
}

The sizeof trick works where the array was declared. It does not work inside functions that receive the array as a parameter. Inside a function, sizeof(arr) gives the size of a pointer (typically 8 on 64-bit), not the array.

Passing arrays to functions

When you pass an array to a function, the function receives a pointer to the original storage, not a copy. You must also pass the size.

void print_array(const int arr[], int size)
{
    int i;
    for (i = 0; i < size; i++) {
        printf("%d ", arr[i]);
    }
    printf("\n");
}

const documents “I promise not to write to this” and the compiler enforces it. Modifying array elements inside the function affects the caller’s array.

Check your understanding (what is the bug?)

void total(int arr[])
{
    int i;
    int sum = 0;
    int n = sizeof(arr) / sizeof(arr[0]);
    for (i = 0; i < n; i++) {
        sum += arr[i];
    }
    printf("sum = %d\n", sum);
}

The caller passes a 10-element array. total prints a sum much smaller than expected. What is wrong?

Reveal answer

Inside total, arr is a pointer to the first element, not a 10-element array. On 64-bit systems sizeof(arr) is 8 and sizeof(arr[0]) is 4, so n is 2, and only the first two elements get summed.

Fix: pass the size as a separate parameter.

void total(const int arr[], int n) { ... }

Strings as character arrays

There is no String class in C. A string is a char array ending with '\0' (one byte, value 0). Every string function walks forward until it hits that byte.

char greeting[] = "Hello";

In memory:

Index:  [0]   [1]   [2]   [3]   [4]   [5]
Value:  'H'   'e'   'l'   'l'   'o'   '\0'

The array is 6 bytes: 5 characters plus the terminator. If the byte is missing, the function keeps reading past the array.

For the historical reason C strings look this way (register pressure on the PDP-11, BCPL heritage), see the deep dive on C standards and string history.

<string.h> functions

#include <string.h> to use these.

char name[] = "Alice";
printf("%zu\n", strlen(name));            /* 5, not 6 */

if (strcmp(name, "Alice") == 0) { /* equal */ }
if (strcmp(name, "Bob") < 0) { /* name sorts before "Bob" */ }

char dest[50];
strcpy(dest, "Hello");                    /* dest is now "Hello\0" */

char greeting[50] = "Hello, ";
strcat(greeting, "world!");               /* greeting is now "Hello, world!" */

strlen returns size_t. Use %zu or cast to (int).

strcmp returns 0 when equal, a negative value when the first sorts before the second, a positive value when it sorts after. Zero means equal: memorize it. The idiom is strcmp(a, b) == 0 for equality; writing strcmp(a, b) == 1 is non-portable (the standard only promises the sign).

strcpy and strcat do not check destination size. Copying a 100-character string into a 50-character buffer overflows silently. Use bounded variants (strncpy, strncat, or POSIX strlcpy, strlcat) when the source is not a known literal.

Never use == on strings

if (a == b)            /* compares two ADDRESSES, not contents */

Because a string is an address, a == b is “do both names refer to the same memory location?” not “do they hold the same text?” Two arrays holding identical text compare as not equal. This is CWE-597.

Check your understanding (predict the output)

#include <stdio.h>
#include <string.h>

int main(void)
{
    char a[] = "apple";
    char b[] = "apricot";

    printf("%d\n", strcmp(a, b) == 0);
    printf("%d\n", strcmp(a, b) == -1);
    printf("%d\n", strcmp(a, b) < 0);
    return 0;
}
Reveal answer
0
0 or 1 (depends on libc)
1
  • strcmp(a, b) == 0 is false. Prints 0.
  • strcmp(a, b) == -1 is the anti-pattern. On glibc it usually prints 1 because strcmp returns exactly -1 for “apple” vs “apricot”; on some other libcs the result is -2 or -14, and the line prints 0. Do not write this.
  • strcmp(a, b) < 0 is the portable form of “a sorts before b.” Prints 1.

Reading strings safely

scanf("%s", ...) stops at whitespace and has no upper bound on how many characters it writes unless you give it one. Two bounded options.

Width specifier for a single word

char name[50];
scanf("%49s", name);              /* at most 49 chars, leave room for '\0' */

The cap is one less than the array size so the null terminator fits. Lab 1 does not require this cap, but the reflection on CWE-120 walks through what happens without it.

fgets for a whole line, including spaces

char line[100];
fgets(line, sizeof(line), stdin);        /* reads at most 99 chars plus '\0' */
line[strcspn(line, "\n")] = '\0';        /* strip the trailing newline if present */

fgets(buf, N, stream) reads at most N - 1 characters (the -1 leaves room for '\0'), stops at a newline or EOF, and always writes '\0' at the last used byte. If the newline arrives within the limit it is stored in buf; the strcspn trick finds and overwrites it.

Never use gets

char buf[100];
gets(buf);                        /* NEVER. Removed from C11. */

gets reads until a newline with no upper bound. It is what the 1988 Morris Worm exploited in fingerd, the first internet-scale worm. gets was removed from C11 entirely. If you see it in legacy code, file a ticket. For the exploit mechanics (attacker-controlled bytes overwriting boarding_zone or a return address), see the CWE-120 walkthrough in the memory-safety deep-dive.


Java vs. C, and what comes next

Operation Java C
Declare String s = "hello"; char s[50] = "hello";
Length s.length() strlen(s)
Compare s.equals("hello") strcmp(s, "hello") == 0
Copy s2 = s; strcpy(s2, s);
Concatenate s + " world" strcat(s, " world"); (pre-sized buffer)
Read line scanner.nextLine() fgets(s, sizeof(s), stdin);

Java’s String is an immutable object with methods. A C string is a mutable char array plus a pointer convention. You manage the buffer, you check the bounds, you track the null terminator.

Select every statement that is true about C arrays and strings.
Achar s[5] = "Alice"; leaves no room for '\0', so printf("%s", s) reads past the end of the array.
Bsizeof(arr) / sizeof(arr[0]) gives the correct element count for an array parameter inside a function.
Cfgets(buf, N, stdin) reads at most N − 1 characters and always writes a '\0'.
DRelying on strcmp(a, b) == 1 is non-portable; strcmp(a, b) > 0 is the portable form.
Egets(buf) is still in the C standard and is safe as long as the user types less than sizeof(buf) characters.
Fscanf("%49s", name) caps the read for a char name[50], leaving room for '\0'.
Correct: A, C, D, F.
  • B is wrong: inside a function, sizeof(arr) gives the size of a pointer. Pass the length separately.
  • E is wrong: gets was removed from the C standard in C11. Trusting the user does not make an unbounded read safe.

Next, Headers, Makefiles & CLI Args covers splitting a program across multiple files with headers, how extern shares a declaration, and how argc / argv work. Drill this page: Arrays · Strings · practice gallery.