c-foundations Lesson 13 20 min read

How Do I Work with Strings as Character Arrays?

Null terminators, string.h, and the biggest difference from Java

Reading: C Text: Ch. 8 §1–4 (pp. 477–520), §6 (pp. 535–541)

After this lesson, you will be able to:

  • Explain that C strings are char arrays terminated by '\0'
  • Use strlen, strcmp, strcpy, and strcat from <string.h>
  • Explain why == compares addresses for strings and use strcmp instead
  • Use fgets for safe string input and strip the trailing newline
  • Identify buffer overflow risks with strcpy, strcat, and scanf("%s")
  • Use <ctype.h> functions to test and convert individual characters

There Is No String Type

In Java, you write:

String name = "Alice";
if (name.equals("Alice")) { ... }
String greeting = "Hello, " + name;

Clean, safe, and object-oriented. Now here’s the C equivalent:

char name[50] = "Alice";
if (strcmp(name, "Alice") == 0) { ... }
// String concatenation? It's... complicated.

C has no String class. A “string” in C is just an array of char that ends with a special byte: '\0' (the null terminator). Every string function — strlen, strcmp, printf("%s") — works by scanning forward until it hits that '\0'. If it’s missing, your program reads off into random memory.


Character Arrays and String Functions

How C Strings Work

When you write:

char greeting[] = "Hello";

C stores this in memory:

Index:   [0]  [1]  [2]  [3]  [4]  [5]
Value:   'H'  'e'  'l'  'l'  'o'  '\0'

The array is 6 bytes — 5 characters plus the null terminator. The '\0' is ASCII value 0, and it marks the end of the string.

Key Insight: A C string is just an array of characters ending with '\0' (the null terminator, ASCII value 0). Every string function — strlen, strcmp, printf("%s") — scans forward until it finds that '\0'. Without it, functions read past the end of your array into garbage memory.

String Library Functions (<string.h>)

strlen — string length (not counting '\0'):

char name[] = "Alice";
printf("%zu\n", strlen(name));    // 5 (not 6!)

strcmp — compare two strings:

if (strcmp(name, "Alice") == 0)       // Equal
if (strcmp(name, "Bob") < 0)          // name comes before "Bob" alphabetically
if (strcmp(name, "Aaron") > 0)        // name comes after "Aaron"

Common Pitfall: NEVER use == to compare strings in C. if (str1 == str2) compares memory addresses, not contents. Two identical strings stored in different locations would compare as “not equal.” Always use strcmp(str1, str2) == 0.

Check Your Understanding
What does strlen("Hello") return?
A 6 — five characters plus the null terminator
B 5 — the number of visible characters, not counting '\0'
C 4 — it doesn't count the first character
D It depends on the size of the array holding the string
Answer: B. strlen counts characters until it hits '\0', but does not count the null terminator itself. The string "Hello" occupies 6 bytes in memory (H-e-l-l-o-\0), but strlen returns 5. This distinction matters when allocating buffer space — you always need strlen + 1 bytes to store a string.

strcpy — copy one string to another:

char dest[50];
strcpy(dest, "Hello");             // dest is now "Hello"

strcat — concatenate (append) strings:

char greeting[50] = "Hello, ";
strcat(greeting, "world!");        // greeting is now "Hello, world!"

Common Pitfall: Both strcpy and strcat don’t check whether the destination buffer is large enough. Copying a 100-character string into a 50-character buffer overflows silently. Always ensure your destination is big enough, or use the safer strncpy/strncat variants.

Reading Strings from Users

scanf("%s") — reads one word (stops at whitespace):

char word[50];
scanf("%s", word);          // Note: no & needed (array name IS an address)

Common Pitfall: scanf("%s") has no way to know how large your buffer is. If the user types 1000 characters into a 50-byte buffer, scanf writes past the end — a buffer overflow. Use fgets for safe input.

fgets — reads a full line (safe):

char line[100];
fgets(line, sizeof(line), stdin);

fgets reads at most sizeof(line) - 1 characters, always adds '\0', and includes the newline if the line fits. But that trailing \n is often unwanted:

// Strip the trailing newline
line[strcspn(line, "\n")] = '\0';

The Trick: Use fgets + newline stripping for safe line reading. scanf("%s") stops at whitespace and has no overflow protection. The strcspn approach cleanly removes the trailing newline.

Check Your Understanding
You write char name[5] = "Alice";. What's wrong with this?
A Nothing — "Alice" has 5 characters and fits perfectly in a size-5 array
B No room for the null terminator — "Alice" needs 6 bytes (5 characters + '\0')
C You can't initialize a char array with a string literal
D The array is too small for the pointer to the string
Answer: B. The string "Alice" requires 6 bytes: 5 for the characters plus 1 for the null terminator '\0'. A char[5] array can only hold 5 bytes, so the null terminator either gets cut off or overwrites adjacent memory. Without the null terminator, string functions like strlen and printf("%s") will read past the array into garbage memory. Use char name[6] or char name[] = "Alice"; (the compiler figures out the size).
Why does this matter?

Off-by-one errors with string buffers are one of the most common C bugs. Every string needs one extra byte for the null terminator. When you allocate buffers for user input, always account for it: a 50-character input needs a char[51] at minimum.

Deep dive: Why gets() was removed from C — a real security disaster

C once had a function called gets() that read a line from stdin:

char buffer[100];
gets(buffer);          // NEVER USE THIS — removed from C11

The problem: gets() had no way to specify buffer size. A user could type 10,000 characters and overflow any buffer. This was so dangerous that it was the attack vector for the 1988 Morris Worm — one of the first internet worms, which exploited gets() in the Unix fingerd daemon to gain unauthorized access to thousands of machines.

The function was deprecated in C99, removed entirely in C11, and is classified as CWE-242 (Use of Inherently Dangerous Function).

Always use fgets() instead:

char buffer[100];
fgets(buffer, sizeof(buffer), stdin);   // Safe — reads at most 99 chars + '\0'

The general rule: any function that writes to a buffer without a size limit (gets, strcpy, sprintf, scanf("%s")) is a potential buffer overflow. Prefer the bounded versions (fgets, strncpy, snprintf, scanf("%49s")).

| Dangerous | Safe alternative | Why | |———–|—————–|—–| | gets(buf) | fgets(buf, size, stdin) | No size limit vs. bounded read | | strcpy(d, s) | strncpy(d, s, size) | No overflow check vs. bounded copy | | sprintf(d, ...) | snprintf(d, size, ...) | No overflow check vs. bounded write | | scanf("%s", buf) | scanf("%49s", buf) or fgets | No size limit vs. field width limit |

Character Functions (<ctype.h>)

Test and convert individual characters:

#include <ctype.h>

char ch = 'a';
if (isalpha(ch))  { ... }    // Is it a letter?
if (isdigit(ch))  { ... }    // Is it a digit?
if (isupper(ch))  { ... }    // Is it uppercase?
char upper = toupper(ch);     // 'A'
char lower = tolower('B');    // 'b'

Java vs. C String Comparison

Operation Java C
Declare String s = "hello"; char s[50] = "hello";
Length s.length() strlen(s)
Compare s.equals("hello") strcmp(s, "hello") == 0
Copy s2 = s; (reference copy) strcpy(s2, s);
Concatenate s + " world" strcat(s, " world");
Read line scanner.nextLine() fgets(s, size, stdin);
Char at index s.charAt(i) s[i]

From Java: Java’s String is an immutable object with methods. C’s “string” is a mutable char array with library functions. There’s no garbage collection, no automatic resizing, no bounds checking. You manage the buffer, you check the bounds, you track the null terminator.

Check Your Understanding
What does if (str1 == str2) actually compare when str1 and str2 are char arrays?
A The contents of both strings, character by character
B The lengths of both strings
C The memory addresses where each string starts
D The first character of each string
Answer: C. Array names decay to pointers (memory addresses). So str1 == str2 checks whether both arrays start at the same memory location — which is almost never what you want. Two arrays containing "Hello" at different locations would compare as "not equal." Use strcmp(str1, str2) == 0 to compare actual string contents.

Complete Example: Name Processor

#include <stdio.h>
#include <string.h>
#include <ctype.h>

void capitalize(char str[])
{
    if (strlen(str) > 0)
    {
        str[0] = toupper(str[0]);
    }
}

int main(void)
{
    char first[50];
    char last[50];
    char full[100];

    printf("Enter first name: ");
    fgets(first, sizeof(first), stdin);
    first[strcspn(first, "\n")] = '\0';

    printf("Enter last name: ");
    fgets(last, sizeof(last), stdin);
    last[strcspn(last, "\n")] = '\0';

    capitalize(first);
    capitalize(last);

    strcpy(full, last);
    strcat(full, ", ");
    strcat(full, first);

    printf("Full name: %s\n", full);
    printf("Name length: %zu characters\n", strlen(full));

    return 0;
}
Quick Check: What does the null terminator '\0' do?

It marks the end of a string. Every string function (strlen, strcmp, printf("%s")) scans characters until it encounters '\0'. Without it, these functions read past the array into undefined memory.

Quick Check: Why is if (str1 == str2) wrong for comparing strings?

== compares memory addresses, not string contents. Two identical strings stored at different memory locations would compare as “not equal.” Use strcmp(str1, str2) == 0 to compare contents.

Quick Check: Why is fgets safer than scanf("%s")?

fgets takes a size parameter and never reads more than that many characters, preventing buffer overflows. scanf("%s") has no size limit — it reads until whitespace, potentially writing far past the end of your buffer.


Strings Are Where C Gets Real

The gap between Java and C is widest with strings. Java gives you an immutable, bounds-checked, garbage-collected String object. C gives you a mutable array of bytes with a null terminator and no safety net. This is harder, but it’s also how strings actually work at the hardware level — and understanding this makes you a stronger programmer in any language.

Next: sorting and searching. You’ll implement selection sort on arrays and use it to find medians — building the algorithms by hand that Java’s Arrays.sort() hides from you.

Big Picture: Strings in C are where most security vulnerabilities live. Buffer overflows from strcpy, strcat, and scanf("%s") have caused some of the worst security breaches in computing history. Understanding null terminators and buffer sizes isn’t academic — it’s the foundation of writing secure C code. Every string function you use from here on should make you ask: “Is my buffer big enough?”