Normalization & Schema Design Principles¶

Normalization is the practice of organizing tables so each fact lives in exactly one place. It prevents update anomalies and data drift.

The Problem: Duplicated Data¶

tasks
┌────┬─────────┬──────────────┬───────────────────┐
│ id │ title   │ owner_name   │ owner_email       │
├────┼─────────┼──────────────┼───────────────────┤
│ 1  │ Buy     │ Ada Lovelace │ ada@example.com   │
│ 2  │ Ship    │ Ada Lovelace │ ada@example.com   │
└────┴─────────┴──────────────┴───────────────────┘

If Ada changes her email, you must update every task row. Miss one and your data is inconsistent. This is an update anomaly.

The Fix: Separate the Concerns¶

users                        tasks
┌────┬──────────────┬──────┐ ┌────┬───────┬──────────┐
│ id │ name         │ email│ │ id │ title │ owner_id │
└────┴──────────────┴──────┘ └────┴───────┴──────────┘

Now Ada's email lives in one row. Tasks just reference her by owner_id.

The First Three Normal Forms (plain English)¶

1.1NF — each cell holds a single value (no comma-separated lists in a column).
2.2NF — every non-key column depends on the whole primary key.
3.3NF — non-key columns depend on nothing but the key.

A practical rule of thumb: "Each fact, once, in the right place."

When to Denormalize¶

Normalization optimizes for correctness and writes. Sometimes you trade it for read speed:

A cached task_count on users to avoid COUNT(*) on every page load.
A duplicated author_name snapshot so deleting a user doesn't blank old posts.

Denormalize deliberately, knowing you now own the job of keeping copies in sync. Start normalized; denormalize only when a real performance need appears.