Database Normalization Explained

Database Normalization Explained

Learn database normalization step by step. Understand 1NF, 2NF, 3NF, BCNF with practical examples to design efficient, scalable, and reliable relational databases.

Database Normalization Explained

When designing a database, one of the biggest mistakes beginners make is storing duplicate data in multiple places. Duplicate data leads to inconsistent information, larger databases, and difficult maintenance.

Database Normalization is a design technique used to organize data efficiently while reducing redundancy and improving data integrity.

In this guide, you'll learn what normalization is, why it matters, and how to apply the most common normal forms with practical examples.

What is Database Normalization?

Database normalization is the process of organizing tables and relationships to:

  • Reduce duplicate data
  • Improve data consistency
  • Prevent update anomalies
  • Make maintenance easier
  • Improve database scalability

Instead of storing everything in one large table, normalization divides data into multiple related tables.

Why Normalization Matters

Imagine an online shopping system.

Without normalization:

Order IDCustomerPhoneProductPrice
1001John123456Laptop1200
1002John123456Mouse25
1003John123456Keyboard50

Problems:

  • Customer phone repeats every order.
  • Updating John's phone requires updating many rows.
  • If one row is missed, inconsistent data appears.

Normalization solves this problem.

Benefits of Normalization

  • Eliminates duplicate information
  • Improves data integrity
  • Easier updates
  • Easier maintenance
  • Smaller database size
  • Better scalability
  • Cleaner database structure

Common Database Problems

1. Insert Anomaly

You cannot add a customer unless an order exists.

2. Update Anomaly

Updating one customer's phone requires changing multiple rows.

If one row isn't updated:

John
123456

John
987654

Now the database contains conflicting information.

3. Delete Anomaly

Deleting the last order may accidentally delete customer information.

The Normal Forms

The most common normal forms are:

  • First Normal Form (1NF)
  • Second Normal Form (2NF)
  • Third Normal Form (3NF)
  • Boyce-Codd Normal Form (BCNF)

Let's understand each one.

First Normal Form (1NF)

Rule

Every column should contain atomic values.

One cell = One value.

Bad Example

StudentSubjects
AliceMath, Science
BobMath, English

The Subjects column stores multiple values.

Good Example

StudentSubject
AliceMath
AliceScience
BobMath
BobEnglish

Now every field contains only one value.

✅ Database satisfies 1NF.

Second Normal Form (2NF)

Rule

Must already satisfy 1NF.

Every non-key column should depend on the entire primary key.

Imagine:

Student IDCourse IDStudent NameCourse Name

Primary Key:

(Student ID, Course ID)

Problem:

Student Name depends only on Student ID.

Course Name depends only on Course ID.

This is called a Partial Dependency.

Solution

Split into:

Students

Student IDName
1Alice

Courses

Course IDCourse Name
101Database

Student Courses

Student IDCourse ID
1101

Now every attribute depends on the whole primary key.

✅ Database satisfies 2NF.

Third Normal Form (3NF)

Rule

Must satisfy 2NF.

No non-key column should depend on another non-key column.

This is called a Transitive Dependency.

Bad example:

Employee IDEmployee NameDepartmentManager

Manager depends on Department.

Not Employee ID.

This violates 3NF.

Solution

Employees

Employee IDNameDepartment ID
1John10

Departments

Department IDDepartmentManager
10ITDavid

Now:

Employee → Department

Department → Manager

No transitive dependency.

✅ Database satisfies 3NF.

Boyce-Codd Normal Form (BCNF)

BCNF is a stricter version of 3NF.

Rule:

Every determinant must be a candidate key.

Most real-world applications using Laravel, Spring Boot, ASP.NET, Django, or Node.js are well-designed if they reach 3NF, while BCNF is useful for more complex database relationships.

Before vs After Normalization

Before

Orders

------------------------------------------
OrderID
CustomerName
Phone
Address
Product
Price
------------------------------------------

Many duplicate values.

After

Customers
-------------------
CustomerID
Name
Phone
Address
-------------------

Products
-------------------
ProductID
Name
Price
-------------------

Orders
-------------------
OrderID
CustomerID
-------------------

OrderItems
-------------------
OrderID
ProductID
Quantity
-------------------

Advantages:

  • No duplicate customer information
  • Easier updates
  • Better scalability

Real Example: E-commerce Database

Instead of

Order
Customer
Customer Phone
Customer Address
Product
Price

Use

Customers

Products

Orders

Order Details

Each table has a single responsibility.

SQL Example

Create Customers

CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
phone VARCHAR(20)
);

Create Products

CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
price DECIMAL(10,2)
);

Create Orders

CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
FOREIGN KEY (customer_id)
REFERENCES customers(customer_id)
);

Create Order Items

CREATE TABLE order_items (
order_item_id INT PRIMARY KEY,
order_id INT,
product_id INT,
quantity INT,

FOREIGN KEY(order_id)
REFERENCES orders(order_id),

FOREIGN KEY(product_id)
REFERENCES products(product_id)
);

Advantages of Normalization

  • Less duplicate data
  • Better consistency
  • Easier maintenance
  • More flexible database
  • Better relationships
  • Reduced storage usage
  • Cleaner SQL queries

Disadvantages

Normalization is excellent for transactional systems, but it also has trade-offs:

  • More tables to manage
  • More JOIN operations in queries
  • Slightly more complex SQL
  • Can impact performance for large analytical queries

For reporting or data warehouses, some denormalization is often used to improve read performance.

Best Practices

  • Always identify primary keys clearly.
  • Use foreign keys to maintain relationships.
  • Normalize up to Third Normal Form (3NF) for most applications.
  • Avoid unnecessary duplication of data.
  • Index frequently searched columns.
  • Review your schema as requirements evolve.

Conclusion

Database normalization is one of the most important concepts in relational database design. By organizing data into logical tables and removing unnecessary duplication, you create databases that are easier to maintain, scale, and keep consistent.

For most web applications, designing your schema to Third Normal Form (3NF) provides an excellent balance between simplicity, data integrity, and performance. As your application grows, you can selectively denormalize specific areas when profiling shows a real performance need.

Souy Soeng

Souy Soeng

Hi there 👋, I’m Soeng Souy (StarCode Kh)
-------------------------------------------
🌱 I’m currently creating a sample Laravel and React Vue Livewire
👯 I’m looking to collaborate on open-source PHP & JavaScript projects
💬 Ask me about Laravel, MySQL, or Flutter
⚡ Fun fact: I love turning ☕️ into code!

Post a Comment

CAN FEEDBACK
close