Normalisation in SQL

Pranav Bakare - Oct 2 - - Dev Community

Normalization in SQL is a process used to organize a database into tables and columns to reduce redundancy and dependency. The goal is to ensure that data is stored efficiently, minimize the amount of duplicated data, and ensure data integrity. The process is divided into different normal forms (NF), and each form addresses specific issues with data structure and relationships.

Let’s go through each normal form in detail:

  1. First Normal Form (1NF)

A table is in 1NF if:

  1. All the values in a column are atomic (indivisible).

  2. Each entry in a column must contain only one value (no multiple values or sets).

  3. There should be no repeating groups of data (e.g., arrays or lists within a single column).

Example:

Suppose we have the following table:

This table violates 1NF because:

The CustomerPhone column contains multiple values (a list of phone numbers).

The ProductsOrdered column contains multiple values (a list of products).

Conversion to 1NF:

We split the columns into atomic values by removing lists:

Now, the table is in 1NF because all fields contain atomic values and there are no repeating groups.


  1. Second Normal Form (2NF)

A table is in 2NF if:

  1. It is already in 1NF.

  2. All non-key attributes are fully dependent on the entire primary key (no partial dependencies).

In simpler terms, there should not be any column that is dependent on part of the primary key (if the primary key is composite).

Example:

Let’s assume the following table that is in 1NF:

In this case:

OrderID and ProductID together form the composite primary key.

CustomerName and CustomerPhone only depend on OrderID, not on ProductID. This is a partial dependency, which violates 2NF.

Conversion to 2NF:

We split the table into two tables to remove the partial dependency:

Orders Table:

OrderDetails Table:

Now, CustomerName and CustomerPhone depend only on OrderID, and ProductName and Price depend only on ProductID. The table is now in 2NF.


  1. Third Normal Form (3NF)

A table is in 3NF if:

  1. It is already in 2NF.

  2. There are no transitive dependencies, meaning no non-key attribute depends on another non-key attribute.

Example:

Consider the following table that is in 2NF:

Here, CustomerCity depends on CustomerID (which is not the primary key), creating a transitive dependency. This violates 3NF.

Conversion to 3NF:

We split the table into two tables to remove the transitive dependency:

Orders Table:

Customers Table:

Now, there is no transitive dependency, and the table is in 3NF.


  1. Boyce-Codd Normal Form (BCNF)

A table is in BCNF if:

  1. It is already in 3NF.

  2. Every determinant must be a candidate key. A determinant is any attribute that can uniquely identify another attribute.

BCNF is a stricter version of 3NF. It handles situations where a 3NF table still has anomalies due to a non-candidate key acting as a determinant.

Example:

Consider the following table that is in 3NF:

In this table:

TeacherID and CourseID form the composite primary key.

However, CourseID determines CourseName (each course is taught by multiple teachers, but each course has only one name). This means CourseID is a determinant, but it is not a candidate key, violating BCNF.

Conversion to BCNF:

We need to create a separate table for the courses to resolve this issue:

Teachers Table:

Courses Table:

Now, each determinant is a candidate key, and the table is in BCNF.


Summary of Normal Forms:

Why Normalize?

Data Integrity: By eliminating redundancy, normalization reduces the risk of anomalies (like update, insert, or delete anomalies).

Efficiency: Normalized databases are more efficient in terms of space utilization.

Scalability: As the data grows, normalized structures make it easier to maintain and query large datasets.

Trade-offs:

Performance: Normalization often results in more tables, which can lead to more complex joins and potentially slower performance. In highly transactional systems, denormalization is sometimes considered to optimize read operations.

Complexity: Higher normal forms can result in many small tables, making queries more complex to write and maintain.

In most real-world cases, databases are normalized up to 3NF, with some denormalization applied for performance reasons in read-heavy systems.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player