Database Normalization — How it Works

12 min readMar 21, 2021

Intro

Redundant data may not be an issue for non-relational databases.

For relational databases, this is a pest that must be disposed of.

Data redundancy increases the difficulty of inserting, updating or deleting rows from a database consistently. Database normalization is a technique to organize data into multiple tables to eliminate data redundancy.

Systems that require specific information retrieval from a database benefit from the precision normalization provides. However, systems performing mass amounts of information retrieval may experience a decrease in access speed. The lack of data duplicity in normalized databases results in slower read times from the required use of table joins. The performance effects of database normalization will not be discussed further here, but it is worthwhile to mention that such ramifications should be considered when choosing to normalize a database. You can learn more on that here.

There are multiple synonyms utilized when discussing databases. Though terms used will remain consistent throughout this article, feel free to use the following as a reference when coming across additional sources.

Database synonyms found throughout other discussions related to database normalization — Examples of database synonyms

Keys in a Database Table

There exists numerous types of keys within a database table.

C# Corner defines a key in this way:

“A key is a unique data item or group of columns that exclusively identifies a row in a table.” — C# Corner

There can be any number of keys within a database table. The options available to uniquely identify a table row is proportional to the amount of keys present. Many different mechanisms exist to specify a single row in a table within many keys; less mechanisms exist in a table with few keys. A table without any keys cannot specify any row of data uniquely, making it difficult to perform accurate read or write operations.

Consider the following PERSON table:

PERSON database for simple explanations — Simple PERSON database table

In the PERSON table, three columns exist: Person_ID, First_Name and Last_Name. Note that only one of the columns can uniquely identify any individual table row. First_Name and Last_Name can’t uniquely identify a single person; neither can we identify a single person by combining First_Name and Last_Name into one key. Names are bound to overlap at some point — in a table of only three rows, two rows of data both have the same First_Name and Last_Name!

Only one column in the PERSON table can act as a key: Person_ID. This is because:

There will never be two people with the same Person_ID
Each column value for each table row can be obtained using Person_ID

Types of Database Keys

🔑 Super

A column or set set of columns used to uniquely identify all columns of a row
Are not always candidate keys

Super keys are the most general type of keys found in a table. Super keys can be created by:

- Using individual keys { Person_ID }

- Combining keys with other keys { Person_ID, SSN }

- Combing keys with columns that are not keys { Person_ID, First_Name }

Not all existing super keys will be useful in the database normalization process.

🔑 Candidate

Also known as a minimal super key, indicating the minimal amount of columns that can be used to identify a single table row
Are chosen from the list of super keys
Are super keys with no repeated columns

We may have two super keys from the super key description above:

1. { Person_ID }

2. { Person_ID, First_Name }

Only { Person_ID } is denotable as a candidate key. { Person_ID, First_Name } is not a minimal super key and consequently it is not a candidate key as { Person_ID, First_Name } unnecessarily contains the general column First_Name in its key definition.

🔑 Primary

Chosen from the list of candidate keys
Used to uniquely identify any table row

Only one primary key can exist in a database table. Person_ID is the only candidate key that exists in the PERSON table above, making it the natural choice as primary key.

🔑 Composite

Candidate key created by combining two or more columns from within a single table
Are not always candidate keys

Functional Dependency

When one column in a database table depends upon another column, functional dependency exists. The types of keys present in each table can help identify dependencies that are either beneficial or detrimental to the normalization of a database.

There are two specific types of negative functional dependency discussed in this article:

1. Partial Dependency

A partial dependency exists when a non-primary column depends upon a single column that is a part of a composite primary key.

An example of this type of dependency can be found in the 2NF explanation below.

2. Transitive Dependency

A transitive dependency exists when a non-primary column depends on some other non-primary column. There exists an indirect relationship between columns of the same table.

A transitive dependency only exists within tables that consist of at least three columns. Just as the transitive property in mathematics can only exist with three or more entities, the transitive property in database tables can only exist with three or more columns.

An example of this type of dependency can be found in the 3NF explanation below.

“Good“ Dependency

A positive form of functional dependency can exist.

In the PERSON table used earlier, Person_ID is the primary key. Both columns First_Name and Last_Name are dependent upon the Person_ID column for their values. We can say Person_ID determines both First_Name and Last_Name. This is denoted by:

Person_ID — -> First_Name

and

Person_ID — -> Last_Name

These two dependencies are acceptable because both non-primary columns depend upon the primary key for their values.

Additionally, each column in the PERSON table can be obtained by using the primary key, Person_ID.

Database Normal Forms (1NF, 2NF & 3NF)

As mentioned above, data redundancy causes insertion, deletion and modification anomalies. Database normalization is a technique to organize data into multiple tables to eliminate data redundancy.

Database normalization consists of six levels of progression. Each level is referred to as a normal form, only able to be reached after obtaining the previous. Second Normal Form (2NF) can only be reached after achieving First Normal Form (1NF), Third Normal Form can only be reached after achieving both Second Normal Form and First Normal Form, and so on. This process continues until Sixth Normal Form (6NF).

Forms beyond 4NF are generally pursued solely for academic research. The scope of this article focuses briefly on the first three normal forms: 1NF, 2NF & 3NF.

First Normal Form (1NF)

Five rules exist for a database to conform to 1NF:

Each column must have a unique name
The order of table rows does not matter
Each table row in a column follows the same data format
Each column contains purely atomic values
No column is subverted (Name_1, Name_2, etc.)

1NF violation to 1NF violation — 1NF Violation to 1NF Conformance

In the initial PROGRAMMERs table above, the Certifications column contains values that are not atomic; there is more than one value for at least one of the rows of data.

The first row contains two values: Python and Go
The third row contains two values: TypeScript and C#

Degree_1 and Degree_2 also exist as subverted columns. If a programmer obtained a third degree using the current approach, a third column, Degree_3, would have to be created. This increases the likelihood for data anomalies in the future.

The solution to both issues is to create two new tables:

CERTIFICATIONS
DEGREES

Any reference to a certification or degree may now be removed from the PROGRAMMERS table.

A reference to an individual programmer may now be maintained by using Prog_Id in both the CERTIFICATIONS and DEGREES table.

Second Normal Form (2NF)

There are two rules for a database to conform to 2NF:

Already conforms to 1NF
There exists no partial dependencies

Example of 2NF violation — 2NF Violation Example

The WORKOUT table above conforms to 1NF. However, an issue exists with compliance to 2NF. One rule remains unfollowed: no partial dependencies allowed.

As previously stated, a partial dependency exists when a non-primary column depends on only part of a composite primary key, rather than the composite primary key in its entirety. In the WORKOUT table above, Workout_Type and Workout_Duration both depend on the Workout_ID column for its value. They do not depend on the { Athlete_ID, Workout_ID } key for its value, indicating a partial dependency. Athlete_Name also only depends upon Athlete_ID, rather than the { Athlete_ID, Workout_ID } key.

These partial dependency issues can be resolved by:

Getting rid of the Athlete_Name column in the WORKOUT table
Creating a new ATHLETE table
Creating Athlete_ID and Athlete_Name columns in the ATHLETE table
Simplifying Workout_Type and Workout_Duration to Type and Duration respectively in the WORKOUT table

Example of 2NF conformance — 2NF Conformance Example

The dependencies from Workout_Type and Workout_Duration to Workout_ID are maintained; the dependency from Athlete_Name to Athlete_ID is also maintained — Second Normal Form has been achieved.

Third Normal Form (3NF)

There are two rules for a database to conform to 3NF:

Conforms to 1NF and 2NF
There exists no transitive dependencies

In the MOVIE_INFO table above, we have information relating to two main entities:

A movie
A director

For the sake of this example, let’s assume the director’s salary is consistent across every movie directed.

In the MOVIE_INFO table, Director_ID is the primary key. The column values of Director_Name, Movie_Name and Director_Salary may each be obtained from the Director_ID column.

The MOVIE_INFO table above conforms to 2NF. However, an issue exists with compliance to 3NF.

In the MOVIE_INFO table, Movie_Name determines Director_Name. Director_Name determines the Director_Salary. This means that Movie_Name also determines Director_Salary. This indicates a transitive dependency, as there is an indirect relationship between columns in the same table; a non-primary column is able to determine another non-primary column.

This transitive dependency issue can be resolved by:

Getting rid of the Movie_Name column in the MOVIE_INFO table
Changing the MOVIE_INFO table name to DIRECTOR to keep movie and director entities in separate tables
Creating a new MOVIE table
Creating a Director_ID and Name column in the MOVIE table

3NF Conformance example — 3NF Conformance Example

There is now no indirect relationship between Movie_Name and Director_Salary in the same database table. Our database conforms to 3NF.

How Database Normalization Makes Better Programmers

Through the normalization process, a database becomes organized and data redundancy is eliminated.

Normalization teaches:

Data should be atomic
Column names should not be subverted
Partial dependencies should not exist
Transitive dependencies should not exist

Database normalization accomplishes more than the act of increasing the quality of a database’s structure; by identifying key concepts behind normalization and applying it to programming, code quality improves dramatically.

Normalizing Structures

The results found when normalizing databases can be mirrored to results when normalizing structures within programs. Structures can include a number of constructs. In TypeScript, a prime example of a structure is an interface (More on TypeScript interfaces here).

A PROGRAMMERS interface can be created to mimic the unnormalized PROGRAMMERS table from the 1NF section above:

// Unnormalized structure analogous to a table within an
// unnormalized databaseinterface PERSON { 
    Prog_ID: number;
    Name: String;
    Certifications: String;
    Degree_1: String;
    Degree_2: String;
}

The issues that existed within the PROGRAMMERS table are analogous to the issues existing in this PROGRAMMERS interface:

Certifications allows only a single value while indicating plurality, violating atomicity
The degree property is subverted as Degree_1 and Degree_2

The solution to the unorganized structure of the PERSON interface can be fixed by adhering to the rules of 1NF. We can eliminate the 1NF violation by creating a separate interface for both certifications and degrees. Creating separate interfaces for violating attributes of an object is the same as creating separate tables for violating columns of a table. The normalized structure now looks like this:

// Normalized to 1NFinterface PERSON {
    Prog_ID: number;
    Name: String;
    Certifications: Certification[];
    Degree: Degree[];
}interface Certification {
    Prog_ID: number;
    title: String;
}interface Degree {
    Prog_ID: number;
    title: String;
}

Normalizing Logic

A database table with a partial dependency means a general column only depends on part of a composite primary key for its value. This can be represented through programming logic by part of a function return value depending on only a part of the function arguments.

Consider the following COFFEE_SNOB database table:

Table to represent normalizing logic — Logic normalization table representation

In the table above, notice that the Creamer column only depends on the Age column, and the Sugar column only depends on the Snob_Level column.

Now consider the following code, representing the same logic:

interface Coffee {
  creamer: boolean;
  sugar: boolean;
}interface CoffeeSnob {
  age: number;
  snobLevel: number; // 1-10
}function serveCoffeeSnob(snob: CoffeeSnob): Coffee {
    let coffee: Coffee;
    if (snob.age >= 60 && snob.snobLevel >= 8) {
        coffee = { creamer: false, sugar: false };
    } else if (snob.age >= 60 && snob.snobLevel < 8) {
        coffee = { creamer: false, sugar: true };
    } else if (snob.age < 60 && snob.snobLevel >= 8 ) {
        coffee = { creamer: true, sugar: false };
    } else {
        coffee = { creamer: true, sugar: true };
    }
    return coffee;
}const snob: CoffeeSnob = { age: 60, snobLevel: 10 };
const coffee = serveCoffeeSnob(snob); // without creamer or sugar

Ugly, right?

In the coffee function above, the creamer property of the returned coffee object is only dependent upon the age property of the snob function argument. Likewise, the sugar property of the returned coffee object is only dependent upon the snobLevel property of the snob function argument.

In a database table, the solution to partial dependency is to give the columns causing the dependency their own table. A similar solution exists in the coffee function; the assignment of both properties can be pulled out into their own expressions. These expressions are now located in the returned coffee object.

The result is cleaner, shorter code. 11 lines have been reduced to 4.

function coffee(snob: CoffeeSnob): Coffee {
    return {
      creamer: snob.age < 60,
      sugar: snob.snobLevel < 8
    };
}const snob: CoffeeSnob = { age: 60, snobLevel: 10 };
const coffee = serveCoffeeSnob(snob); // without creamer or sugar

Conclusion

This article covers a wide range of material: types of keys within a database, the first three forms of database normalization and how the normalization process can result in higher-quality code.

For more information on the subject of database normalization, refer to the sources listed below.

Sources

(With helpful advice from Anthony Feldhake, Zachary Temple & Zechariah Schwerman)

Normalization

Page Navigation: [ Forms of normalization | Additional normalization tips ] Other Navigation: [ About Steve Litt |…

www.troubleshooters.com

Pros and Cons of Database Normalization - DZone Database

When using a relational database, normalization can help keep the data free of errors and can also help ensure that the…

dzone.com

Functional Dependency in DBMS: What is, Types and Examples

Functional Dependency (FD) is a constraint that determines the relation of one attribute to another attribute in a…

www.guru99.com

Types Of Keys In Database

Super Key Minimal Super Key Candidate Key Primary Key Unique Key Alternate Key Composite Key Foreign Key Natural Key…

www.c-sharpcorner.com

Natural key - Wikipedia

A natural key (also known as business key or domain key) is a type of unique key in a database formed of attributes…

en.wikipedia.org

Database normalization - Wikipedia

This article See the needs attention from an expert in Databases. talk page for details. WikiProject Databases may be…

en.wikipedia.org

Unnormalized form - Wikipedia

Unnormalized form (UNF), also known as an unnormalized relation or non first normal form (NF 2), is a simple database…

en.wikipedia.org

First normal form - Wikipedia

First normal form ( 1NF) is a property of a relation in a relational database. A relation is in first normal form if…

en.wikipedia.org

Second normal form

Second normal form ( 2NF) is a normal form used in database normalization. 2NF was originally defined by E. F. Codd in…

en.wikipedia.org

Third normal form - Wikipedia

Third normal form ( 3NF) is a database schema design approach for relational databases which uses normalizing…

en.wikipedia.org

Database Normalization — How it Works

Intro

Keys in a Database Table

Types of Database Keys

Functional Dependency

Database Normal Forms (1NF, 2NF & 3NF)

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

How Database Normalization Makes Better Programmers

Normalizing Structures

Normalizing Logic

Conclusion

Sources

Normalization

Page Navigation: [ Forms of normalization | Additional normalization tips ] Other Navigation: [ About Steve Litt |…

Pros and Cons of Database Normalization - DZone Database

When using a relational database, normalization can help keep the data free of errors and can also help ensure that the…

Functional Dependency in DBMS: What is, Types and Examples

Functional Dependency (FD) is a constraint that determines the relation of one attribute to another attribute in a…

Types Of Keys In Database

Super Key Minimal Super Key Candidate Key Primary Key Unique Key Alternate Key Composite Key Foreign Key Natural Key…

Natural key - Wikipedia

A natural key (also known as business key or domain key) is a type of unique key in a database formed of attributes…

Database normalization - Wikipedia

This article See the needs attention from an expert in Databases. talk page for details. WikiProject Databases may be…

Unnormalized form - Wikipedia

Unnormalized form (UNF), also known as an unnormalized relation or non first normal form (NF 2), is a simple database…

First normal form - Wikipedia

First normal form ( 1NF) is a property of a relation in a relational database. A relation is in first normal form if…

Second normal form

Second normal form ( 2NF) is a normal form used in database normalization. 2NF was originally defined by E. F. Codd in…

Third normal form - Wikipedia

Third normal form ( 3NF) is a database schema design approach for relational databases which uses normalizing…

Written by Chase Grainger