# Relational Database Design - Normalisation - Normal Forms

## Normalisation

Wikipedia says that Normalisation is:

In the design of a relational database management system (RDBMS), the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.

Source - Wikipedia Normalisation page - Checked May 2011

So there you go -eaasy! I must have a very small brain, because I find that very difficult to understand.

My definition of normalisation is:

### Normalisation - Organise data into logical groups with no duplicate data

## Normalisation Methods

There are many ways of normalising data, my favourite is:

## Intuitive Normalisation

If you like rules and regulations then pass this section over and go immediately to the next section - The Normal Forms!

To Intuitively normalise data:

- Step 1: Draw or refine a Data Model
- Step 2: Look at the Data Model
- Step 3: Intuitively group the data into entities
- Step 4 : Fiddle with your Data Model
- Step 5: If it doesn't feel right - go to Step 1

Ha Ha! Intuitive Normalisation hasn't got steps - it's intuitive! Basically you just draw and redraw your data model until it looks and feels right! After about 25 years of designing databases you should become an expert!

## The 'Formal Normal' Forms

I like that! 'Formal Normal' Maybe we should try and rhyme all our database design stuff it would be more fun!

Normalisation was invented, like everything else in the Universe, by Codd back in the early seventies, Folks like Chris Date and Boyce have amended it and there will always be controversy over how the Normal Forms should be applied.

Most wise folks agree that there are three stages of normalisation to go through if you are using formal normalisation methodology. I have put together some more detailed pages on each stage of Normalisation - if you really want to know click on a link below:

So just 3 Normal forms then? NO! There are FIVE and some folks are even talking about SIX! Cripes! Only read on from here if you have a lot of spare time on your hands!

Only folks with no friends or social skills should read beyond this point - you may turn into a database geek!

*These formal definitions are taken from the book by Chris J. Date: An Introduction to Database Systems Volume 14th edition, © 1996, Addison-Wesley Publishing Co., Inc., Reading, Massachusetts.*

**First Normal Form**

'A relation R is in first normal form (1NF) if and only if all underlying domains contain atomic values only.'

**Second Normal Form**

'A relation R is in second normal form (2NF) if and only if it is in 1NF and every nonkey attribute is fully dependent on the primary key.'

**Third Normal Form**

'A relation R is in third normal form (3NF) if and only if it is in 2NF and every nonkey attribute is nontransitively dependent on the primary key.'

**Boyce/Codd Normal Form**

'A relation R is in Boyce/Codd normal form (BCNF) if and only if every determinant is a candidate key.'

**Fourth Normal Form**

'A relation R is in fourth normal form (4NF) if and only if, wherever there exists an MVD in R, say A -> -> B, then all attributes of R are also functionally dependent on A. In other words, the only dependencies (FDs or MVDs) in R are of the form K -> X (i.e. a functional dependency from a candidate key K to some other attribute X). Equivalently: R is in 4NF if it is in BCNF and all MVD's in R are in fact FDs.'

**Fifth Normal Form**

'A relation R is in fifth normal form (5NF) – also called projection-join normal form (PJ/NF) if and only if every join dependency in R is a consequence of the candidate keys of R.'

For every normal form it is assumed that every occurrence of R can be uniquely identified by a primary key using one or more attributes in R.

FD = Functional Dependency

MVD = Multi-Valued Dependency