Relational Database Design - Normalisation - 1st Normal Form

First Normal Form

Data in First Normal Form has been described as 'the elimination of repeating groups of data through the creation of separate tables of related data', but I'm going to just say that we sort our data into entities.

Chris Date's rules for First Normal Form:

  1. There's no top-to-bottom ordering to the rows.
  2. There's no left-to-right ordering to the columns.
  3. There are no duplicate rows.
  4. Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
  5. All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].

Time for an example. In the SQL School lessons we created some tables for our parents, children and classes. We are now going to go back in time and sort out how we designed these tables. You might think that we should have started at design and you would be right in the real world, but when learning a new subject I think it is useful to do some hands on stuff first. In fact the initial design was pretty bad - so we'll redesign it!

We would start off with the information we know we want to store in our database. We'll start off with a list of stuff we want to record about a particular class (it's easier to use some dummy data to get a better feel for the data).

An Example

We want to design a system that monitors: Classes, held in Class Rooms, on certain Days and Times, taught by Teachers and attended by Children with the permission of their Guardian:

In our school we have Two teachers - Mr Cane and Miss Stick.

There are 3 Classrooms - 1b, 2c and 3a.

We have 3 lessons - Woodworking at 10.00 on Wednesday in class 1c taught by Miss Stick, Cookery at 1200 on Thursday in class 2c taught by Mr Cane and Maths at 11.00 on Wednesday in class 1b taught by Miss Stick.

We have some children and the children have guardians - Daisy Smith / Mr Smith , Brian Jones / Mr Jones , Julie Andrews / Miss Andrews , Fred Jones / Mr Jones , Harry Trotter / Mrs Trotter and Mary Hill / Miss Stick.

Here is the data in spreadsheet form:

Class Name Class Day Class Time Class Room Teacher Name Child Name Guardian
Woodworking Wednesday 10.00 1b Miss Stick Daisy Smith Mr Smih
            Mrs Smith
             
          Brian Jones Mr Jones
             
          Julie Andrews Miss Andrews
             
Cookery Thursday 12.00 2c Mr Cane Daisy Smith Mr Smith
            Mrs Smith
             
          Julie Andrews Miss Andrews
             
          Fred Jones Mr Jones
            Mr Lloyd
             
Maths Wednesday 11.00 1b Miss Stick Fred Jones Mr Jones
             
          Harry Trotter Mrs Trotter
             
          Mary Hill Miss Stick
             
          John Smith Mr Smith
            Mrs Smith

Spreadsheet format is a good way of identifying repeating data groups

That will do for now - this is only an overview! To turn this set of jumbled data into First Normal Form we have to decide on the Entities (which will become our tables) we need. We also have to make sure that there are no repeating groups of data in our entities - each entity must relate to a unique record. We could also identify a candidate primary key field here - one piece of data that would uniquely identify a record, but I'm (as usual) going to use Surrogate Primary Keys ( see the SQL School Primary Keys lesson for more information).

We also need to make sure that with one or a combination of our data items in our entities, we will be able to uniquely identify the record. In this example I am going to assume that Child Name and Guardian Name will uniquely identify a particular person. This is NOT true in real life - you need more information as folks often do not have unique names! You may have to create an extra column like Person ID Number.

First Normal Form Rules (mine not Codds or Dates!):

Remove repeating groups of data

We must be able to identify a unique record from one or more columns

There is no ordering of rows or columns

There are no duplicate rows

Every cell has just one piece of data

We can see straight away that there in fact two repeating groups of data. Each Class has repeating groups of Children (ie: more than one child attends the class ) and the some of the Children have repeating groups of Guardians ( ie: children have one or more guardians). So our first step in normalising is to make sure that every cell in our table has just one value and now would look like this:

Class Name Class Day Class Time Class Room Teacher Name Child Name Guardian
Woodworking Wednesday 10.00 1b Miss Stick Daisy Smith Mr Smith
Woodworking Wednesday 10.00 1b Miss Stick Daisy Smith Mrs Smith
Woodworking Wednesday 10.00 1b Miss Stick Brian Jones Mr Jones
Woodworking Wednesday 10.00 1b Miss Stick Julie Andrews Miss Andrews
Cookery Thursday 12.00 2c Mr Cane Daisy Smith Mr Smith
Cookery Thursday 12.00 2c Mr Cane Daisy Smith Mrs Smith
Cookery Thursday 12.00 2c Mr Cane Julie Andrews Miss Andrews
Cookery Thursday 12.00 2c Mr Cane Fred Jones Mr Jones
Cookery Thursday 12.00 2c Mr Cane Fred Jones Mr Lloyd
Maths Wednesday 11.00 1b Miss Stick Fred Jones Mr Jones
Maths Wednesday 11.00 1b Miss Stick Harry Trotter Mrs Trotter
Maths Wednesday 11.00 1b Miss Stick Mary Hill Miss Stick
Maths Wednesday 11.00 1b Miss Stick John Smith Mr Smith
Maths Wednesday 11.00 1b Miss Stick John Smith Mrs Smith

Now we can split our data into entities. I think that Classes, Children and Guardians would make good entities!

So now let's have a look at our data in the three entities:

Classes Entity:

Class Name Class Day Class Time Class Room Teacher Name
Woodworking Wednesday 10.00 1b Miss Stick
Cookery Thursday 12.00 2c Mr Cane
Maths Wednesday 11.00 1b Miss Stick

Children Entity:

Child Name
Daisy Smith
Brian Jones
Julie Andrews
Fred Jones
etc...

Guardian Entity:

Guardian
Mr Smith
Mrs Smith
Mr Jones
Miss Andrews
etc.

These entities obey all the rules for 1st Normal Form. The next section will convert this data into second normal form.