Relational Database Design - Normalisation - 1st Normal Form
First Normal Form
Data in First Normal Form has been described as 'the elimination of repeating groups of data through the creation of separate tables of related data', but I'm going to just say that we sort our data into entities.
Chris Date's rules for First Normal Form:
- There's no top-to-bottom ordering to the rows.
- There's no left-to-right ordering to the columns.
- There are no duplicate rows.
- Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
- All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].
Time for an example. In the SQL School lessons we created some tables for our parents, children and classes. We are now going to go back in time and sort out how we designed these tables. You might think that we should have started at design and you would be right in the real world, but when learning a new subject I think it is useful to do some hands on stuff first. In fact the initial design was pretty bad - so we'll redesign it!
We would start off with the information we know we want to store in our database. We'll start off with a list of stuff we want to record about a particular class (it's easier to use some dummy data to get a better feel for the data).
An Example
We want to design a system that monitors: Classes, held in Class Rooms, on certain Days and Times, taught by Teachers and attended by Children with the permission of their Guardian:
In our school we have Two teachers - Mr Cane and Miss Stick.
There are 3 Classrooms - 1b, 2c and 3a.
We have 3 lessons - Woodworking at 10.00 on Wednesday in class 1c taught by Miss Stick, Cookery at 1200 on Thursday in class 2c taught by Mr Cane and Maths at 11.00 on Wednesday in class 1b taught by Miss Stick.
We have some children and the children have guardians - Daisy Smith / Mr Smith , Brian Jones / Mr Jones , Julie Andrews / Miss Andrews , Fred Jones / Mr Jones , Harry Trotter / Mrs Trotter and Mary Hill / Miss Stick.
Here is the data in spreadsheet form:
Class Name | Class Day | Class Time | Class Room | Teacher Name | Child Name | Guardian |
Woodworking | Wednesday | 10.00 | 1b | Miss Stick | Daisy Smith | Mr Smih |
Mrs Smith | ||||||
Brian Jones | Mr Jones | |||||
Julie Andrews | Miss Andrews | |||||
Cookery | Thursday | 12.00 | 2c | Mr Cane | Daisy Smith | Mr Smith |
Mrs Smith | ||||||
Julie Andrews | Miss Andrews | |||||
Fred Jones | Mr Jones | |||||
Mr Lloyd | ||||||
Maths | Wednesday | 11.00 | 1b | Miss Stick | Fred Jones | Mr Jones |
Harry Trotter | Mrs Trotter | |||||
Mary Hill | Miss Stick | |||||
John Smith | Mr Smith | |||||
Mrs Smith |
Spreadsheet format is a good way of identifying repeating data groups
That will do for now - this is only an overview! To turn this set of jumbled data into First Normal Form we have to decide on the Entities (which will become our tables) we need. We also have to make sure that there are no repeating groups of data in our entities - each entity must relate to a unique record. We could also identify a candidate primary key field here - one piece of data that would uniquely identify a record, but I'm (as usual) going to use Surrogate Primary Keys ( see the SQL School Primary Keys lesson for more information).
We also need to make sure that with one or a combination of our data items in our entities, we will be able to uniquely identify the record. In this example I am going to assume that Child Name and Guardian Name will uniquely identify a particular person. This is NOT true in real life - you need more information as folks often do not have unique names! You may have to create an extra column like Person ID Number.
First Normal Form Rules (mine not Codds or Dates!):
Remove repeating groups of data
We must be able to identify a unique record from one or more columns
There is no ordering of rows or columns
There are no duplicate rows
Every cell has just one piece of data
We can see straight away that there in fact two repeating groups of data. Each Class has repeating groups of Children (ie: more than one child attends the class ) and the some of the Children have repeating groups of Guardians ( ie: children have one or more guardians). So our first step in normalising is to make sure that every cell in our table has just one value and now would look like this:
Class Name | Class Day | Class Time | Class Room | Teacher Name | Child Name | Guardian |
Woodworking | Wednesday | 10.00 | 1b | Miss Stick | Daisy Smith | Mr Smith |
Woodworking | Wednesday | 10.00 | 1b | Miss Stick | Daisy Smith | Mrs Smith |
Woodworking | Wednesday | 10.00 | 1b | Miss Stick | Brian Jones | Mr Jones |
Woodworking | Wednesday | 10.00 | 1b | Miss Stick | Julie Andrews | Miss Andrews |
Cookery | Thursday | 12.00 | 2c | Mr Cane | Daisy Smith | Mr Smith |
Cookery | Thursday | 12.00 | 2c | Mr Cane | Daisy Smith | Mrs Smith |
Cookery | Thursday | 12.00 | 2c | Mr Cane | Julie Andrews | Miss Andrews |
Cookery | Thursday | 12.00 | 2c | Mr Cane | Fred Jones | Mr Jones |
Cookery | Thursday | 12.00 | 2c | Mr Cane | Fred Jones | Mr Lloyd |
Maths | Wednesday | 11.00 | 1b | Miss Stick | Fred Jones | Mr Jones |
Maths | Wednesday | 11.00 | 1b | Miss Stick | Harry Trotter | Mrs Trotter |
Maths | Wednesday | 11.00 | 1b | Miss Stick | Mary Hill | Miss Stick |
Maths | Wednesday | 11.00 | 1b | Miss Stick | John Smith | Mr Smith |
Maths | Wednesday | 11.00 | 1b | Miss Stick | John Smith | Mrs Smith |
Now we can split our data into entities. I think that Classes, Children and Guardians would make good entities!
So now let's have a look at our data in the three entities:
Classes Entity:
Class Name | Class Day | Class Time | Class Room | Teacher Name |
Woodworking | Wednesday | 10.00 | 1b | Miss Stick |
Cookery | Thursday | 12.00 | 2c | Mr Cane |
Maths | Wednesday | 11.00 | 1b | Miss Stick |
Children Entity:
Child Name |
Daisy Smith |
Brian Jones |
Julie Andrews |
Fred Jones |
etc... |
Guardian Entity:
Guardian |
Mr Smith |
Mrs Smith |
Mr Jones |
Miss Andrews |
etc. |
These entities obey all the rules for 1st Normal Form. The next section will convert this data into second normal form.