Relational Database Design - Background History

Flat Files

Before we had Relational Databases we had flat files. A collection of information (data) could be stored in these files. The You could have many files, each storing a particular group of information for example a file called 'Children', each line of which was a particular student, another file could be called 'Classes'.

Children File Classes File
Susan Smith Cooking
Harry Totter Needlework
... ....

As you can see it would be very difficult to link these two files together to produce a list of Children taking particular courses. We could use another type of flat file where all the required data is listed in a single file. The groups of information would be 'delimited' (separated) by various means; there could be commas separating the pieces of information - this is what a CSV (Comma Separated Values) file does.

Susan Smith,Cooking
Susan Smith,Needlework
Susan Smith,Gardening
Harry Totter,Cooking
Harry Totter,Nuclear Physics
......

Comma separated data, One set of data per line

Tabs could be used to separate groups of information, or we could use the character position in the file, padding out the bits of information with spaces:

12345678901234567890123456789012
Susan Smith______Cooking
Susan Smith______Needlework
Susan Smith______Gardening
Harry Totter_____Cooking
Harry Totter_____Nuclear Physics
......

Positional Data Separation - Name 1-17, Class 18-32

There are some problems associated with these sorts of files. There is a lot of data duplication. In the examples above changing a person's name would mean identifying every occurrence of the name and making sure it was changed. One of the aims of Normalisation - don't worry we'll cover that later in the Database Design - Normalisation lesson - is to remove duplication.

Finding data is also difficult to find one particular record we would have to read all the records in the file.

So, along came indexes. An index was a smaller file that held key information on the data in our flat file, so we might have records a bit like this:

Susan Smith,1,2,3
Harry Totter,4,5
......
This would tell us which lines (records) in the main file we needed to read to find a particular person's data.

These worked quite well and kept a lot of us in jobs as there was a LOT of coding to do to join all these files (all with different structures and formats) together. Then some clever people decided it would be a good idea if we could create a system where data could be linked together in a Master / Detail or Parent / Child format with this system dealing with all the indexing - the Relational Database.