Learning SQL - Design Tip - Self Referencing & Domain Table

These tips are mostly applicable to all forms of SQL, some are specific to SQLite and versions of SQLite like RealSQL Databases and RealSQL Server databases.

Lookup Values

Every Database Project I have ever worked on has needed a lookup Table. There will always be lists of values you need to select from.

In our Test Project table QiSQL_Children we decide we need to add a column 'Gender'. So, for every record we would need to enter 'Male' or 'Female' and possibly in the future 'Transgender' or there may be an option for 'Non Specified'. We don't want to have to type these values in every time we add a new record - it's a waste of time, a waste of disk space and we have the possibility of incorrect data being added to our table.

So let's consider adding a table for Gender. A Table just for Gender would be daft! There would only normally be two values in it. Maybe it would be better to create a 'Domains' Table where we could group together different sets of Lookup Data. So how about a Table that stored Lookup Data like this:

Domain Value
Gender Male
Gender Female
Hair Colour Brown
Hair Colour Black
Hair Colour Blonde
Hair Colour Green

NO!! Loads of Repeating Data that breaks all our Normalisation Rules. We will have to break this design down into separate entities (you remember what they are don't you - chunks of data). Let's create a simple Entity Relationship Model for Domains and Values.

Data Model - Domains 1

So we would have a Foreign Key in the Child Table -' Domain Values' linking it to the Primary Key in its Parent 'Domains'

Domain Hierarchy

The solution above would work reasonably well until we decided that we need a more complex Domain Hierarchy with more than one detail level.

Hierarchies are used a lot in Databases. Say for example we wanted a lookup field to store Towns. We may want to know which county or state the town was in , we may also need to know which country the town was in, which continent etc. So we would need a Data Model a bit like this:

Data Model for Multi Level Domain

 

 

Can you spot the problem with this kind of design? How would we know how many levels to build in - our Domain table is supposed to handle ALL our lookup data. It would be also be a real pain maintaining these multiple tables.

I am now going to introduce another new concept:

Self Referential Tables

How about this for a Data relationship Model:

Data Model - Selef Referential

A Table with a Foreign Key linking it to itself! Lets have a look at the Table Creation SQL command and add a few record to get a better idea of how this works:

create table lookup_domains
(id integer primary key autoincrement,
parent_id integer,
domain text );

insert into lookup_domains ( domain ) values ( 'Colours' );

This new record has No Parent_id - so it IS a parent Domain. When we want to insert Child records for we use the ID of the Parent. If you are using command line SQLite you will have to do a Select to find out the last ID number allocated (id is an Integer Primary Key and as we did not insert a value SQLite will automatically populate it for us). If you are using the SQL Execute screen in QiSQL the last ID number will be automatically displayed for you.

Now we can add another level to the Colours Domain

insert into lookup_domains ( parent_id , domain ) values ( 1 , 'Bright Colours' );

That record returned ID of 2 as Parent Id so let's use to insert some actual colours:

insert into lookup_domains ( parent_id , domain ) values ( 2 , 'Red' );
insert into lookup_domains ( parent_id , domain ) values ( 2 , 'Blue' );
insert into lookup_domains ( parent_id , domain ) values ( 2 , 'Yellow' );

Now let's add some more parent domains and domain values to make our example more realistic:

insert into lookup_domains ( parent_id , domain ) values ( 1 , 'Light Colours' );

insert into lookup_domains ( parent_id , domain ) values ( 6 , 'Grey' );
insert into lookup_domains ( parent_id , domain ) values ( 6 , 'Pink' );
insert into lookup_domains ( parent_id , domain ) values ( 6 , 'White' );


insert into lookup_domains ( domain ) values ( 'Sizes' );

insert into lookup_domains ( parent_id , domain ) values ( 10 , 'Small' );
insert into lookup_domains ( parent_id , domain ) values ( 10 , 'Medium' );
insert into lookup_domains ( parent_id , domain ) values ( 10 , 'large' );

Now we have a variable level hierarchy - Colours is divided into to sub divisions - Bright Colours and Light Colours, Sizes does not have any sub divisions. We got the Parent Id from the Last Rowid (ID Primary Key) from the previous insert, QiSQL data Manager will return this for you or you can issue a SQL Select statement like this:

select max (id) from lookup_domains;

PLEASE make sure you have read the section on Primary Keys and Rowids before relying on this max(rowid) solution. It would probably be best to use an Autoincrement Primary Key field if you are going to used max(rowid) as you can be sure that Primary Key numbers will not be reused.

Walking the Tree?

So far so good.But, and it's a big BUT, Sqlite does not currently support Recursive Selects, (also known as Tree Walking). The concept of Tree Walking is probably a bit advanced for this lesson, but I'll try and explain it simply. Using statements like Connect By Prior (Not in SQLite - shame!) you can write a SQL Select command to return a hierarchy of Values a bit like this (selecting ID, Parent_ID, Domain):

1		 Colours
2 1 Bright Colours
3 2 Red
4 2 Blue
5 2 Yellow 6 1 Light Colours 7 6 Grey 8 6 Pink 9 6 White 10 Sizes 11 10 Small 12 10 Medium 13 10 Large

Well we can't do that with SQLite (yet) so what can we do? Luckily for the purposes of this example (and creating A Domain Lookup table generally) we are only really interested in Bottom level domains and their parents. We can write some fancy code later to show our Hierarchies, but for now knowing that the important bit of information is right at the bottom of out 'tree' we can issue a sql select statement like this

create view vw_bottom_level_domains as
select d1.parent_id parent_id,
d2.domain Parent_Domain,
d1.id Child_Id,
d1.domain Child_Domain
from lookup_domains d1 ,
lookup_domains d2
where d1.id not in ( select d3.parent_id from lookup_domains d3 )
and d1.parent_id = d2.id

What we are saying here is give me all the records that are NOT parents - with this bit of the code : where d1.id not in ( select d3.parent_id from lookup_domains d3 ). So if they are not parents they must be bottom level children - cool!

Notice how I created a View? Now I can just query my Bottom Level Domains and their parents by issuing this SQL Select query:

select * from vw_bottom_level_domains ;

And here's the result:

Parent Id Parent_Domain Child_Id Child_Domain
2 Bright Colours 3 Red
2 Bright Colours 4 Blue
2 Bright Colours 5 Yellow
6 Light Colours 7 Grey
6 Light Colours 8 Pink
6 Light Colours 9 White
10 Sizes 11 Small
10 Sizes 12 Medium
10 Sizes 13 Large