Relational Database Design - Primary Keys

Primary Key

A Primary Key is used to uniquely identify a row of data in a database.

There are more discussions, arguments and probably fights over Primary Keys than anything else in the Relational Database World, but these are the undisputed rules for creating a Primary Key:

A Primary Key value must be UNIQUE in its own table

A Primary Key value may NOT be NULL

A Primary key may value NOT be changed

These may be undisputed rules.... but there are, as always exceptions. If you are creating Databases that will be distributed and will need Merging or Replicating then you MUST read this bit - Replication & Merging. The best thing to do would be to never create a Database that can live many separate existences, but this is the real World so.....

The Big Fight!

Natural Keys vs Surrogate Keys - this is where the arguments go on and on and on......

Natural Primary Keys

A Natural Key is a key that uses meaningful data. For example in the UK most people would consider a person's National Insurance number (Social Security number in the US?) as a candidate for a Natural Primary Key for a People table as it is unique, never changing and everyone has to have one.

But just hold on a minute! Imagine you were building a database to store details of folks buying products from your online store. By using the NI number you have just excluded most of the world from your shop - that's a bit insular in this global age. You have also excluded everyone under the age of 16 and anyone who hasn't applied for a number! Add to this that most people would not, never, no way tell you their NI number and some people may even lie! So this 'Natural' key is NOT a good candidate for a Primary Key.

In my experience I have found very few real Natural Primary Keys - most clients will insist that their Product ID will never change, they will never add a digit to their Payroll number set.... BUT, this is the real world and stuff does change.

Surrogate Primary Keys

A Surrogate Primary Key is an anonymous, meaningless (except the database!) integer value. Most database systems will offer to auto create the values for these fields.

I use Surrogate Primary Keys and I would strongly suggest that you do too!

So, why the arguments? Some designers get a bit touchy about extraneous data, and adding a whole integer field to a database table would give them sleepless nights for years! Don't worry about it! In the old days when we had to fit all our code and variables into 32k and disk space was more expensive than prime real estate and relational databases were just a dream - we had to worry about every bit (yes bit not byte!) - those days are gone so add those surrogate keys!

As I have mentioned previously I add an Integer Primary Key named 'ID' to EVERY table I create. I never have to remember what I have called the Primary Key and writing generic code based on the primary key is easy!

The SQL School section of this site will cover the actual creation of keys and we see that there are subtle differences in the type of value allocation provided.

Simon's Rule : Every Table will have ONE non recycled (surrogate), Primary Key named 'ID'

BUT... When the rules don't fit

All the above rules work just fine on a single Database. It could be a humongous Database being accessed by millions of users from all over the Internet or a tiny standalone Database - the rules are good.

However when you are designing Databases that will have distributed versions that will need to Replicated or Merged then the rules will fail you.

Please have a look at this IMPORTANT section for more information on Replication & Merging