When to use an auto-incremented primary key and when not to?

I'm trying to figure out the "best practices" for deciding whether or not to add an auto-incrementing integer as the primary key to a table.

Let's say I have a table containing data about the chemical elements. The atomic number of each element is unique and will never change. So rather than using an auto-incrementing integer for each column, it would probably make more sense to just use the atomic number, correct?

Would the same be true if I had a table of books? Should I use the ISBN or an auto-incrementing integer for the primary key? Or a table of employees containing each person's SSN?


Solution 1:

There are a lot of already addressed questions on Stack Overflow that can help you with your questions. See here, here, here and here.

The term you should be looking for: surrogated keys.

Hope it helps.

Solution 2:

This is a highly debated question, with lots of emotion on both sides.

In my humble opinion, if there's a good, useable natural key available -- like an ISBN -- I use it. I'm going to store it on the database anyway. Yes, a natural key is usually bigger than an integer auto-increment key, but I think this issue is overblown. Disk space is cheap today. I'd worry more about it taking longer to process. If you were talking about an 80 byte text field as a primary key, I'd say no. But if you're thinking of using a 10-byte ISBN instead of an 8-byte big integer, I can't imagine that brings much of a performance penalty.

Sometimes there's a performance advantage to natural keys. Suppose, for example, I want to find how many copies of a given book have been sold. I don't care about any of the data from the Book master record. If the primary key is ISBN, I could simply write "select count(*) from sale where isbn='143573338X'". If I used an autoincrement key, I would have to do a join to look up the isbn, and the query becomes more complex and slower, like "select count(*) from book join sale using (bookid) where isbn='143573338X'". (And I can assure you that that as this particular ISBN is for my book, the number of sale records is very small, so doing the join and reading one extra record is a big percentage difference!)

Another advantage of natural keys is that when you have to work on the database and you look at records that refer back to this table by key, it's easy to see what record they're referring to.

On the other hand, if there is no good, obvious natural key, don't try to cobble together a crazy one. I've seen people try to make a natural key by concatenating together the first 6 letters of the customers first name, his year of birth, and his zip code, and then pray that that will be unique. That sort of silliness is just making trouble for yourself. Often people end up taking on a sequence number to insure it's unique anyway, and at that point, why bother? Why not just use the sequence number by itself as the key?

Solution 3:

You've got the idea right there.

Auto-increment should be used as a unique key when no unique key already exists about the items you are modelling. So for Elements you could use the Atomic Number or Books the ISBN number.

But if people are posting messages on a message board then these need a unique ID, but don't contain one naturally so we assign the next number from a list.

It make sense to use natural keys where possible, just remember to make the field as the primary key and ensure that it is indexed for performance

Solution 4:

With regards to using ISBN and SSN you really have to Think about how many rows in other tables are going to reference these through foreign keys because those ids will take up much more space than an integer and thus may lead to a waste of disk space and possibly to worse join performance.