What's the best practice for primary keys in tables? [closed]

When designing tables, I've developed a habit of having one column that is unique and that I make the primary key. This is achieved in three ways depending on requirements:

Identity integer column that auto increments.
Unique identifier (GUID)
A short character(x) or integer (or other relatively small numeric type) column that can serve as a row identifier column

Number 3 would be used for fairly small lookup, mostly read tables that might have a unique static length string code, or a numeric value such as a year or other number.

For the most part, all other tables will either have an auto-incrementing integer or unique identifier primary key.

The Question :-)

I have recently started working with databases that have no consistent row identifier and primary keys are currently clustered across various columns. Some examples:

datetime/character
datetime/integer
datetime/varchar
char/nvarchar/nvarchar

Is there a valid case for this? I would have always defined an identity or unique identifier column for these cases.

In addition there are many tables without primary keys at all. What are the valid reasons, if any, for this?

I'm trying to understand why tables were designed as they were, and it appears to be a big mess to me, but maybe there were good reasons for it.

A third question to sort of help me decipher the answers: In cases where multiple columns are used to comprise the compound primary key, is there a specific advantage to this method vs. a surrogate/artificial key? I'm thinking mostly in regards to performance, maintenance, administration, etc.?

I follow a few rules:

Primary keys should be as small as necessary. Prefer a numeric type because numeric types are stored in a much more compact format than character formats. This is because most primary keys will be foreign keys in another table as well as used in multiple indexes. The smaller your key, the smaller the index, the less pages in the cache you will use.
Primary keys should never change. Updating a primary key should always be out of the question. This is because it is most likely to be used in multiple indexes and used as a foreign key. Updating a single primary key could cause of ripple effect of changes.
Do NOT use "your problem primary key" as your logic model primary key. For example passport number, social security number, or employee contract number as these "natural keys" can change in real world situations. Make sure to add UNIQUE constraints for these where necessary to enforce consistency.

On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.

Natural verses artifical keys is a kind of religious debate among the database community - see this article and others it links to. I'm neither in favour of always having artifical keys, nor of never having them. I would decide on a case-by-case basis, for example:

US States: I'd go for state_code ('TX' for Texas etc.), rather than state_id=1 for Texas
Employees: I'd usually create an artifical employee_id, because it's hard to find anything else that works. SSN or equivalent may work, but there could be issues like a new joiner who hasn't supplied his/her SSN yet.
Employee Salary History: (employee_id, start_date). I would not create an artifical employee_salary_history_id. What point would it serve (other than "foolish consistency")

Wherever artificial keys are used, you should always also declare unique constraints on the natural keys. For example, use state_id if you must, but then you'd better declare a unique constraint on state_code, otherwise you are sure to eventually end up with:

state_id    state_code   state_name
137         TX           Texas
...         ...          ...
249         TX           Texas

I avoid using natural keys for one simple reason -- human error. Although natural unique identifiers are often available (SSN, VIN, Account Number, etc.), they require a human to enter them correctly. If you're using SSNs as a primary key, someone transposes a couple of numbers during data entry, and the error isn't discovered immediately, then you're faced with changing your primary key.

My primary keys are all handled by the database program in the background and the user is never aware of them.

What's the best practice for primary keys in tables? [closed]

The Question :-)

Related

Recent Posts