What are the differences between a superkey and a candidate key?
Candidate key
is a super key
from which you cannot remove any fields.
For instance, a software release can be identified either by major/minor version, or by the build date (we assume nightly builds).
Storing date in three fields is not a good idea of course, but let's pretend it is for demonstration purposes:
year month date major minor
2008 01 13 0 1
2008 04 23 0 2
2009 11 05 1 0
2010 04 05 1 1
So (year, major, minor)
or (year, month, date, major)
are super keys (since they are unique) but not candidate keys, since you can remove year
or major
and the remaining set of columns will still be a super key.
(year, month, date)
and (major, minor)
are candidate keys, since you cannot remove any of the fields from them without breaking uniqueness.
A super key is any combination of columns that uniquely identifies a row in a table. A candidate key is a super key which cannot have any columns removed from it without losing the unique identification property. This property is sometimes known as minimality or (better) irreducibility.
A super key ≠ a primary key in general. The primary key is simply a candidate key chosen to be the main key. However, in dependency theory, candidate keys are important and the primary key is not more important than any of the other candidate keys. Non-primary candidate keys are also known as alternative keys.
Consider this table of Elements:
CREATE TABLE elements
(
atomic_number INTEGER NOT NULL PRIMARY KEY
CHECK (atomic_number > 0 AND atomic_number < 120),
symbol CHAR(3) NOT NULL UNIQUE,
name CHAR(20) NOT NULL UNIQUE,
atomic_weight DECIMAL(8,4) NOT NULL,
period SMALLINT NOT NULL
CHECK (period BETWEEN 1 AND 7),
group CHAR(2) NOT NULL
-- 'L' for Lanthanoids, 'A' for Actinoids
CHECK (group IN ('1', '2', 'L', 'A', '3', '4', '5', '6',
'7', '8', '9', '10', '11', '12', '13',
'14', '15', '16', '17', '18')),
stable CHAR(1) DEFAULT 'Y' NOT NULL
CHECK (stable IN ('Y', 'N'))
);
It has three unique identifiers - atomic number, element name, and symbol. Each of these, therefore, is a candidate key. Further, unless you are dealing with a table that can only ever hold one row of data (in which case the empty set (of columns) is a candidate key), you cannot have a smaller-than-one-column candidate key, so the candidate keys are irreducible.
Consider a key made up of { atomic number, element name, symbol }. If you supply a consistent set of values for these three fields (say { 6, Carbon, C }), then you uniquely identify the entry for an element - Carbon. However, this is very much a super key that is not a candidate key because it is not irreducible; you can eliminate any two of the three fields without losing the unique identification property.
As another example, consider a key made up of { atomic number, period, group }. Again, this is a unique identifier for a row; { 6, 2, 14 } identifies Carbon (again). If it were not for the Lanthanoids and Actinoids, then the combination of { period, group } would be unique, but because of them, it is not. However, as before, atomic number on its own is sufficient to uniquely identify an element, so this is a super key and not a candidate key.