What is the difference between a primary key and a surrogate key?

Solution 1:

The primary key is a unique key in your table that you choose that best uniquely identifies a record in the table. All tables should have a primary key, because if you ever need to update or delete a record you need to know how to uniquely identify it.

A surrogate key is an artificially generated key. They're useful when your records essentially have no natural key (such as a Person table, since it's possible for two people born on the same date to have the same name, or records in a log, since it's possible for two events to happen such they they carry the same timestamp). Most often you'll see these implemented as integers in an automatically incrementing field, or as GUIDs that are generated automatically for each record. ID numbers are almost always surrogate keys.

Unlike primary keys, not all tables need surrogate keys, however. If you have a table that lists the states in America, you don't really need an ID number for them. You could use the state abbreviation as a primary key code.

The main advantage of the surrogate key is that they're easy to guarantee as unique. The main disadvantage is that they don't have any meaning. There's no meaning that "28" is Wisconsin, for example, but when you see 'WI' in the State column of your Address table, you know what state you're talking about without needing to look up which state is which in your State table.

Solution 2:

A surrogate key is a made up value with the sole purpose of uniquely identifying a row. Usually, this is represented by an auto incrementing ID.

Example code:

CREATE TABLE Example
(
    SurrogateKey INT IDENTITY(1,1) -- A surrogate key that increments automatically
)

A primary key is the identifying column or set of columns of a table. Can be surrogate key or any other unique combination of columns (for example a compound key). MUST be unique for any row and cannot be NULL.

Example code:

CREATE TABLE Example
(
    PrimaryKey INT PRIMARY KEY -- A primary key is just an unique identifier
)

Solution 3:

All keys are identifiers used as surrogates for the things they identify. E.F.Codd explained the concept of system-assigned surrogates as follows [1]:

Database users may cause the system to generate or delete a surrogate, but they have no control over its value, nor is its value ever displayed to them.

This is what is commonly referred to as a surrogate key. The definition is immediately problematic however because Codd was assuming that such a feature would be provided by the DBMS. DBMSs in general have no such feature. The keys are normally visible to at least some DBMS users as, for obvious reasons, they have to be. The concept of a surrogate has therefore morphed slightly in usage. The term is generally used in the data management profession to mean a key that is not exposed and used as an identifier in the business domain. Note that this is essentially unrelated to how the key is generated or how "artificial" it is perceived to be. All keys consist of symbols invented by humans or machines. The only possible significance of the term surrogate therefore relates how the key is used, not how it is created or what its values are.

[1] Extending the database relational model to capture more meaning, E.F.Codd, 1979

Solution 4:

This is a great treatment describing the various kinds of keys:

http://www.agiledata.org/essays/keys.html

Solution 5:

A surrogate key is typically a numeric value. Within SQL Server, Microsoft allows you to define a column with an identity property to help generate surrogate key values.

The PRIMARY KEY constraint uniquely identifies each record in a database table. Primary keys must contain UNIQUE values. A primary key column cannot contain NULL values. Most tables should have a primary key, and each table can have only ONE primary key.

http://www.databasejournal.com/features/mssql/article.php/3922066/SQL-Server-Natural-Key-Verses-Surrogate-Key.htm