Why use an auto-incrementing primary key when other unique fields exist?

I'm taking a course called "database systems" and for our class project I have to design a website.

Here's an example of a table I created:

CREATE TABLE users
(
  uid INT NOT NULL AUTO_INCREMENT,
  username VARCHAR(60),
  passhash VARCHAR(255),
  email VARCHAR(60),
  rdate DATE,
  PRIMARY KEY(uid)
);

The professor told me "uid" (user id) was completely useless and unnecessary and I should have used the username as the primary key, since no two users can have the same username.

I told him it was convenient for me use a user id because when I call something like domain.com/viewuser?id=5 I just check the parameter with: is_numeric($_GET['id'])... needless to say he was not convinced.

Since I've seen user_id and other similar attributes (thread_id, comment_id, among others) on plenty of tutorials and looking at the database schema of popular software (eg. vbulletin) there must be plenty of other (stronger) reasons.

So my question is: How would you justify the need of a not null auto incrementing id as a primary key vs using another attribute like the username?


Solution 1:

Auto-incrementing primary keys are useful for several reasons:

  • They allow duplicate user names as on Stack Overflow
  • They allow the user name (or email address, if that's used to login) to be changed (easily)
  • Selects, joins and inserts are faster than varchar primary keys as its much faster to maintain a numeric index
  • As you mentioned, validation becomes very simple: if ((int)$id > 0) { ... }
  • Sanitation of input is trivial: $id = (int)$_GET['id']
  • There is far less overhead as foreign keys don't have to duplicate potentially large string values

I would say trying to use any piece of string information as a unique identifier for a record is a bad idea when an auto-incrementing numeric key is so readily available.

Systems with unique user names are fine for very small numbers of users, but the Internet has rendered them fundamentally broken. When you consider the sheer number of people named "john" that might have to interact with a website, it's ridiculous to require each of them to use a unique display name. It leads to the awful system we see so frequently with random digits and letters decorating a username.

However, even in a system where you enforced unique usernames, it's still a poor choice for a primary key. Imagine a user with 500 posts: The foreign key in the posts table is going to contain the username, duplicated 500 times. The overhead is prohibitive even before you consider that somebody might eventually need to change their username.

Solution 2:

If the username is the primary key and a user changes his/her username, you will need to update all the tables which have foreign key references to the users table.

Solution 3:

If you have demonstrated to your professor that assigning a unique arbitrary integer to each user is of value to your application then of course he would be wrong to say that it is "completely useless and unnecessary".

However, maybe you missed his point. If he told you that the requirement is that "no two users can have the same username" then you haven't met that requirement.

Sincere thanks for posting your SQL DDL, it is very useful but most don't bother on SO.

Using your table, I can do this:

INSERT INTO users (username) VALUES (NULL);
INSERT INTO users (username) VALUES (NULL);
INSERT INTO users (username) VALUES (NULL);
INSERT INTO users (username) VALUES (NULL);
INSERT INTO users (username) VALUES (NULL);

Which results in this:

SELECT uid, username, passhash, email, rdate 
FROM users;

uid   username   passhash   email   rdate
1     <NULL>     <NULL>     <NULL>  <NULL>
2     <NULL>     <NULL>     <NULL>  <NULL>
3     <NULL>     <NULL>     <NULL>  <NULL>
4     <NULL>     <NULL>     <NULL>  <NULL>

I think is the point your professor was trying to make: without enforcing the natural key on username you don't really have any data integrity at all.

If I was the prof, I'd also urge you to remove nullable columns from your design.

Solution 4:

This is typically called a surrogate key and it has many benefits. One of which is insulating your database relationships from the application data. More details and the corresponding disadvantages can be found at the wiki link provided above.

Solution 5:

Because someone might want to change their username (or any name for that matter).