NULL permitted in Primary Key - why and in which DBMS?

Further to my question "Why to use ´not null primary key´ in TSQL?"...

As I understood from other discussions, some RDBMS (for example SQLite, MySQL) permit "unique" NULL in the primary key.

Why is this allowed and how might it be useful?

Background: I believe it is beneficial for communication with colleagues and database professionals to know the differences in fundamental concepts, approaches and their implementations in different DBMS.

Notes

  • MySQL is rehabilitated and returned to the "NOT NULL PK" list.
  • SQLite has been added (thanks to Paul Hadfield) to "NULL PK" list:

For the purposes of determining the uniqueness of primary key values, NULL values are considered distinct from all other values, including other NULLs.

If an INSERT or UPDATE statement attempts to modify the table content so that two or more rows feature identical primary key values, it is a constraint violation. According to the SQL standard, PRIMARY KEY should always imply NOT NULL. Unfortunately, due to a long-standing coding oversight, this is not the case in SQLite.

Unless the column is an INTEGER PRIMARY KEY SQLite allows NULL values in a PRIMARY KEY column. We could change SQLite to conform to the standard (and we might do so in the future), but by the time the oversight was discovered, SQLite was in such wide use that we feared breaking legacy code if we fixed the problem.

So for now we have chosen to continue allowing NULLs in PRIMARY KEY columns. Developers should be aware, however, that we may change SQLite to conform to the SQL standard in future and should design new programs accordingly.

— SQL As Understood By SQLite: CREATE TABLE


Suppose you have a primary key containing a nullable column Kn.

If you want to have a second row rejected on the ground that in that second row, Kn is null and the table already contains a row with Kn null, then you are actually requiring that the system would treat the comparison "row1.Kn = row2.Kn" as giving TRUE (because you somehow want the system to detect that the key values in those rows are indeed equal). However, this comparison boils down to the comparison "null = null", and the standard already explicitly specifies that null doesn't compare equal to anything, including itself.

To allow for what you want, would thus amount to SQL deviating from its own principles regarding the treatment of null. There are innumerable inconsistencies in SQL, but this particular one never got past the committee.


I don't know whether older versions of MySQL differ on this, but as of modern versions a primary key must be on columns that are not null. See the manual page on CREATE TABLE: "A PRIMARY KEY is a unique index where all key columns must be defined as NOT NULL. If they are not explicitly declared as NOT NULL, MySQL declares them so implicitly (and silently)."


As far as relational database theory is concerned:

  • The primary key of a table is used to uniquely identify each and every row in the table
  • A NULL value in a column indicates that you don't konw what the value is
  • Therefore, you should never use the value of "I don't know" to uniquely identify a row in a table.

Depending upon the data you are modelling, a "made up" value can be used instead of NULL. I've used 0, "N/A", 'Jan 1, 1980', and similar values to represent dummy "known to be missing" data.

Most, if not all, DB engines do allow for a UNIQUE constraint or index, which does allow for NULL column values, though (ideally) only one row may be assigned the value null (otherwise it wouldn't be a unique value). This can be used to support the irritatingly pragmatic (but occasionally necessary) situations that don't fit neatly into relational theory.


Well, it could allow you to implement the Null Object Pattern natively within the database. So if you were using something similar in code, which interacted very closely with the DB, you could just look up the object corresponding to the key without having to special-case a null check.

Now whether this is worthwhile functionality I'm not sure, but it's really a question of whether the pros of disallowing null pkeys in absolutely all cases outweigh the cons of obstructing someone who (for better or worse) actually wants to use null keys. This would only be worth it if you could demonstrate some non-trivial improvement (such as faster key lookup) from being able to guarantee that keys are non-null. Some DB engines would show this, others might not. And if there aren't any real pros from forcing this, why artificially restrict your clients?