Are primary keys passé?

A completely different perspective :

SQL is a language that is defined by an ISO standard. That standard has "mandatory" features and "optional conformance" features.

If you build a DBMS with some data manipulation language, then you are entitled to call your language "SQL" only if :

(a) you have implemented ALL of the syntax prescribed by the standard ("mandatory" features) , and (b) all of the language features that you have implemented (all the mandatory ones as a minimum, but also the "optional" ones you "opted in" for), expose exactly the behaviour as defined/Described in the standard.

The "PRIMARY KEY" syntax is a very old feature, and it's not unlikely that it is one of those "mandatory" ones. Ditching the word from your language means you can no longer legitimately call your language SQL. Big commercial vendors are not likely going to make such a move any time soon.

The idea of designating one key per table as a "primary" one is essentially superfluous, outdated and in many ways very unhelpful.

It is superfluous because logically speaking all keys can and do serve the same function. Leaving aside the limitations of any particular DBMS, logically speaking a "primary" key enjoys exactly the same features and functionality as any other key of the same table. The designation of one key as "primary" is therefore only as important as the database designer or user wants it to be. The distinction is arbitrary (that's the word used by E.F.Codd) and purely psychological (C.J.Date).

The concept is outdated because in modern practice it is commonplace for tables to have more than one key and for different users and consumers of data to have different "preferred" or "most significant" identifiers for the same piece of data. E.g.: an end user may recognise and use one key of a table (often the one called a "business" or "natural" key); a middle-tier programmer will possibly be more interested in a different key in the same table (e.g. a "surrogate" key); the DBA on the other hand may view the "clustered" key as the most important or maybe he is equally concerned with all keys that have indexes. So the preferred or most important key depends on the point-of-view and the intended usage - it is not a rigid structural feature at all.

The "primary key" concept is unhelpful for at least two reasons. Firstly, software vendors of database development tools, DBMSs and modelling tools have unfortunately attached all sorts of software features to the keys designated as "primary key". This actually works against the original concept. No longer do we just need to select one key per table that has some logical significance for the designer or user. We are encouraged or even compelled to choose "primary" keys to support this or that feature in X,Y or Z piece of software, regardless of other considerations. This is very regrettable because it represents a limitation and a lack of flexibility in software. We ought to be free to choose an appropriate key for each purpose and not be restricted to just one key per table for every purpose.

The final reason that primary keys are unhelpful is that they are a needless distraction from more important issues of database design. The primary key concept is given often vastly exaggerated significance in education, in textbooks on database design and in everyday data management practice. This is frequently to the detriment or actual exclusion of the more fundamental issue, i.e. that all of the keys and all of the other integrity constraints can be just as important to successful database design and implementation.

I have often argued that the term "primary key" ought to be deprecated and dropped from data management vocabulary as well as from data management software.

Primary key is a logical concept. It is the key that defines the entity identity: in a table of Widgets, each individual Widget is distinguished by its primary key value. PK is not the clustered index (that is a physical storage property), nor an unique constraint (that is a different logical property). While often the primary key and the clustered key overlap, that is just a coincidence (PK is a convenient clustered key) or even just a negligence (PK is used as a clustered key even though better candidates exist for the given workload).

Changing the clustered key is a change that can be done at any moment, on the field, by ops, to better fit this storage requirement or that performance workload requirement. The app should not notice the change (in an ideal world...). Changing the PK is a design change that require modifying the data model of the app as the identifier of the object changes, and it usually percolates through the data mode/app code.

BTW, this topic has been asked and answered ad-nauseam already here:

What column should the clustered index be put on?
Should primary keys be always assigned as clustered index
Should I design a table with a primary key of varchar or int?

To elaborate a bit on the difference between PK and UNIQUE constraints: even if there are several properties that have unique constraints and therefore could serve as a PK, only one will be the right choice, they are not equal. Which one is entirely up to the data model, to the business meaning the entity and what each property represents. The PK is not important for the DBMS, the DBMS really cares about clustered key and uniqueness a whole lot more than about PK. The PK is for you, the developer, and for your toolset. You don't want each developer pointing to the database with it's ORM tool to pick a different unique key as the entity identity, and then each one write code that stores a different property as the identity. You want all to pick the same one, the primary key, because that has other attributes in addition to being unique. A prime example is Stability, the PK value is stable for the entire lifetime of the entity (if is not, then the PK was not chosen correctly).

what features do PK's provide that cannot reasonably be implemented using other features?

The little SSMS icon. Seriously, ultimately this is what it boils down to: the PK conveys the extra information which of the possible keys is actually the one that identifies the entities in the table. Path Dependence does play a significant role in today's PK position, agreed, but if not for that some other construct would rise up to fulfill exactly this role of conveying intent about the logical model.

Theoretically, all keys are equivalent, but we choose one of them as "primary" for psychological and practical reasons. Some considerations:

The PK fields are automatically NOT NULL. UNIQUE fields are NOT NULL only if you specify them as such (BTW, NULLs in the UNIQUE constraint are often treated differently by different DBMSes).
FOREIGN KEY syntax defaults to parent's PK. If you want to use parent's UNIQUE constraint, you need to specify it explicitly.
Clustering is typically based on PK.
PK is often displayed in a visually distinct way by ER tools. This can document that (psychologically) we consider one key more "important" than others.

A lot of this is tradition - we could just as easily have the conventions and tools that make all the keys equivalent not just in theory but in practice as well, but history can be a powerful force, even in a relatively young industry such as ours.

You point out that

logically a Primary Key coveys a bit of intention

and Aaron Betrand points out in the comments

You can have multiple unique constraints or unique indexes, but only one should be the way that you typically expect to identify a row

I'm guessing that Aaron used words like should and typically because he knows that even foreign key contraints only require a unique contraint

From MSDN docs on SQL FOREIGN KEY Constraints

A FOREIGN KEY constraint does not have to be linked only to a PRIMARY KEY constraint in another table; it can also be defined to reference the columns of a UNIQUE constraint in another table

Furthermore C.J. Date also notes in An introduction to database systems

If the set of candidate keys actually does include more than one member, then the choice of which is to be primary is essentially arbitrary *

This leads me to conclude that Primary Keys don't indeed provide much except for convention. But it is one that is so heavily integrated into the tools we use and the mental models of most people that it can't be ignored.

*C.J. Date does explain here that choosing a Primary Key isn't completely arbitrary. For example a volatile primary key would be a bad idea.

Are primary keys passé?

Related

Recent Posts