Although one can use composite primary keys, for the case below, is it really a bad practice? The consensus on Stackoveflow seems to go both ways on this issue.

Why?


I want to store payments for the orders in a separate table. The reason is that, an order can have many items which are handled in a separate table in the form of many to many relationship. Now, if I don't use composite primary keys for my payment table, I'll lose my unique PaymentID:

[PaymentId] INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[OrderId] INT NOT NULL PRIMARY KEY --Also a Foreign Key--

Now, if I just remove the Primary Key for the OrderId, I'll lose my one to one relationship here so Many OrderIds can be associated to many PaymentIds, and I don't want this.

This seems to be why other answers on SO have concluded (mostly) that the composite key is a bad idea. If it is bad, what's the best practice then?


Solution 1:

There is no conclusion that composite primary keys are bad.

The best practice is to have some column or columns that uniquely identify a row. But in some tables a single column is not enough by itself to uniquely identify a row.

SQL (and the relational model) allows a composite primary key. It is a good practice is some cases. Or, another way of looking at it is that it's not a bad practice in all cases.

Some people have the opinion that every table should have an integer column that automatically generates unique values, and that should serve as the primary key. Some people also claim that this primary key column should always be called id. But those are conventions, not necessarily best practices. Conventions have some benefit, because it simplifies certain decisions. But conventions are also restrictive.

You may have an order with multiple payments because some people purchase on layaway, or else they have multiple sources of payment (two credit cards, for instance), or two different people want to pay for a share of the order (I frequently go to a restaurant with a friend, and we each pay for our own meal, so the staff process half of the order on each of our credit cards).

I would design the system you describe as follows:

Products  : product_id (PK)

Orders    : order_id (PK)

LineItems : product_id is (FK) to Products
            order_id is (FK) to Orders
            (product_id, order_id) is (PK)

Payments  : order_id (FK)
            payment_id - ordinal for each order_id
            (order_id, payment_id) is (PK)

This is also related to the concept of identifying relationship. If it's definitional that a payment exists only because an order exist, then make the order part of the primary key.

Note the LineItems table also lacks its own auto-increment, single-column primary key. A many-to-many table is a classic example of a good use of a composite primary key.

Solution 2:

This question is dangerously close to asking for opinions, which can generate religious wars. As someone who is highly biased toward having auto-increasing integer primary keys in my tables (called something like TablenameId, not Id), there is one situation where it is optional.

I think the other answers address why you want primary keys.

One very important reason is for reference purposes. In a relational database, any entity could -- in theory -- be referenced by another entity via foreign key relationships. For foreign keys, you definitely want one column to uniquely define a row. Otherwise, you have to deal with multiple columns in different tables that align with each other. This is possible, but cumbersome.

The table you are referring to is not an "entity" table it is a "junction" table. It is a relational database construct for handling many-to-many relationships. Because it doesn't really represent an entity, it should not have foreign key relationships. Hence, a composite primary key is reasonable. There are some situations, such as when you are concerned about database size, where leaving out an artificial primary key is even desirable.

Solution 3:

Disk space is cheap, so a primary key clustered on an int identity(1,1) named after a convention (like pk + table name) is a good practice. It will make queries, joins, indexes and other constraints easy to manage.

However there's one good reason to no do that (in MS SQL Server at least): if you want to manage the physical sorting of your data in the underlying storage system.

The primary key clustered determines the physical sorting order. If you do it on an identity column, the physical sorting order is basically the insert order. However, this may not be the best, especially if you always query the table the same way. On very large tables, getting the right physical sorting order makes queries a lot faster. For example you may want the clustered index on a composite of two columns.

Solution 4:

Best practices are helpful at best, but blinding at worst. Going against a best practice isn't a sin. Just be sure you know what kind of trade-off you are making.

Database engines can be very complicated things. Without knowing what particular optimizations are made by a given engine, it will be difficult to determine what kinds of constructs will yield the best performance (because I assume that the issue we are talking about here is performance). Composite keys may be problematic for large tables in one kind of database, but not have any noticeable impact for another.

A useful practice I've learned is to always strive for having my applications as simple as possible. Do using composite keys save you from having to perform lookups before insertions, or some other nuisance? Use them. If you, however, notice that using them makes your application no longer satisfy some significant performance requirement, consider a solution without them.