Key value pairs in relational database

Solution 1:

Before you continue on your approach, I would humbly suggest you step back and consider if you really want to store this data in a "Key-Value Pair"table. I don't know your application but my experience has shown that every time I have done what you are doing, later on I wish I had created a color table, a fabric table and a size table.

Think about referential integrity constraints, if you take the key-value pair approach, the database can't tell you when you are trying to store a color id in a size field

Think about the performance benefits of joining on a table with 10 values versus a generic value that may have thousands of values across multiple domains. How useful is an index on Key Value really going to be?

Usually the reasoning behind doing what you are doing is because the domains need to be "user definable". If that is the case then even I am not going to push you towards creating tables on the fly (although that is a feasible approach).

However, if your reasoning is because you think it will be easier to manage than multiple tables, or because you are envisioning a maintenance user interface that is generic for all domains, then stop and think really hard before you continue.

Solution 2:

There is another solution that falls somewhere between the two. You can use an xml type column for the keys and values. So you keep the itemid field, then have an xml field that contains the xml defined for some key value pairs like <items> <item key="colour" value="red"/><item key="xxx" value="blah"/></items> Then when you extract your data fro the database you can process the xml in a number of different ways. Depending on your usage. This is an extend able solution.

Solution 3:

In most cases that you would use the first method, it's because you haven't really sat down and thought out your model. "Well, we don't know what the keys will be yet". Generally, this is pretty poor design. It's going to be slower than actually having your keys as columns, which they should be.

I'd also question why your id is a varchar.

In the rare case that you really must implement a key/value table, the first solution is fine, although, I'd generally want to have the keys in a separate table so you aren't storing varchars as the keys in your key/value table.

eg,

CREATE TABLE valid_keys ( 
    id            NUMBER(10) NOT NULL,
    description   varchar(32) NOT NULL,
    CONSTRAINT pk_valid_keys PRIMARY KEY(id)
);

CREATE TABLE item_values ( 
    item_id NUMBER(10) NOT NULL,
    key_id  NUMBER(10) NOT NULL,
    item_value VARCHAR2(32) NOT NULL,
    CONSTRAINT pk_item_values PRIMARY KEY(item_id),
    CONSTRAINT fk_item_values_iv FOREIGN KEY (key_id) REFERENCES valid_keys (id)
);

You can then even go nuts and add a "TYPE" to the keys, allowing some type checking.

Solution 4:

I once used key-value pairs in a database for the purpose of creating a spreadsheet (used for data entry) in which a teller would summarize his activity from working a cash drawer. Each k/v pair represented a named cell into which the user entered a monetary amount. The primary reason for this approach is that the spreadsheet was highly subject to change. New products and services were added routinely (thus new cells appeared). Also, certain cells were not needed in certain situations and could be dropped.

The app I wrote was a rewrite of an application that did break the teller sheet into separate sections each represented in a different table. The trouble here was that as products and services were added, schema modifications were required. As with all design choices there are pros and cons to taking a certain direction as compared to another. My redesign certainly performed slower and more quickly consumed disk space; however, it was highly agile and allowed for new products and services to be added in minutes. The only issue of note, however, was disk consumption; there were no other headaches I can recall.

As already mentioned, the reason I usually consider a key-value pair approach is when users—this could be a the business owner—want to create their own types having a user-specific set of attributes. In such situations I have come to the following determination.

If there is either no need to retrieve data by these attributes or searching can be deferred to the application once a chunk of data has been retrieved, I recommend storing all the attributes in a single text field (using JSON, YAML, XML, etc.). If there is a strong need to retrieve data by these attributes, it gets messy.

You can create a single "attributes" table (id, item_id, key, value, data_type, sort_value) where the sort column coverts the actual value into a string-sortable representation. (e.g. date: “2010-12-25 12:00:00”, number: “0000000001”) Or you can create separate attribute tables by data-type (e.g. string_attributes, date_attributes, number_attributes). Among numerous pros and cons to both approaches: the first is simpler, the second is faster. Both will cause you to write ugly, complex queries.

Solution 5:

From experience, i have found that certain keys will be more widely used or queried more often. We have usually then slightly de-normalized the design to include a specific field back in the main "item" table.

eg. if every Item has a Colour, you might add the Colour column to your item table. Fabric and Size may be used less often and can be kept separate in the key-value pair table. You may even keep the colour in the key-value pair table, but duplicate the data in the item table to get the performance benefits.

Obviously this varies depending on the data and how flexible you need the key-value pairs to be. It can also result in your attribute data not being located consistantly. However, de-normalizing does greatly simplify the queries and improves their performance as well.

I would usually only consider de-normalizing when performance becomes and issue, not just to simplify a query.