Designing an EAV database correctly for historical data
An EAV 'database' [sic] is literally mathematically straightforwardly an undocumented description in triples of a database and its metadata, with no functionality to tablulate relationships, or query relationships, or query metadata, or type check, or maintain integrity, or optimize, or transact atomically, or control concurrency.
Software engineering principles dictate that sound EAV database [sic] use consist entirely of defining appropriate abstractions (types, operators, processes, interpreters, modules) reconstructing functionality of a DBMS.
The mechanical nature of the mapping from one's EAV triples and their meanings to a (fragmented) database description makes this easy to show.
To paraphrase Greenspun, any sufficiently complex EAV project contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of a DBMS.
I repeat: EAV is an undocumented description in triples of a database and its metadata, with no DBMS. Use EAV only for parts of a database where you have demonstrated that a DDL solution cannot meet performance requirements and that an EAV solution can and is worth it.
Here are some problems with this design.
How would you query for the current value of all integer attributes of a given entity?
How would you model an attribute that should be
NOT NULL
? That is, make sure a given attribute is mandatory for its entity, and the entity cannot be created without a value for that attribute.How would you model a UNIQUE column? Assume you can change an attribute's value and then change it back to the original value.
How do you support foreign keys that reference a entity with something other than an integer primary key?
How do you restrict a given attribute to the set of values in a lookup table?
The only way to solve most of these is by using application code. That's the problem with EAV: you end up reinventing many constraints we take for granted with SQL. It's an example of the Inner Platform Effect antipattern:
The inner-platform effect is the tendency of software architects to create a system so customizable as to become a replica, and often a poor replica, of the software development platform they are using.
Sixth normal form is not EAV. In Sixth normal form, you need a separate table per attribute, not per data type. You use conventional columns with the appropriate name and data type. Storing this attribute in a distinct table is what gives you the ability to store historical revisions.
This means you still can't model NOT NULL
in 6NF, but at least you can model UNIQUE
and FOREIGN KEY
in a pretty conventional way.
"I have been reading about EAV database and most of the short comings seem to be related to really, really, bad EAV designs or difficulty generating reports from the data."
The difficulties generating reports derive inherently and inevitably from the kind of facts that an EAV DB represents : "The value of attribute BIRTHDATE for person XYZ is ..." "The value of attribute DECEASEDATE for person XYZ is ..." etc. etc.
That is NOT the typical form in which end users think of the data structures for carrying information about person XYZ (or any other), ergo somewhere in between the end user and the DB, additional transformation (very akin to pivoting, though not exactly 100%) is necessary. Each additional transformation is a potential source of bugs and of performance loss.
"Usually when you see people complaining about EAV they are using less than three tables to try to replicate the functionally of separate tables + columns in a RDBMS. Sometimes that means storing everything from decimals to strings in a single TEXT value column."
That is ONLY ONE of the downsides of EAV. Attribute-level type constraints become harder or impossible to define. But there are others besides this.
"EAV also messes with the safe-guards over data integrity which can be very bad if you are not careful."
This one is entirely correlated to the difficulty of report generation, which is the exact same thing as the difficulty of expressing meaningful queries, which is the exact same thing as the difficulty of expressing the scenarios that constitute a violation of some given rule.
"However, EAV does provide an easy way to track historical data and allows us to move parts of the system back and forth between SQL and key-value store systems."
BS & baloney. EAV applied rigorously will move the time information just as far away from the things it applies to as any other "regular" attribute. If you don't do this, then you're not applying EAV any longer (strictly). See Bill Karwin's answer : EAV != 6NF !!!!!!!!! 6NF still has all the "structure" that any other "regular" DB also has, EAV is all about (see philip's answer and Bill's 'inner platform' remark) effectively removing that structure from the DB.