How do you determine how far to normalize a database?
When creating a database structure, what are good guidelines to follow or good ways to determine how far a database should be normalized? Should you create an un-normalized database and split it apart as the project progresses? Should you create it fully normalized and combine tables as needed for performance?
You want to start designing a normalized database up to 3rd normal form. As you develop the business logic layer you may decide you have to denormalize a bit but never, never go below the 3rd form. Always, keep 1st and 2nd form compliant. You want to denormalize for simplicity of code, not for performance. Use indexes and stored procedures for that :)
The reason not "normalize as you go" is that you would have to modify the code you already have written most every time you modify the database design.
There are a couple of good articles:
http://www.agiledata.org/essays/dataNormalization.html
@GrizzlyGuru A wise man once told me "normalize till it hurts, denormalize till it works".
It hasn't failed me yet :)
I disagree about starting with it in un-normalized form however, in my experience its' been easier to adapt your application to deal with a less normalized database than a more-normalized one. It could also lead to situations where its' working "well enough" so you never get around to normalizing it (until its' too late!)
Normalization means eliminating redundant data. In other words, an un-normalized or de-normalized database is a database where the same information will be repeated in multiple different places. This means you have to write more complex update statement to ensure you update the same data everywhere, otherwise you get inconsistent data which in turn means the output of queries is unrealiable.
This is a pretty huge problem, so I would say denormalization hurts, not the other way around.
In some case you may deliberately decide to denormalize specific parts of a database, if you judge that the benefit outweighs the extra work in updating data and the risk of data corruption. For example with datawarehouses, where data is aggregated for performance reasons, and data if often not updated after the initial entry which reduce the risk of inconsistencies.
But in general be weary of denormalizing for performance. For example the performance benefit of a denormalized join can typically be achieved by using materialized view (also called indexed view), which will be as fast as querying a denormalized table, but still protects the consistency of the data.