How many columns is too many columns? [closed]

I've noticed that a lot of folks here cite tables with 20+ (I've seen as much as 55) columns in one table. Now I don't pretend to be a database design expert, but I've always heard that this is a horrible practice. When I see this, I usually suggest splitting into two tables with a one to one relationship: one containing the most frequently used data, the other with the least often used data. Though at the same time, there's the possible issue of performance (less JOINs and such). So my question is this:

When it comes to really LARGE scale databases, is there actually an advantage to having a large amount of columns, despite the fact that this usually leads to many NULL values?

Which is more of a performance hit: lots of columns with lots of NULLs, or fewer columns with lots of JOINs?


Solution 1:

The design of the table depends on the entity it needs to store. If all the data belongs together, then 50 columns (or even 100) might be the correct thing to do.

So long as the table is normalized, there is no rule of thumb regarding size, apart from database capabilities and the need to optimize.

Solution 2:

I agree with Oded. I have seen tables with 500 columns in them, and all the columns in them were in the correct place. Just consider the number of facts one might wish to store about an everyday object, and you'll soon see why.

If it proves inconvenient to select all those columns, or to specify which columns to select when you are only interested in a small proportion of them, you may find it worthwhile to define a view.

Solution 3:

How many columns is too many columns?

When you feel it no longer makes sense or is right to add another column.

Generally depends on application.

Solution 4:

Having too many columns results in a lot nulls (evil) and an unwieldy object the table is mapped to. This hurts readability in the IDE and hinders maintenance (increasing development costs). If you need fast reads in some cases use denormalized tables e.g. used solely for reporting or queries (search for the "CQRS" pattern). Yes "Person" has a million attributes, but you can break down these monothilic tables (design preceeds normalization) to match smaller entities ("address," "phone," "hobby") instead of adding new columns for each new use case. Having smaller sized objects (and tables) brings so many advantages; they enable things like unit testing, OOP, and SOLID practices.

Also, as it regards to bunching numerous columns to avoid joins, I think the performance gain from avoiding joins is lost through index maintenance, assuming a typical workload of both reads and writes. Adding indexes on fields for sake of read performance could be indicative of a need to move those fields into their own table.

Solution 5:

odbc has a character limit of 8000 .... so that is a physical limit beyond which things get highly frustrating.

I worked on a table that had 138 columns .. it was horribly written and could have been normalised. Although this database seem to of been the creation of someone wondering why there are conventions in database design and deciding to test them all at once.

Having very wide flattened tables is fairly common when you get into data warehousing and reporting servers. They are just a lot faster and mean that you don't have to store your database entirley in ram for performance.