Is there any reason to worry about the column order in a table?

I know you can ALTER the column order in MySQL with FIRST and AFTER, but why would you want to bother? Since good queries explicitly name columns when inserting data, is there really any reason to care what order your columns are in in the table?


Solution 1:

Column order had a big performance impact on some of the databases I've tuned, spanning Sql Server, Oracle, and MySQL. This post has good rules of thumb:

  • Primary key columns first
  • Foreign key columns next.
  • Frequently searched columns next
  • Frequently updated columns later
  • Nullable columns last.
  • Least used nullable columns after more frequently used nullable columns

An example for difference in performance is an Index lookup. The database engine finds a row based on some conditions in the index, and gets back a row address. Now say you are looking for SomeValue, and it's in this table:

 SomeId int,
 SomeString varchar(100),
 SomeValue int

The engine has to guess where SomeValue starts, because SomeString has an unknown length. However, if you change the order to:

 SomeId int,
 SomeValue int,
 SomeString varchar(100)

Now the engine knows that SomeValue can be found 4 bytes after the start of the row. So column order can have a considerable performance impact.

EDIT: Sql Server 2005 stores fixed-length fields at the start of the row. And each row has a reference to the start of a varchar. This completely negates the effect I've listed above. So for recent databases, column order no longer has any impact.

Solution 2:

Update:

In MySQL, there may be a reason to do this.

Since variable datatypes (like VARCHAR) are stored with variable lengths in InnoDB, the database engine should traverse all previous columns in each row to find out the offset of the given one.

The impact may be as big as 17% for 20 columns.

See this entry in my blog for more detail:

  • Choosing column order

In Oracle, trailing NULL columns consume no space, that's why you should always put them to the end of the table.

Also in Oracle and in SQL Server, in case of a large row, a ROW CHAINING may occur.

ROW CHANING is splitting a row that doesn't fit into one block and spanning it over the multiple blocks, connected with a linked list.

Reading trailing columns that didn't fit into the first block will require traversing the linked list, which will result in an extra I/O operation.

See this page for illustration of ROW CHAINING in Oracle:

That's why you should put columns you often use to the beginning of the table, and columns you don't use often, or columns that tend to be NULL, to the end of the table.

Important note:

If you like this answer and want to vote for it, please also vote for @Andomar's answer.

He answered the same thing, but seems to be downvoted for no reason.

Solution 3:

During Oracle training at a previous job, our DBA suggested that putting all the non-nullable columns before the nullable ones was advantageous... although TBH I don't remember the details of why. Or maybe it was just the ones that were likely to get updated should go at the end? (Maybe puts off having to move the row if it expands)

In general, it shouldn't make any difference. As you say, queries should always specify columns themselves rather than relying on the ordering from "select *". I don't know of any DB that allows them to be changed... well, I didn't know MySQL allowed it until you mentioned it.

Solution 4:

Readability of the output when you have to type:

select * from <table>

in your database management software?

It's a very spurious reason, but at the moment I can't think of anything else.