Preserving ORDER BY in SELECT INTO

Solution 1:

I know this is a bit old, but I needed to do something similar. I wanted to insert the contents of one table into another, but in a random order. I found that I could do this by using select top n and order by newid(). Without the 'top n', order was not preserved and the second table had rows in the same order as the first. However, with 'top n', the order (random in my case) was preserved. I used a value of 'n' that was greater than the number of rows. So my query was along the lines of:

insert Table2 (T2Col1, T2Col2)
  select top 10000 T1Col1, T1Col2
  from Table1
  order by newid()

Solution 2:

What for?

Point is – data in a table is not ordered. In SQL Server the intrinsic storage order of a table is that of the (if defined) clustered index.

The order in which data is inserted is basically "irrelevant". It is forgotten the moment the data is written into the table.

As such, nothing is gained, even if you get this stuff. If you need an order when dealing with data, you HAVE To put an order by clause on the select that gets it. Anything else is random - i.e. the order you et data is not determined and may change.

So it makes no sense to have a specific order on the insert as you try to achieve.

SQL 101: sets have no order.

Solution 3:

I've found a specific scenario where we want the new table to be created with a specific order in the columns' content:

Amount of rows is very big (from 200 to 2000 millions of rows), so we are using SELECT INTO instead of CREATE TABLE + INSERT because needs to be loaded as fast as possible (minimal logging). We have tested using the trace flag 610 for loading an already created empty table with a clustered index but still takes longer than the following approach.
We need the data to be ordered by specific columns for query performances, so we are creating a CLUSTERED INDEX just after the table is loaded. We discarded creating a non-clustered index because it would need another read for the data that's not included in the ordered columns from the index, and we discarded creating a full-covering non-clustered index because it would practically double the amount of space needed to hold the table.

It happens that if you manage to somehow create the table with columns already "ordered", creating the clustered index (with the same order) takes a lot less time than when the data isn't ordered. And sometimes (you will have to test your case), ordering the rows in the SELECT INTO is faster than loading without order and creating the clustered index later.

The problem is that SQL Server 2012+ will ignore the ORDER BY column list when doing INSERT INTO or when doing SELECT INTO. It will consider the ORDER BY columns if you specify an IDENTITY column on the SELECT INTO or if the inserted table has an IDENTITY column, but just to determine the identity values and not the actual storage order in the underlying table. In this case, it's likely that the sort will happen but not guaranteed as it's highly dependent on the execution plan.

A trick we have found is that doing a SELECT INTO with the result of a UNION ALL makes the engine perform a SORT (not always an explicit SORT operator, sometimes a MERGE JOIN CONCATENATION, etc.) if you have an ORDER BY list. This way the select into already creates the new table in the order we are going to create the clustered index later and thus the index takes less time to create.

So you can rewrite this query:

SELECT
    FirstColumn = T.FirstColumn,
    SecondColumn = T.SecondColumn
INTO
    #NewTable
FROM
    VeryBigTable AS T
ORDER BY            -- ORDER BY is ignored!
    FirstColumn,
    SecondColumn

SELECT
    FirstColumn = T.FirstColumn,
    SecondColumn = T.SecondColumn
INTO
    #NewTable
FROM
    VeryBigTable AS T

UNION ALL

-- A "fake" row to be deleted
SELECT
    FirstColumn = 0,
    SecondColumn = 0

ORDER BY
    FirstColumn,
    SecondColumn

We have used this trick a few times, but I can't guarantee it will always sort. I'm just posting this as a possible workaround in case someone has a similar scenario.

Solution 4:

Just add top to your sql with a number that is greater than the actual number of rows:

SELECT top 25000 * 
into spx_copy
  from SPX
  order by date

Solution 5:

You cannot do this with ORDER BY but if you create a Clustered Index on VibeFGEvents.id after your SELECT INTO the table will be sorted on disk by VibeFGEvents.id.