DISTINCT() and ORDERBY issue

I am learning about LINQ-to-SQL and everything was going well until something strange happened:

I tried to make an example of distinct, so, using the Northwind dabatase I wrote the following query:

var query = 
    from o in db.Orders
    orderby o.CustomerID
    select new
    {
        o.CustomerID
    };

If I print the SQL generated by LINQ-to-SQL for the query stored in query it looks like this:

SELECT [t0].[CustomerID]
FROM [dbo].[Orders] AS [t0]
ORDER BY [t0].[CustomerID]

So, as usual, the query brings all the CustomerID for each Order in the Orders table ordered alphabetically.

But! If I use the Distinct() method like this:

var query = (
    from o in db.Orders
    orderby o.CustomerID
    select new
    {
        o.CustomerID
    }).Distinct();

The query brings the expected results of the Distinct clause, but the CustomerIDs are not ordered despite I wrote orderby o.CustomerID!

The SQL query for this second LINQ query is the following:

SELECT DISTINCT [t0].[CustomerID]
FROM [dbo].[Orders] AS [t0]

As we can see **the ORDER BY clause is missing. Why is that?

Why does the ORDER BY clause disappears when I use the Distinct() method?


Solution 1:

From the Queryable.Distinct documentation;

The expected behavior is that it returns an unordered sequence of the unique items in source.

In other words, any order the existing IQueryable has is lost when you use Distinct() on it.

What you want is probably something more like this, an OrderBy() after the Distinct() is done;

var query = (from o in db.Orders
             select new
             {
                 o.CustomerID
             }).Distinct().OrderBy(x => x.CustomerID);

Solution 2:

Try rearranging the members to place the OrderBy after the Distinct. You'll have to revert to method chaining:

db.Orders.Select(o=>o.CustomerId).Distinct().OrderBy(id=>id);

This would be the more efficient way to set up the query in Enumerable Linq anyway, because the OrderBy would then operate only on the unique items and not on all of them. Also, according to MSDN, Enumerable.Distinct does not guarantee the return order of the elements anyway, so ordering before deduping is pointless.

Solution 3:

Due to the use of distinct, the order of the returned list is not guaranteed. LinqToSql is smart enough to recognize this, therefor it ignores it.

If you place the order by AFTER your Distinct, everything will happen as you desire.

var query = (from o in db.Orders
             select new
             {
                 o.CustomerID
             }).Distinct().OrderBy(o => o.CustomerID);

or

var query = db.Orders.Select(o => o.CustomerID).Distinct().OrderBy(o => o.CustomerID);

Please see this article for clarification:

http://programminglinq.com/blogs/marcorusso/archive/2008/07/20/use-of-distinct-and-orderby-in-linq.aspx