DISTINCT() and ORDERBY issue
I am learning about LINQ-to-SQL and everything was going well until something strange happened:
I tried to make an example of distinct
, so, using the Northwind dabatase I wrote the following query:
var query =
from o in db.Orders
orderby o.CustomerID
select new
{
o.CustomerID
};
If I print the SQL generated by LINQ-to-SQL for the query stored in query
it looks like this:
SELECT [t0].[CustomerID]
FROM [dbo].[Orders] AS [t0]
ORDER BY [t0].[CustomerID]
So, as usual, the query brings all the CustomerID
for each Order
in the Orders
table ordered alphabetically.
But! If I use the Distinct()
method like this:
var query = (
from o in db.Orders
orderby o.CustomerID
select new
{
o.CustomerID
}).Distinct();
The query brings the expected results of the Distinct
clause, but the CustomerID
s are not ordered despite I wrote orderby o.CustomerID
!
The SQL query for this second LINQ query is the following:
SELECT DISTINCT [t0].[CustomerID]
FROM [dbo].[Orders] AS [t0]
As we can see **the ORDER BY
clause is missing. Why is that?
Why does the ORDER BY
clause disappears when I use the Distinct()
method?
Solution 1:
From the Queryable.Distinct documentation;
The expected behavior is that it returns an unordered sequence of the unique items in source.
In other words, any order the existing IQueryable has is lost when you use Distinct()
on it.
What you want is probably something more like this, an OrderBy() after the Distinct() is done;
var query = (from o in db.Orders
select new
{
o.CustomerID
}).Distinct().OrderBy(x => x.CustomerID);
Solution 2:
Try rearranging the members to place the OrderBy after the Distinct. You'll have to revert to method chaining:
db.Orders.Select(o=>o.CustomerId).Distinct().OrderBy(id=>id);
This would be the more efficient way to set up the query in Enumerable Linq anyway, because the OrderBy would then operate only on the unique items and not on all of them. Also, according to MSDN, Enumerable.Distinct does not guarantee the return order of the elements anyway, so ordering before deduping is pointless.
Solution 3:
Due to the use of distinct, the order of the returned list is not guaranteed. LinqToSql is smart enough to recognize this, therefor it ignores it.
If you place the order by AFTER your Distinct, everything will happen as you desire.
var query = (from o in db.Orders
select new
{
o.CustomerID
}).Distinct().OrderBy(o => o.CustomerID);
or
var query = db.Orders.Select(o => o.CustomerID).Distinct().OrderBy(o => o.CustomerID);
Please see this article for clarification:
http://programminglinq.com/blogs/marcorusso/archive/2008/07/20/use-of-distinct-and-orderby-in-linq.aspx