Linq to Entities, random order

A simple way of doing this is to order by Guid.NewGuid() but then the ordering happens on the client side. You may be able to persuade EF to do something random on the server side, but that's not necessarily simple - and doing it using "order by random number" is apparently broken.

To make the ordering happen on the .NET side instead of in EF, you need AsEnumerable:

IEnumerable<MyEntity> results = context.MyEntity
                                       .Where(en => en.type == myTypeVar)
                                       .AsEnumerable()
                                       .OrderBy(en => context.Random());

It would be better to get the unordered version in a list and then shuffle that though.

Random rnd = ...; // Assume a suitable Random instance
List<MyEntity> results = context.MyEntity
                                .Where(en => en.type == myTypeVar)
                                .ToList();

results.Shuffle(rnd); // Assuming an extension method on List<T>

Shuffling is more efficient than sorting, aside from anything else. See my article on randomness for details about acquiring an appropriate Random instance though. There are lots of Fisher-Yates shuffle implementations available on Stack Overflow.


Jon's answer is helpful, but actually you can have the DB do the ordering using Guid and Linq to Entities (at least, you can in EF4):

from e in MyEntities
orderby Guid.NewGuid()
select e

This generates SQL resembling:

SELECT
[Project1].[Id] AS [Id], 
[Project1].[Column1] AS [Column1]
FROM ( SELECT 
    NEWID() AS [C1],                     -- Guid created here
    [Extent1].[Id] AS [Id], 
    [Extent1].[Column1] AS [Column1],
    FROM [dbo].[MyEntities] AS [Extent1]
)  AS [Project1]
ORDER BY [Project1].[C1] ASC             -- Used for sorting here

In my testing, using Take(10) on the resulting query (converts to TOP 10 in SQL), the query ran consistently between 0.42 and 0.46 sec against a table with 1,794,785 rows. No idea whether SQL Server does any kind of optimisation on this or whether it generated a GUID for every row in that table. Either way, that would be considerably faster than bringing all those rows into my process and trying to sort them there.


The simple solution would be creating an array (or a List<T>) and than randomize its indexes.

EDIT:

static IEnumerable<T> Randomize<T>(this IEnumerable<T> source) {
  var array = source.ToArray();
  // randomize indexes (several approaches are possible)
  return array;
}

EDIT: Personally, I find the answer of Jon Skeet is more elegant:

var results = from ... in ... where ... orderby Guid.NewGuid() select ...

And sure, you can take a random number generator instead of Guid.NewGuid().


The NewGuid hack for sorting it server side unfortunately causes entities to get duplicated in case of joins (or eager fetching includes).

See this question about this issue.

To overcome this issue, you may use instead of NewGuid a sql checksum on some unique value computed server side, with a random seed computed once client side to randomize it. See my answer on previously linked question.