C# Sort and OrderBy comparison
I can sort a list using Sort or OrderBy. Which one is faster? Are both working on same algorithm?
List<Person> persons = new List<Person>();
persons.Add(new Person("P005", "Janson"));
persons.Add(new Person("P002", "Aravind"));
persons.Add(new Person("P007", "Kazhal"));
1.
persons.Sort((p1,p2)=>string.Compare(p1.Name,p2.Name,true));
2.
var query = persons.OrderBy(n => n.Name, new NameComparer());
class NameComparer : IComparer<string>
{
public int Compare(string x,string y)
{
return string.Compare(x, y, true);
}
}
No, they aren't the same algorithm. For starters, the LINQ OrderBy
is documented as stable (i.e. if two items have the same Name
, they'll appear in their original order).
It also depends on whether you buffer the query vs iterate it several times (LINQ-to-Objects, unless you buffer the result, will re-order per foreach
).
For the OrderBy
query, I would also be tempted to use:
OrderBy(n => n.Name, StringComparer.{yourchoice}IgnoreCase);
(for {yourchoice}
one of CurrentCulture
, Ordinal
or InvariantCulture
).
List<T>.Sort
This method uses Array.Sort, which uses the QuickSort algorithm. This implementation performs an unstable sort; that is, if two elements are equal, their order might not be preserved. In contrast, a stable sort preserves the order of elements that are equal.
Enumerable.OrderBy
This method performs a stable sort; that is, if the keys of two elements are equal, the order of the elements is preserved. In contrast, an unstable sort does not preserve the order of elements that have the same key. sort; that is, if two elements are equal, their order might not be preserved. In contrast, a stable sort preserves the order of elements that are equal.
Why not measure it:
class Program
{
class NameComparer : IComparer<string>
{
public int Compare(string x, string y)
{
return string.Compare(x, y, true);
}
}
class Person
{
public Person(string id, string name)
{
Id = id;
Name = name;
}
public string Id { get; set; }
public string Name { get; set; }
}
static void Main()
{
List<Person> persons = new List<Person>();
persons.Add(new Person("P005", "Janson"));
persons.Add(new Person("P002", "Aravind"));
persons.Add(new Person("P007", "Kazhal"));
Sort(persons);
OrderBy(persons);
const int COUNT = 1000000;
Stopwatch watch = Stopwatch.StartNew();
for (int i = 0; i < COUNT; i++)
{
Sort(persons);
}
watch.Stop();
Console.WriteLine("Sort: {0}ms", watch.ElapsedMilliseconds);
watch = Stopwatch.StartNew();
for (int i = 0; i < COUNT; i++)
{
OrderBy(persons);
}
watch.Stop();
Console.WriteLine("OrderBy: {0}ms", watch.ElapsedMilliseconds);
}
static void Sort(List<Person> list)
{
list.Sort((p1, p2) => string.Compare(p1.Name, p2.Name, true));
}
static void OrderBy(List<Person> list)
{
var result = list.OrderBy(n => n.Name, new NameComparer()).ToArray();
}
}
On my computer when compiled in Release mode this program prints:
Sort: 1162ms
OrderBy: 1269ms
UPDATE:
As suggested by @Stefan here are the results of sorting a big list fewer times:
List<Person> persons = new List<Person>();
for (int i = 0; i < 100000; i++)
{
persons.Add(new Person("P" + i.ToString(), "Janson" + i.ToString()));
}
Sort(persons);
OrderBy(persons);
const int COUNT = 30;
Stopwatch watch = Stopwatch.StartNew();
for (int i = 0; i < COUNT; i++)
{
Sort(persons);
}
watch.Stop();
Console.WriteLine("Sort: {0}ms", watch.ElapsedMilliseconds);
watch = Stopwatch.StartNew();
for (int i = 0; i < COUNT; i++)
{
OrderBy(persons);
}
watch.Stop();
Console.WriteLine("OrderBy: {0}ms", watch.ElapsedMilliseconds);
Prints:
Sort: 8965ms
OrderBy: 8460ms
In this scenario it looks like OrderBy performs better.
UPDATE2:
And using random names:
List<Person> persons = new List<Person>();
for (int i = 0; i < 100000; i++)
{
persons.Add(new Person("P" + i.ToString(), RandomString(5, true)));
}
Where:
private static Random randomSeed = new Random();
public static string RandomString(int size, bool lowerCase)
{
var sb = new StringBuilder(size);
int start = (lowerCase) ? 97 : 65;
for (int i = 0; i < size; i++)
{
sb.Append((char)(26 * randomSeed.NextDouble() + start));
}
return sb.ToString();
}
Yields:
Sort: 8968ms
OrderBy: 8728ms
Still OrderBy is faster
Darin Dimitrov's answer shows that OrderBy
is slightly faster than List.Sort
when faced with already-sorted input. I modified his code so it repeatedly sorts the unsorted data, and OrderBy
is in most cases slightly slower.
Furthermore, the OrderBy
test uses ToArray
to force enumeration of the Linq enumerator, but that obviously returns a type (Person[]
) which is different from the input type (List<Person>
). I therefore re-ran the test using ToList
rather than ToArray
and got an even bigger difference:
Sort: 25175ms
OrderBy: 30259ms
OrderByWithToList: 31458ms
The code:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
class Program
{
class NameComparer : IComparer<string>
{
public int Compare(string x, string y)
{
return string.Compare(x, y, true);
}
}
class Person
{
public Person(string id, string name)
{
Id = id;
Name = name;
}
public string Id { get; set; }
public string Name { get; set; }
public override string ToString()
{
return Id + ": " + Name;
}
}
private static Random randomSeed = new Random();
public static string RandomString(int size, bool lowerCase)
{
var sb = new StringBuilder(size);
int start = (lowerCase) ? 97 : 65;
for (int i = 0; i < size; i++)
{
sb.Append((char)(26 * randomSeed.NextDouble() + start));
}
return sb.ToString();
}
private class PersonList : List<Person>
{
public PersonList(IEnumerable<Person> persons)
: base(persons)
{
}
public PersonList()
{
}
public override string ToString()
{
var names = Math.Min(Count, 5);
var builder = new StringBuilder();
for (var i = 0; i < names; i++)
builder.Append(this[i]).Append(", ");
return builder.ToString();
}
}
static void Main()
{
var persons = new PersonList();
for (int i = 0; i < 100000; i++)
{
persons.Add(new Person("P" + i.ToString(), RandomString(5, true)));
}
var unsortedPersons = new PersonList(persons);
const int COUNT = 30;
Stopwatch watch = new Stopwatch();
for (int i = 0; i < COUNT; i++)
{
watch.Start();
Sort(persons);
watch.Stop();
persons.Clear();
persons.AddRange(unsortedPersons);
}
Console.WriteLine("Sort: {0}ms", watch.ElapsedMilliseconds);
watch = new Stopwatch();
for (int i = 0; i < COUNT; i++)
{
watch.Start();
OrderBy(persons);
watch.Stop();
persons.Clear();
persons.AddRange(unsortedPersons);
}
Console.WriteLine("OrderBy: {0}ms", watch.ElapsedMilliseconds);
watch = new Stopwatch();
for (int i = 0; i < COUNT; i++)
{
watch.Start();
OrderByWithToList(persons);
watch.Stop();
persons.Clear();
persons.AddRange(unsortedPersons);
}
Console.WriteLine("OrderByWithToList: {0}ms", watch.ElapsedMilliseconds);
}
static void Sort(List<Person> list)
{
list.Sort((p1, p2) => string.Compare(p1.Name, p2.Name, true));
}
static void OrderBy(List<Person> list)
{
var result = list.OrderBy(n => n.Name, new NameComparer()).ToArray();
}
static void OrderByWithToList(List<Person> list)
{
var result = list.OrderBy(n => n.Name, new NameComparer()).ToList();
}
}
I think it's important to note another difference between Sort
and OrderBy
:
Suppose there exists a Person.CalculateSalary()
method, which takes a lot of time; possibly more than even the operation of sorting a large list.
Compare
// Option 1
persons.Sort((p1, p2) => Compare(p1.CalculateSalary(), p2.CalculateSalary()));
// Option 2
var query = persons.OrderBy(p => p.CalculateSalary());
Option 2 may have superior performance, because it only calls the CalculateSalary
method n times, whereas the Sort
option might call CalculateSalary
up to 2n log(n) times, depending on the sort algorithm's success.