String sorting issue in C#

I have List like this

    List<string> items = new List<string>();
    items.Add("-");
    items.Add(".");
    items.Add("a-");
    items.Add("a.");
    items.Add("a-a");
    items.Add("a.a");

    items.Sort();

    string output = string.Empty;
    foreach (string s in items)
    {
        output += s + Environment.NewLine;
    }

MessageBox.Show(output);

The output is coming back as

-
.
a-
a.
a.a
a-a

where as I am expecting the results as

-
.
a-
a.
a-a
a.a

Any idea why "a-a" is not coming before "a.a" where as "a-" comes before "a."


Solution 1:

I suspect that in the last case "-" is treated in a different way due to culture-specific settings (perhaps as a "dash" as opposed to "minus" in the first strings). MSDN warns about this:

The comparison uses the current culture to obtain culture-specific information such as casing rules and the alphabetic order of individual characters. For example, a culture could specify that certain combinations of characters be treated as a single character, or uppercase and lowercase characters be compared in a particular way, or that the sorting order of a character depends on the characters that precede or follow it.

Also see in this MSDN page:

The .NET Framework uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them; for example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases; therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string.

So, hyphen gets a special treatment in the default sort mode in order to make the word sort more "natural".

You can get "normal" ordinal sort if you specifically turn it on:

     Console.WriteLine(string.Compare("a.", "a-"));                  //1
     Console.WriteLine(string.Compare("a.a", "a-a"));                //-1

     Console.WriteLine(string.Compare("a.", "a-", StringComparison.Ordinal));    //1
     Console.WriteLine(string.Compare("a.a", "a-a", StringComparison.Ordinal));  //1

To sort the original collection using ordinal comparison use:

     items.Sort(StringComparer.Ordinal);

Solution 2:

If you want your string sort to be based on the actual byte value as opposed to the rules defined by the current culture you can sort by Ordinal:

items.Sort(StringComparer.Ordinal);

This will make the results consistent across all cultures (but it will produce unintuitive sortings of "14" coming before "9" which may or may not be what you're looking for).

Solution 3:

The Sort method of the List<> class relies on the default string comparer of the .NET Framework, which is actually an instance of the current CultureInfo of the Thread.

The CultureInfo specifies the alphabetical order of characters and it seems that the default one is using an order different order to what you would expect.

When sorting you can specify a specific CultureInfo, one that you know will match your sorting requirements, sample (german culture):

var sortCulture = new CultureInfo("de-DE");
items.Sort(sortCulture);

More info can be found here:
http://msdn.microsoft.com/en-us/library/b0zbh7b6.aspx
http://msdn.microsoft.com/de-de/library/system.stringcomparer.aspx