LINQ Where Ignore Accentuation and Case

What is the easiest way to filter elements with LINQ through the Where method ignoring accentuation and case?

So far, I've been able to ignore Casing by calling methods on the properties, which I dont think is a good idea because it calls the same method for every element (right?).

So here's what I got so far:

var result = from p in People
             where p.Name.ToUpper().Contains(filter.ToUpper())
             select p;

Please tell me if this is a good practice, and the easiest way to ignore accentuation.


Solution 1:

To ignore case and accents (diacritics) you can first define an extension method like this:

    public static string RemoveDiacritics(this String s)
    {
        String normalizedString = s.Normalize(NormalizationForm.FormD);
        StringBuilder stringBuilder = new StringBuilder();

        for (int i = 0; i < normalizedString.Length; i++)
        {
            Char c = normalizedString[i];
            if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
                stringBuilder.Append(c);
        }

        return stringBuilder.ToString();
    }

(Modified from Ignoring accented letters in string comparison)

Now you can run your query:

string queryText = filter.ToUpper().RemoveDiacritics();

var result = from p in People
         where p.Name.ToUpper().RemoveDiacritics() == queryText
         select p;

This is fine if you are just iterating over a collection in C#, but if you are using LINQ to SQL it is preferable to avoid non-standard methods (including extension methods) in your LINQ query. This is because your code cannot be converted into valid SQL and hence run on SQL Server with all its lovely performance optimization.

Since there doesn't seem to be a standard way of ignoring accents within LINQ to SQL, in this case I would suggest changing the field type that you want to search to be case- and accent-insensitive (CI_AI).

With your example:

ALTER TABLE People ALTER COLUMN Name [varchar](100) COLLATE SQL_Latin1_General_CP1_CI_AI

Your query should now ignore accentuation and case.

Note that you will need to temporarily remove any unique constraints on the field before running the above query, e.g.

ALTER TABLE People DROP CONSTRAINT UQ_People_Name

Now your LINQ query would simply be:

var result = from p in People
         where p.Name == filter
         select p;

See related question here.

Solution 2:

Change collate:

ALTER TABLE dbo.MyTable 
ALTER COLUMN CharCol varchar(10)**COLLATE Latin1_General_CI_AS** NOT NULL;