How do I ignore the UTF-8 Byte Order Marker in String comparisons?

Solution 1:

Well, I assume it's because the raw binary data includes the BOM. You could always remove the BOM yourself after decoding, if you don't want it - but you should consider whether the byte array should consider the BOM to start with.

EDIT: Alternatively, you could use a StreamReader to perform the decoding. Here's an example, showing the same byte array being converted into two characters using Encoding.GetString or one character via a StreamReader:

using System;
using System.IO;
using System.Text;

class Test
{
    static void Main()
    {
        byte[] withBom = { 0xef, 0xbb, 0xbf, 0x41 };
        string viaEncoding = Encoding.UTF8.GetString(withBom);
        Console.WriteLine(viaEncoding.Length);

        string viaStreamReader;
        using (StreamReader reader = new StreamReader
               (new MemoryStream(withBom), Encoding.UTF8))
        {
            viaStreamReader = reader.ReadToEnd();           
        }
        Console.WriteLine(viaStreamReader.Length);
    }
}

Solution 2:

There is a slightly more efficient way to do it than creating StreamReader and MemoryStream:

1) If you know that there is always a BOM

string viaEncoding = Encoding.UTF8.GetString(withBom, 3, withBom.Length - 3);

2) If you don't know, check:

string viaEncoding;
if (withBom.Length >= 3 && withBom[0] == 0xEF && withBom[1] == 0xBB && withBom[2] == 0xBF)
    viaEncoding = Encoding.UTF8.GetString(withBom, 3, withBom.Length - 3);
else
    viaEncoding = Encoding.UTF8.GetString(withBom);