byte[] array pattern search

Solution 1:

May I suggest something that doesn't involve creating strings, copying arrays or unsafe code:

using System;
using System.Collections.Generic;

static class ByteArrayRocks
{    
    static readonly int[] Empty = new int[0];

    public static int[] Locate (this byte[] self, byte[] candidate)
    {
        if (IsEmptyLocate(self, candidate))
            return Empty;

        var list = new List<int>();

        for (int i = 0; i < self.Length; i++)
        {
            if (!IsMatch(self, i, candidate))
                continue;

            list.Add(i);
        }

        return list.Count == 0 ? Empty : list.ToArray();
    }

    static bool IsMatch (byte[] array, int position, byte[] candidate)
    {
        if (candidate.Length > (array.Length - position))
            return false;

        for (int i = 0; i < candidate.Length; i++)
            if (array[position + i] != candidate[i])
                return false;

        return true;
    }

    static bool IsEmptyLocate (byte[] array, byte[] candidate)
    {
        return array == null
            || candidate == null
            || array.Length == 0
            || candidate.Length == 0
            || candidate.Length > array.Length;
    }

    static void Main()
    {
        var data = new byte[] { 23, 36, 43, 76, 125, 56, 34, 234, 12, 3, 5, 76, 8, 0, 6, 125, 234, 56, 211, 122, 22, 4, 7, 89, 76, 64, 12, 3, 5, 76, 8, 0, 6, 125 };
        var pattern = new byte[] { 12, 3, 5, 76, 8, 0, 6, 125 };

        foreach (var position in data.Locate(pattern))
            Console.WriteLine(position);
    }
}

Edit (by IAbstract) - moving contents of post here since it is not an answer

Out of curiosity, I've created a small benchmark with different answers.

Here are the results for a million iterations:

solution [Locate]:            00:00:00.7714027
solution [FindAll]:           00:00:03.5404399
solution [SearchBytePattern]: 00:00:01.1105190
solution [MatchBytePattern]:  00:00:03.0658212

Solution 2:

Use LINQ Methods.

public static IEnumerable<int> PatternAt(byte[] source, byte[] pattern)
{
    for (int i = 0; i < source.Length; i++)
    {
        if (source.Skip(i).Take(pattern.Length).SequenceEqual(pattern))
        {
            yield return i;
        }
    }
}

Very simple!

Solution 3:

This is my propossal, more simple and faster:

int Search(byte[] src, byte[] pattern)
{
    int maxFirstCharSlot = src.Length - pattern.Length + 1;
    for (int i = 0; i < maxFirstCharSlot; i++)
    {
        if (src[i] != pattern[0]) // compare only first byte
            continue;
        
        // found a match on first byte, now try to match rest of the pattern
        for (int j = pattern.Length - 1; j >= 1; j--) 
        {
           if (src[i + j] != pattern[j]) break;
           if (j == 1) return i;
        }
    }
    return -1;
}

The logic behind this code is this: in first place it search ONLY THE FIRST BYTE (this is the key improvement) and when is found this first byte, i try to match the rest of pattern

Solution 4:

Originally I posted some old code I used but was curious about Jb Evain's benchmarks. I found that my solution was stupid slow. It appears that bruno conde's SearchBytePattern is the fastest. I could not figure why especially since he uses an Array.Copy and an Extension method. But there is proof in Jb's tests, so kudos to bruno.

I simplified the bits even further, so hopefully this will be the clearest and simplest solution. (All the hard work done by bruno conde) The enhancements are:

  • Buffer.BlockCopy
  • Array.IndexOf<byte>
  • while loop instead of a for loop
  • start index parameter
  • converted to extension method

    public static List<int> IndexOfSequence(this byte[] buffer, byte[] pattern, int startIndex)    
    {
       List<int> positions = new List<int>();
       int i = Array.IndexOf<byte>(buffer, pattern[0], startIndex);  
       while (i >= 0 && i <= buffer.Length - pattern.Length)  
       {
          byte[] segment = new byte[pattern.Length];
          Buffer.BlockCopy(buffer, i, segment, 0, pattern.Length);    
          if (segment.SequenceEqual<byte>(pattern))
               positions.Add(i);
          i = Array.IndexOf<byte>(buffer, pattern[0], i + 1);
       }
       return positions;    
    }
    

Note that, the last statement in the while block should be i = Array.IndexOf<byte>(buffer, pattern[0], i + 1); instead of i = Array.IndexOf<byte>(buffer, pattern[0], i + pattern.Length);. Look at the comment by Johan. A simple test could prove that:

byte[] pattern = new byte[] {1, 2};
byte[] toBeSearched = new byte[] { 1, 1, 2, 1, 12 };

With i = Array.IndexOf<byte>(buffer, pattern[0], i + pattern.Length);, nothing returned. i = Array.IndexOf<byte>(buffer, pattern[0], i + 1); returns the correct result.