byte[] array pattern search
Solution 1:
May I suggest something that doesn't involve creating strings, copying arrays or unsafe code:
using System;
using System.Collections.Generic;
static class ByteArrayRocks
{
static readonly int[] Empty = new int[0];
public static int[] Locate (this byte[] self, byte[] candidate)
{
if (IsEmptyLocate(self, candidate))
return Empty;
var list = new List<int>();
for (int i = 0; i < self.Length; i++)
{
if (!IsMatch(self, i, candidate))
continue;
list.Add(i);
}
return list.Count == 0 ? Empty : list.ToArray();
}
static bool IsMatch (byte[] array, int position, byte[] candidate)
{
if (candidate.Length > (array.Length - position))
return false;
for (int i = 0; i < candidate.Length; i++)
if (array[position + i] != candidate[i])
return false;
return true;
}
static bool IsEmptyLocate (byte[] array, byte[] candidate)
{
return array == null
|| candidate == null
|| array.Length == 0
|| candidate.Length == 0
|| candidate.Length > array.Length;
}
static void Main()
{
var data = new byte[] { 23, 36, 43, 76, 125, 56, 34, 234, 12, 3, 5, 76, 8, 0, 6, 125, 234, 56, 211, 122, 22, 4, 7, 89, 76, 64, 12, 3, 5, 76, 8, 0, 6, 125 };
var pattern = new byte[] { 12, 3, 5, 76, 8, 0, 6, 125 };
foreach (var position in data.Locate(pattern))
Console.WriteLine(position);
}
}
Edit (by IAbstract) - moving contents of post here since it is not an answer
Out of curiosity, I've created a small benchmark with different answers.
Here are the results for a million iterations:
solution [Locate]: 00:00:00.7714027
solution [FindAll]: 00:00:03.5404399
solution [SearchBytePattern]: 00:00:01.1105190
solution [MatchBytePattern]: 00:00:03.0658212
Solution 2:
Use LINQ Methods.
public static IEnumerable<int> PatternAt(byte[] source, byte[] pattern)
{
for (int i = 0; i < source.Length; i++)
{
if (source.Skip(i).Take(pattern.Length).SequenceEqual(pattern))
{
yield return i;
}
}
}
Very simple!
Solution 3:
This is my propossal, more simple and faster:
int Search(byte[] src, byte[] pattern)
{
int maxFirstCharSlot = src.Length - pattern.Length + 1;
for (int i = 0; i < maxFirstCharSlot; i++)
{
if (src[i] != pattern[0]) // compare only first byte
continue;
// found a match on first byte, now try to match rest of the pattern
for (int j = pattern.Length - 1; j >= 1; j--)
{
if (src[i + j] != pattern[j]) break;
if (j == 1) return i;
}
}
return -1;
}
The logic behind this code is this: in first place it search ONLY THE FIRST BYTE (this is the key improvement) and when is found this first byte, i try to match the rest of pattern
Solution 4:
Originally I posted some old code I used but was curious about Jb Evain's benchmarks. I found that my solution was stupid slow. It appears that bruno conde's SearchBytePattern is the fastest. I could not figure why especially since he uses an Array.Copy and an Extension method. But there is proof in Jb's tests, so kudos to bruno.
I simplified the bits even further, so hopefully this will be the clearest and simplest solution. (All the hard work done by bruno conde) The enhancements are:
- Buffer.BlockCopy
- Array.IndexOf<byte>
- while loop instead of a for loop
- start index parameter
-
converted to extension method
public static List<int> IndexOfSequence(this byte[] buffer, byte[] pattern, int startIndex) { List<int> positions = new List<int>(); int i = Array.IndexOf<byte>(buffer, pattern[0], startIndex); while (i >= 0 && i <= buffer.Length - pattern.Length) { byte[] segment = new byte[pattern.Length]; Buffer.BlockCopy(buffer, i, segment, 0, pattern.Length); if (segment.SequenceEqual<byte>(pattern)) positions.Add(i); i = Array.IndexOf<byte>(buffer, pattern[0], i + 1); } return positions; }
Note that, the last statement in the while
block should be i = Array.IndexOf<byte>(buffer, pattern[0], i + 1);
instead of i = Array.IndexOf<byte>(buffer, pattern[0], i + pattern.Length);
. Look at the comment by Johan. A simple test could prove that:
byte[] pattern = new byte[] {1, 2};
byte[] toBeSearched = new byte[] { 1, 1, 2, 1, 12 };
With i = Array.IndexOf<byte>(buffer, pattern[0], i + pattern.Length);
, nothing returned. i = Array.IndexOf<byte>(buffer, pattern[0], i + 1);
returns the correct result.