Finding all positions of substring in a larger string in C#
Solution 1:
Here's an example extension method for it:
public static List<int> AllIndexesOf(this string str, string value) {
if (String.IsNullOrEmpty(value))
throw new ArgumentException("the string to find may not be empty", "value");
List<int> indexes = new List<int>();
for (int index = 0;; index += value.Length) {
index = str.IndexOf(value, index);
if (index == -1)
return indexes;
indexes.Add(index);
}
}
If you put this into a static class and import the namespace with using
, it appears as a method on any string, and you can just do:
List<int> indexes = "fooStringfooBar".AllIndexesOf("foo");
For more information on extension methods, http://msdn.microsoft.com/en-us/library/bb383977.aspx
Also the same using an iterator:
public static IEnumerable<int> AllIndexesOf(this string str, string value) {
if (String.IsNullOrEmpty(value))
throw new ArgumentException("the string to find may not be empty", "value");
for (int index = 0;; index += value.Length) {
index = str.IndexOf(value, index);
if (index == -1)
break;
yield return index;
}
}
Solution 2:
Why don't you use the built in RegEx class:
public static IEnumerable<int> GetAllIndexes(this string source, string matchString)
{
matchString = Regex.Escape(matchString);
foreach (Match match in Regex.Matches(source, matchString))
{
yield return match.Index;
}
}
If you do need to reuse the expression then compile it and cache it somewhere. Change the matchString param to a Regex matchExpression in another overload for the reuse case.
Solution 3:
using LINQ
public static IEnumerable<int> IndexOfAll(this string sourceString, string subString)
{
return Regex.Matches(sourceString, subString).Cast<Match>().Select(m => m.Index);
}
Solution 4:
Polished version + case ignoring support:
public static int[] AllIndexesOf(string str, string substr, bool ignoreCase = false)
{
if (string.IsNullOrWhiteSpace(str) ||
string.IsNullOrWhiteSpace(substr))
{
throw new ArgumentException("String or substring is not specified.");
}
var indexes = new List<int>();
int index = 0;
while ((index = str.IndexOf(substr, index, ignoreCase ? StringComparison.OrdinalIgnoreCase : StringComparison.Ordinal)) != -1)
{
indexes.Add(index++);
}
return indexes.ToArray();
}
Solution 5:
It could be done in efficient time complexity using KMP algorithm in O(N + M) where N is the length of text
and M is the length of the pattern
.
This is the implementation and usage:
static class StringExtensions
{
public static IEnumerable<int> AllIndicesOf(this string text, string pattern)
{
if (string.IsNullOrEmpty(pattern))
{
throw new ArgumentNullException(nameof(pattern));
}
return Kmp(text, pattern);
}
private static IEnumerable<int> Kmp(string text, string pattern)
{
int M = pattern.Length;
int N = text.Length;
int[] lps = LongestPrefixSuffix(pattern);
int i = 0, j = 0;
while (i < N)
{
if (pattern[j] == text[i])
{
j++;
i++;
}
if (j == M)
{
yield return i - j;
j = lps[j - 1];
}
else if (i < N && pattern[j] != text[i])
{
if (j != 0)
{
j = lps[j - 1];
}
else
{
i++;
}
}
}
}
private static int[] LongestPrefixSuffix(string pattern)
{
int[] lps = new int[pattern.Length];
int length = 0;
int i = 1;
while (i < pattern.Length)
{
if (pattern[i] == pattern[length])
{
length++;
lps[i] = length;
i++;
}
else
{
if (length != 0)
{
length = lps[length - 1];
}
else
{
lps[i] = length;
i++;
}
}
}
return lps;
}
and this is an example of how to use it:
static void Main(string[] args)
{
string text = "this is a test";
string pattern = "is";
foreach (var index in text.AllIndicesOf(pattern))
{
Console.WriteLine(index); // 2 5
}
}