Finding a subsequence in longer sequence
Solution 1:
Similar to @dlev's but this also handles {1,1,1,2}.ContainsSubsequence({1,1,2})
public static bool ContainsSubsequence<T>(this IEnumerable<T> parent, IEnumerable<T> target)
{
var pattern = target.ToArray();
var source = new LinkedList<T>();
foreach (var element in parent)
{
source.AddLast(element);
if(source.Count == pattern.Length)
{
if(source.SequenceEqual(pattern))
return true;
source.RemoveFirst();
}
}
return false;
}
Solution 2:
This method will find a subsequence within a parent sequence, of any type that can be compared via Equals()
:
public static bool ContainsSubequence<T>(this IEnumerable<T> parent, IEnumerable<T> target)
{
bool foundOneMatch = false;
using (IEnumerator<T> parentEnum = parent.GetEnumerator())
{
using (IEnumerator<T> targetEnum = target.GetEnumerator())
{
// Get the first target instance; empty sequences are trivially contained
if (!targetEnum.MoveNext())
return true;
while (parentEnum.MoveNext())
{
if (targetEnum.Current.Equals(parentEnum.Current))
{
// Match, so move the target enum forward
foundOneMatch = true;
if (!targetEnum.MoveNext())
{
// We went through the entire target, so we have a match
return true;
}
}
else if (foundOneMatch)
{
return false;
}
}
return false;
}
}
}
You can use it like this:
bool match = new[] {1, 2, 3}.ContainsSubsequence(new[] {1, 2}); // match == true
match = new[] {1, 2, 3}.ContainsSubsequence(new[] {1, 3}); // match == false
Note that it assumes the target sequence has no null
elements.
Update: Thanks for the upvotes, everyone, but there is actually a bug in the above code! If a partial match is found, but then doesn't turn into a full match, the process is ended, rather than reset (which is obviously incorrected when applied to something like {1, 2, 1, 2, 3}.ContainsSubsequence({1, 2, 3})
).
The above code works really well for the more common definition of subsequence (i.e. contiguousness is not required) but in order to handle resetting (which most IEnumerators
do not support) the target sequence needs to be enumerated up front. That leads to the following code:
public static bool ContainsSubequence<T>(this IEnumerable<T> parent, IEnumerable<T> target)
{
bool foundOneMatch = false;
var enumeratedTarget = target.ToList();
int enumPos = 0;
using (IEnumerator<T> parentEnum = parent.GetEnumerator())
{
while (parentEnum.MoveNext())
{
if (enumeratedTarget[enumPos].Equals(parentEnum.Current))
{
// Match, so move the target enum forward
foundOneMatch = true;
if (enumPos == enumeratedTarget.Count - 1)
{
// We went through the entire target, so we have a match
return true;
}
enumPos++;
}
else if (foundOneMatch)
{
foundOneMatch = false;
enumPos = 0;
if (enumeratedTarget[enumPos].Equals(parentEnum.Current))
{
foundOneMatch = true;
enumPos++;
}
}
}
return false;
}
}
This code doesn't have any bugs, but won't work well for large (or infinite) sequences.
Solution 3:
This is quite a well studied problem and according to my research there are two algorithms that are optimal for this job, depending on your data.
Namely, they are the Knuth-Morris-Pratt algorithm and the Boyer-Moore algorithm.
Here, I submit my implementation of the KMP algorithm, originally reviewed here.
Its written to handle a source or parent sequence of indeterminate length up to Int64.MaxValue
.
As we can see, the internal implementation returns a sequence of the indices at which the sub-string or target pattern occurs. You can present these results through your choice of facade.
You could use this simply like this,
var contains = new[] { 1, 3, 2, 3, 4, 3 }.Contains(new[] { 1, 3, 2, 3 });
Here is a working fiddle that shows the code in action.
Below, follows the fully commented code for my answer.
namespace Code
{
using System;
using System.Collections.Generic;
using System.Linq;
/// <summary>
/// A generic implementation of the Knuth-Morris-Pratt algorithm that searches,
/// in a memory efficient way, over a given <see cref="IEnumerable{T}"/>.
/// </summary>
public static class KMP
{
/// <summary>
/// Determines whether a sequence contains the search string.
/// </summary>
/// <typeparam name="T">
/// The type of elements of <paramref name="source"/>
/// </typeparam>
/// <param name="source">
/// A sequence of elements
/// </param>
/// <param name="pattern">The search string.</param>
/// <param name="equalityComparer">
/// Determines whether the sequence contains a specified element.
/// If <c>null</c>
/// <see cref="EqualityComparer{T}.Default"/> will be used.
/// </param>
/// <returns>
/// <c>true</c> if the source contains the specified pattern;
/// otherwise, <c>false</c>.
/// </returns>
/// <exception cref="ArgumentNullException">pattern</exception>
public static bool Contains<T>(
this IEnumerable<T> source,
IEnumerable<T> pattern,
IEqualityComparer<T> equalityComparer = null)
{
if (pattern == null)
{
throw new ArgumentNullException(nameof(pattern));
}
equalityComparer = equalityComparer ?? EqualityComparer<T>.Default;
return SearchImplementation(source, pattern, equalityComparer).Any();
}
public static IEnumerable<long> IndicesOf<T>(
this IEnumerable<T> source,
IEnumerable<T> pattern,
IEqualityComparer<T> equalityComparer = null)
{
if (pattern == null)
{
throw new ArgumentNullException(nameof(pattern));
}
equalityComparer = equalityComparer ?? EqualityComparer<T>.Default;
return SearchImplementation(source, pattern, equalityComparer);
}
/// <summary>
/// Identifies indices of a pattern string in a given sequence.
/// </summary>
/// <typeparam name="T">
/// The type of elements of <paramref name="source"/>
/// </typeparam>
/// <param name="source">
/// The sequence to search.
/// </param>
/// <param name="patternString">
/// The string to find in the sequence.
/// </param>
/// <param name="equalityComparer">
/// Determines whether the sequence contains a specified element.
/// </param>
/// <returns>
/// A sequence of indices where the pattern can be found
/// in the source.
/// </returns>
/// <exception cref="ArgumentOutOfRangeException">
/// patternSequence - The pattern must contain 1 or more elements.
/// </exception>
private static IEnumerable<long> SearchImplementation<T>(
IEnumerable<T> source,
IEnumerable<T> patternString,
IEqualityComparer<T> equalityComparer)
{
// Pre-process the pattern
(var slide, var pattern) = GetSlide(patternString, equalityComparer);
var patternLength = pattern.Count;
if (patternLength == 0)
{
throw new ArgumentOutOfRangeException(
nameof(patternString),
"The pattern must contain 1 or more elements.");
}
var buffer = new Dictionary<long, T>(patternLength);
var more = true;
long sourceIndex = 0; // index for source
int patternIndex = 0; // index for pattern
using(var sourceEnumerator = source.GetEnumerator())
while (more)
{
more = FillBuffer(
buffer,
sourceEnumerator,
sourceIndex,
patternLength,
out T t);
if (equalityComparer.Equals(pattern[patternIndex], t))
{
patternIndex++;
sourceIndex++;
more = FillBuffer(
buffer,
sourceEnumerator,
sourceIndex,
patternLength,
out t);
}
if (patternIndex == patternLength)
{
yield return sourceIndex - patternIndex;
patternIndex = slide[patternIndex - 1];
}
else if (more && !equalityComparer.Equals(pattern[patternIndex], t))
{
if (patternIndex != 0)
{
patternIndex = slide[patternIndex - 1];
}
else
{
sourceIndex = sourceIndex + 1;
}
}
}
}
/// <summary>
/// Services the buffer and retrieves the value.
/// </summary>
/// <remarks>
/// The buffer is used so that it is not necessary to hold the
/// entire source in memory.
/// </remarks>
/// <typeparam name="T">
/// The type of elements of <paramref name="source"/>.
/// </typeparam>
/// <param name="buffer">The buffer.</param>
/// <param name="source">The source enumerator.</param>
/// <param name="sourceIndex">The element index to retrieve.</param>
/// <param name="patternLength">Length of the search string.</param>
/// <param name="value">The element value retrieved from the source.</param>
/// <returns>
/// <c>true</c> if there is potentially more data to process;
/// otherwise <c>false</c>.
/// </returns>
private static bool FillBuffer<T>(
IDictionary<long, T> buffer,
IEnumerator<T> source,
long sourceIndex,
int patternLength,
out T value)
{
bool more = true;
if (!buffer.TryGetValue(sourceIndex, out value))
{
more = source.MoveNext();
if (more)
{
value = source.Current;
buffer.Remove(sourceIndex - patternLength);
buffer.Add(sourceIndex, value);
}
}
return more;
}
/// <summary>
/// Gets the offset array which acts as a slide rule for the KMP algorithm.
/// </summary>
/// <typeparam name="T">
/// The type of elements of <paramref name="source"/>.
/// </typeparam>
/// <param name="pattern">The search string.</param>
/// <param name="equalityComparer">
/// Determines whether the sequence contains a specified element.
/// If <c>null</c>
/// <see cref="EqualityComparer{T}.Default"/> will be used.
/// </param>
/// <returns>A tuple of the offsets and the enumerated pattern.</returns>
private static (IReadOnlyList<int> Slide, IReadOnlyList<T> Pattern) GetSlide<T>(
IEnumerable<T> pattern,
IEqualityComparer<T> equalityComparer)
{
var patternList = pattern.ToList();
var slide = new int[patternList.Count];
int length = 0;
int patternIndex = 1;
while (patternIndex < patternList.Count)
{
if (equalityComparer.Equals(
patternList[patternIndex],
patternList[length]))
{
length++;
slide[patternIndex] = length;
patternIndex++;
}
else
{
if (length != 0)
{
length = slide[length - 1];
}
else
{
slide[patternIndex] = length;
patternIndex++;
}
}
}
return (slide, patternList);
}
}
}
Solution 4:
You can try something like this to get you started. Once you've converted this list to a string, you can find the sequence using the substring:
if (String.Join(",", numericList.ConvertAll<string>(x => x.ToString()).ToArray())
{
//get sequence
}
Solution 5:
This function checks whether List parent
contains List target
using some LINQ:
public static bool ContainsSequence<T>(this List<T> parent, List<T> target)
{
for (int fromElement = parent.IndexOf(target.First());
(fromElement != -1) && (fromElement <= parent.Count - target.Count);
fromElement = parent.FindIndex(fromElement + 1, p => p.Equals(target.First())))
{
var comparedSequence = parent.Skip(fromElement).Take(target.Count);
if (comparedSequence.SequenceEqual(target)) return true;
}
return false;
}