LINQ query — Data aggregation (Group Adjacent)
Let's take a class called Cls
:
public class Cls
{
public int SequenceNumber { get; set; }
public int Value { get; set; }
}
Now, let's populate some collection with following elements:
Sequence Number Value ======== ===== 1 9 2 9 3 15 4 15 5 15 6 30 7 9
What I need to do, is to enumerate over Sequence Numbers and check if the next element has the same value. If yes, values are aggregated and so, desired output is as following:
Sequence Sequence Number Number From To Value ======== ======== ===== 1 2 9 3 5 15 6 6 30 7 7 9
How can I perform this operation using LINQ query?
Solution 1:
You can use Linq's GroupBy
in a modified version which groups only if the two items are adjacent, then it's easy as:
var result = classes
.GroupAdjacent(c => c.Value)
.Select(g => new {
SequenceNumFrom = g.Min(c => c.SequenceNumber),
SequenceNumTo = g.Max(c => c.SequenceNumber),
Value = g.Key
});
foreach (var x in result)
Console.WriteLine("SequenceNumFrom:{0} SequenceNumTo:{1} Value:{2}", x.SequenceNumFrom, x.SequenceNumTo, x.Value);
DEMO
Result:
SequenceNumFrom:1 SequenceNumTo:2 Value:9
SequenceNumFrom:3 SequenceNumTo:5 Value:15
SequenceNumFrom:6 SequenceNumTo:6 Value:30
SequenceNumFrom:7 SequenceNumTo:7 Value:9
This is the extension method to to group adjacent items:
public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
TKey last = default(TKey);
bool haveLast = false;
List<TSource> list = new List<TSource>();
foreach (TSource s in source)
{
TKey k = keySelector(s);
if (haveLast)
{
if (!k.Equals(last))
{
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
list = new List<TSource>();
list.Add(s);
last = k;
}
else
{
list.Add(s);
last = k;
}
}
else
{
list.Add(s);
last = k;
haveLast = true;
}
}
if (haveLast)
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
}
}
and the class used:
public class GroupOfAdjacent<TSource, TKey> : IEnumerable<TSource>, IGrouping<TKey, TSource>
{
public TKey Key { get; set; }
private List<TSource> GroupList { get; set; }
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return ((System.Collections.Generic.IEnumerable<TSource>)this).GetEnumerator();
}
System.Collections.Generic.IEnumerator<TSource> System.Collections.Generic.IEnumerable<TSource>.GetEnumerator()
{
foreach (var s in GroupList)
yield return s;
}
public GroupOfAdjacent(List<TSource> source, TKey key)
{
GroupList = source;
Key = key;
}
}
Solution 2:
You can use this linq query
Demo
var values = (new[] { 9, 9, 15, 15, 15, 30, 9 }).Select((x, i) => new { x, i });
var query = from v in values
let firstNonValue = values.Where(v2 => v2.i >= v.i && v2.x != v.x).FirstOrDefault()
let grouping = firstNonValue == null ? int.MaxValue : firstNonValue.i
group v by grouping into v
select new
{
From = v.Min(y => y.i) + 1,
To = v.Max(y => y.i) + 1,
Value = v.Min(y => y.x)
};
Solution 3:
MoreLinq provides this functionality out of the box
It's called GroupAdjacent
and is implemented as extension method on IEnumerable
:
Groups the adjacent elements of a sequence according to a specified key selector function.
enumerable.GroupAdjacent(e => e.Key)
There is even a Nuget "source" package that contains only that method, if you don't want to pull in an additional binary Nuget package.
The method returns an IEnumerable<IGrouping<TKey, TValue>>
, so its output can be processed in the same way output from GroupBy
would be.
Solution 4:
You can do it like this:
var all = new [] {
new Cls(1, 9)
, new Cls(2, 9)
, new Cls(3, 15)
, new Cls(4, 15)
, new Cls(5, 15)
, new Cls(6, 30)
, new Cls(7, 9)
};
var f = all.First();
var res = all.Skip(1).Aggregate(
new List<Run> {new Run {From = f.SequenceNumber, To = f.SequenceNumber, Value = f.Value} }
, (p, v) => {
if (v.Value == p.Last().Value) {
p.Last().To = v.SequenceNumber;
} else {
p.Add(new Run {From = v.SequenceNumber, To = v.SequenceNumber, Value = v.Value});
}
return p;
});
foreach (var r in res) {
Console.WriteLine("{0} - {1} : {2}", r.From, r.To, r.Value);
}
The idea is to use Aggregate
creatively: starting with a list consisting of a single Run
, examine the content of the list we've got so far at each stage of aggregation (the if
statement in the lambda). Depending on the last value, either continue the old run, or start a new one.
Here is a demo on ideone.