Recursive List Flattening

I could probably write this myself, but the specific way I'm trying to accomplish it is throwing me off. I'm trying to write a generic extension method similar to the others introduced in .NET 3.5 that will take a nested IEnumerable of IEnumerables (and so on) and flatten it into one IEnumerable. Anyone have any ideas?

Specifically, I'm having trouble with the syntax of the extension method itself so that I can work on a flattening algorithm.


Solution 1:

Here's an extension that might help. It will traverse all nodes in your hierarchy of objects and pick out the ones that match a criteria. It assumes that each object in your hierarchy has a collection property that holds its child objects.

Here's the extension:

/// Traverses an object hierarchy and return a flattened list of elements
/// based on a predicate.
/// 
/// TSource: The type of object in your collection.</typeparam>
/// source: The collection of your topmost TSource objects.</param>
/// selectorFunction: A predicate for choosing the objects you want.
/// getChildrenFunction: A function that fetches the child collection from an object.
/// returns: A flattened list of objects which meet the criteria in selectorFunction.
public static IEnumerable<TSource> Map<TSource>(
  this IEnumerable<TSource> source,
  Func<TSource, bool> selectorFunction,
  Func<TSource, IEnumerable<TSource>> getChildrenFunction)
{
  // Add what we have to the stack
  var flattenedList = source.Where(selectorFunction);

  // Go through the input enumerable looking for children,
  // and add those if we have them
  foreach (TSource element in source)
  {
    flattenedList = flattenedList.Concat(
      getChildrenFunction(element).Map(selectorFunction,
                                       getChildrenFunction)
    );
  }
  return flattenedList;
}

Examples (Unit Tests):

First we need an object and a nested object hierarchy.

A simple node class

class Node
{
  public int NodeId { get; set; }
  public int LevelId { get; set; }
  public IEnumerable<Node> Children { get; set; }

  public override string ToString()
  {
    return String.Format("Node {0}, Level {1}", this.NodeId, this.LevelId);
  }
}

And a method to get a 3-level deep hierarchy of nodes

private IEnumerable<Node> GetNodes()
{
  // Create a 3-level deep hierarchy of nodes
  Node[] nodes = new Node[]
    {
      new Node 
      { 
        NodeId = 1, 
        LevelId = 1, 
        Children = new Node[]
        {
          new Node { NodeId = 2, LevelId = 2, Children = new Node[] {} },
          new Node
          {
            NodeId = 3,
            LevelId = 2,
            Children = new Node[]
            {
              new Node { NodeId = 4, LevelId = 3, Children = new Node[] {} },
              new Node { NodeId = 5, LevelId = 3, Children = new Node[] {} }
            }
          }
        }
      },
      new Node { NodeId = 6, LevelId = 1, Children = new Node[] {} }
    };
  return nodes;
}

First Test: flatten the hierarchy, no filtering

[Test]
public void Flatten_Nested_Heirachy()
{
  IEnumerable<Node> nodes = GetNodes();
  var flattenedNodes = nodes.Map(
    p => true, 
    (Node n) => { return n.Children; }
  );
  foreach (Node flatNode in flattenedNodes)
  {
    Console.WriteLine(flatNode.ToString());
  }

  // Make sure we only end up with 6 nodes
  Assert.AreEqual(6, flattenedNodes.Count());
}

This will show:

Node 1, Level 1
Node 6, Level 1
Node 2, Level 2
Node 3, Level 2
Node 4, Level 3
Node 5, Level 3

Second Test: Get a list of nodes that have an even-numbered NodeId

[Test]
public void Only_Return_Nodes_With_Even_Numbered_Node_IDs()
{
  IEnumerable<Node> nodes = GetNodes();
  var flattenedNodes = nodes.Map(
    p => (p.NodeId % 2) == 0, 
    (Node n) => { return n.Children; }
  );
  foreach (Node flatNode in flattenedNodes)
  {
    Console.WriteLine(flatNode.ToString());
  }
  // Make sure we only end up with 3 nodes
  Assert.AreEqual(3, flattenedNodes.Count());
}

This will show:

Node 6, Level 1
Node 2, Level 2
Node 4, Level 3

Solution 2:

Hmm... I'm not sure exactly what you want here, but here's a "one level" option:

public static IEnumerable<TElement> Flatten<TElement,TSequence> (this IEnumerable<TSequence> sequences)
    where TSequence : IEnumerable<TElement> 
{
    foreach (TSequence sequence in sequences)
    {
        foreach(TElement element in sequence)
        {
            yield return element;
        }
    }
}

If that's not what you want, could you provide the signature of what you do want? If you don't need a generic form, and you just want to do the kind of thing that LINQ to XML constructors do, that's reasonably simple - although the recursive use of iterator blocks is relatively inefficient. Something like:

static IEnumerable Flatten(params object[] objects)
{
    // Can't easily get varargs behaviour with IEnumerable
    return Flatten((IEnumerable) objects);
}

static IEnumerable Flatten(IEnumerable enumerable)
{
    foreach (object element in enumerable)
    {
        IEnumerable candidate = element as IEnumerable;
        if (candidate != null)
        {
            foreach (object nested in candidate)
            {
                yield return nested;
            }
        }
        else
        {
            yield return element;
        }
    }
}

Note that that will treat a string as a sequence of chars, however - you may want to special-case strings to be individual elements instead of flattening them, depending on your use case.

Does that help?

Solution 3:

I thought I'd share a complete example with error handling and a single-logic apporoach.

Recursive flattening is as simple as:

LINQ version

public static class IEnumerableExtensions
{
    public static IEnumerable<T> SelectManyRecursive<T>(this IEnumerable<T> source, Func<T, IEnumerable<T>> selector)
    {
        if (source == null) throw new ArgumentNullException("source");
        if (selector == null) throw new ArgumentNullException("selector");

        return !source.Any() ? source :
            source.Concat(
                source
                .SelectMany(i => selector(i).EmptyIfNull())
                .SelectManyRecursive(selector)
            );
    }

    public static IEnumerable<T> EmptyIfNull<T>(this IEnumerable<T> source)
    {
        return source ?? Enumerable.Empty<T>();
    }
}

Non-LINQ version

public static class IEnumerableExtensions
{
    public static IEnumerable<T> SelectManyRecursive<T>(this IEnumerable<T> source, Func<T, IEnumerable<T>> selector)
    {
        if (source == null) throw new ArgumentNullException("source");
        if (selector == null) throw new ArgumentNullException("selector");

        foreach (T item in source)
        {
            yield return item;

            var children = selector(item);
            if (children == null)
                continue;

            foreach (T descendant in children.SelectManyRecursive(selector))
            {
                yield return descendant;
            }
        }
    }
}

Design decisions

I decided to:

  • disallow flattening of a null IEnumerable, this can be changed by removing exception throwing and:
    • adding source = source.EmptyIfNull(); before return in the 1st version
    • adding if (source != null) before foreach in the 2nd version
  • allow returning of a null collection by the selector - this way I'm removing responsibility from the caller to assure the children list isn't empty, this can be changed by:
    • removing .EmptyIfNull() in the first version - note that SelectMany will fail if null is returned by selector
    • removing if (children == null) continue; in the second version - note that foreach will fail on a null IEnumerable parameter
  • allow filtering children with .Where clause on the caller side or within the children selector rather than passing a children filter selector parameter:
    • it won't impact the efficiency because in both versions it is a deferred call
    • it would be mixing another logic with the method and I prefer to keep the logic separated

Sample use

I'm using this extension method in LightSwitch to obtain all controls on the screen:

public static class ScreenObjectExtensions
{
    public static IEnumerable<IContentItemProxy> FindControls(this IScreenObject screen)
    {
        var model = screen.Details.GetModel();

        return model.GetChildItems()
            .SelectManyRecursive(c => c.GetChildItems())
            .OfType<IContentItemDefinition>()
            .Select(c => screen.FindControl(c.Name));
    }
}

Solution 4:

Here is a modified Jon Skeet's answer to allow more than "one level":

static IEnumerable Flatten(IEnumerable enumerable)
{
    foreach (object element in enumerable)
    {
        IEnumerable candidate = element as IEnumerable;
        if (candidate != null)
        {
            foreach (object nested in Flatten(candidate))
            {
                yield return nested;
            }
        }
        else
        {
            yield return element;
        }
    }
}

disclaimer: I don't know C#.

The same in Python:

#!/usr/bin/env python

def flatten(iterable):
    for item in iterable:
        if hasattr(item, '__iter__'):
            for nested in flatten(item):
                yield nested
        else:
            yield item

if __name__ == '__main__':
    for item in flatten([1,[2, 3, [[4], 5]], 6, [[[7]]], [8]]):
        print(item, end=" ")

It prints:

1 2 3 4 5 6 7 8