How to loop through IEnumerable in batches [duplicate]

I am developing a C# program which has an "IEnumerable users" that stores the ids of 4 million users. I need to loop through the IEnumerable and extract a batch 1000 ids each time to perform some operations in another method.

How do I extract 1000 ids at a time from start of the IEnumerable, do some thing else, then fetch the next batch of 1000 and so on?

Is this possible?


Solution 1:

You can use MoreLINQ's Batch operator (available from NuGet):

foreach(IEnumerable<User> batch in users.Batch(1000))
   // use batch

If simple usage of library is not an option, you can reuse implementation:

public static IEnumerable<IEnumerable<T>> Batch<T>(
        this IEnumerable<T> source, int size)
{
    T[] bucket = null;
    var count = 0;

    foreach (var item in source)
    {
       if (bucket == null)
           bucket = new T[size];

       bucket[count++] = item;

       if (count != size)                
          continue;

       yield return bucket.Select(x => x);

       bucket = null;
       count = 0;
    }

    // Return the last bucket with all remaining elements
    if (bucket != null && count > 0)
    {
        Array.Resize(ref bucket, count);
        yield return bucket.Select(x => x);
    }
}

BTW for performance you can simply return bucket without calling Select(x => x). Select is optimized for arrays, but selector delegate still would be invoked on each item. So, in your case it's better to use

yield return bucket;

Solution 2:

Sounds like you need to use Skip and Take methods of your object. Example:

users.Skip(1000).Take(1000)

this would skip the first 1000 and take the next 1000. You'd just need to increase the amount skipped with each call

You could use an integer variable with the parameter for Skip and you can adjust how much is skipped. You can then call it in a method.

public IEnumerable<user> GetBatch(int pageNumber)
{
    return users.Skip(pageNumber * 1000).Take(1000);
}

Solution 3:

The easiest way to do this is probably just to use the GroupBy method in LINQ:

var batches = myEnumerable
    .Select((x, i) => new { x, i })
    .GroupBy(p => (p.i / 1000), (p, i) => p.x);

But for a more sophisticated solution, see this blog post on how to create your own extension method to do this. Duplicated here for posterity:

public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> collection, int batchSize)
{
    List<T> nextbatch = new List<T>(batchSize);
    foreach (T item in collection)
    {
        nextbatch.Add(item);
        if (nextbatch.Count == batchSize)
        {
            yield return nextbatch;
            nextbatch = new List<T>(); 
            // or nextbatch.Clear(); but see Servy's comment below
        }
    }

    if (nextbatch.Count > 0)
        yield return nextbatch;
}