Faster alternative to nested loops?

I have a need to create a list of combinations of numbers. The numbers are quite small so I can use byte rather than int. However it requires many nested loops in order to get every possible combination. I'm wondering if there's a more efficient manner to do what I'm after. Code so far is:

var data = new List<byte[]>();
for (byte a = 0; a < 2; a++)
for (byte b = 0; b < 3; b++)
for (byte c = 0; c < 4; c++)
for (byte d = 0; d < 3; d++)
for (byte e = 0; e < 4; e++)
for (byte f = 0; f < 3; f++)
for (byte g = 0; g < 3; g++)
for (byte h = 0; h < 4; h++)
for (byte i = 0; i < 2; i++)
for (byte j = 0; j < 4; j++)
for (byte k = 0; k < 4; k++)
for (byte l = 0; l < 3; l++)
for (byte m = 0; m < 4; m++)
{
    data.Add(new [] {a, b, c, d, e, f, g, h, i, j, k, l, m});
}

I was considering using something like a BitArray but I'm not sure how I could incorporate it.

Any recommendations would be greatly appreciated. Alternatively, perhaps this is the fastest way of doing what I want?

EDIT Couple of quick points (and apologies I didn't put these in the original post):

  • The numbers and the order of them (2, 3, 4, 3, 4, 3, 3 etc) are very important, so using a solution such as Generating Permutations using LINQ won't help because the maximums in each 'column' are different
  • I'm not a mathematician, so I apologise if I'm not using the technical terms like 'permutations' and 'combinations' correctly :)
  • I do need to populate all of these combinations at once - I can't just grab one or another based on an index
  • Using byte is faster than using int, I guarantee it. It's also a lot better on memory usage to have 67m+ arrays of bytes rather than ints
  • My ultimate goal here is to look for a faster alternative to nested loops.
  • I considered using parallel programming, but due to the iterative nature of what I'm trying to achieve, I couldn't find a way to do it successfully (even with ConcurrentBag) - however I'm happy to be proved wrong :)

CONCLUSION

Caramiriel has provided a good micro-optimisation which shaves some time off the loops, so I've marked that answer as correct. Eric also mentioned that it is faster to pre-allocate the List. But, at this stage it seems that the nested loops are in fact the fastest possible way of doing this (depressing, I know!).

If you want to try exactly what I was trying to benchmark with StopWatch, go with 13 loops counting up to 4 in each loop - that makes about 67m+ lines in the list. On my machine (i5-3320M 2.6GHz) it takes about 2.2s to do the optimised version.


As a reminder: you probably do not need this kind of code while developing your own solution. This can and should only used in very specific situations. Readability is often more important than speed.

You can use the properties of a struct and allocate the structure in advance. I cut off some levels in the sample below, but I'm sure you'll be able to figure out the specifics. Runs about 5-6 times faster than the original (release mode).

The block:

struct ByteBlock
{
    public byte A;
    public byte B;
    public byte C;
    public byte D;
    public byte E;
}

The loop:

var data = new ByteBlock[2*3*4*3*4];
var counter = 0;

var bytes = new ByteBlock();

for (byte a = 0; a < 2; a++)
{
    bytes.A = a;
    for (byte b = 0; b < 3; b++)
    {
        bytes.B = b;
        for (byte c = 0; c < 4; c++)
        {
            bytes.C = c;
            for (byte d = 0; d < 3; d++)
            {
                bytes.D = d;
                for (byte e = 0; e < 4; e++)
                {
                    bytes.E = e;
                    data[counter++] = bytes;
                }
            }
        }
    }
}

It's faster because it doesn't allocate a new list every time you add it to the list. Also since it's creating this list, it needs a reference to every other value (a,b,c,d,e). You can assume each value is only modified once inside the loop, so we can optimize it to do so (data locality).

Also read the comments for side-effects.

Edited the answer to use an T[] instead of a List<T>.


What you are doing is counting (with variable radix, but still counting).

Since you are using C#, I assume you don't want to play with useful memory layout and data structures that let you really optimize your code.

So here I'm posting something different, which may not suit your case, but it's worth noting: In case you actually access the list in a sparse fashion, here a class that let you compute the i-th element in linear time (rather than exponential as the other answers)

class Counter
{
    public int[] Radices;

    public int[] this[int n]
    {
        get 
        { 
            int[] v = new int[Radices.Length];
            int i = Radices.Length - 1;

            while (n != 0 && i >= 0)
            {
                //Hope C# has an IL-opcode for div-and-reminder like x86 do
                v[i] = n % Radices[i];
                n /= Radices[i--];
            }
            return v;
        }
    }
}

You can use this class this way

Counter c = new Counter();
c.Radices = new int[] { 2,3,4,3,4,3,3,4,2,4,4,3,4};

now c[i] is the same as your list, name it l, l[i].

As you can see, you can easily avoid all those loops :) even when you pre compute all the list at whole since you can simply implement a Carry-Ripple counter.

Counters are a very studied subject, I strongly advice to search for some literature if you feel.