Generate a matrix containing all combinations of elements taken from n vectors

This question pops up quite often in one form or another (see for example here or here). So I thought I'd present it in a general form, and provide an answer which might serve for future reference.

Given an arbitrary number n of vectors of possibly different sizes, generate an n-column matrix whose rows describe all combinations of elements taken from those vectors (Cartesian product) .

For example,

vectors = { [1 2], [3 6 9], [10 20] }

should give

combs = [ 1     3    10
          1     3    20
          1     6    10
          1     6    20
          1     9    10
          1     9    20
          2     3    10
          2     3    20
          2     6    10
          2     6    20
          2     9    10
          2     9    20 ]

Solution 1:

The ndgrid function almost gives the answer, but has one caveat: n output variables must be explicitly defined to call it. Since n is arbitrary, the best way is to use a comma-separated list (generated from a cell array with ncells) to serve as output. The resulting n matrices are then concatenated into the desired n-column matrix:

vectors = { [1 2], [3 6 9], [10 20] }; %// input data: cell array of vectors

n = numel(vectors); %// number of vectors
combs = cell(1,n); %// pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order 
combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n); %// reshape to obtain desired matrix

Solution 2:

A little bit simpler ... if you have the Neural Network toolbox you can simply use combvec:

vectors = {[1 2], [3 6 9], [10 20]};
combs = combvec(vectors{:}).' % Use cells as arguments

which returns a matrix in a slightly different order:

combs =

     1     3    10
     2     3    10
     1     6    10
     2     6    10
     1     9    10
     2     9    10
     1     3    20
     2     3    20
     1     6    20
     2     6    20
     1     9    20
     2     9    20

If you want the matrix that is in the question, you can use sortrows:

combs = sortrows(combvec(vectors{:}).')
% Or equivalently as per @LuisMendo in the comments: 
% combs = fliplr(combvec(vectors{end:-1:1}).') 

which gives

combs =

     1     3    10
     1     3    20
     1     6    10
     1     6    20
     1     9    10
     1     9    20
     2     3    10
     2     3    20
     2     6    10
     2     6    20
     2     9    10
     2     9    20

If you look at the internals of combvec (type edit combvec in the command window), you'll see that it uses different code than @LuisMendo's answer. I can't say which is more efficient overall.

If you happen to have a matrix whose rows are akin to the earlier cell array you can use:

vectors = [1 2;3 6;10 20];
vectors = num2cell(vectors,2);
combs = sortrows(combvec(vectors{:}).')

Solution 3:

I've done some benchmarking on the two proposed solutions. The benchmarking code is based on the timeit function, and is included at the end of this post.

I consider two cases: three vectors of size n, and three vectors of sizes n/10, n and n*10 respectively (both cases give the same number of combinations). n is varied up to a maximum of 240 (I choose this value to avoid the use of virtual memory in my laptop computer).

The results are given in the following figure. The ndgrid-based solution is seen to consistently take less time than combvec. It's also interesting to note that the time taken by combvec varies a little less regularly in the different-size case.

enter image description here


Benchmarking code

Function for ndgrid-based solution:

function combs = f1(vectors)
n = numel(vectors); %// number of vectors
combs = cell(1,n); %// pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order
combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n);

Function for combvec solution:

function combs = f2(vectors)
combs = combvec(vectors{:}).';

Script to measure time by calling timeit on these functions:

nn = 20:20:240;
t1 = [];
t2 = [];
for n = nn;
    %//vectors = {1:n, 1:n, 1:n};
    vectors = {1:n/10, 1:n, 1:n*10};
    t = timeit(@() f1(vectors));
    t1 = [t1; t];
    t = timeit(@() f2(vectors));
    t2 = [t2; t];
end

Solution 4:

Here's a do-it-yourself method that made me giggle with delight, using nchoosek, although it's not better than @Luis Mendo's accepted solution.

For the example given, after 1,000 runs this solution took my machine on average 0.00065935 s, versus the accepted solution 0.00012877 s. For larger vectors, following @Luis Mendo's benchmarking post, this solution is consistently slower than the accepted answer. Nevertheless, I decided to post it in hopes that maybe you'll find something useful about it:

Code:

tic;
v = {[1 2], [3 6 9], [10 20]};

L = [0 cumsum(cellfun(@length,v))];
V = cell2mat(v);

J = nchoosek(1:L(end),length(v));
J(any(J>repmat(L(2:end),[size(J,1) 1]),2) | ...
  any(J<=repmat(L(1:end-1),[size(J,1) 1]),2),:)  = [];

V(J)
toc

gives

ans =

 1     3    10
 1     3    20
 1     6    10
 1     6    20
 1     9    10
 1     9    20
 2     3    10
 2     3    20
 2     6    10
 2     6    20
 2     9    10
 2     9    20

Elapsed time is 0.018434 seconds.

Explanation:

L gets the lengths of each vector using cellfun. Although cellfun is basically a loop, it's efficient here considering your number of vectors will have to be relatively low for this problem to even be practical.

V concatenates all the vectors for easy access later (this assumes you entered all your vectors as rows. v' would work for column vectors.)

nchoosek gets all the ways to pick n=length(v) elements from the total number of elements L(end). There will be more combinations here than what we need.

J =

 1     2     3
 1     2     4
 1     2     5
 1     2     6
 1     2     7
 1     3     4
 1     3     5
 1     3     6
 1     3     7
 1     4     5
 1     4     6
 1     4     7
 1     5     6
 1     5     7
 1     6     7
 2     3     4
 2     3     5
 2     3     6
 2     3     7
 2     4     5
 2     4     6
 2     4     7
 2     5     6
 2     5     7
 2     6     7
 3     4     5
 3     4     6
 3     4     7
 3     5     6
 3     5     7
 3     6     7
 4     5     6
 4     5     7
 4     6     7
 5     6     7

Since there are only two elements in v(1), we need to throw out any rows where J(:,1)>2. Similarly, where J(:,2)<3, J(:,2)>5, etc... Using L and repmat we can determine whether each element of J is in its appropriate range, and then use any to discard rows that have any bad element.

Finally, these aren't the actual values from v, just indices. V(J) will return the desired matrix.