Generate a matrix containing all combinations of elements taken from n vectors
This question pops up quite often in one form or another (see for example here or here). So I thought I'd present it in a general form, and provide an answer which might serve for future reference.
Given an arbitrary number
n
of vectors of possibly different sizes, generate ann
-column matrix whose rows describe all combinations of elements taken from those vectors (Cartesian product) .
For example,
vectors = { [1 2], [3 6 9], [10 20] }
should give
combs = [ 1 3 10
1 3 20
1 6 10
1 6 20
1 9 10
1 9 20
2 3 10
2 3 20
2 6 10
2 6 20
2 9 10
2 9 20 ]
Solution 1:
The ndgrid
function almost gives the answer, but has one caveat: n
output variables must be explicitly defined to call it. Since n
is arbitrary, the best way is to use a comma-separated list (generated from a cell array with n
cells) to serve as output. The resulting n
matrices are then concatenated into the desired n
-column matrix:
vectors = { [1 2], [3 6 9], [10 20] }; %// input data: cell array of vectors
n = numel(vectors); %// number of vectors
combs = cell(1,n); %// pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order
combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n); %// reshape to obtain desired matrix
Solution 2:
A little bit simpler ... if you have the Neural Network toolbox you can simply use combvec
:
vectors = {[1 2], [3 6 9], [10 20]};
combs = combvec(vectors{:}).' % Use cells as arguments
which returns a matrix in a slightly different order:
combs =
1 3 10
2 3 10
1 6 10
2 6 10
1 9 10
2 9 10
1 3 20
2 3 20
1 6 20
2 6 20
1 9 20
2 9 20
If you want the matrix that is in the question, you can use sortrows
:
combs = sortrows(combvec(vectors{:}).')
% Or equivalently as per @LuisMendo in the comments:
% combs = fliplr(combvec(vectors{end:-1:1}).')
which gives
combs =
1 3 10
1 3 20
1 6 10
1 6 20
1 9 10
1 9 20
2 3 10
2 3 20
2 6 10
2 6 20
2 9 10
2 9 20
If you look at the internals of combvec
(type edit combvec
in the command window), you'll see that it uses different code than @LuisMendo's answer. I can't say which is more efficient overall.
If you happen to have a matrix whose rows are akin to the earlier cell array you can use:
vectors = [1 2;3 6;10 20];
vectors = num2cell(vectors,2);
combs = sortrows(combvec(vectors{:}).')
Solution 3:
I've done some benchmarking on the two proposed solutions. The benchmarking code is based on the timeit
function, and is included at the end of this post.
I consider two cases: three vectors of size n
, and three vectors of sizes n/10
, n
and n*10
respectively (both cases give the same number of combinations). n
is varied up to a maximum of 240
(I choose this value to avoid the use of virtual memory in my laptop computer).
The results are given in the following figure. The ndgrid
-based solution is seen to consistently take less time than combvec
. It's also interesting to note that the time taken by combvec
varies a little less regularly in the different-size case.
Benchmarking code
Function for ndgrid
-based solution:
function combs = f1(vectors)
n = numel(vectors); %// number of vectors
combs = cell(1,n); %// pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order
combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n);
Function for combvec
solution:
function combs = f2(vectors)
combs = combvec(vectors{:}).';
Script to measure time by calling timeit
on these functions:
nn = 20:20:240;
t1 = [];
t2 = [];
for n = nn;
%//vectors = {1:n, 1:n, 1:n};
vectors = {1:n/10, 1:n, 1:n*10};
t = timeit(@() f1(vectors));
t1 = [t1; t];
t = timeit(@() f2(vectors));
t2 = [t2; t];
end
Solution 4:
Here's a do-it-yourself method that made me giggle with delight, using nchoosek
, although it's not better than @Luis Mendo's accepted solution.
For the example given, after 1,000 runs this solution took my machine on average 0.00065935 s, versus the accepted solution 0.00012877 s. For larger vectors, following @Luis Mendo's benchmarking post, this solution is consistently slower than the accepted answer. Nevertheless, I decided to post it in hopes that maybe you'll find something useful about it:
Code:
tic;
v = {[1 2], [3 6 9], [10 20]};
L = [0 cumsum(cellfun(@length,v))];
V = cell2mat(v);
J = nchoosek(1:L(end),length(v));
J(any(J>repmat(L(2:end),[size(J,1) 1]),2) | ...
any(J<=repmat(L(1:end-1),[size(J,1) 1]),2),:) = [];
V(J)
toc
gives
ans =
1 3 10
1 3 20
1 6 10
1 6 20
1 9 10
1 9 20
2 3 10
2 3 20
2 6 10
2 6 20
2 9 10
2 9 20
Elapsed time is 0.018434 seconds.
Explanation:
L
gets the lengths of each vector using cellfun
. Although cellfun
is basically a loop, it's efficient here considering your number of vectors will have to be relatively low for this problem to even be practical.
V
concatenates all the vectors for easy access later (this assumes you entered all your vectors as rows. v' would work for column vectors.)
nchoosek
gets all the ways to pick n=length(v)
elements from the total number of elements L(end)
. There will be more combinations here than what we need.
J =
1 2 3
1 2 4
1 2 5
1 2 6
1 2 7
1 3 4
1 3 5
1 3 6
1 3 7
1 4 5
1 4 6
1 4 7
1 5 6
1 5 7
1 6 7
2 3 4
2 3 5
2 3 6
2 3 7
2 4 5
2 4 6
2 4 7
2 5 6
2 5 7
2 6 7
3 4 5
3 4 6
3 4 7
3 5 6
3 5 7
3 6 7
4 5 6
4 5 7
4 6 7
5 6 7
Since there are only two elements in v(1)
, we need to throw out any rows where J(:,1)>2
. Similarly, where J(:,2)<3
, J(:,2)>5
, etc... Using L
and repmat
we can determine whether each element of J
is in its appropriate range, and then use any
to discard rows that have any bad element.
Finally, these aren't the actual values from v
, just indices. V(J)
will return the desired matrix.