In MATLAB, when is it optimal to use bsxfun?
I've noticed that a lot of good answers to MATLAB questions on Stack Overflow frequently use the function bsxfun
. Why?
Motivation: In the MATLAB documentation for bsxfun
, the following example is provided:
A = magic(5);
A = bsxfun(@minus, A, mean(A))
Of course we could do the same operation using:
A = A - (ones(size(A, 1), 1) * mean(A));
And in fact a simple speed test demonstrates the second method is about 20% faster. So why use the first method? I'm guessing there are some circumstances where using bsxfun
will be much faster than the "manual" approach. I'd be really interested in seeing an example of such a situation and an explanation as to why it is faster.
Also, one final element to this question, again from the MATLAB documentation for bsxfun
: "C = bsxfun(fun,A,B) applies the element-by-element binary operation specified by the function handle fun to arrays A and B, with singleton expansion enabled.". What does the phrase "with singleton expansion enabled" mean?
Solution 1:
There are three reasons I use bsxfun
(documentation, blog link)
-
bsxfun
is faster thanrepmat
(see below) -
bsxfun
requires less typing - Using
bsxfun
, like usingaccumarray
, makes me feel good about my understanding of MATLAB.
bsxfun
will replicate the input arrays along their "singleton dimensions", i.e., the dimensions along which the size of the array is 1, so that they match the size of the corresponding dimension of the other array. This is what is called "singleton expansion". As an aside, the singleton dimensions are the ones that will be dropped if you call squeeze
.
It is possible that for very small problems, the repmat
approach is faster - but at that array size, both operations are so fast that it likely won't make any difference in terms of overall performance. There are two important reasons bsxfun
is faster: (1) the calculation happens in compiled code, which means that the actual replication of the array never happens, and (2) bsxfun
is one of the multithreaded MATLAB functions.
I have run a speed comparison between repmat
and bsxfun
with MATLAB R2012b on my decently fast laptop.
For me, bsxfun
is about three times faster than repmat
. The difference becomes more pronounced if the arrays get larger:
The jump in runtime of repmat
happens around an array size of 1 MB, which could have something to do with the size of my processor cache - bsxfun
doesn't get as bad of a jump, because it only needs to allocate the output array.
Below you find the code I used for timing:
n = 300;
k=1; %# k=100 for the second graph
a = ones(10,1);
rr = zeros(n,1);
bb = zeros(n,1);
ntt = 100;
tt = zeros(ntt,1);
for i=1:n;
r = rand(1,i*k);
for it=1:ntt;
tic,
x = bsxfun(@plus,a,r);
tt(it) = toc;
end;
bb(i) = median(tt);
for it=1:ntt;
tic,
y = repmat(a,1,i*k) + repmat(r,10,1);
tt(it) = toc;
end;
rr(i) = median(tt);
end
Solution 2:
In my case, I use bsxfun
because it avoids me to think about the column or row issues.
In order to write your example:
A = A - (ones(size(A, 1), 1) * mean(A));
I have to solve several problems:
-
size(A,1)
orsize(A,2)
-
ones(sizes(A,1),1)
orones(1,sizes(A,1))
-
ones(size(A, 1), 1) * mean(A)
ormean(A)*ones(size(A, 1), 1)
-
mean(A)
ormean(A,2)
When I use bsxfun
, I just have to solve the last one:
a) mean(A)
or mean(A,2)
You might think it is lazy or something, but when I use bsxfun
, I have fewer bugs and I program faster.
Moreover, it is shorter, which improves typing speed and readability.