How to broadcast pandas series with different indices?
I have two pandas series, each with a different index:
In [2]: a = pd.Series(range(5), index=pd.Index(list('abcde'), name='index'))
In [3]: b = pd.Series(range(4), index=pd.Index(list('ABCD'), name='BIG_INDEX'))
What I would like to do is something along the lines of a.mul(b, axis=1)
to instruct pandas to broadcast b along the 1 axis before performing a ufunc (e.g. multiply, raise to the power of, etc). Is there a better way of doing this than using apply?
In [4]: a.apply(lambda x: x*b)
Out[4]:
BIG_INDEX A B C D
index
a 0 0 0 0
b 0 1 2 3
c 0 2 4 6
d 0 3 6 9
e 0 4 8 12
You could use numpy.outer
to perform the calculation as if a
and b
were arrays:
In [285]: pd.DataFrame(np.outer(a, b), columns=b.index, index=a.index)
Out[285]:
BIG_INDEX A B C D
index
a 0 0 0 0
b 0 1 2 3
c 0 2 4 6
d 0 3 6 9
e 0 4 8 12
This is quicker than calling a.apply(lambda x: x*b)
.
By the way, all NumPy ufuncs come with 5 methods: outer
, accumulate
, reduce
, reduceat
, and at
. So another way to write the solution above is
In [34]: pd.DataFrame(np.multiply.outer(a, b), columns=b.index, index=a.index)
Out[36]:
BIG_INDEX A B C D
index
a 0 0 0 0
b 0 1 2 3
c 0 2 4 6
d 0 3 6 9
e 0 4 8 12
And when written this way, it is clear how to apply the same idea to any NumPy ufunc. For example, to make and addition table out of a
and b
, call np.add
's outer
method:
In [37]: pd.DataFrame(np.add.outer(a, b), columns=b.index, index=a.index)
Out[37]:
BIG_INDEX A B C D
index
a 0 1 2 3
b 1 2 3 4
c 2 3 4 5
d 3 4 5 6
e 4 5 6 7