Access index in pandas.Series.apply
Lets say I have a MultiIndex Series s
:
>>> s
values
a b
1 2 0.1
3 6 0.3
4 4 0.7
and I want to apply a function which uses the index of the row:
def f(x):
# conditions or computations using the indexes
if x.index[0] and ...:
other = sum(x.index) + ...
return something
How can I do s.apply(f)
for such a function? What is the recommended way to make this kind of operations? I expect to obtain a new Series with the values resulting from this function applied on each row and the same MultiIndex.
Solution 1:
I don't believe apply
has access to the index; it treats each row as a numpy object, not a Series, as you can see:
In [27]: s.apply(lambda x: type(x))
Out[27]:
a b
1 2 <type 'numpy.float64'>
3 6 <type 'numpy.float64'>
4 4 <type 'numpy.float64'>
To get around this limitation, promote the indexes to columns, apply your function, and recreate a Series with the original index.
Series(s.reset_index().apply(f, axis=1).values, index=s.index)
Other approaches might use s.get_level_values
, which often gets a little ugly in my opinion, or s.iterrows()
, which is likely to be slower -- perhaps depending on exactly what f
does.
Solution 2:
Make it a frame, return scalars if you want (so the result is a series)
Setup
In [11]: s = Series([1,2,3],dtype='float64',index=['a','b','c'])
In [12]: s
Out[12]:
a 1
b 2
c 3
dtype: float64
Printing function
In [13]: def f(x):
print type(x), x
return x
....:
In [14]: pd.DataFrame(s).apply(f)
<class 'pandas.core.series.Series'> a 1
b 2
c 3
Name: 0, dtype: float64
<class 'pandas.core.series.Series'> a 1
b 2
c 3
Name: 0, dtype: float64
Out[14]:
0
a 1
b 2
c 3
Since you can return anything here, just return the scalars (access the index via the name
attribute)
In [15]: pd.DataFrame(s).apply(lambda x: 5 if x.name == 'a' else x[0] ,1)
Out[15]:
a 5
b 2
c 3
dtype: float64
Solution 3:
Convert to DataFrame
and apply along row. You can access the index as x.name
. x
is also a Series
now with 1 value
s.to_frame(0).apply(f, axis=1)[0]