how do I calculate a rolling idxmax
consider the pd.Series
s
import pandas as pd
import numpy as np
np.random.seed([3,1415])
s = pd.Series(np.random.randint(0, 10, 10), list('abcdefghij'))
s
a 0
b 2
c 7
d 3
e 8
f 7
g 0
h 6
i 8
j 6
dtype: int64
I want to get the index for the max value for the rolling window of 3
s.rolling(3).max()
a NaN
b NaN
c 7.0
d 7.0
e 8.0
f 8.0
g 8.0
h 7.0
i 8.0
j 8.0
dtype: float64
What I want is
a None
b None
c c
d c
e e
f e
g e
h f
i i
j i
dtype: object
What I've done
s.rolling(3).apply(np.argmax)
a NaN
b NaN
c 2.0
d 1.0
e 2.0
f 1.0
g 0.0
h 0.0
i 2.0
j 1.0
dtype: float64
which is obviously not what I want
There is no simple way to do that, because the argument that is passed to the rolling-applied function is a plain numpy array, not a pandas Series, so it doesn't know about the index. Moreover, the rolling functions must return a float result, so they can't directly return the index values if they're not floats.
Here is one approach:
>>> s.index[s.rolling(3).apply(np.argmax)[2:].astype(int)+np.arange(len(s)-2)]
Index([u'c', u'c', u'e', u'e', u'e', u'f', u'i', u'i'], dtype='object')
The idea is to take the argmax values and align them with the series by adding a value indicating how far along in the series we are. (That is, for the first argmax value we add zero, because it is giving us the index into a subsequence starting at index 0 in the original series; for the second argmax value we add one, because it is giving us the index into a subsequence starting at index 1 in the original series; etc.)
This gives the correct results, but doesn't include the two "None" values at the beginning; you'd have to add those back manually if you wanted them.
There is an open pandas issue to add rolling idxmax.
I used a generator
def idxmax(s, w):
i = 0
while i + w <= len(s):
yield(s.iloc[i:i+w].idxmax())
i += 1
pd.Series(idxmax(s, 3), s.index[2:])
c c
d c
e e
f e
g e
h f
i i
j i
dtype: object
Here's an approach using broadcasting
-
maxidx = (s.values[np.arange(s.size-3+1)[:,None] + np.arange(3)]).argmax(1)
out = s.index[maxidx+np.arange(maxidx.size)]
This generates all the indices corresponding to the rolling windows, indexes into the extracted array version with those and thus gets the max indices for each window. For a more efficient indexing, we can use NumPy strides
, like so -
arr = s.values
n = arr.strides[0]
maxidx = np.lib.stride_tricks.as_strided(arr, \
shape=(s.size-3+1,3), strides=(n,n)).argmax(1)
Just chiming in on how I solved a similar problem that I had. I did not want to find the index exactly, I wanted to find how long ago the max value happened. But this could be used to find the index as well.
I'm basically using the shift strategy, but I'm iterating over several shifts with a configurable length. It's probably slow, but it works good enough for me.
import pandas as pd
length = 5
data = [1, 2, 3, 4, 5, 4, 3, 4, 5, 6, 7, 6, 5, 4, 5, 4, 3]
df = pd.DataFrame(data, columns=['number'])
df['helper_max'] = df.rolling(length).max()
for i in range(length, -1, -1):
# Set the column to what you want. You may grab the index
# if you wish, I wanted number of rows since max happened
df.loc[df['number'].shift(i) == df['helper_max'], 'n_rows_ago_since_max'] = i
print(df)
Output:
number helper_max n_rows_ago_since_max
0 1 NaN NaN
1 2 NaN NaN
2 3 NaN NaN
3 4 NaN NaN
4 5 5.0 0.0
5 4 5.0 1.0
6 3 5.0 2.0
7 4 5.0 3.0
8 5 5.0 0.0
9 6 6.0 0.0
10 7 7.0 0.0
11 6 7.0 1.0
12 5 7.0 2.0
13 4 7.0 3.0
14 5 7.0 4.0
15 4 6.0 4.0
16 3 5.0 2.0
You can also simulate the rolling window by creating a DataFrame
and use idxmax
as follows:
window_values = pd.DataFrame({0: s, 1: s.shift(), 2: s.shift(2)})
s.index[np.arange(len(s)) - window_values.idxmax(1)]
Index(['a', 'b', 'c', 'c', 'e', 'e', 'e', 'f', 'i', 'i'], dtype='object', name=0)
As you can see, the first two terms are the idxmax
as applied to the initial windows of lengths 1 and 2 rather than null values.
It's not as efficient as the accepted answer and probably not a good idea for large windows but just another perspective.