how do I calculate a rolling idxmax

consider the pd.Series s

import pandas as pd
import numpy as np

np.random.seed([3,1415])
s = pd.Series(np.random.randint(0, 10, 10), list('abcdefghij'))
s

a    0
b    2
c    7
d    3
e    8
f    7
g    0
h    6
i    8
j    6
dtype: int64

I want to get the index for the max value for the rolling window of 3

s.rolling(3).max()

a    NaN
b    NaN
c    7.0
d    7.0
e    8.0
f    8.0
g    8.0
h    7.0
i    8.0
j    8.0
dtype: float64

What I want is

a    None
b    None
c       c
d       c
e       e
f       e
g       e
h       f
i       i
j       i
dtype: object

What I've done

s.rolling(3).apply(np.argmax)

a    NaN
b    NaN
c    2.0
d    1.0
e    2.0
f    1.0
g    0.0
h    0.0
i    2.0
j    1.0
dtype: float64

which is obviously not what I want

There is no simple way to do that, because the argument that is passed to the rolling-applied function is a plain numpy array, not a pandas Series, so it doesn't know about the index. Moreover, the rolling functions must return a float result, so they can't directly return the index values if they're not floats.

Here is one approach:

>>> s.index[s.rolling(3).apply(np.argmax)[2:].astype(int)+np.arange(len(s)-2)]
Index([u'c', u'c', u'e', u'e', u'e', u'f', u'i', u'i'], dtype='object')

The idea is to take the argmax values and align them with the series by adding a value indicating how far along in the series we are. (That is, for the first argmax value we add zero, because it is giving us the index into a subsequence starting at index 0 in the original series; for the second argmax value we add one, because it is giving us the index into a subsequence starting at index 1 in the original series; etc.)

This gives the correct results, but doesn't include the two "None" values at the beginning; you'd have to add those back manually if you wanted them.

There is an open pandas issue to add rolling idxmax.

I used a generator

def idxmax(s, w):
    i = 0
    while i + w <= len(s):
        yield(s.iloc[i:i+w].idxmax())
        i += 1

pd.Series(idxmax(s, 3), s.index[2:])

c    c
d    c
e    e
f    e
g    e
h    f
i    i
j    i
dtype: object

Here's an approach using broadcasting -

maxidx = (s.values[np.arange(s.size-3+1)[:,None] + np.arange(3)]).argmax(1)
out = s.index[maxidx+np.arange(maxidx.size)]

This generates all the indices corresponding to the rolling windows, indexes into the extracted array version with those and thus gets the max indices for each window. For a more efficient indexing, we can use NumPy strides, like so -

arr = s.values
n = arr.strides[0]
maxidx = np.lib.stride_tricks.as_strided(arr, \
                   shape=(s.size-3+1,3), strides=(n,n)).argmax(1)

Just chiming in on how I solved a similar problem that I had. I did not want to find the index exactly, I wanted to find how long ago the max value happened. But this could be used to find the index as well.

I'm basically using the shift strategy, but I'm iterating over several shifts with a configurable length. It's probably slow, but it works good enough for me.

import pandas as pd


length = 5

data = [1, 2, 3, 4, 5, 4, 3, 4, 5, 6, 7, 6, 5, 4, 5, 4, 3]
df = pd.DataFrame(data, columns=['number'])
df['helper_max'] = df.rolling(length).max()

for i in range(length, -1, -1):
    # Set the column to what you want. You may grab the index 
    # if you wish, I wanted number of rows since max happened
    df.loc[df['number'].shift(i) == df['helper_max'], 'n_rows_ago_since_max'] = i

print(df)

Output:

    number  helper_max  n_rows_ago_since_max
0        1         NaN                   NaN
1        2         NaN                   NaN
2        3         NaN                   NaN
3        4         NaN                   NaN
4        5         5.0                   0.0
5        4         5.0                   1.0
6        3         5.0                   2.0
7        4         5.0                   3.0
8        5         5.0                   0.0
9        6         6.0                   0.0
10       7         7.0                   0.0
11       6         7.0                   1.0
12       5         7.0                   2.0
13       4         7.0                   3.0
14       5         7.0                   4.0
15       4         6.0                   4.0
16       3         5.0                   2.0

You can also simulate the rolling window by creating a DataFrame and use idxmax as follows:

window_values = pd.DataFrame({0: s, 1: s.shift(), 2: s.shift(2)})
s.index[np.arange(len(s)) - window_values.idxmax(1)]

Index(['a', 'b', 'c', 'c', 'e', 'e', 'e', 'f', 'i', 'i'], dtype='object', name=0)

As you can see, the first two terms are the idxmax as applied to the initial windows of lengths 1 and 2 rather than null values. It's not as efficient as the accepted answer and probably not a good idea for large windows but just another perspective.

how do I calculate a rolling idxmax

Related

Recent Posts