Pandas every nth row

Dataframe.resample() works only with timeseries data. I cannot find a way of getting every nth row from non-timeseries data. What is the best method?

I'd use iloc, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:

df.iloc[::5, :]

Though @chrisb's accepted answer does answer the question, I would like to add to it the following.

A simple method I use to get the nth data or drop the nth row is the following:

df1 = df[df.index % 3 != 0]  # Excludes every 3rd row starting from 0
df2 = df[df.index % 3 == 0]  # Selects every 3rd raw starting from 0

This arithmetic based sampling has the ability to enable even more complex row-selections.

This assumes, of course, that you have an index column of ordered, consecutive, integers starting at 0.

There is an even simpler solution to the accepted answer that involves directly invoking df.__getitem__.

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

For example, to get every 2 rows, you can do

df[::2]

   a  b  c
0  x  x  x
2  x  x  x
4  x  x  x

There's also GroupBy.first/GroupBy.head, you group on the index:

df.index // 2
# Int64Index([0, 0, 1, 1, 2], dtype='int64')

df.groupby(df.index // 2).first()
# Alternatively,
# df.groupby(df.index // 2).head(1)

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x

The index is floor-divved by the stride (2, in this case). If the index is non-numeric, instead do

# df.groupby(np.arange(len(df)) // 2).first()
df.groupby(pd.RangeIndex(len(df)) // 2).first()

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x

Pandas every nth row

Related

Recent Posts