Pandas efficiently interpolate sections of a larger dataframe

Solution 1:

It seems like it should be quicker to groupby and then interpolate. Unfortunately, when I run your code I don't actually get the "filtered interpolated DF" that you list (perhaps you've left out some part of the interpolate where you specify that it should be 15 minute intervals?). You get a slight speedup if you use str.startswith instead of str[:3]:

%%timeit
for ccy in ccy_prefix:
  df[df.index.str[:3]==ccy] = df[df.index.str[:3]==ccy].interpolate(limit_direction='forward')
% 10 loops, best of 5: 25.9 ms per loop

As opposed to:

%%timeit
for ccy in ccy_prefix:
  df[df.index.str.startswith(ccy)] = df[df.index.str.startswith(ccy)].interpolate(limit_direction='forward')
% 10 loops, best of 5: 24.1 ms per loop

Perhaps a better solution is to create a new column with the currency prefixes and then groupby and interpolate, going from a comment provided here.

df['ccy_prefix'] = df.index.str[:3]

def interpolator(df):
  return(df.interpolate(limit_direction='forward'))

Then this should be quickest of them all:

df = df.groupby('ccy_prefix').apply(interpolator)

Pandas efficiently interpolate sections of a larger dataframe

Solution 1:

Related

Recent Posts