Pandas efficiently interpolate sections of a larger dataframe
Solution 1:
It seems like it should be quicker to groupby
and then interpolate. Unfortunately, when I run your code I don't actually get the "filtered interpolated DF" that you list (perhaps you've left out some part of the interpolate where you specify that it should be 15 minute intervals?). You get a slight speedup if you use str.startswith
instead of str[:3]
:
%%timeit
for ccy in ccy_prefix:
df[df.index.str[:3]==ccy] = df[df.index.str[:3]==ccy].interpolate(limit_direction='forward')
% 10 loops, best of 5: 25.9 ms per loop
As opposed to:
%%timeit
for ccy in ccy_prefix:
df[df.index.str.startswith(ccy)] = df[df.index.str.startswith(ccy)].interpolate(limit_direction='forward')
% 10 loops, best of 5: 24.1 ms per loop
Perhaps a better solution is to create a new column with the currency prefixes and then groupby
and interpolate
, going from a comment provided here.
df['ccy_prefix'] = df.index.str[:3]
def interpolator(df):
return(df.interpolate(limit_direction='forward'))
Then this should be quickest of them all:
df = df.groupby('ccy_prefix').apply(interpolator)