Difference between asfreq and resample
resample
is more general than asfreq
. For example, using resample
I can pass an arbitrary function to perform binning over a Series
or DataFrame
object in bins of arbitrary size. asfreq
is a concise way of changing the frequency of a DatetimeIndex
object. It also provides padding functionality.
As the pandas documentation says, asfreq
is a thin wrapper around a call to date_range
+ a call to reindex
. See here for an example.
An example of resample
that I use in my daily work is computing the number of spikes of a neuron in 1 second bins by resampling a large boolean array where True
means "spike" and False
means "no spike". I can do that as easy as large_bool.resample('S', how='sum')
. Kind of neat!
asfreq
can be used when you want to change a DatetimeIndex
to have a different frequency while retaining the same values at the current index.
Here's an example where they are equivalent:
In [6]: dr = date_range('1/1/2010', periods=3, freq=3 * datetools.bday)
In [7]: raw = randn(3)
In [8]: ts = Series(raw, index=dr)
In [9]: ts
Out[9]:
2010-01-01 -1.948
2010-01-06 0.112
2010-01-11 -0.117
Freq: 3B, dtype: float64
In [10]: ts.asfreq(datetools.BDay())
Out[10]:
2010-01-01 -1.948
2010-01-04 NaN
2010-01-05 NaN
2010-01-06 0.112
2010-01-07 NaN
2010-01-08 NaN
2010-01-11 -0.117
Freq: B, dtype: float64
In [11]: ts.resample(datetools.BDay())
Out[11]:
2010-01-01 -1.948
2010-01-04 NaN
2010-01-05 NaN
2010-01-06 0.112
2010-01-07 NaN
2010-01-08 NaN
2010-01-11 -0.117
Freq: B, dtype: float64
As far as when to use either: it depends on the problem you have in mind...care to share?
Let me use an example to illustrate:
# generate a series of 365 days
# index = 20190101, 20190102, ... 20191231
# values = [0,1,...364]
ts = pd.Series(range(365), index = pd.date_range(start='20190101',
end='20191231',
freq = 'D'))
ts.head()
output:
2019-01-01 0
2019-01-02 1
2019-01-03 2
2019-01-04 3
2019-01-05 4
Freq: D, dtype: int64
Now, resample the data by quarter:
ts.asfreq(freq='Q')
output:
2019-03-31 89
2019-06-30 180
2019-09-30 272
2019-12-31 364
Freq: Q-DEC, dtype: int64
The asfreq()
returns a Series
object with the last day of each quarter in it.
ts.resample('Q')
output:
DatetimeIndexResampler [freq=<QuarterEnd: startingMonth=12>, axis=0, closed=right, label=right, convention=start, base=0]
Resample returns a DatetimeIndexResampler
and you cannot see what's actually inside. Think of it as the groupby
method. It creates a list of bins
(groups):
bins = ts.resample('Q')
bin.groups
output:
{Timestamp('2019-03-31 00:00:00', freq='Q-DEC'): 90,
Timestamp('2019-06-30 00:00:00', freq='Q-DEC'): 181,
Timestamp('2019-09-30 00:00:00', freq='Q-DEC'): 273,
Timestamp('2019-12-31 00:00:00', freq='Q-DEC'): 365}
Nothing seems different so far except for the return type. Let's calculate the average of each quarter:
# (89+180+272+364)/4 = 226.25
ts.asfreq(freq='Q').mean()
output:
226.25
When mean()
is applied, it outputs the average of all the values. Note that this is not the average of each quarter, but the average of the last day of each quarter.
To calculate the average of each quarter:
ts.resample('Q').mean()
output:
2019-03-31 44.5
2019-06-30 135.0
2019-09-30 226.5
2019-12-31 318.5
You can perform more powerful operations with resample()
than asfreq()
.
Think of resample
as groupby
+ every method that you can call after groupby
(e.g. mean, sum, apply, you name it) .
Think of asfreq
as a filter mechanism with limited fillna()
capabilities (in fillna(), you can specify limit
, but asfreq() does not support it).