Does Pandas calculate ewm wrong?
When trying to calculate the exponential moving average (EMA) from financial data in a dataframe it seems that Pandas' ewm approach is incorrect.
The basics are well explained in the following link: http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:moving_averages
When going to Pandas explanation, the approach taken is as follows (using the "adjust" parameter as False):
weighted_average[0] = arg[0];
weighted_average[i] = (1-alpha) * weighted_average[i-1] + alpha * arg[i]
This in my view is incorrect. The "arg" should be (for example) the closing values, however, arg[0] is the first average (i.e. the simple average of the first series of data of the length of the period selected), but NOT the first closing value. arg[0] and arg[i] can therefore never be from the same data. Using the "min_periods" parameter does not seem to resolve this.
Can anyone explain me how (or if) Pandas can be used to properly calculate the EMA of data?
There are several ways to initialize an exponential moving average, so I wouldn't say pandas is doing it wrong, just different.
Here would be a way to calculate it like you want:
In [20]: s.head()
Out[20]:
0 22.27
1 22.19
2 22.08
3 22.17
4 22.18
Name: Price, dtype: float64
In [21]: span = 10
In [22]: sma = s.rolling(window=span, min_periods=span).mean()[:span]
In [24]: rest = s[span:]
In [25]: pd.concat([sma, rest]).ewm(span=span, adjust=False).mean()
Out[25]:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 22.221000
10 22.208091
11 22.241165
12 22.266408
13 22.328879
14 22.516356
15 22.795200
16 22.968800
17 23.125382
18 23.275312
19 23.339801
20 23.427110
21 23.507635
22 23.533520
23 23.471062
24 23.403596
25 23.390215
26 23.261085
27 23.231797
28 23.080561
29 22.915004
Name: Price, dtype: float64
You can compute EWMA using alpha or coefficient (span
) in Pandas ewm
function.
Formula for using alpha: (1 - alpha) * previous_val + alpha * current_val
where alpha = 1 / period
Formula for using coeff: ((current_val - previous_val) * coeff) + previous_val
where coeff = 2 / (period + 1)
Here is how you can use Pandas for computing above formulas:
con = pd.concat([df[:period][base].rolling(window=period).mean(), df[period:][base]])
if (alpha == True):
df[target] = con.ewm(alpha=1 / period, adjust=False).mean()
else:
df[target] = con.ewm(span=period, adjust=False).mean()
Here's an example of how Pandas calculates both adjusted and non-adjusted ewm:
name = 'closing'
series = pd.Series([1, 2, 3, 5, 8, 13, 21, 34], name=name).to_frame()
period = 4
alpha = 2/(1+period)
series[name+'_ewma'] = np.nan
series.loc[0, name+'_ewma'] = series[name].iloc[0]
series[name+'_ewma_adjust'] = np.nan
series.loc[0, name+'_ewma_adjust'] = series[name].iloc[0]
for i in range(1, len(series)):
series.loc[i, name+'_ewma'] = (1-alpha) * series.loc[i-1, name+'_ewma'] + alpha * series.loc[i, name]
ajusted_weights = np.array([(1-alpha)**(i-t) for t in range(i+1)])
series.loc[i, name+'_ewma_adjust'] = np.sum(series.iloc[0:i+1][name].values * ajusted_weights) / ajusted_weights.sum()
print(series)
print("diff adjusted=False -> ", np.sum(series[name+'_ewma'] - series[name].ewm(span=period, adjust=False).mean()))
print("diff adjusted=True -> ", np.sum(series[name+'_ewma_adjust'] - series[name].ewm(span=period, adjust=True).mean()))
Mathematical formula can be found at https://github.com/pandas-dev/pandas/issues/8861