How to deal with statsmodel.api OLS efficiency
Solution 1:
You could define a minimum acceptable sample length (either as a total or for one/all regressor columns) at your helper function, for instance:
def func_reg_err(df, yvar, xvar, alpha=True, min_samples=30):
# Return NaNs only
if len(df) < min_samples or (df.notnull().sum(1) < min_samples).any():
return pd.Series(index=df.index)
# Carry on with your regression
y = df[yvar].copy()
x = pd.DataFrame(df[xvar].copy())
if alpha == True:
x['intercept'] = 1.
mod = sm.OLS(y,x, missing='drop')
res = mod.fit()
err = y - mod.predict(res.params, x)
return err