How to deal with statsmodel.api OLS efficiency

Solution 1:

You could define a minimum acceptable sample length (either as a total or for one/all regressor columns) at your helper function, for instance:

def func_reg_err(df, yvar, xvar, alpha=True, min_samples=30):
    # Return NaNs only
    if len(df) < min_samples or (df.notnull().sum(1) < min_samples).any():
        return pd.Series(index=df.index)

    # Carry on with your regression
    y = df[yvar].copy()
    x = pd.DataFrame(df[xvar].copy())
    if alpha == True:
        x['intercept'] = 1.
    mod = sm.OLS(y,x, missing='drop')
    res = mod.fit()
    err = y - mod.predict(res.params, x)

    return err