How to prepare training data (remove boundary values)
To remove outliers, you can use Series.quantile
:
Suppose the following dataframe:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(2022)
df = pd.DataFrame({'A': np.random.normal(5, 2, size=50)})
df.plot.hist(bins=25)
plt.xlim(0, 10)
plt.show()
Now filter out your dataframe:
df1 = df.loc[df['A'].between(*df['A'].quantile([0.25, 0.75]).values)]
df1.plot.hist(bins=10)
plt.xlim(0, 10)
plt.show()