Outlier removal techniques from an array
Solution 1:
You could use scipy's median_filter:
import pandas as pd
from matplotlib import pyplot as plt
from scipy.ndimage import median_filter
b = pd.read_csv("test.csv")
x = b.copy()
x.orig_w = median_filter(b.orig_w, size=15)
#Plot
plt.rcParams['figure.figsize'] = [10,8]
#Original
plt.plot(b.t,b.orig_w,'o',label='Original',alpha=0.8)
# After outlier removal
plt.plot(x.t,x.orig_w,'.',c='r',label='Outlier removed',alpha=0.8)
plt.legend()
plt.show()
Sample output:
Solution 2:
Since your data looks sinusoidal, it probably makes sense to perform your outliers removal technique by using a sliding window. You can compute median and standard deviation in the direct neighborhood of the points you are testing and check if it's an outlier by checking if your point is within a specified number of the standard deviation from your median. This method exists under the name of Hampel filter
(more details here and here). Below is a way to implement it with a window size equal to 50 samples on each side and a threshold based on 1.25 std:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df_orig=pd.read_csv('trial_data.csv')
def hampel_filter(df_orig, m = 1.25,win=50):
c=[]
k = 1.4826
for i in range(len(df_orig)):
med=np.median(df_orig['orig_w'][np.amax([0,i-win]):np.amin([len(df_orig['orig_w']),i+win])])
mad=np.std(np.abs(df_orig['orig_w'][np.amax([0,i-win]):np.amin([len(df_orig['orig_w']),i+win])]-med))
sigma=k*mad
if np.abs(med-df_orig['orig_w'][i])<m*sigma:
c.append(df_orig.loc[i])
return c
x = pd.DataFrame(hampel_filter(df_orig))
column = ['t','orig_w','filt_w','smt_w']
x.columns = column
#Plot
plt.rcParams['figure.figsize'] = [10,8]
plt.plot(df['t'],df['orig_w'],'o',label='Original',alpha=0.8) # Original
plt.plot(x.t,x.orig_w,'.',c='r',label='Outlier removed',alpha=0.8) # After outlier removal
plt.legend()
And the output gives:
You can then fine tune win
and m
to get a result that works for you.