Outlier removal techniques from an array

Solution 1:

You could use scipy's median_filter:

import pandas as pd
from matplotlib import pyplot as plt
from scipy.ndimage import median_filter

b = pd.read_csv("test.csv")

x = b.copy()
x.orig_w = median_filter(b.orig_w, size=15)

#Plot
plt.rcParams['figure.figsize'] = [10,8]
#Original
plt.plot(b.t,b.orig_w,'o',label='Original',alpha=0.8) 
# After outlier removal
plt.plot(x.t,x.orig_w,'.',c='r',label='Outlier removed',alpha=0.8) 
plt.legend()
plt.show()

Sample output: enter image description here

Solution 2:

Since your data looks sinusoidal, it probably makes sense to perform your outliers removal technique by using a sliding window. You can compute median and standard deviation in the direct neighborhood of the points you are testing and check if it's an outlier by checking if your point is within a specified number of the standard deviation from your median. This method exists under the name of Hampel filter (more details here and here). Below is a way to implement it with a window size equal to 50 samples on each side and a threshold based on 1.25 std:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df_orig=pd.read_csv('trial_data.csv')

def hampel_filter(df_orig, m = 1.25,win=50):
  c=[]
  k = 1.4826 
  for i in range(len(df_orig)):

      med=np.median(df_orig['orig_w'][np.amax([0,i-win]):np.amin([len(df_orig['orig_w']),i+win])])
      mad=np.std(np.abs(df_orig['orig_w'][np.amax([0,i-win]):np.amin([len(df_orig['orig_w']),i+win])]-med))
      sigma=k*mad
      
      if np.abs(med-df_orig['orig_w'][i])<m*sigma:
          c.append(df_orig.loc[i])            
  return c

x = pd.DataFrame(hampel_filter(df_orig))
column = ['t','orig_w','filt_w','smt_w']
x.columns = column

#Plot
plt.rcParams['figure.figsize'] = [10,8]
plt.plot(df['t'],df['orig_w'],'o',label='Original',alpha=0.8) # Original
plt.plot(x.t,x.orig_w,'.',c='r',label='Outlier removed',alpha=0.8) # After outlier removal
plt.legend()

And the output gives:

enter image description here

You can then fine tune win and m to get a result that works for you.

Outlier removal techniques from an array

Solution 1:

Solution 2:

Related

Recent Posts