How to remove blanks/NA's from dataframe and shift the values up

Solution 1:

You can use apply with dropna:

np.random.seed(100)
df = pd.DataFrame(np.random.randn(5,4))
df.iloc[1,2] = np.NaN
df.iloc[0,1] = np.NaN
df.iloc[2,1] = np.NaN
df.iloc[2,0] = np.NaN
print (df)
          0         1         2         3
0 -1.749765       NaN  1.153036 -0.252436
1  0.981321  0.514219       NaN -1.070043
2       NaN       NaN -0.458027  0.435163
3 -0.583595  0.816847  0.672721 -0.104411
4 -0.531280  1.029733 -0.438136 -1.118318

df1 = df.apply(lambda x: pd.Series(x.dropna().values))
print (df1)
          0         1         2         3
0 -1.749765  0.514219  1.153036 -0.252436
1  0.981321  0.816847 -0.458027 -1.070043
2 -0.583595  1.029733  0.672721  0.435163
3 -0.531280       NaN -0.438136 -0.104411
4       NaN       NaN       NaN -1.118318

And then if need replace to empty space, what create mixed values - strings with numeric - some functions can be broken:

df1 = df.apply(lambda x: pd.Series(x.dropna().values)).fillna('')
print (df1)
          0         1         2         3
0  -1.74977  0.514219   1.15304 -0.252436
1  0.981321  0.816847 -0.458027 -1.070043
2 -0.583595   1.02973  0.672721  0.435163
3  -0.53128           -0.438136 -0.104411
4                               -1.118318

Solution 2:

A numpy approach
The idea is to sort the columns by np.isnan so that np.nans are put last. I use kind='mergesort' to preserve the order within non np.nan. Finally, I slice the array and reassign it. I follow this up with a fillna

v = df.values
i = np.arange(v.shape[1])
a = np.isnan(v).argsort(0, kind='mergesort')
v[:] = v[a, i]
print(df.fillna(''))

          0         1         2         3
0   1.85748 -0.540645 -0.462941 -0.600606
1  0.000267  0.036393 -0.803889  0.492480
2  0.566922 -0.221294  -1.58493  0.527973
3 -0.243182             1.40348  2.278294
4                                1.574097

If you didn't want to alter the dataframe in place

v = df.values
i = np.arange(v.shape[1])
a = np.isnan(v).argsort(0, kind='mergesort')
pd.DataFrame(v[a, i], df.index, df.columns).fillna('')

The point of this is to leverage numpys quickness

naive time test

enter image description here