Move non-empty cells to the left in pandas DataFrame

Suppose I have data of the form

Name    h1    h2    h3    h4
A       1     nan   2     3
B       nan   nan   1     3
C       1     3     2     nan

I want to move all non-nan cells to the left (or collect all non-nan data in new columns) while preserving the order from left to right, getting

Name    h1    h2    h3    h4
A       1     2     3     nan
B       1     3     nan   nan
C       1     3     2     nan

I can of course do so row by row. But I hope to know if there are other ways with better performance.


First, make function.

        def squeeze_nan(x):
            original_columns = x.index.tolist()

            squeezed = x.dropna()
            squeezed.index = [original_columns[n] for n in range(squeezed.count())]

            return squeezed.reindex(original_columns, fill_value=np.nan)

Second, apply the function.

df.apply(squeeze_nan, axis=1)

You can also try axis=0 and .[::-1] to squeeze nan to any direction.

[EDIT]

@Mxracer888 you want this?

def squeeze_nan(x, hold):
    if x.name not in hold:
        original_columns = x.index.tolist()

        squeezed = x.dropna()
        squeezed.index = [original_columns[n] for n in range(squeezed.count())]

        return squeezed.reindex(original_columns, fill_value=np.nan)
    else:
        return x

df.apply(lambda x: squeeze_nan(x, ['B']), axis=1)

enter image description here


Here's what I did:

I unstacked your dataframe into a longer format, then grouped by the name column. Within each group, I drop the NaNs, but then reindex to the full h1 thought h4 set, thus re-creating your NaNs to the right.

from io import StringIO
import pandas

def defragment(x):
    values = x.dropna().values
    return pandas.Series(values, index=df.columns[:len(values)])

datastring = StringIO("""\
Name    h1    h2    h3    h4
A       1     nan   2     3
B       nan   nan   1     3
C       1     3     2     nan""")

df = pandas.read_table(datastring, sep='\s+').set_index('Name')
long_index = pandas.MultiIndex.from_product([df.index, df.columns])

print(
    df.stack()
      .groupby(level='Name')
      .apply(defragment)
      .reindex(long_index)  
      .unstack()  
)

And so I get:

   h1  h2  h3  h4
A   1   2   3 NaN
B   1   3 NaN NaN
C   1   3   2 NaN

Here's how you could do it with a regex (possibly not recommended):

pd.read_csv(StringIO(re.sub(',+',',',df.to_csv())))
Out[20]: 
  Name  h1  h2  h3  h4
0    A   1   2   3 NaN
1    B   1   3 NaN NaN
2    C   1   3   2 NaN

First, create a boolean array using np.isnan this would mark NaN as True and non-nan values as False then argsort them, this way you will maintain the order of non-nan values and NaN are pushed to the right.

idx = np.isnan(df.values).argsort(axis=1)
df = pd.DataFrame(
    df.values[np.arange(df.shape[0])[:, None], idx],
    index=df.index,
    columns=df.columns,
)

       h1   h2   h3  h4
Name
A     1.0  2.0  3.0 NaN
B     1.0  3.0  NaN NaN
C     1.0  3.0  2.0 NaN

Details

np.isnan(df.values)
# array([[False,  True, False, False],
#        [ True,  True, False, False],
#        [False, False, False,  True]])

# False ⟶ 0 True ⟶ 1
# When sorted all True values i.e nan are pushed to the right.

idx = np.isnan(df.values).argsort(axis=1)
# array([[0, 2, 3, 1],
#        [2, 3, 0, 1],
#        [0, 1, 2, 3]], dtype=int64)

# Now, indexing `df.values` using `idx`
df.values[np.arange(df.shape[0])[:, None], idx]
# array([[ 1.,  2.,  3., nan],
#        [ 1.,  3., nan, nan],
#        [ 1.,  3.,  2., nan]])

# Make that as a DataFrame
df = pd.DataFrame(
    df.values[np.arange(df.shape[0])[:, None], idx],
    index=df.index,
    columns=df.columns,
)

#        h1   h2   h3  h4
# Name
# A     1.0  2.0  3.0 NaN
# B     1.0  3.0  NaN NaN
# C     1.0  3.0  2.0 NaN