Replacing values greater 1 in a large pandas dataframe

Solution 1:

You can try this.

Using boolean mask

df.set_index('Report No',inplace=True)
df[df > 1] = 1
df.reset_index()

Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0

Or use this if you have some non numeric columns. No need to use set_index and reset_index. This is equivalent to df.select_dtypes('number')

val = df._get_numeric_data()
val[val > 1] = 1
df
Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0

df.mask

df.set_index('Report No',inplace=True)
df.mask(df>1,1).reset_index()
Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0

np.where

df[df.columns[1:]] = df.iloc[:,1:].where(df.iloc[:,1:] >1 ,1)

np.select

This can be helpful when dealing with multiple conditions. If you want to convert values less than 0 to 0 and values greater than 1 to 1.

df.set_index('Report No', inplace=True)
condlist = [df >= 1, df <= 0] #you can have more conditions and add choices accordingly.
choice = [1, 0] #len(condlist) should be equal to len(choice).
df.loc[:] = np.select(condlist, choice)

Like Jan mentioned use df.clip


Not recommended but you can try this for fun. Using df.astype.

df.set_index('Report No',inplace=True)
df.astype('bool').astype('int')

NOTE: This will only convert falsy values to False and truthy values to True i.e. this will convert 0 to False and anything other than 0 is True even negative numbers.

s = pd.Series([1,-1,0])
s.astype('bool')
0     True
1     True
2    False
dtype: bool

s.astype('bool').astype('int')
0    1
1    1
2    0
dtype: int32

np.sign

When values present are between [0, n] i.e no negative values.

df.loc[:] = np.sign(df)

Solution 2:

Use pandas.DataFrame.clip:

new_df = df.clip(0, 1)

EDIT: To exclude the first column by name (this will edit the DataFrame in-place)

mask = df.columns != "Report No"
df.loc[:, mask] = df.loc[:, mask].clip(0, 1)