How to make new pandas column based on multiple conditionals including 'isnull', 'or' and if colB 'isin' colA -like statements?
Solution 1:
When using axis=1, the applied function receives a single row as an argument. Indexing into the row gives you string objects, in most cases -- except where a NaN is encountered.
Numpy NaNs are actually floats. So when you attempt to perform string operations on the company column, like checking whether the unit_desc contains the company, this throws errors for rows that contain NaNs.
Numpy has a function isnan
, but calling this function on a string also throws an error. So any rows that have an actual company value will cause problems for that check.
You could check the type of the data with isinstance
, or you could just remove the NaNs from your data ahead of time.
This example removes the NaNs ahead of time.
badlist=["Generic", "Name"]
def merge(row):
if row['company'] in badlist:
return row['unit_desc']
if row['company'] in row['unit_desc']:
return row['unit_desc']
return f"{row['company']} {row['unit_desc']}".strip()
df['company'] = df['company'].fillna('')
df['comp_unit'] = df.apply(merge, axis=1)
print(df)
Here's an online runnable version.
Here's an alternative that safely detects the NaNs:
badlist=["Generic", "Name"]
def merge(row):
if isinstance(row['company'], float) and np.isnan(row['company']):
return row['unit_desc']
if row['company'] in badlist:
return row['unit_desc']
if row['company'] in row['unit_desc']:
return row['unit_desc']
return f"{row['company']} {row['unit_desc']}".strip()
df['comp_unit'] = df.apply(merge, axis=1)
print(df)
Here's an online runnable version.
Solution 2:
def my_func(dataframe, unit, comp, bad_info_list):
df = dataframe.copy()
df['comp_new'] = df[comp].apply(lambda x: x if x not in bad_info_list else np.nan)
idx = df[df.apply(lambda x: str(x['comp_new']) in str(x[unit]), axis=1) | df['comp_new'].isnull()].index
df['comp_unit'] = np.where(
df.index.isin(idx),
df[unit],
df['comp_new'] + ' ' + df[unit]
)
return df
new_df = my_func(df, 'unit_desc', 'company', ['Generic', 'Name'])