How to make new pandas column based on multiple conditionals including 'isnull', 'or' and if colB 'isin' colA -like statements?

Solution 1:

When using axis=1, the applied function receives a single row as an argument. Indexing into the row gives you string objects, in most cases -- except where a NaN is encountered.

Numpy NaNs are actually floats. So when you attempt to perform string operations on the company column, like checking whether the unit_desc contains the company, this throws errors for rows that contain NaNs.

Numpy has a function isnan, but calling this function on a string also throws an error. So any rows that have an actual company value will cause problems for that check.

You could check the type of the data with isinstance, or you could just remove the NaNs from your data ahead of time.


This example removes the NaNs ahead of time.

badlist=["Generic", "Name"]

def merge(row):
    if row['company'] in badlist:
        return row['unit_desc']
    if row['company'] in row['unit_desc']:
        return row['unit_desc']
    return f"{row['company']} {row['unit_desc']}".strip()

df['company'] = df['company'].fillna('')
df['comp_unit'] = df.apply(merge, axis=1)
print(df)

Here's an online runnable version.


Here's an alternative that safely detects the NaNs:

badlist=["Generic", "Name"]

def merge(row):
    if isinstance(row['company'], float) and np.isnan(row['company']):
        return row['unit_desc']
    if row['company'] in badlist:
        return row['unit_desc']
    if row['company'] in row['unit_desc']:
        return row['unit_desc']
    return f"{row['company']} {row['unit_desc']}".strip()

df['comp_unit'] = df.apply(merge, axis=1)
print(df)

Here's an online runnable version.

Solution 2:

def my_func(dataframe, unit, comp, bad_info_list):

    df = dataframe.copy()

    df['comp_new'] = df[comp].apply(lambda x: x if x not in bad_info_list else np.nan)

    idx = df[df.apply(lambda x: str(x['comp_new']) in str(x[unit]), axis=1) | df['comp_new'].isnull()].index

    df['comp_unit'] = np.where(
        df.index.isin(idx),
        df[unit],
        df['comp_new'] + ' ' + df[unit]
    )

    return df

new_df = my_func(df, 'unit_desc', 'company', ['Generic', 'Name'])