Fill in series to mirror a column if another column contains missing values in Python

I have a dataset where I would like to fill in a series to mirror a column if another column contains missing values in Python

Data

index1  id  stat    id_A            id_B    
1       aa  y       aa_Q1.22_1      aa_Q1.22_1
1       aa  y       aa_Q2.22_2      aa_Q2.22_2
1       bb  n       bb_Q3.22_3      bb_Q3.22_3
12      bb  n       bb_Q4.22_4      bb_Q4.22_4
2                   cc_Q3.22_4
2                   cc_Q4.22_5
2                   dd_Q3.22_5
2                   ee_Q4.22_6
2                   ee_Q3.22_6
2                   ee_Q4.22_7

Desired

index1  id  stat    id_A            id_B    
1       aa  y       aa_Q1.22_1      aa_Q1.22_1
1       aa  y       aa_Q2.22_2      aa_Q2.22_2
1       bb  n       bb_Q3.22_3      bb_Q3.22_3
12      bb  n       bb_Q4.22_4      bb_Q4.22_4
2                   cc_Q3.22_4      cc_Q3.22_4
2                   cc_Q4.22_5      cc_Q4.22_5
2                   dd_Q3.22_5      dd_Q3.22_5
2                   ee_Q4.22_6      ee_Q4.22_6
2                   ee_Q3.22_6      ee_Q3.22_6
2                   ee_Q4.22_7      ee_Q4.22_7

I would like to fill in column 'id_B' with the exact values from 'id_A' if 'id_B' is blank.

Doing

I am thinking I should you the .apply method-

df['id_A'] = df.apply(
    lambda row: row[''].row[''] if np.isnan,
    axis=1
)

Any suggestion is appreciated. Still researching.

Another option is to use np.where. Simply choose from "id_A" or "id_B" depending on if df['id_B']== '' is True or False:

df['id_B'] = np.where(df['id_B']== '', df['id_A'], df['id_B'])

Output:

   index1  id stat        id_A        id_B
0       1  aa    y  aa_Q1.22_1  aa_Q1.22_1
1       1  aa    y  aa_Q2.22_2  aa_Q2.22_2
2       1  bb    n  bb_Q3.22_3  bb_Q3.22_3
3      12  bb    n  bb_Q4.22_4  bb_Q4.22_4
4       2           cc_Q3.22_4  cc_Q3.22_4
5       2           cc_Q4.22_5  cc_Q4.22_5
6       2           dd_Q3.22_5  dd_Q3.22_5
7       2           ee_Q4.22_6  ee_Q4.22_6
8       2           ee_Q3.22_6  ee_Q3.22_6
9       2           ee_Q4.22_7  ee_Q4.22_7

An alternative is to use mask and combine_first:

df['id_B'] = df['id_B'].mask(df['id_B'] == '').combine_first(df['id_A'])
print(df)

# Output
   index1  id stat        id_A        id_B
0       1  aa    y  aa_Q1.22_1  aa_Q1.22_1
1       1  aa    y  aa_Q2.22_2  aa_Q2.22_2
2       1  bb    n  bb_Q3.22_3  bb_Q3.22_3
3      12  bb    n  bb_Q4.22_4  bb_Q4.22_4
4       2           cc_Q3.22_4  cc_Q3.22_4
5       2           cc_Q4.22_5  cc_Q4.22_5
6       2           dd_Q3.22_5  dd_Q3.22_5
7       2           ee_Q4.22_6  ee_Q4.22_6
8       2           ee_Q3.22_6  ee_Q3.22_6
9       2           ee_Q4.22_7  ee_Q4.22_7

Setup:

data = {'index1': [1, 1, 1, 12, 2, 2, 2, 2, 2, 2],
        'id': ['aa', 'aa', 'bb', 'bb', '', '', '', '', '', ''],
        'stat': ['y', 'y', 'n', 'n', '', '', '', '', '', ''],
        'id_A': ['aa_Q1.22_1', 'aa_Q2.22_2', 'bb_Q3.22_3', 'bb_Q4.22_4',
                 'cc_Q3.22_4', 'cc_Q4.22_5', 'dd_Q3.22_5', 'ee_Q4.22_6', 
                 'ee_Q3.22_6', 'ee_Q4.22_7'],
        'id_B': ['aa_Q1.22_1', 'aa_Q2.22_2', 'bb_Q3.22_3', 'bb_Q4.22_4',
                 '', '', '', '', '', '']}
df = pd.DataFrame(df)
print(df)

# Output
   index1  id stat        id_A        id_B
0       1  aa    y  aa_Q1.22_1  aa_Q1.22_1
1       1  aa    y  aa_Q2.22_2  aa_Q2.22_2
2       1  bb    n  bb_Q3.22_3  bb_Q3.22_3
3      12  bb    n  bb_Q4.22_4  bb_Q4.22_4
4       2           cc_Q3.22_4            
5       2           cc_Q4.22_5            
6       2           dd_Q3.22_5            
7       2           ee_Q4.22_6            
8       2           ee_Q3.22_6            
9       2           ee_Q4.22_7

Replace empty strings to missing values and then use Series.fillna:

df['id_B'] = df['id_B'].replace('',np.nan).fillna(df['id_A'])

Or use DataFrame.loc:

df.loc[df['id_B'].eq(''), 'id_B'] = df['id_A']

If empty strings are in real data missing values or None like Nonetype:

df['id_B'] = df['id_B'].fillna(df['id_A'])

Or:

df.loc[df['id_B'].isna(), 'id_B'] = df['id_A']

Fill in series to mirror a column if another column contains missing values in Python

Related

Recent Posts