Map values and update a Pandas DataFrame

I have a dataframe that looks like this

df1 = pd.DataFrame({'casting_id': ['V4CA', 'V4CA', 'V4CB', 'V4CL', 'V4CD'],
                    'ele_1': [3.232, 3.208, 3.178, 3.187, 3.344],
                    'ele_2': [2.06, 1.926, 1.87, 1.947, 1.816],
                    'ele_3': [0.717, 0.714, 0.7, 0.72, 0.666],
                    'ele_4': [0.089, 0.087, 0.084, 0.085, 0.069],
                    'ele_5': [0.03, 0.042, 0.043, 0.039, 0.072],
                    'ele_6': [0.958, 0.959, 0.964, 0.968, 0.132],
                    'ele_7': [0.257, 0.266, 0.272, 0.301, 0.186],
                    'ele_8': [0.025, 0.033, 0.032, 0.028, 0.062]})
casting_id ele_1 ele_2 ele_3 ele_4 ele_5 ele_6 ele_7 ele_8
0 V4CA 3.232 2.06 0.717 0.089 0.03 0.958 0.257 0.025
1 V4CA 3.208 1.926 0.714 0.087 0.042 0.959 0.266 0.033
2 V4CB 3.178 1.87 0.7 0.084 0.043 0.964 0.272 0.032
3 V4CL 3.187 1.947 0.72 0.085 0.039 0.968 0.301 0.028
4 V4CD 3.344 1.816 0.666 0.069 0.072 0.132 0.186 0.062

There is another dataframe which has a standard for each element against the casting_id

df2 = pd.DataFrame({'id': [1, 2, 3, 4, 5],
                    'ele_id': ['ele_1', 'ele_2', 'ele_3', 'ele_4', 'ele_5'],
                    'max': [0.8, 0.8, 0.5, 0.3, 0.2],
                    'casting_id': ['V4CA', 'V4CB', 'V4CC', 'V4CD', 'V4CE'],
                    'min': [0.5, 0.3, 0.1, 0.1, 0.0]})
id ele_id max casting_id min
1 ele_1 0.8 V4CA 0.5
2 ele_2 0.8 V4CB 0.3
3 ele_3 0.5 V4CC 0.1
4 ele_4 0.3 V4CD 0.1
5 ele_5 0.2 V4CE 0

I want to form a dataframe that uses the mapping and add the color value in each cell.

For example: For V4CA the min value is 0.5 and max is 0.8 for ele_1. In main Dataframe, the first row has ele_1 above max, so its value should change to [3.232, 'Red']

Value above max = red, below min = green, between min and max = black.

I have tried something like this:

def func(num, casting_id,ele_id):  
    try:
   
        # print(ele_id)
        if casting_id in [None]:
            return [num, "black"]
        rec = min_max_df.loc[(min_max_df["casting_id"]==casting_id) & (min_max_df["ele_id"]==ele_id)]
        if rec.empty:
            return [num, "black"]
        _min = rec["min"][0]
        _max = rec["max"][0]
        color = "black" if _min<num<_max else "green" if num>_max else "red"
        return [num, color]
    except Exception as e:
        return num

And tried df.apply(lambda x: x.apply(func,args=(x.casting_id, x.name, list(x.index))), axis=1)

But I am not getting values as expected.


Solution 1:

Use melt to flatten your first dataframe then merge with the second one. Finally, process color with select and pivot your new dataframe:

# Step 1: flat and merge
out = df1.melt('casting_id', var_name='ele_id', ignore_index=False).reset_index() \
         .merge(df2, on=['casting_id', 'ele_id'], how='left')

# Step 2a: compute color
out['color'] = np.select([out['value'] < out['min'], out['value'] > out['max']], 
                         choicelist=['green', 'red'], default='black')

# Step 2b: create a list from value and color
out['value'] = out[['value', 'color']].apply(list, axis=1)

# Step 3: reshape to your original dataframe
out = out.pivot(['index', 'casting_id'], 'ele_id', 'value') \
         .reset_index('casting_id').rename_axis(index=None, columns=None)

Output:

>>> out
  casting_id           ele_1           ele_2           ele_3           ele_4           ele_5           ele_6           ele_7           ele_8
0       V4CA    [3.232, red]   [2.06, black]  [0.717, black]  [0.089, black]   [0.03, black]  [0.958, black]  [0.257, black]  [0.025, black]
1       V4CA    [3.208, red]  [1.926, black]  [0.714, black]  [0.087, black]  [0.042, black]  [0.959, black]  [0.266, black]  [0.033, black]
2       V4CB  [3.178, black]     [1.87, red]    [0.7, black]  [0.084, black]  [0.043, black]  [0.964, black]  [0.272, black]  [0.032, black]
3       V4CL  [3.187, black]  [1.947, black]   [0.72, black]  [0.085, black]  [0.039, black]  [0.968, black]  [0.301, black]  [0.028, black]
4       V4CD  [3.344, black]  [1.816, black]  [0.666, black]  [0.069, green]  [0.072, black]  [0.132, black]  [0.186, black]  [0.062, black]

Setup:

df1 = pd.DataFrame({'casting_id': ['V4CA', 'V4CA', 'V4CB', 'V4CL', 'V4CD'],
                    'ele_1': [3.232, 3.208, 3.178, 3.187, 3.344],
                    'ele_2': [2.06, 1.926, 1.87, 1.947, 1.816],
                    'ele_3': [0.717, 0.714, 0.7, 0.72, 0.666],
                    'ele_4': [0.089, 0.087, 0.084, 0.085, 0.069],
                    'ele_5': [0.03, 0.042, 0.043, 0.039, 0.072],
                    'ele_6': [0.958, 0.959, 0.964, 0.968, 0.132],
                    'ele_7': [0.257, 0.266, 0.272, 0.301, 0.186],
                    'ele_8': [0.025, 0.033, 0.032, 0.028, 0.062]})

df2 = pd.DataFrame({'id': [1, 2, 3, 4, 5],
                    'ele_id': ['ele_1', 'ele_2', 'ele_3', 'ele_4', 'ele_5'],
                    'max': [0.8, 0.8, 0.5, 0.3, 0.2],
                    'casting_id': ['V4CA', 'V4CB', 'V4CC', 'V4CD', 'V4CE'],
                    'min': [0.5, 0.3, 0.1, 0.1, 0.0]})

Solution 2:

Given:

import pandas as pd

df = pd.DataFrame(
    columns=['casting_id', 'ele_1', 'ele_2', 'ele_3', 'ele_4', 'ele_5', 'ele_6', 'ele_7', 'ele_8'],
    data=[
        ['V4CA', 3.232, 2.06, 0.717, 0.089, 0.03, 0.958, 0.257, 0.025],
        ['V4CA', 3.208, 1.926, 0.714, 0.087, 0.042, 0.959, 0.266, 0.033],
        ['V4CB', 3.178, 1.87, 0.7, 0.084, 0.043, 0.964, 0.272, 0.032],
        ['V4CL', 3.187, 1.947, 0.72, 0.085, 0.039, 0.968, 0.301, 0.028],
        ['V4CD', 3.344, 1.816, 0.666, 0.069, 0.072, 0.132, 0.186, 0.062],
    ]
)

min_max_df = pd.DataFrame(
    columns=['id', 'ele_id', 'max', 'casting_id', 'min'],
    data=[
        [1, 'ele_1', 0.8, 'V4CA', 0.5],
        [2, 'ele_2', 0.8, 'V4CB', 0.3],
        [3, 'ele_3', 0.5, 'V4CC', 0.1],
        [4, 'ele_4', 0.3, 'V4CD', 0.1],
        [5, 'ele_5', 0.2, 'V4CE', 0],
        ]
)

you can solve with a one-liner:

pd.DataFrame(
    data=[
        list(row[:3]) + [[row[3], "black" if pd.isna(row[4]) or row[6] <= row[3] <= row[5] else ['green', 'red'][row[3] > row[5]]]]
        for row in df.reset_index().melt(
            id_vars=["casting_id", 'index'],
            var_name="ele_id",
            value_name="value"
        ).merge(
            min_max_df, left_on=["casting_id", "ele_id"], right_on=["casting_id", "ele_id"], how='left',
        ).itertuples(name=None, index=False)
    ],
    columns=["casting_id", 'index', "ele_id", 'value'],
).pivot(
    values='value', index=["casting_id", 'index'], columns='ele_id'
).reset_index().sort_values('index').drop(columns=['index'])

This is what you get:

       casting_id           ele_1           ele_2           ele_3           ele_4           ele_5           ele_6           ele_7           ele_8
0            V4CA    [3.232, red]   [2.06, black]  [0.717, black]  [0.089, black]   [0.03, black]  [0.958, black]  [0.257, black]  [0.025, black]
1            V4CA    [3.208, red]  [1.926, black]  [0.714, black]  [0.087, black]  [0.042, black]  [0.959, black]  [0.266, black]  [0.033, black]
2            V4CB  [3.178, black]     [1.87, red]    [0.7, black]  [0.084, black]  [0.043, black]  [0.964, black]  [0.272, black]  [0.032, black]
4            V4CL  [3.187, black]  [1.947, black]   [0.72, black]  [0.085, black]  [0.039, black]  [0.968, black]  [0.301, black]  [0.028, black]
3            V4CD  [3.344, black]  [1.816, black]  [0.666, black]  [0.069, green]  [0.072, black]  [0.132, black]  [0.186, black]  [0.062, black]