How to create new column in dataframe based on condition from other dataframe?
df1 = pd.DataFrame({"DEPTH":[0.5, 1, 1.5, 2, 2.5],
"POROSITY":[10, 22, 15, 30, 20],
"WELL":"well 1"})
df2 = pd.DataFrame({"Well":"well 1",
"Marker":["Fm 1","Fm 2"],
"Depth":[0.7, 1.7]})
Hello everyone. I have two dataframes and I would like to create a new column on df1, for example: df1["FORMATIONS"], with information from df2["Marker"] values based on depth limits from df2["Depth"] and df1["DEPTH"].
So, for example, if df2["Depth"] = 1.7, then all samples in df1 with df1["DEPTH"] > 1.7 should be labelled as "Fm 2" in this new column df1["FORMATIONS"].
And the final dataframe df1 should look like this:
DEPTH POROSITY WELL FORMATIONS
0.5 10 well 1 nan
1 22 well 1 Fm 1
1.5 15 well 1 Fm 1
2 30 well 1 Fm 2
2.5 20 well 1 Fm 2
Anyone could help me?
Solution 1:
What you're doing here is transforming continuous data into categorical data. There are many ways to do this with pandas
, but one of the better known ways is using pandas.cut
.
When specifying the bins
argument, you need to add float(inf)
to the end of the list, to represent that the last bin goes to infinity.
df1["FORMATIONS"] = pd.cut(df1.DEPTH, list(df2.Depth) + [float('inf')], labels=df2.Marker)
df1
will now be:
Solution 2:
Use pandas.merge_asof
:
NB. the columns used for the merge need to be sorted first
pd.merge_asof(df1,
df2[['Marker', 'Depth']].rename(columns={'Marker': 'Formations'}),
left_on='DEPTH', right_on='Depth')
output:
DEPTH POROSITY WELL Formations Depth
0 0.5 10 well 1 NaN NaN
1 1.0 22 well 1 Fm 1 0.7
2 1.5 15 well 1 Fm 1 0.7
3 2.0 30 well 1 Fm 2 1.7
4 2.5 20 well 1 Fm 2 1.7