Apply two different formulas based on the cell content of a dataframe
i have this dataframe named optiondf:
strike type settle openInterest volume
0 10000 Call 24'24 0 0
1 15000 Call 0 0 0
2 20000 Call 23'56 0 0
3 25000 Call 1 0 0
4 30000 Call 23'24 0 0
I would like to be able to do some operations on the "settle" column. I would like to filter based on whether or not the symbol " ' " is present in the value in the column.
In case contains " ' " putting this calculation as the value of the cell, which I have already verified, has no problems in calculating itself
pd.to_numeric(optiondf['settle'].str.split(pat="'").str[0])+pd.to_numeric(optiondf['settle'].str.split(pat="'").str[-1])*(1/64)
In case not contains " ' " I would just like to convert the value to a number, like this:
optiondf['settle']=pd.to_numeric(optiondf['settle'])
I have tried with this pandas feature but without success, I probably am doing something wrong:
optiondf['settle'].str.contains("'")=....
Solution 1:
You could construct a lambda that applies to different functions:
def func_a(x):
v = x.split("'")
return v[0] + v[1] / 64
optiondf['settle'] = optiondf['settle'].apply(lambda x: func_a(x) if "'" in x else pd.to_numeric(x))
Solution 2:
You could try to use apply()
.
It would be something like this:
def some_function(row):
value = row["settle"]
if "'" in row["settle"]:
row["settle"] = pd.to_numeric(row['settle'].str.split(pat="'").str[0])+pd.to_numeric(row['settle'].str.split(pat="'").str[-1])*(1/64)
else:
row["settle"] = pd.to_numeric(row['settle'])
return row
updated_df = optiondf.apply(somefunction, axis=1)
Solution 3:
You can use str.contains
to create a boolean mask and use it to filter the rows where there is '
and apply your calculation to them and finally convert the entire column to numeric type.
msk = df['settle'].str.contains("'")
temp = df.loc[msk, 'settle'].str.split("'")
df.loc[msk, 'settle'] = pd.to_numeric(temp.str[0]) + pd.to_numeric(temp.str[-1])*(1/64)
df['settle'] = pd.to_numeric(df['settle'])
or use np.where
instead of the last two lines in the above code. np.where
lets us assign different values depending on a condition (in our case it's whether a string contains '
or not). If True, manipulate the number, else do nothing. Then convert the entire column to numeric type.
msk = df['settle'].str.contains("'")
temp = df.loc[msk, 'settle'].str.split("'")
df['settle'] = pd.to_numeric(np.where(msk, pd.to_numeric(temp.str[0]) + pd.to_numeric(temp.str[-1])*(1/64), df['settle']))
Output:
strike type settle openInterest volume
0 10000 Call 24.375 0 0
1 15000 Call 0.000 0 0
2 20000 Call 23.875 0 0
3 25000 Call 1.000 0 0
4 30000 Call 23.375 0 0