Apply two different formulas based on the cell content of a dataframe

i have this dataframe named optiondf:

     strike  type settle openInterest volume
0     10000  Call  24'24        0      0
1     15000  Call  0            0      0
2     20000  Call  23'56        0      0
3     25000  Call  1            0      0
4     30000  Call  23'24        0      0

I would like to be able to do some operations on the "settle" column. I would like to filter based on whether or not the symbol " ' " is present in the value in the column.

In case contains " ' " putting this calculation as the value of the cell, which I have already verified, has no problems in calculating itself

pd.to_numeric(optiondf['settle'].str.split(pat="'").str[0])+pd.to_numeric(optiondf['settle'].str.split(pat="'").str[-1])*(1/64)

In case not contains " ' " I would just like to convert the value to a number, like this:

optiondf['settle']=pd.to_numeric(optiondf['settle'])

I have tried with this pandas feature but without success, I probably am doing something wrong:

optiondf['settle'].str.contains("'")=....

Solution 1:

You could construct a lambda that applies to different functions:

def func_a(x):
    v = x.split("'")
    return v[0] + v[1] / 64

optiondf['settle'] = optiondf['settle'].apply(lambda x: func_a(x) if "'" in x else pd.to_numeric(x))

Solution 2:

You could try to use apply().

It would be something like this:

def some_function(row):
   value = row["settle"]
   if "'" in row["settle"]:
       row["settle"] = pd.to_numeric(row['settle'].str.split(pat="'").str[0])+pd.to_numeric(row['settle'].str.split(pat="'").str[-1])*(1/64)
   else:
       row["settle"] = pd.to_numeric(row['settle'])
   return row


updated_df = optiondf.apply(somefunction, axis=1)

Solution 3:

You can use str.contains to create a boolean mask and use it to filter the rows where there is ' and apply your calculation to them and finally convert the entire column to numeric type.

msk = df['settle'].str.contains("'")
temp = df.loc[msk, 'settle'].str.split("'")
df.loc[msk, 'settle'] = pd.to_numeric(temp.str[0]) + pd.to_numeric(temp.str[-1])*(1/64)
df['settle'] = pd.to_numeric(df['settle'])

or use np.where instead of the last two lines in the above code. np.where lets us assign different values depending on a condition (in our case it's whether a string contains ' or not). If True, manipulate the number, else do nothing. Then convert the entire column to numeric type.

msk = df['settle'].str.contains("'")
temp = df.loc[msk, 'settle'].str.split("'")
df['settle'] = pd.to_numeric(np.where(msk, pd.to_numeric(temp.str[0]) + pd.to_numeric(temp.str[-1])*(1/64), df['settle']))

Output:

   strike  type  settle  openInterest  volume
0   10000  Call  24.375             0       0
1   15000  Call   0.000             0       0
2   20000  Call  23.875             0       0
3   25000  Call   1.000             0       0
4   30000  Call  23.375             0       0