compare number of days with value
I work with dataset below named "emails_visits"
Territory Account: External ID Date Clicked Opened Sent Call Method date_after date_before days_before_visit days_after_visit
40582 PsAPS4 WNLN03239383 2021-02-16 13:46:00 0.0 0.0 1.0 RTE NaT NaT NaT NaT
19726 CardioPS5 WNLN00441144 2021-09-17 13:33:00 0.0 0.0 1.0 RTE NaT NaT NaT NaT
3532 ASPS4 WNLN00026136 2021-10-25 17:02:00 0.0 0.0 1.0 RTE 2021-10-21 NaT NaT 4 days 17:02:00
22371 CardioPS6 WNLN04438596 2021-06-15 13:44:00 0.0 1.0 1.0 RTE NaT NaT NaT NaT
35930 PSOPS5 WNLN02913837 2021-08-19 09:59:00 0.0 1.0 1.0 RTE NaT NaT NaT NaT
40099 PsAPS3 WNLN09365001 2021-02-25 16:18:00 0.0 0.0 1.0 RTE 2020-05-12 NaT NaT 289 days 16:18:00
25013 CardioPS7 WNLN04585438 2021-05-31 14:45:00 0.0 1.0 1.0 RTE NaT 2021-06-22 21 days 09:15:00 NaT
60381 MEDRESP6 WNLN00000715 2021-03-02 00:00:00 NaN NaN NaN Virtual MS Teams 2021-03-02 2021-03-02 0 days 00:00:00 0 days 00:00:00
I want to create new column with time brackets, for example if value <3 days time bracket is [3]
I used
emails_visits["before_bracket"]=emails_visits.apply(lambda x:"[3]"if x[10]<3 else "[10]" if x[10]<10 days else "[10+]")
I receive an error TypeError: '<' not supported between instances of 'str' and 'int'
I`ve also tried to convert column to numeric using emails_visits["days_before_visit"]=pd.to_numeric(emails_visits["days_before_visit"])
but got some weird numbers like -9223372036854775808 or 1433520000000000
Solution 1:
A more cleaner and faster way would be to use np.select
with conditions and values
import numpy as np
days_after_visit = emails_visits["days_after_visit"].dt.days
conditions = [
days_after_visit < 3,
days_after_visit < 10,
days_after_visit >= 10
]
values = [
"[3]",
"[10]",
"[10+]"
]
emails_visits["before_bracket"] = np.select(conditions, values)