Getting next element in time series pandas
I have a dataframe like so, I want to create a new column next_domain
.
It's calculated by looking for the next domain for an IP by the timestamp. It's N/A if the domain is the last domain for a certain IP. How can i do this in pandas?
Input:
domain ip timestamp
0 Google 101 2020-04-01 23:01:41
1 Google 101 2020-04-01 23:01:59
2 Google 101 2020-04-02 12:01:41
3 Facebook 101 2020-04-02 13:11:33
4 Facebook 101 2020-04-02 13:11:35
5 Youtube 103 2020-04-21 13:01:41
6 Youtube 103 2020-04-21 13:11:46
7 Youtube 103 2020-04-22 01:01:01
8 Google 103 2020-04-22 02:11:23
9 Facebook 103 2020-04-23 14:11:13
10 Youtube 103 2020-04-23 14:11:55
Expected output:
In the following table , row 2 has switch = 1 because it switches to Facebook right after (as seen in the timestamp) for the same IP.
Row 7 is a switch because Youtube changes to Google for IP 103, Row 8 is a switch because Google changes to Facebook for IP 103, and Row 10 is not a switch because there is no domain after Youtube.
domain ip timestamp next_domain
0 Google 101 2020-04-01 23:01:41 Facebook
1 Google 101 2020-04-01 23:01:59 Facebook
2 Google 101 2020-04-02 12:01:41 Facebook
3 Facebook 101 2020-04-02 13:11:33 N/A
4 Facebook 101 2020-04-02 13:11:35 N/A
5 Youtube 103 2020-04-21 13:01:41 Google
6 Youtube 103 2020-04-21 13:11:46 Google
7 Youtube 103 2020-04-22 01:01:01 Google
8 Google 103 2020-04-22 02:11:23 Facebook
9 Facebook 103 2020-04-23 14:11:13 Youtube
10 Youtube 103 2020-04-23 14:11:55 N/A
Solution 1:
You can keep the first domain of each stretch, bfill
per group and shift
:
s = df['domain']
df['next_domain'] = (s.where(s.ne(s.shift())) # keep only first domain of each stretch
.groupby(df['ip']) # per group
.apply(lambda s: s.bfill().shift(-1)) # bfill and shift up
)
output:
domain ip timestamp next_domain
0 Google 101 2020-04-01 23:01:41 Facebook
1 Google 101 2020-04-01 23:01:59 Facebook
2 Google 101 2020-04-02 12:01:41 Facebook
3 Facebook 101 2020-04-02 13:11:33 NaN
4 Facebook 101 2020-04-02 13:11:35 NaN
5 Youtube 103 2020-04-21 13:01:41 Google
6 Youtube 103 2020-04-21 13:11:46 Google
7 Youtube 103 2020-04-22 01:01:01 Google
8 Google 103 2020-04-22 02:11:23 Facebook
9 Facebook 103 2020-04-23 14:11:13 Youtube
10 Youtube 103 2020-04-23 14:11:55 NaN