Iterating within groups until a column changes in pandas
I have the following input df:
domain ip timestamp
0 Google 101 2020-04-01 23:01:41
1 Google 101 2020-04-01 23:01:59
2 Google 101 2020-04-02 12:01:41
3 Facebook 101 2020-04-02 13:11:33
4 Facebook 101 2020-04-02 13:11:35
5 Youtube 103 2020-04-21 13:01:41
6 Youtube 103 2020-04-21 13:11:46
7 Youtube 103 2020-04-22 01:01:01
8 Google 103 2020-04-22 02:11:23
9 Facebook 103 2020-04-23 14:11:13
10 Youtube 103 2020-04-23 14:11:55
How can I get this output? Where domain_num
is an iterator that increases everytime a domain switches within an IP.
domain ip timestamp domain_num
0 Google 101 2020-04-01 23:01:41 1
1 Google 101 2020-04-01 23:01:59 1
2 Google 101 2020-04-02 12:01:41 1
3 Facebook 101 2020-04-02 13:11:33 2
4 Facebook 101 2020-04-02 13:11:35 2
5 Youtube 103 2020-04-21 13:01:41 1
6 Youtube 103 2020-04-21 13:11:46 1
7 Youtube 103 2020-04-22 01:01:01 1
8 Google 103 2020-04-22 02:11:23 2
9 Facebook 103 2020-04-23 14:11:13 3
10 Youtube 103 2020-04-23 14:11:55 4
I tried something like this which gets the counts but I need to group it by ip
df['domain'].ne(df['domain'].shift()).cumsum()
This code below errors out
df.groupby('ip').apply(lambda x : x[x.domain.ne(x.domain.shift().cumsum())])
Data
import pandas as pd
data = {'domain':['Google', 'Google', 'Google', 'Facebook', 'Facebook', 'Youtube', 'Youtube', 'Youtube', 'Google', 'Facebook', 'Youtube'],
'ip':[101, 101, 101, 101, 101, 103, 103, 103, 103, 103, 103],
'timestamp' : ['2020-04-01 23:01:41', '2020-04-01 23:01:59', '2020-04-02 12:01:41', '2020-04-02 13:11:33',
'2020-04-02 13:11:35', '2020-04-21 13:01:41', '2020-04-21 13:11:46',
'2020-04-22 01:01:01', '2020-04-22 02:11:23','2020-04-23 14:11:13', '2020-04-23 14:11:55' ]}
df = pd.DataFrame(data)
df['timestamp']= pd.to_datetime(df['timestamp'])
Assume your dataframe is sorted by timestamp
column:
inc_domain_num = lambda x: x.ne(x.shift()).cumsum()
df['domain_num'] = df.groupby('ip')['domain'].apply(inc_domain_num)
print(df)
# Output
domain ip timestamp domain_num
0 Google 101 2020-04-01 23:01:41 1
1 Google 101 2020-04-01 23:01:59 1
2 Google 101 2020-04-02 12:01:41 1
3 Facebook 101 2020-04-02 13:11:33 2
4 Facebook 101 2020-04-02 13:11:35 2
5 Youtube 103 2020-04-21 13:01:41 1
6 Youtube 103 2020-04-21 13:11:46 1
7 Youtube 103 2020-04-22 01:01:01 1
8 Google 103 2020-04-22 02:11:23 2
9 Facebook 103 2020-04-23 14:11:13 3
10 Youtube 103 2020-04-23 14:11:55 4