Compute difference for each combination of dates for a given level

I have a dataframe such as below:

example_df = pd.DataFrame({"group_id": ["12356", "12356", "12359", "12356", "12359"], "date": ["2021-12-03", "2021-12-05", "2021-05-06", "2021-11-04", "2021-06-05"]})

I need to find the difference of dates in date column for each group id. For example for group_id=12356 there are 3 dates available

["2021-12-03", "2021-12-05", "2021-11-04"]

I need the difference between these days in days. There are 3 combinations with these given dates. I wrote a code that does not work the way I want and is very slow because I am using iterrows. Is there a shorter and easier way to achieve this?

My code:

%%time
date_diff_dict = {}
for index, row in example_df.iterrows():
    group_id = row.group_id
    group_id_df = example_df[example_df.group_id == group_id]
    date_diff_list = []
    for idx, rw in group_id_df.iterrows():
        if (row.order_id != rw.order_id) & (row.order_id >= rw.order_id):
            date_diff = np.abs((row.date - rw.date).days)
            print(row.date, rw.date)
            print(date_diff)
            date_diff_list.append(date_diff)
            print(date_diff_list)
    date_diff_dict[str(group_id)] = date_diff_list

This code gives partially correct answer but misses a day.

Expected output is :

{'12356': [2, 29, 31], '12359': [30]}

Solution 1:

Here's one way:

(i) convert "date" from string literals to datetime object

(ii) groupby "date"

(iii) for each group, use itertools.combinations to find pairs of dates

(iv) find the absolute difference in days between the pairs of dates

from itertools import combinations
example_df['date'] = pd.to_datetime(example_df['date'])
out = example_df.groupby('group_id')['date'].apply(lambda date: [abs((y-x).days) for x,y in combinations(date, 2)]).to_dict()

Output:

{'12356': [2, 29, 31], '12359': [30]}