Iterating over groups (Python pandas dataframe)

Solution 1:

The .groupby() object has a .groups attribute that returns a Python dict of indices. In this case:

In [26]: df = pd.DataFrame({'A': ['foo', 'bar'] * 3,
   ....:                    'B': ['me', 'you', 'me'] * 2,
   ....:                    'C': [5, 2, 3, 4, 6, 9]})

In [27]: groups = df.groupby('A')

In [28]: groups.groups
Out[28]: {'bar': [1L, 3L, 5L], 'foo': [0L, 2L, 4L]}

You can iterate over this as follows:

keys = groups.groups.keys()
for index in range(0, len(keys) - 1):
    g1 = df.ix[groups.groups[keys[index]]]
    g2 = df.ix[groups.groups[keys[index + 1]]]
    # Do something with g1, g2

However, please remember that using for loops to iterate over Pandas objects is generally slower than vector operations. Depending on what you need done, and if it needs to be fast, you may want to try other approaches.