How to iterate over Pandas Series generated from groupby().size()
How do you iterate over a Pandas Series generated from a .groupby('...').size()
command and get both the group name and count.
As an example if I have:
foo
-1 7
0 85
1 14
2 5
how can I loop over them so the that each iteration I would have -1 & 7, 0 & 85, 1 & 14 and 2 & 5 in variables?
I tried the enumerate option but it doesn't quite work. Example:
for i, row in enumerate(df.groupby(['foo']).size()):
print(i, row)
it doesn't return -1, 0, 1, and 2 for i
but rather 0, 1, 2, 3.
Update:
Given a pandas Series:
s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])
s
#a 1
#b 2
#c 3
#d 4
#dtype: int64
You can directly loop through it, which yield one value from the series in each iteration:
for i in s:
print(i)
1
2
3
4
If you want to access the index at the same time, you can use either items
or iteritems
method, which produces a generator that contains both the index and value:
for i, v in s.items():
print('index: ', i, 'value: ', v)
#index: a value: 1
#index: b value: 2
#index: c value: 3
#index: d value: 4
for i, v in s.iteritems():
print('index: ', i, 'value: ', v)
#index: a value: 1
#index: b value: 2
#index: c value: 3
#index: d value: 4
Old Answer:
You can call iteritems()
method on the Series:
for i, row in df.groupby('a').size().iteritems():
print(i, row)
# 12 4
# 14 2
According to doc:
Series.iteritems()
Lazily iterate over (index, value) tuples
Note: This is not the same data as in the question, just a demo.
To expand upon the answer of Psidom, there are three useful ways to unpack data from pd.Series. Having the same Series as Psidom:
s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])
- A direct loop over
s
yields thevalue
of each row. - A loop over
s.iteritems()
ors.items()
yields a tuple with the(index,value)
pairs of each row. - Using
enumerate()
ons.iteritems()
yields a nested tuple in the form of:(rownum,(index,value))
.
The last way is useful in case your index contains other information than the row number itself (e.g. in a case of a timeseries where the index is time).
s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])
for rownum,(indx,val) in enumerate(s.iteritems()):
print('row number: ', rownum, 'index: ', indx, 'value: ', val)
will output:
row number: 0 index: a value: 1
row number: 1 index: b value: 2
row number: 2 index: c value: 3
row number: 3 index: d value: 4
You can read more on unpacking nested tuples here.