Pandas: Incrementally count occurrences in a column
I have a DataFrame (df) which contains a 'Name' column. In a column labeled 'Occ_Number' I would like to keep a running tally on the number of appearances of each value in 'Name'.
For example:
Name Occ_Number
abc 1
def 1
ghi 1
abc 2
abc 3
def 2
jkl 1
jkl 2
I've been trying to come up with a method using
>df['Name'].value_counts()
but can't quite figure out how to tie it all together. I can only get the grand total from value_counts(). My process thus far involves creating a list of the 'Name' column string values which contain counts greater than 1 with the following code:
>things = df['Name'].value_counts()
>things = things[things > 1]
>queries = things.index.values
I was hoping to then somehow cycle through 'Name' and conditionally add to Occ_Number by checking against queries, but this is where I'm getting stuck. Does anybody know of a way to do this? I would appreciate any help. Thank you!
You can use cumcount
to avoid a dummy column:
>>> df["Occ_Number"] = df.groupby("Name").cumcount()+1
>>> df
Name Occ_Number
0 abc 1
1 def 1
2 ghi 1
3 abc 2
4 abc 3
5 def 2
6 jkl 1
7 jkl 2
You can add a helper column and then use cumsum
:
df =pd.DataFrame({'Name':['abc', 'def', 'ghi', 'abc', 'abc', 'def', 'jkl', 'jkl']})
add count:
df['counts'] =1
group by name:
cs =df.groupby('Name')['counts'].cumsum()
# set series name
cs.name = 'Occ_number'
join series back to dataframe:
# remove helper column
del df['counts']
df.join(cs)
returns:
Name Occ_number
0 abc 1
1 def 1
2 ghi 1
3 abc 2
4 abc 3
5 def 2
6 jkl 1
7 jkl 2