Use Pandas groupby() + apply() with arguments
Solution 1:
pandas.core.groupby.GroupBy.apply
does NOT have named parameter args
, but pandas.DataFrame.apply
does have it.
So try this:
df.groupby('columnName').apply(lambda x: myFunction(x, arg1))
or as suggested by @Zero:
df.groupby('columnName').apply(myFunction, ('arg1'))
Demo:
In [82]: df = pd.DataFrame(np.random.randint(5,size=(5,3)), columns=list('abc'))
In [83]: df
Out[83]:
a b c
0 0 3 1
1 0 3 4
2 3 0 4
3 4 2 3
4 3 4 1
In [84]: def f(ser, n):
...: return ser.max() * n
...:
In [85]: df.apply(f, args=(10,))
Out[85]:
a 40
b 40
c 40
dtype: int64
when using GroupBy.apply
you can pass either a named arguments:
In [86]: df.groupby('a').apply(f, n=10)
Out[86]:
a b c
a
0 0 30 40
3 30 40 40
4 40 20 30
a tuple of arguments:
In [87]: df.groupby('a').apply(f, (10))
Out[87]:
a b c
a
0 0 30 40
3 30 40 40
4 40 20 30
Solution 2:
Some confusion here over why using an args
parameter throws an error might stem from the fact that pandas.DataFrame.apply
does have an args
parameter (a tuple), while pandas.core.groupby.GroupBy.apply
does not.
So, when you call .apply
on a DataFrame itself, you can use this argument; when you call .apply
on a groupby object, you cannot.
In @MaxU's answer, the expression lambda x: myFunction(x, arg1)
is passed to func
(the first parameter); there is no need to specify additional *args
/**kwargs
because arg1
is specified in lambda.
An example:
import numpy as np
import pandas as pd
# Called on DataFrame - `args` is a 1-tuple
# `0` / `1` are just the axis arguments to np.sum
df.apply(np.sum, axis=0) # equiv to df.sum(0)
df.apply(np.sum, axis=1) # equiv to df.sum(1)
# Called on groupby object of the DataFrame - will throw TypeError
print(df.groupby('col1').apply(np.sum, args=(0,)))
# TypeError: sum() got an unexpected keyword argument 'args'
Solution 3:
For me
df2 = df.groupby('columnName').apply(lambda x: my_function(x, arg1, arg2,))
worked