pandas apply function that returns multiple values to rows in pandas dataframe

I have a dataframe with a timeindex and 3 columns containing the coordinates of a 3D vector:

                         x             y             z
ts
2014-05-15 10:38         0.120117      0.987305      0.116211
2014-05-15 10:39         0.117188      0.984375      0.122070
2014-05-15 10:40         0.119141      0.987305      0.119141
2014-05-15 10:41         0.116211      0.984375      0.120117
2014-05-15 10:42         0.119141      0.983398      0.118164

I would like to apply a transformation to each row that also returns a vector

def myfunc(a, b, c):
    do something
    return e, f, g

but if I do:

df.apply(myfunc, axis=1)

I end up with a Pandas series whose elements are tuples. This is beacause apply will take the result of myfunc without unpacking it. How can I change myfunc so that I obtain a new df with 3 columns?

Edit:

All solutions below work. The Series solution does allow for column names, the List solution seem to execute faster.

def myfunc1(args):
    e=args[0] + 2*args[1]
    f=args[1]*args[2] +1
    g=args[2] + args[0] * args[1]
    return pd.Series([e,f,g], index=['a', 'b', 'c'])

def myfunc2(args):
    e=args[0] + 2*args[1]
    f=args[1]*args[2] +1
    g=args[2] + args[0] * args[1]
    return [e,f,g]

%timeit df.apply(myfunc1 ,axis=1)

100 loops, best of 3: 4.51 ms per loop

%timeit df.apply(myfunc2 ,axis=1)

100 loops, best of 3: 2.75 ms per loop

Return Series and it will put them in a DataFrame.

def myfunc(a, b, c):
    do something
    return pd.Series([e, f, g])

This has the bonus that you can give labels to each of the resulting columns. If you return a DataFrame it just inserts multiple rows for the group.

Based on the excellent answer by @U2EF1, I've created a handy function that applies a specified function that returns tuples to a dataframe field, and expands the result back to the dataframe.

def apply_and_concat(dataframe, field, func, column_names):
    return pd.concat((
        dataframe,
        dataframe[field].apply(
            lambda cell: pd.Series(func(cell), index=column_names))), axis=1)

Usage:

df = pd.DataFrame([1, 2, 3], index=['a', 'b', 'c'], columns=['A'])
print df
   A
a  1
b  2
c  3

def func(x):
    return x*x, x*x*x

print apply_and_concat(df, 'A', func, ['x^2', 'x^3'])

   A  x^2  x^3
a  1    1    1
b  2    4    8
c  3    9   27

Hope it helps someone.

I've tried returning a tuple (I was using functions like scipy.stats.pearsonr which return that kind of structures) but It returned a 1D Series instead of a Dataframe which was I expected. If I created a Series manually the performance was worse, so I fixed It using the result_type as explained in the official API documentation:

Returning a Series inside the function is similar to passing result_type='expand'. The resulting column names will be the Series index.

So you could edit your code this way:

def myfunc(a, b, c):
    # do something
    return (e, f, g)

df.apply(myfunc, axis=1, result_type='expand')

pandas apply function that returns multiple values to rows in pandas dataframe

Related

Recent Posts