Difference between map, applymap and apply methods in Pandas
Solution 1:
Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book):
Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:
In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [117]: frame
Out[117]:
b d e
Utah -0.029638 1.081563 1.280300
Ohio 0.647747 0.831136 -1.549481
Texas 0.513416 -0.884417 0.195343
Oregon -0.485454 -0.477388 -0.309548
In [118]: f = lambda x: x.max() - x.min()
In [119]: frame.apply(f)
Out[119]:
b 1.133201
d 1.965980
e 2.829781
dtype: float64
Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.
Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:
In [120]: format = lambda x: '%.2f' % x
In [121]: frame.applymap(format)
Out[121]:
b d e
Utah -0.03 1.08 1.28
Ohio 0.65 0.83 -1.55
Texas 0.51 -0.88 0.20
Oregon -0.49 -0.48 -0.31
The reason for the name applymap is that Series has a map method for applying an element-wise function:
In [122]: frame['e'].map(format)
Out[122]:
Utah 1.28
Ohio -1.55
Texas 0.20
Oregon -0.31
Name: e, dtype: object
Summing up, apply
works on a row / column basis of a DataFrame, applymap
works element-wise on a DataFrame, and map
works element-wise on a Series.
Solution 2:
Comparing map
, applymap
and apply
: Context Matters
First major difference: DEFINITION
-
map
is defined on Series ONLY -
applymap
is defined on DataFrames ONLY -
apply
is defined on BOTH
Second major difference: INPUT ARGUMENT
-
map
acceptsdict
s,Series
, or callable -
applymap
andapply
accept callables only
Third major difference: BEHAVIOR
-
map
is elementwise for Series -
applymap
is elementwise for DataFrames -
apply
also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.
Fourth major difference (the most important one): USE CASE
-
map
is meant for mapping values from one domain to another, so is optimised for performance (e.g.,df['A'].map({1:'a', 2:'b', 3:'c'})
) -
applymap
is good for elementwise transformations across multiple rows/columns (e.g.,df[['A', 'B', 'C']].applymap(str.strip)
) -
apply
is for applying any function that cannot be vectorised (e.g.,df['sentences'].apply(nltk.sent_tokenize)
).
Also see When should I (not) want to use pandas apply() in my code? for a writeup I made a while back on the most appropriate scenarios for using apply
(note that there aren't many, but there are a few -- apply is generally slow`).
#Summarising
Footnotes
map
when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.
applymap
in more recent versions has been optimised for some operations. You will findapplymap
slightly faster thanapply
in some cases. My suggestion is to test them both and use whatever works better.
map
is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.
Series.apply
returns a scalar for aggregating operations, Series otherwise. Similarly forDataFrame.apply
. Note thatapply
also has fastpaths when called with certain NumPy functions such asmean
,sum
, etc.
Solution 3:
Quick Summary
-
DataFrame.apply
operates on entire rows or columns at a time. -
DataFrame.applymap
,Series.apply
, andSeries.map
operate on one element at time.
Series.apply
and Series.map
are similar and often interchangeable. Some of their slight differences are discussed in osa's answer below.