When should I use a Map instead of a For Loop?
This is relating to the following: (In Python Code)
for i in object:
doSomething(i)
versus
map(doSomething, object)
Both are easy to understand, and short, but is there any speed difference? Now, if doSomething had a return value we needed to check it would be returned as a list from map, and in the for loop we could either create our own list or check one at a time.
for i in object:
returnValue = doSomething(i)
doSomethingWithReturnValue(returnValue)
versus
returnValue = map(doSomething, object)
map(doSomethingWithReturnValue, returnValue)
Now, I feel the two diverge a little bit. The two doSomethingWithReturnValue functions may be different based on if checking them on the fly as we go through the loop or if checking them all at once at the end produce different results. Also it seems the for loop would always work, maybe slower, where the map would only work under certain scenarios. Of course, we could make contortions to make either work, but the whole point is to avoid this type of work.
What I'm looking for is a scenario where the mapping function truly shines in comparison to a well done for loop in performance, readability, maintainability, or speed of implementation. If the answer is there really isn't a big difference then I'd like to know when in practice people use one or the other or if it's really completely arbitrary and set by coding standards depending on your institution.
Thanks!
map
is useful when you want to apply the function to every item of an iterable and return a list of the results. This is simpler and more concise than using a for loop and constructing a list.
for
is often more readable for other situations, and in lisp there were lots of iteration constructs that were written basically using macros and map. So, in cases where map
doesn't fit, use a for
loop.
In theory, if we had a compiler/interpreter that was smart enough to make use of multiple cpus/processors, then map
could be implemented faster as the different operations on each item could be done in parallel. I don't think this is the case at present, however.
Are you familiar with the timeit module? Below are some timings. -s performs a one-time setup, and then the command is looped and the best time recorded.
1> python -m timeit -s "L=[]; M=range(1000)" "for m in M: L.append(m*2)"
1000 loops, best of 3: 432 usec per loop
2> python -m timeit -s "M=range(1000);f=lambda x: x*2" "L=map(f,M)"
1000 loops, best of 3: 449 usec per loop
3> python -m timeit -s "M=range(1000);f=lambda x:x*2" "L=[f(m) for m in M]"
1000 loops, best of 3: 483 usec per loop
4> python -m timeit -s "L=[]; A=L.append; M=range(1000)" "for m in M: A(m*2)"
1000 loops, best of 3: 287 usec per loop
5> python -m timeit -s "M=range(1000)" "L=[m*2 for m in M]"
1000 loops, best of 3: 174 usec per loop
Note they are all similar except for the last two. It is the function calls (L.append, or f(x)) that severely affect the timing. In #4 the L.append lookup has been done once in setup. In #5 a list comp with no function calls is used.
just use list comprehensions: they're more pythonic. They're also have syntax similar to generator expressions which makes it easy to switch from one to the other. You don't need to change anything when converting your code to py3k: map
returns an iterable in py3k and you'll have to adjust your code.
if you don't care about return values just don't name the new list, you need to use return values once in your code you might switch to generator expressions and a single list comprehension at the end.