Swift performance: map() and reduce() vs for loops
Shouldn't the built-in Array methods be faster than the naive approach for performing such operations? Maybe somebody with more low-level knowledge than I can shed some light on the situation.
I just want to attempt to address this part of the question and more from the conceptual level (with little understanding of the nature of Swift's optimizer on my part) with a "not necessarily". It's coming more from a background in compiler design and computer architecture than deep-rooted knowledge of the nature of Swift's optimizer.
Calling Overhead
With functions like map
and reduce
accepting functions as inputs, it places a greater strain on the optimizer to put it one way. The natural temptation in such a case short of some very aggressive optimization is to constantly branch back and forth between the implementation of, say, map
, and the closure you provided, and likewise transmit data across these disparate branches of code (through registers and stack, typically).
That kind of branching/calling overhead is very difficult for the optimizer to eliminate, especially given the flexibility of Swift's closures (not impossible but conceptually quite difficult). C++ optimizers can inline function object calls but with far more restrictions and code generation techniques required to do it where the compiler would effectively have to generate a whole new set of instructions for map
for each type of function object you pass in (and with explicit aid of the programmer indicating a function template used for the code generation).
So it shouldn't be of great surprise to find that your hand-rolled loops can perform faster -- they put a great deal of less strain on the optimizer. I have seen some people cite that these higher-order functions should be able to go faster as a result of the vendor being able to do things like parallelize the loop, but to effectively parallelize the loop would first require the kind of information that would typically allow the optimizer to inline the nested function calls within to a point where they become as cheap as the hand-rolled loops. Otherwise the function/closure implementation you pass in is going to be effectively opaque to functions like map/reduce
: they can only call it and pay the overhead of doing so, and cannot parallelize it since they cannot assume anything about the nature of the side effects and thread-safety in doing so.
Of course this is all conceptual -- Swift may be able to optimize these cases in the future, or it may already be able to do so now (see -Ofast
as a commonly-cited way to make Swift go faster at the cost of some safety). But it does place a heavier strain on the optimizer, at the very least, to use these kinds of functions over the hand-rolled loops, and the time differences you're seeing in the first benchmark seem to reflect the kind of differences one might expect with this additional calling overhead. Best way to find out is to look at the assembly and try various optimization flags.
Standard Functions
That's not to discourage the use of such functions. They do more concisely express intent, they can boost productivity. And relying on them could allow your codebase to get faster in future versions of Swift without any involvement on your part. But they aren't necessarily always going to be faster -- it is a good general rule to think that a higher-level library function that more directly expresses what you want to do is going to be faster, but there are always exceptions to the rule (but best discovered in hindsight with a profiler in hand since it's far better to err on the side of trust than distrust here).
Artificial Benchmarks
As for your second benchmark, it is almost certainly a result of the compiler optimizing away code that has no side effects that affect user output. Artificial benchmarks have a tendency to be notoriously misleading as a result of what optimizers do to eliminate irrelevant side effects (side effects that don't affect user output, essentially). So you have to be careful there when constructing benchmarks with times that seem too good to be true that they aren't the result of the optimizer merely skipping all the work you actually wanted to benchmark. At the very least, you want your tests to output some final result gathered from the computation.
I cannot say much about your first test (map()
vs append()
in a loop)
but I can confirm your results. The append loop becomes even faster if
you add
output.reserveCapacity(array.count)
after the array creation. It seems that Apple can improve things here and you might file a bug report.
In
for _ in 0..<100_000 {
var sum: Float = 0
for element in array {
sum += element
}
}
the compiler (probably) removes the entire loop because the computed results are not used at all. I can only speculate why a similar optimization does not happen in
for _ in 0..<100_000 {
let sum = array.reduce(0, combine: {$0 + $1})
}
but it would more difficult to decide if calling reduce()
with the closure has any side-effects or not.
If the test code is changed slightly to calculate and print a total sum
do {
var total = Float(0.0)
let start = NSDate()
for _ in 0..<100_000 {
total += array.reduce(0, combine: {$0 + $1})
}
let elapsed = NSDate().timeIntervalSinceDate(start)
print("sum with reduce:", elapsed)
print(total)
}
do {
var total = Float(0.0)
let start = NSDate()
for _ in 0..<100_000 {
var sum = Float(0.0)
for element in array {
sum += element
}
total += sum
}
let elapsed = NSDate().timeIntervalSinceDate(start)
print("sum with loop:", elapsed)
print(total)
}
then both variants take about 10 seconds in my test.
I did a quick set of performance tests measuring the performance of repeated transformations on an Array of Strings, and it showed that .map
was much more performant than a for loop, by a factor of about 10x.
The results in the screenshot below show that chained transformations in a single map
block outperform multiple map
s with a single transformation in each, and any use of map
out-performs for loops.
Code I used in a Playground:
import Foundation
import XCTest
class MapPerfTests: XCTestCase {
var array =
[
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString",
"MyString"
]
func testForLoopAllInOnePerf() {
measure {
var newArray: [String] = []
for item in array {
newArray.append(item.uppercased().lowercased().uppercased().lowercased())
}
}
}
func testForLoopMultipleStagesPerf() {
measure {
var newArray: [String] = []
for item in array {
let t1 = item.uppercased()
let t2 = item.lowercased()
let t3 = item.uppercased()
let t4 = item.lowercased()
newArray.append(t4)
}
}
}
func testMultipleMapPerf() {
measure {
let newArray = array
.map( { $0.uppercased() } )
.map( { $0.lowercased() } )
.map( { $0.uppercased() } )
.map( { $0.lowercased() } )
}
}
func testSingleMapPerf() {
measure {
let newArray = array
.map( { $0.uppercased().lowercased().uppercased().lowercased() } )
}
}
}
MapPerfTests.defaultTestSuite.run()