Format strings vs concatenation
I see many people using format strings like this:
root = "sample"
output = "output"
path = "{}/{}".format(root, output)
Instead of simply concatenating strings like this:
path = root + '/' + output
Do format strings have better performance or is this just for looks?
It's just for the looks. You can see at one glance what the format is. Many of us like readability better than micro-optimization.
Let's see what IPython's %timeit
says:
Python 3.7.2 (default, Jan 3 2019, 02:55:40)
IPython 5.8.0
Intel(R) Core(TM) i5-4590T CPU @ 2.00GHz
In [1]: %timeit root = "sample"; output = "output"; path = "{}/{}".format(root, output)
The slowest run took 12.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 223 ns per loop
In [2]: %timeit root = "sample"; output = "output"; path = root + '/' + output
The slowest run took 13.82 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 101 ns per loop
In [3]: %timeit root = "sample"; output = "output"; path = "%s/%s" % (root, output)
The slowest run took 27.97 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 155 ns per loop
In [4]: %timeit root = "sample"; output = "output"; path = f"{root}/{output}"
The slowest run took 19.52 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 77.8 ns per loop
I agree that the formatting is mostly used for readability, but since the release of f-strings in 3.6, the tables have turned in terms of performance. It is also my opinion that the f-strings are more readable/maintainable since 1) they can be read left-right like most regular text and 2) the spacing-related disadvantages of concatenation are avoided since the variables are in-string.
Running this code:
from timeit import timeit
runs = 1000000
def print_results(time, start_string):
print(f'{start_string}\n'
f'Total: {time:.4f}s\n'
f'Avg: {(time/runs)*1000000000:.4f}ns\n')
t1 = timeit('"%s, %s" % (greeting, loc)',
setup='greeting="hello";loc="world"',
number=runs)
t2 = timeit('f"{greeting}, {loc}"',
setup='greeting="hello";loc="world"',
number=runs)
t3 = timeit('greeting + ", " + loc',
setup='greeting="hello";loc="world"',
number=runs)
t4 = timeit('"{}, {}".format(greeting, loc)',
setup='greeting="hello";loc="world"',
number=runs)
print_results(t1, '% replacement')
print_results(t2, 'f strings')
print_results(t3, 'concatenation')
print_results(t4, '.format method')
yields this result on my machine:
% replacement
Total: 0.3044s
Avg: 304.3638ns
f strings
Total: 0.0991s
Avg: 99.0777ns
concatenation
Total: 0.1252s
Avg: 125.2442ns
.format method
Total: 0.3483s
Avg: 348.2690ns
A similar answer to a different question is given on this answer.
As with most things, there will be a performance difference, but ask yourself "Does it really matter if this is ns faster?". The root + '/' output
method is quick and easy to type out. But this can get hard to read real quick when you have multiple variables to print out
foo = "X = " + myX + " | Y = " + someY + " Z = " + Z.toString()
vs
foo = "X = {} | Y= {} | Z = {}".format(myX, someY, Z.toString())
Which is easier to understand what is going on? Unless you really need to eak out performance, chose the way that will be easiest for people to read and understand