What is the most efficient string concatenation method in python?

Is there any efficient mass string concatenation method in Python (like StringBuilder in C# or StringBuffer in Java)? I found following methods here:

  • Simple concatenation using +
  • Using string list and join method
  • Using UserString from MutableString module
  • Using character array and the array module
  • Using cStringIO from StringIO module

But what do you experts use or suggest, and why?

[A related question here]


Solution 1:

You may be interested in this: An optimization anecdote by Guido. Although it is worth remembering also that this is an old article and it predates the existence of things like ''.join (although I guess string.joinfields is more-or-less the same)

On the strength of that, the array module may be fastest if you can shoehorn your problem into it. But ''.join is probably fast enough and has the benefit of being idiomatic and thus easier for other python programmers to understand.

Finally, the golden rule of optimization: don't optimize unless you know you need to, and measure rather than guessing.

You can measure different methods using the timeit module. That can tell you which is fastest, instead of random strangers on the internet making guesses.

Solution 2:

Python 3.6 changed the game for string concatenation of known components with Literal String Interpolation.

Given the test case from mkoistinen's answer, having strings

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'

The contenders are

  • f'http://{domain}/{lang}/{path}' - 0.151 µs

  • 'http://%s/%s/%s' % (domain, lang, path) - 0.321 µs

  • 'http://' + domain + '/' + lang + '/' + path - 0.356 µs

  • ''.join(('http://', domain, '/', lang, '/', path)) - 0.249 µs (notice that building a constant-length tuple is slightly faster than building a constant-length list).

Thus currently the shortest and the most beautiful code possible is also fastest.

In alpha versions of Python 3.6 the implementation of f'' strings was the slowest possible - actually the generated byte code is pretty much equivalent to the ''.join() case with unnecessary calls to str.__format__ which without arguments would just return self unchanged. These inefficiencies were addressed before 3.6 final.

The speed can be contrasted with the fastest method for Python 2, which is + concatenation on my computer; and that takes 0.203 µs with 8-bit strings, and 0.259 µs if the strings are all Unicode.

Solution 3:

''.join(sequenceofstrings) is what usually works best -- simplest and fastest.