List vs generator comprehension speed with join function [duplicate]
The str.join
method converts its iterable parameter to a list if it's not a list or tuple already. This lets the joining logic iterate over the items multiple times (it makes one pass to calculate the size of the result string, then a second pass to actually copy the data).
You can see this in the CPython source code:
PyObject *
PyUnicode_Join(PyObject *separator, PyObject *seq)
{
/* lots of variable declarations at the start of the function omitted */
fseq = PySequence_Fast(seq, "can only join an iterable");
/* ... */
}
The PySequence_Fast
function in the C API does just what I described. It converts an arbitrary iterable into a list (essentially by calling list
on it), unless it already is a list or tuple.
The conversion of the generator expression to a list means that the usual benefits of generators (a smaller memory footprint and the potential for short-circuiting) don't apply to str.join
, and so the (small) additional overhead that the generator has makes its performance worse.