add a string prefix to each value in a string column using Pandas
I would like to append a string to the start of each value in a said column of a pandas dataframe (elegantly). I already figured out how to kind-of do this and I am currently using:
df.ix[(df['col'] != False), 'col'] = 'str'+df[(df['col'] != False), 'col']
This seems one hell of an inelegant thing to do - do you know any other way (which maybe also adds the character to rows where that column is 0 or NaN)?
In case this is yet unclear, I would like to turn:
col
1 a
2 0
into:
col
1 stra
2 str0
Solution 1:
df['col'] = 'str' + df['col'].astype(str)
Example:
>>> df = pd.DataFrame({'col':['a',0]})
>>> df
col
0 a
1 0
>>> df['col'] = 'str' + df['col'].astype(str)
>>> df
col
0 stra
1 str0
Solution 2:
As an alternative, you can also use an apply
combined with format
(or better with f-strings) which I find slightly more readable if one e.g. also wants to add a suffix or manipulate the element itself:
df = pd.DataFrame({'col':['a', 0]})
df['col'] = df['col'].apply(lambda x: "{}{}".format('str', x))
which also yields the desired output:
col
0 stra
1 str0
If you are using Python 3.6+, you can also use f-strings:
df['col'] = df['col'].apply(lambda x: f"str{x}")
yielding the same output.
The f-string version is almost as fast as @RomanPekar's solution (python 3.6.4):
df = pd.DataFrame({'col':['a', 0]*200000})
%timeit df['col'].apply(lambda x: f"str{x}")
117 ms ± 451 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit 'str' + df['col'].astype(str)
112 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Using format
, however, is indeed far slower:
%timeit df['col'].apply(lambda x: "{}{}".format('str', x))
185 ms ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Solution 3:
You can use pandas.Series.map
:
df['col'].map('str{}'.format)
In this example, it will apply the word str
before all your values.