Python pandas equivalent for replace
Solution 1:
pandas
has a replace
method too:
In [25]: df = DataFrame({1: [2,3,4], 2: [3,4,5]})
In [26]: df
Out[26]:
1 2
0 2 3
1 3 4
2 4 5
In [27]: df[2]
Out[27]:
0 3
1 4
2 5
Name: 2
In [28]: df[2].replace(4, 17)
Out[28]:
0 3
1 17
2 5
Name: 2
In [29]: df[2].replace(4, 17, inplace=True)
Out[29]:
0 3
1 17
2 5
Name: 2
In [30]: df
Out[30]:
1 2
0 2 3
1 3 17
2 4 5
or you could use numpy
-style advanced indexing:
In [47]: df[1]
Out[47]:
0 2
1 3
2 4
Name: 1
In [48]: df[1] == 4
Out[48]:
0 False
1 False
2 True
Name: 1
In [49]: df[1][df[1] == 4]
Out[49]:
2 4
Name: 1
In [50]: df[1][df[1] == 4] = 19
In [51]: df
Out[51]:
1 2
0 2 3
1 3 17
2 19 5
Solution 2:
Pandas doc for replace
does not have any examples, so I will give some here. For those coming from an R perspective (like me), replace
is basically an all-purpose replacement function that combines the functionality of R functions plyr::mapvalues
, plyr::revalue
and stringr::str_replace_all
. Since DSM covered the case of single values, I will cover the multi-value case.
Example series
In [10]: x = pd.Series([1, 2, 3, 4])
In [11]: x
Out[11]:
0 1
1 2
2 3
3 4
dtype: int64
We want to replace the positive integers with negative integers (and not by multiplying with -1).
Two lists of values
One way to do this by having one list (or pandas series) of the values we want to replace and a second list with the values we want to replace them with.
In [14]: x.replace([1, 2, 3, 4], [-1, -2, -3, -4])
Out[14]:
0 -1
1 -2
2 -3
3 -4
dtype: int64
This corresponds to plyr::mapvalues
.
Dictionary of value pairs
Sometimes it's more convenient to have a dictionary of value pairs. The index is the one we replace and the value is the one we replace it with.
In [15]: x.replace({1: -1, 2: -2, 3: -3, 4: -4})
Out[15]:
0 -1
1 -2
2 -3
3 -4
dtype: int64
This corresponds to plyr::revalue
.
Strings
It works similarly for strings, except that we also have the option of using regex patterns.
If we simply want to replace strings with other strings, it works exactly the same as before:
In [18]: s = pd.Series(["ape", "monkey", "seagull"])
In [22]: s
Out[22]:
0 ape
1 monkey
2 seagull
dtype: object
Two lists
In [25]: s.replace(["ape", "monkey"], ["lion", "panda"])
Out[25]:
0 lion
1 panda
2 seagull
dtype: object
Dictionary
In [26]: s.replace({"ape": "lion", "monkey": "panda"})
Out[26]:
0 lion
1 panda
2 seagull
dtype: object
Regex
Replace all a
s with x
s.
In [27]: s.replace("a", "x", regex=True)
Out[27]:
0 xpe
1 monkey
2 sexgull
dtype: object
Replace all l
s with x
s.
In [28]: s.replace("l", "x", regex=True)
Out[28]:
0 ape
1 monkey
2 seaguxx
dtype: object
Note that both l
s in seagull
were replaced.
Replace a
s with x
s and l
s with p
s
In [29]: s.replace(["a", "l"], ["x", "p"], regex=True)
Out[29]:
0 xpe
1 monkey
2 sexgupp
dtype: object
In the special case where one wants to replace multiple different values with the same value, one can just simply a single string as the replacement. It must not be inside a list. Replace a
s and l
s with p
s
In [29]: s.replace(["a", "l"], "p", regex=True)
Out[29]:
0 ppe
1 monkey
2 sepgupp
dtype: object
(Credit to DaveL17 in the comments)