Replace string/value in entire DataFrame
I have a very large dataset were I want to replace strings with numbers. I would like to operate on the dataset without typing a mapping function for each key (column) in the dataset. (similar to the fillna method, but replace specific string with assosiated value). Is there anyway to do this?
Here is an example of my dataset
data
resp A B C
0 1 poor poor good
1 2 good poor good
2 3 very good very good very good
3 4 bad poor bad
4 5 very bad very bad very bad
5 6 poor good very bad
6 7 good good good
7 8 very good very good very good
8 9 bad bad very bad
9 10 very bad very bad very bad
The desired result:
data
resp A B C
0 1 3 3 4
1 2 4 3 4
2 3 5 5 5
3 4 2 3 2
4 5 1 1 1
5 6 3 4 1
6 7 4 4 4
7 8 5 5 5
8 9 2 2 1
9 10 1 1 1
very bad=1, bad=2, poor=3, good=4, very good=5
//Jonas
Solution 1:
Use replace
In [126]: df.replace(['very bad', 'bad', 'poor', 'good', 'very good'],
[1, 2, 3, 4, 5])
Out[126]:
resp A B C
0 1 3 3 4
1 2 4 3 4
2 3 5 5 5
3 4 2 3 2
4 5 1 1 1
5 6 3 4 1
6 7 4 4 4
7 8 5 5 5
8 9 2 2 1
9 10 1 1 1
Solution 2:
Considering data
is your pandas DataFrame
you can also use:
data.replace({'very bad': 1, 'bad': 2, 'poor': 3, 'good': 4, 'very good': 5}, inplace=True)
Solution 3:
data = data.replace(['very bad', 'bad', 'poor', 'good', 'very good'],
[1, 2, 3, 4, 5])
You must state where the result should be saved. If you say only data.replace(...) it is only shown as a change in preview, not in the envirable itself.