Replace value for a selected cell in pandas DataFrame without using index
this is a rather similar question to this question but with one key difference: I'm selecting the data I want to change not by its index but by some criteria.
If the criteria I apply return a single row, I'd expect to be able to set the value of a certain column in that row in an easy way, but my first attempt doesn't work:
>>> d = pd.DataFrame({'year':[2008,2008,2008,2008,2009,2009,2009,2009],
... 'flavour':['strawberry','strawberry','banana','banana',
... 'strawberry','strawberry','banana','banana'],
... 'day':['sat','sun','sat','sun','sat','sun','sat','sun'],
... 'sales':[10,12,22,23,11,13,23,24]})
>>> d
day flavour sales year
0 sat strawberry 10 2008
1 sun strawberry 12 2008
2 sat banana 22 2008
3 sun banana 23 2008
4 sat strawberry 11 2009
5 sun strawberry 13 2009
6 sat banana 23 2009
7 sun banana 24 2009
>>> d[d.sales==24]
day flavour sales year
7 sun banana 24 2009
>>> d[d.sales==24].sales = 100
>>> d
day flavour sales year
0 sat strawberry 10 2008
1 sun strawberry 12 2008
2 sat banana 22 2008
3 sun banana 23 2008
4 sat strawberry 11 2009
5 sun strawberry 13 2009
6 sat banana 23 2009
7 sun banana 24 2009
So rather than setting 2009 Sunday's Banana sales to 100, nothing happens! What's the nicest way to do this? Ideally the solution should use the row number, as you normally don't know that in advance!
Many ways to do that
1
In [7]: d.sales[d.sales==24] = 100
In [8]: d
Out[8]:
day flavour sales year
0 sat strawberry 10 2008
1 sun strawberry 12 2008
2 sat banana 22 2008
3 sun banana 23 2008
4 sat strawberry 11 2009
5 sun strawberry 13 2009
6 sat banana 23 2009
7 sun banana 100 2009
2
In [26]: d.loc[d.sales == 12, 'sales'] = 99
In [27]: d
Out[27]:
day flavour sales year
0 sat strawberry 10 2008
1 sun strawberry 99 2008
2 sat banana 22 2008
3 sun banana 23 2008
4 sat strawberry 11 2009
5 sun strawberry 13 2009
6 sat banana 23 2009
7 sun banana 100 2009
3
In [28]: d.sales = d.sales.replace(23, 24)
In [29]: d
Out[29]:
day flavour sales year
0 sat strawberry 10 2008
1 sun strawberry 99 2008
2 sat banana 22 2008
3 sun banana 24 2008
4 sat strawberry 11 2009
5 sun strawberry 13 2009
6 sat banana 24 2009
7 sun banana 100 2009
Not sure about older version of pandas, but in 0.16 the value of a particular cell can be set based on multiple column values.
Extending the answer provided by @waitingkuo, the same operation can also be done based on values of multiple columns.
d.loc[(d.day== 'sun') & (d.flavour== 'banana') & (d.year== 2009),'sales'] = 100
Old question, but I'm surprised nobody mentioned numpy's .where()
functionality (which can be called directly from the pandas module).
In this case the code would be:
d.sales = pd.np.where(d.sales == 24, 100, d.sales)
To my knowledge, this is one of the fastest ways to conditionally change data across a series.