Conditional replacement of values in a data.frame
Since you are conditionally indexing df$est
, you also need to conditionally index the replacement vector df$a
:
index <- df$b == 0
df$est[index] <- (df$a[index] - 5)/2.533
Of course, the variable index
is just temporary, and I use it to make the code a bit more readible. You can write it in one step:
df$est[df$b == 0] <- (df$a[df$b == 0] - 5)/2.533
For even better readibility, you can use within
:
df <- within(df, est[b==0] <- (a[b==0]-5)/2.533)
The results, regardless of which method you choose:
df
a b est
1 11.77000 2 0.000000
2 10.90000 3 0.000000
3 10.32000 2 0.000000
4 10.96000 0 2.352941
5 9.90600 0 1.936834
6 10.70000 0 2.250296
7 11.43000 1 0.000000
8 11.41000 2 0.000000
9 10.48512 4 0.000000
10 11.19000 0 2.443743
As others have pointed out, an alternative solution in your example is to use ifelse
.
Try data.table's :=
operator :
DT = as.data.table(df)
DT[b==0, est := (a-5)/2.533]
It's fast and short. See these linked questions for more information on :=
:
Why has data.table defined :=
When should I use the :=
operator in data.table
How do you remove columns from a data.frame
R self reference
Here is one approach. ifelse
is vectorized and it checks all rows for zero values of b
and replaces est
with (a - 5)/2.53
if that is the case.
df <- transform(df, est = ifelse(b == 0, (a - 5)/2.53, est))
Another option would be to use case_when
require(dplyr)
mutate(df, est = case_when(
b == 0 ~ (a - 5)/2.53,
TRUE ~ est
))
This solution becomes even more handy if more than 2 cases need to be distinguished, as it allows to avoid nested if_else
constructs.
The R-inferno, or the basic R-documentation will explain why using df$* is not the best approach here. From the help page for "[" :
"Indexing by [ is similar to atomic vectors and selects a list of the specified element(s). Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices, whereas [[ does. x$name is equivalent to x[["name", exact = FALSE]]. Also, the partial matching behavior of [[ can be controlled using the exact argument. "
I recommend using the [row,col]
notation instead. Example:
Rgames: foo
x y z
[1,] 1e+00 1 0
[2,] 2e+00 2 0
[3,] 3e+00 1 0
[4,] 4e+00 2 0
[5,] 5e+00 1 0
[6,] 6e+00 2 0
[7,] 7e+00 1 0
[8,] 8e+00 2 0
[9,] 9e+00 1 0
[10,] 1e+01 2 0
Rgames: foo<-as.data.frame(foo)
Rgames: foo[foo$y==2,3]<-foo[foo$y==2,1]
Rgames: foo
x y z
1 1e+00 1 0e+00
2 2e+00 2 2e+00
3 3e+00 1 0e+00
4 4e+00 2 4e+00
5 5e+00 1 0e+00
6 6e+00 2 6e+00
7 7e+00 1 0e+00
8 8e+00 2 8e+00
9 9e+00 1 0e+00
10 1e+01 2 1e+01