dplyr if_else() vs base R ifelse()

I am fairly proficient within the Tidyverse, but have always used ifelse() instead of dplyr if_else(). I want to switch this behavior and default to always using dplyr::if_else() and deprecating ifelse() from my code.

Is there any reason not to do this? Would this likely get me into trouble? I'll spare you the details, but recently, not using if_else() screwed me up, when I unknowingly created a column of character matrices in my data analysis. If I switch to always using if_else() I hope to avoid this issue in the future.


Solution 1:

if_else is more strict. It checks that both alternatives are of the same type and otherwise throws an error, while ifelse will promote types as necessary. This may be a benefit in some circumstances, but may otherwise break scripts if you don't check for errors or explicitly force type conversion. For example:

ifelse(c(TRUE,TRUE,FALSE),"a",3)
[1] "a" "a" "3"
if_else(c(TRUE,TRUE,FALSE),"a",3)
Error: `false` must be type character, not double

Solution 2:

Another reason to choose if_else over ifelse is that ifelse turns Date into numeric objects

Dates <- as.Date(c('2018-10-01', '2018-10-02', '2018-10-03'))
new_Dates <- ifelse(Dates == '2018-10-02', Dates + 1, Dates)
str(new_Dates)

#>  num [1:3] 17805 17807 17807

if_else is also faster than ifelse.

Note that when testing multiple conditions, the code would be more readable and less error-prone if we use case_when.

library(dplyr)

case_when(
  Dates == '2018-10-01' ~ Dates - 1,
  Dates == '2018-10-02' ~ Dates + 1,
  Dates == '2018-10-03' ~ Dates + 2,
  TRUE ~ Dates
)

#> [1] "2018-09-30" "2018-10-03" "2018-10-05"

Created on 2018-06-01 by the reprex package (v0.2.0).

Solution 3:

I'd also add that if_else() can attribute a value in case of NA, which is a handy way of adding an extra condition.

df <- data_frame(val = c(80, 90, NA, 110))
df %>% mutate(category = if_else(val < 100, 1, 2, missing = 9))

#     val category
#   <dbl>    <dbl>
# 1    80        1
# 2    90        1
# 3    NA        9
# 4   110        2

Solution 4:

Another important reason for preferring if_else() to ifelse() is checking for consistency in lengths. See this dangerous gotcha:

> tibble(x = 1:3, y = ifelse(TRUE, x, 4:6))
# A tibble: 3 x 2
      x     y
  <int> <int>
1     1     1
2     2     1
3     3     1

Compare with

> tibble(x = 1:3, y = if_else(TRUE, x, 4:6))
    Error: `true` must be length 1 (length of `condition`), not 3.

The intention in both cases is clearly for column y to equal x or to equal 4:6 acording to the value of a single (scalar) logical variable; ifelse() silently truncates its output to length 1, which is then silently recycled. if_else() catches what is almost certainly an error at source.