Create categories by comparing a numeric column with a fixed value
Solution 1:
Try
iris$Regulation <- ifelse(iris$Sepal.Length >=5, "UP", "DOWN")
Solution 2:
In the interest of updating a possible canonical, the package dplyr
has the function mutate
which lets you create a new column in a data.frame in a vectorized fashion:
library(dplyr)
iris_new <- iris %>%
mutate(Regulation = if_else(Sepal.Length >= 5, 'UP', 'DOWN'))
This makes a new column called Regulation
which consists of either 'UP'
or 'DOWN'
based on applying the condition to the Sepal.Length
column.
The case_when
function (also from dplyr
) provides an easy to read way to chain together multiple conditions:
iris %>%
mutate(Regulation = case_when(Sepal.Length >= 5 ~ 'High',
Sepal.Length >= 4.5 ~ 'Mid',
TRUE ~ 'Low'))
This works just like if_else
except instead of 1 condition with a return value for TRUE and FALSE, each line has condition (left side of ~
) and a return value (right side of ~
) that it returns if TRUE. If false, it moves on to the next condition.
In this case, rows where Sepal.Length >= 5
will return 'High'
, rows where Sepal.Length < 5
(since the first condition had to fail) & Sepal.Length >= 4.5
will return 'Mid'
, and all other rows will return 'Low'
. Since TRUE
is always TRUE
, it is used to provide a default value.
Solution 3:
Without ifelse:
iris$Regulation <- c("DOWN", "UP")[ (iris$Sepal.Length >= 5) + 1 ]
Benchmark, about 14x faster than ifelse:
bigX <- runif(10^6, 0, 10)
bench::mark(
x1 = c("DOWN", "UP")[ (bigX >= 5) + 1 ],
x2 = ifelse(bigX >=5, "UP", "DOWN"),
x3 = dplyr::if_else(bigX >= 5, "UP", "DOWN")
)
# # A tibble: 3 x 14
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr total_time result memory
# <chr> <bch:t> <bch:t> <bch:t> <bch:t> <dbl> <bch:byt> <dbl> <int> <bch:tm> <list> <list>
# x1 19.1ms 23.9ms 20.5ms 31.6ms 41.9 22.9MB 9 22 525ms <chr ~ <Rpro~
# x2 278.9ms 280.2ms 280.2ms 281.5ms 3.57 118.3MB 4 2 560ms <chr ~ <Rpro~
# x3 47.8ms 64.2ms 54.1ms 138.8ms 15.6 68.7MB 11 8 514ms <chr ~ <Rpro~