Create new variables with mutate_at while keeping the original ones
Consider this simple example:
library(dplyr)
dataframe <- data_frame(helloo = c(1,2,3,4,5,6),
ooooHH = c(1,1,1,2,2,2),
ahaaa = c(200,400,120,300,100,100))
# A tibble: 6 x 3
helloo ooooHH ahaaa
<dbl> <dbl> <dbl>
1 1 1 200
2 2 1 400
3 3 1 120
4 4 2 300
5 5 2 100
6 6 2 100
Here I want to apply the function ntile
to all the columns that contains oo
, but I would like these new columns to be called cat
+ the corresponding column.
I know I can do this
dataframe %>% mutate_at(vars(contains('oo')), .funs = funs(ntile(., 2)))
# A tibble: 6 x 3
helloo ooooHH ahaaa
<int> <int> <dbl>
1 1 1 200
2 1 1 400
3 1 1 120
4 2 2 300
5 2 2 100
6 2 2 100
But what I need is this
# A tibble: 8 x 5
helloo ooooHH ahaaa cat_helloo cat_ooooHH
<dbl> <dbl> <dbl> <int> <int>
1 1 1 200 1 1
2 2 1 400 1 1
3 3 1 120 1 1
4 4 2 300 2 2
5 5 2 100 2 2
6 5 2 100 2 2
7 6 2 100 2 2
8 6 2 100 2 2
Is there a solution that does NOT require to store the intermediate data, and merge back to the original dataframe?
Update 2020-06 for dplyr 1.0.0
Starting in dplyr 1.0.0, the across()
function supersedes the "scoped variants" of functions such as mutate_at()
. The code should look pretty familiar within across()
, which is nested inside mutate()
.
Adding a name to the function(s) you give in the list adds the function name as a suffix.
dataframe %>%
mutate( across(contains('oo'),
.fns = list(cat = ~ntile(., 2))) )
# A tibble: 6 x 5
helloo ooooHH ahaaa helloo_cat ooooHH_cat
<dbl> <dbl> <dbl> <int> <int>
1 1 1 200 1 1
2 2 1 400 1 1
3 3 1 120 1 1
4 4 2 300 2 2
5 5 2 100 2 2
6 6 2 100 2 2
Changing the new columns names is a little easier in 1.0.0 with the .names
argument in across()
. Here is an example of adding the function name as a prefix instead of a suffix. This uses glue syntax.
dataframe %>%
mutate( across(contains('oo'),
.fns = list(cat = ~ntile(., 2)),
.names = "{fn}_{col}" ) )
# A tibble: 6 x 5
helloo ooooHH ahaaa cat_helloo cat_ooooHH
<dbl> <dbl> <dbl> <int> <int>
1 1 1 200 1 1
2 2 1 400 1 1
3 3 1 120 1 1
4 4 2 300 2 2
5 5 2 100 2 2
6 6 2 100 2 2
Original answer with mutate_at()
Edited to reflect changes in dplyr. As of dplyr 0.8.0, funs()
is deprecated and list()
with ~
should be used instead.
You can give names to the functions to the list you pass to .funs
to make new variables with the names as suffixes attached.
dataframe %>% mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2)))
# A tibble: 6 x 5
helloo ooooHH ahaaa helloo_cat ooooHH_cat
<dbl> <dbl> <dbl> <int> <int>
1 1 1 200 1 1
2 2 1 400 1 1
3 3 1 120 1 1
4 4 2 300 2 2
5 5 2 100 2 2
6 6 2 100 2 2
If you want it as a prefix instead, you could then use rename_at
to change the names.
dataframe %>%
mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2))) %>%
rename_at( vars( contains( "_cat") ), list( ~paste("cat", gsub("_cat", "", .), sep = "_") ) )
# A tibble: 6 x 5
helloo ooooHH ahaaa cat_helloo cat_ooooHH
<dbl> <dbl> <dbl> <int> <int>
1 1 1 200 1 1
2 2 1 400 1 1
3 3 1 120 1 1
4 4 2 300 2 2
5 5 2 100 2 2
6 6 2 100 2 2
Previous code with funs()
from earlier versions of dplyr:
dataframe %>%
mutate_at(vars(contains('oo')), .funs = funs(cat = ntile(., 2))) %>%
rename_at( vars( contains( "_cat") ), funs( paste("cat", gsub("_cat", "", .), sep = "_") ) )