Add max value to a new column in R [duplicate]
This is the data:
a <- c(1,1,2,2,3)
b <- c(1,3,5,9,4)
df1 <- data.frame(a,b)
df1
a b
1 1
1 3
2 5
2 9
3 4
I want something like this:
a b max
1 1 3
1 3 3
2 5 9
2 9 9
3 4 4
My question: How do I compute the "max" column grouping by "a" column with value in "b" column?
I did some research with aggregate function but I still can not find the solution.
Please help
Solution 1:
We can use one of the group by functions which does allow to create a column instead of just summarise
the output. The data.table
option is succint as the assignment (:=
) will create a new column. We convert the 'data.frame' to 'data.table', grouped by 'a', get the maximum value of 'b' (max(b)
) and assign (:=
) as a new column. If we need a summary, wrap it with list
instead of :=
.
library(data.table)
setDT(df1)[, Max:= max(b), a]
or use ave
from base R
. This is one of the rare base functions which will give the output as the same length of the number of rows of the original dataset. As the first argument, we use the variable to get the max
of, 2nd, 3rd, etc (if any) will be the grouping column. Specify the FUN
. By default, if we don't specify any FUN
, it will take the mean
by group.
df1$Max <- with(df1, ave(b, a, FUN=max)
or with dplyr
. The dplyr
and data.table
solutions are flexible in ways to get either the summary
or get a new column on the original dataset. With dplyr
, after we group by 'a', use mutate
to create a new column. If, instead, we need a summarised version, the summarise
function can replace the mutate
(which will give similar output as aggregate
).
library(dplyr)
df1 %>%
group_by(a) %>%
mutate(Max = max(b))
# a b Max
# (dbl) (dbl) (dbl)
#1 1 1 3
#2 1 3 3
#3 2 5 9
#4 2 9 9
#5 3 4 4