Add max value to a new column in R [duplicate]

This is the data:

a <- c(1,1,2,2,3)  
b <- c(1,3,5,9,4)
df1 <- data.frame(a,b)
df1

a b
1 1
1 3
2 5
2 9
3 4

I want something like this:

a b max
1 1 3
1 3 3
2 5 9
2 9 9
3 4 4

My question: How do I compute the "max" column grouping by "a" column with value in "b" column?

I did some research with aggregate function but I still can not find the solution.

Please help


Solution 1:

We can use one of the group by functions which does allow to create a column instead of just summarise the output. The data.table option is succint as the assignment (:=) will create a new column. We convert the 'data.frame' to 'data.table', grouped by 'a', get the maximum value of 'b' (max(b)) and assign (:=) as a new column. If we need a summary, wrap it with list instead of :=.

library(data.table)
setDT(df1)[, Max:= max(b), a]

or use ave from base R. This is one of the rare base functions which will give the output as the same length of the number of rows of the original dataset. As the first argument, we use the variable to get the max of, 2nd, 3rd, etc (if any) will be the grouping column. Specify the FUN. By default, if we don't specify any FUN, it will take the mean by group.

df1$Max <- with(df1, ave(b, a, FUN=max)

or with dplyr. The dplyr and data.table solutions are flexible in ways to get either the summary or get a new column on the original dataset. With dplyr, after we group by 'a', use mutate to create a new column. If, instead, we need a summarised version, the summarise function can replace the mutate (which will give similar output as aggregate).

library(dplyr)
df1 %>%
   group_by(a) %>%
   mutate(Max = max(b))
#      a     b   Max
#   (dbl) (dbl) (dbl)
#1     1     1     3
#2     1     3     3
#3     2     5     9
#4     2     9     9
#5     3     4     4