Remove duplicates keeping entry with largest absolute value

First. Sort in the order putting the less desired items last within id groups

 aa <- a[order(a$id, -abs(a$value) ), ] #sort by id and reverse of abs(value)

Then: Remove items after the first within id groups

 aa[ !duplicated(aa$id), ]              # take the first row within each id
  id value
2  1     2
4  2    -4
5  3    -5
6  4     6

A data.table approach might be in order if your data set is very large:

library(data.table)

aDT <- as.data.table(a)
setkey(aDT,"id")

aDT[J(unique(id)), list(value = value[which.max(abs(value))])]

Or a not as fast, but still fast, alternative :

library(data.table)
as.data.table(a)[, .SD[which.max(abs(value))], by=id]

This version returns all the columns of a, in case there are more in the real dataset.

Here is a dplyr approach

library(dplyr)
a %>% 
        group_by(id) %>%
        top_n(1, abs(value))

# A tibble: 4 x 2
# Groups:   id [4]
#     id value
#  <dbl> <dbl>
#1     1     2
#2     2    -4
#3     3    -5
#4     4     6

Remove duplicates keeping entry with largest absolute value

Related

Recent Posts