Combining duplicated rows in R and adding new column containing IDs of duplicates
Some call to aggregate()
should do the trick.
Here's an option that collects the ID's in a list object:
(df1 <- aggregate(df[7], df[-7], unique))
# Chr start stop ref alt Hom.het ID
# 1 chr1 5179574 5183384 ref Del Het 719
# 2 chr1 5179574 5184738 ref Del Het 915, 951
# 3 chr1 5336806 5358384 ref Del Het 376
# 4 chr1 5347979 5358384 ref Del Het 228
And here's one that collects them in a character vector:
df2 <- aggregate(df[7], df[-7],
FUN = function(X) paste(unique(X), collapse=", "))
Comparing the results of the two options:
str(df1$ID)
# List of 4
# $ 0: int 719
# $ 3: int [1:2] 915 951
# $ 7: int 376
# $ 8: int 228
str(df2$ID)
# chr [1:4] "719" "915, 951" "376" "228"