Unique on a dataframe with only selected columns
Solution 1:
Ok, if it doesn't matter which value in the non-duplicated column you select, this should be pretty easy:
dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))
> dat[!duplicated(dat[,c('id','id2')]),]
id id2 somevalue
1 1 1 x
3 3 4 z
Inside the duplicated
call, I'm simply passing only those columns from dat
that I don't want duplicates of. This code will automatically always select the first of any ambiguous values. (In this case, x.)
Solution 2:
Here are a couple dplyr
options that keep non-duplicate rows based on columns id and id2:
library(dplyr)
df %>% distinct(id, id2, .keep_all = TRUE)
df %>% group_by(id, id2) %>% filter(row_number() == 1)
df %>% group_by(id, id2) %>% slice(1)
Solution 3:
Using unique()
:
dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))
dat[row.names(unique(dat[,c("id", "id2")])),]
Solution 4:
Minor update in @Joran's code.
Using the code below, you can avoid the ambiguity and only get the unique of two columns:
dat <- data.frame(id=c(1,1,3), id2=c(1,1,4) ,somevalue=c("x","y","z"))
dat[row.names(unique(dat[,c("id", "id2")])), c("id", "id2")]