Remove duplicated rows

r r-faq duplicates

Solution 1:

For people who have come here to look for a general answer for duplicate row removal, use !duplicated():

a <- c(rep("A", 3), rep("B", 3), rep("C",2))
b <- c(1,1,2,4,1,1,2,2)
df <-data.frame(a,b)

duplicated(df)
[1] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE

> df[duplicated(df), ]
  a b
2 A 1
6 B 1
8 C 2

> df[!duplicated(df), ]
  a b
1 A 1
3 A 2
4 B 4
5 B 1
7 C 2

Answer from: Removing duplicated rows from R data frame

Solution 2:

just isolate your data frame to the columns you need, then use the unique function :D

# in the above example, you only need the first three columns
deduped.data <- unique( yourdata[ , 1:3 ] )
# the fourth column no longer 'distinguishes' them, 
# so they're duplicates and thrown out.

Related

Recent Posts

org.apache.kafka.common.errors.TimeoutException: Topic not present in metadata after 60000 ms

Why my code runs infinite time when i entered non integer type in c++ [duplicate]

How to retrieve Instagram username from User ID?

Serverless Framework - Variables resolution error

How do we access a file in github repo inside our azure databricks notebook