Get Rows based on distinct values from Column 2
I am a newbie to pandas, tried searching this on google but still no luck. How can I get the rows by distinct values in column2?
For example, I have the dataframe bellow:
>>> df
COL1 COL2
a.com 22
b.com 45
c.com 34
e.com 45
f.com 56
g.com 22
h.com 45
I want to get the rows based on unique values in COL2
>>> df
COL1 COL2
a.com 22
b.com 45
c.com 34
f.com 56
So, how can I get that? I would appreciate it very much if anyone can provide any help.
Solution 1:
Use drop_duplicates
with specifying column COL2
for check duplicates:
df = df.drop_duplicates('COL2')
#same as
#df = df.drop_duplicates('COL2', keep='first')
print (df)
COL1 COL2
0 a.com 22
1 b.com 45
2 c.com 34
4 f.com 56
You can also keep only last values:
df = df.drop_duplicates('COL2', keep='last')
print (df)
COL1 COL2
2 c.com 34
4 f.com 56
5 g.com 22
6 h.com 45
Or remove all duplicates:
df = df.drop_duplicates('COL2', keep=False)
print (df)
COL1 COL2
2 c.com 34
4 f.com 56