Pandas ; Is there a way to see all the data in one column? (So that duplicate data can only show one value)

By applying a filter to 'df_merge2', I try to draw a 'scatter plot' only on the information of USA and China in the 'country' column. But I don't know if there's USA and China in the "country" column. Is there a way?

url_1 = 'https://ds-lecture-data.s3.ap-northeast-2.amazonaws.com/gapminder/data1.csv'
url_2 = 'https://ds-lecture-data.s3.ap-northeast-2.amazonaws.com/gapminder/data2.csv'
url_3 = 'https://ds-lecture-data.s3.ap-northeast-2.amazonaws.com/gapminder/data3.csv'


import pandas as pd
import matplotlib.pyplot as plt

df1 = pd.read_csv(url_1,index_col = 0)
df2 = pd.read_csv(url_2,index_col = 0)
df3 = pd.read_csv(url_3,index_col = 0)

df_merge1 = df1.merge(df2)
df_merge2 = df_merge1.merge(df3)
df_merge2


# condition = ((df_merge2['country'] == 'USA') & (df_merge2['country'] == 'china'))
# df_merge2[condition]

Solution 1:

IIUC:

>>> df_merge2[df_merge2['geo'].isin(['usa', 'chn'])]

# OR

>>> df_merge2[df_merge2['country'].isin(['United States', 'China'])]

      geo  time  cell_phones_total  population        country
1627  chn  1960                0.0   660408056          China
1628  chn  1965                0.0   724218968          China
1629  chn  1970                0.0   827601394          China
1630  chn  1975                0.0   926240885          China
1631  chn  1976                0.0   942685411          China
...   ...   ...                ...         ...            ...
9033  usa  2014        355500000.0   318673411  United States
9034  usa  2015        382307000.0   320878310  United States
9035  usa  2016        396000000.0   323015995  United States
9036  usa  2017        400000000.0   325084756  United States
9037  usa  2018        422000000.0   327096265  United States

[92 rows x 5 columns]

So you can use:

out = df1.merge(df2, on=['geo', 'time']).query("geo.isin(['usa', 'chn'])")