Pandas ; Is there a way to see all the data in one column? (So that duplicate data can only show one value)
By applying a filter to 'df_merge2', I try to draw a 'scatter plot' only on the information of USA and China in the 'country' column. But I don't know if there's USA and China in the "country" column. Is there a way?
url_1 = 'https://ds-lecture-data.s3.ap-northeast-2.amazonaws.com/gapminder/data1.csv'
url_2 = 'https://ds-lecture-data.s3.ap-northeast-2.amazonaws.com/gapminder/data2.csv'
url_3 = 'https://ds-lecture-data.s3.ap-northeast-2.amazonaws.com/gapminder/data3.csv'
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.read_csv(url_1,index_col = 0)
df2 = pd.read_csv(url_2,index_col = 0)
df3 = pd.read_csv(url_3,index_col = 0)
df_merge1 = df1.merge(df2)
df_merge2 = df_merge1.merge(df3)
df_merge2
# condition = ((df_merge2['country'] == 'USA') & (df_merge2['country'] == 'china'))
# df_merge2[condition]
Solution 1:
IIUC:
>>> df_merge2[df_merge2['geo'].isin(['usa', 'chn'])]
# OR
>>> df_merge2[df_merge2['country'].isin(['United States', 'China'])]
geo time cell_phones_total population country
1627 chn 1960 0.0 660408056 China
1628 chn 1965 0.0 724218968 China
1629 chn 1970 0.0 827601394 China
1630 chn 1975 0.0 926240885 China
1631 chn 1976 0.0 942685411 China
... ... ... ... ... ...
9033 usa 2014 355500000.0 318673411 United States
9034 usa 2015 382307000.0 320878310 United States
9035 usa 2016 396000000.0 323015995 United States
9036 usa 2017 400000000.0 325084756 United States
9037 usa 2018 422000000.0 327096265 United States
[92 rows x 5 columns]
So you can use:
out = df1.merge(df2, on=['geo', 'time']).query("geo.isin(['usa', 'chn'])")