Pandas - concating multi-indexed dataframes keeps duplicate indizes
Solution 1:
It's probably because your index is not sorted:
test = pd.concat([df1, df2]).set_index(['state', 'year']).sort_index()
print(test)
# Output
population violent_crime theft gta
state year
ALABAMA 2223 4903185 510.81 1886.06 256.51
2224 4903185 510.81 1886.06 256.51
ALASKA 2223 731545 867.07 2066.04 357.74
2224 731545 867.07 2066.04 357.74
ARIZONA 2223 7278717 455.31 1796.86 249.37
2224 7278717 455.31 1796.86 249.37
ARKANSAS 2223 3017804 584.63 2012.56 245.87
2224 3017804 584.63 2012.56 245.87
CALIFORNIA 2223 39512223 441.21 1586.35 358.77
2224 39512223 441.21 1586.35 358.77
COLORADO 2223 5758736 380.95 1858.26 383.99
2224 5758736 380.95 1858.26 383.99
CONNECTICUT 2223 3565287 183.60 1078.65 167.28
2224 3565287 183.60 1078.65 167.28
Solution 2:
You can verify when you concatenate for duplicates by:
test = pd.concat([df1, df2], verify_integrity=True)
Or you can drop duplicates afterwards:
test.set_index(['state', 'year'], inplace=True).drop_duplicates(inplace=True)