Pandas - concating multi-indexed dataframes keeps duplicate indizes

Solution 1:

It's probably because your index is not sorted:

test = pd.concat([df1, df2]).set_index(['state', 'year']).sort_index()
print(test)

# Output
                  population  violent_crime    theft     gta
state       year                                            
ALABAMA     2223     4903185         510.81  1886.06  256.51
            2224     4903185         510.81  1886.06  256.51
ALASKA      2223      731545         867.07  2066.04  357.74
            2224      731545         867.07  2066.04  357.74
ARIZONA     2223     7278717         455.31  1796.86  249.37
            2224     7278717         455.31  1796.86  249.37
ARKANSAS    2223     3017804         584.63  2012.56  245.87
            2224     3017804         584.63  2012.56  245.87
CALIFORNIA  2223    39512223         441.21  1586.35  358.77
            2224    39512223         441.21  1586.35  358.77
COLORADO    2223     5758736         380.95  1858.26  383.99
            2224     5758736         380.95  1858.26  383.99
CONNECTICUT 2223     3565287         183.60  1078.65  167.28
            2224     3565287         183.60  1078.65  167.28

Solution 2:

You can verify when you concatenate for duplicates by:

test = pd.concat([df1, df2], verify_integrity=True)

Or you can drop duplicates afterwards:

test.set_index(['state', 'year'], inplace=True).drop_duplicates(inplace=True)