Pandas "Can only compare identically-labeled DataFrame objects" error
I'm using Pandas to compare the outputs of two files loaded into two data frames (uat, prod): ...
uat = uat[['Customer Number','Product']]
prod = prod[['Customer Number','Product']]
print uat['Customer Number'] == prod['Customer Number']
print uat['Product'] == prod['Product']
print uat == prod
The first two match exactly:
74357 True
74356 True
Name: Customer Number, dtype: bool
74357 True
74356 True
Name: Product, dtype: bool
For the third print, I get an error: Can only compare identically-labeled DataFrame objects. If the first two compared fine, what's wrong with the 3rd?
Thanks
Here's a small example to demonstrate this (which only applied to DataFrames, not Series, until Pandas 0.19 where it applies to both):
In [1]: df1 = pd.DataFrame([[1, 2], [3, 4]])
In [2]: df2 = pd.DataFrame([[3, 4], [1, 2]], index=[1, 0])
In [3]: df1 == df2
Exception: Can only compare identically-labeled DataFrame objects
One solution is to sort the index first (Note: some functions require sorted indexes):
In [4]: df2.sort_index(inplace=True)
In [5]: df1 == df2
Out[5]:
0 1
0 True True
1 True True
Note: ==
is also sensitive to the order of columns, so you may have to use sort_index(axis=1)
:
In [11]: df1.sort_index().sort_index(axis=1) == df2.sort_index().sort_index(axis=1)
Out[11]:
0 1
0 True True
1 True True
Note: This can still raise (if the index/columns aren't identically labelled after sorting).
You can also try dropping the index column if it is not needed to compare:
print(df1.reset_index(drop=True) == df2.reset_index(drop=True))
I have used this same technique in a unit test like so:
from pandas.util.testing import assert_frame_equal
assert_frame_equal(actual.reset_index(drop=True), expected.reset_index(drop=True))