Equality in Pandas DataFrames - Column Order Matters?


As part of a unit test, I need to test two DataFrames for equality. The order of the columns in the DataFrames is not important to me. However, it seems to matter to Pandas:

import pandas
df1 = pandas.DataFrame(index = [1,2,3,4])
df2 = pandas.DataFrame(index = [1,2,3,4])
df1['A'] = [1,2,3,4]
df1['B'] = [2,3,4,5]
df2['B'] = [2,3,4,5]
df2['A'] = [1,2,3,4]
df1 == df2

Results in:

Exception: Can only compare identically-labeled DataFrame objects

I believe the expression df1 == df2 should evaluate to a DataFrame containing all True values. Obviously it's debatable what the correct functionality of == should be in this context. My question is: Is there a Pandas method that does what I want? That is, is there a way to do equality comparison that ignores column order?

1/8/2013 9:16:05 PM

Accepted Answer

You could sort the columns using sort_index:

df1.sort_index(axis=1) == df2.sort_index(axis=1)

This will evaluate to a dataframe of all True values.

As @osa comments this fails for NaN's and isn't particularly robust either, in practise using something similar to @quant's answer is probably recommended (Note: we want a bool rather than raise if there's an issue):

def my_equal(df1, df2):
    from pandas.util.testing import assert_frame_equal
        assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1), check_names=True)
        return True
    except (AssertionError, ValueError, TypeError):  perhaps something else?
        return False
7/9/2018 3:44:40 PM

The most common intent is handled like this:

def assertFrameEqual(df1, df2, **kwds ):
    """ Assert that two dataframes are equal, ignoring ordering of columns"""
    from pandas.util.testing import assert_frame_equal
    return assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1), check_names=True, **kwds )

Of course see pandas.util.testing.assert_frame_equal for other parameters you can pass

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow