How do you Unit Test Python DataFrames

How do i unit test python dataframes?

I have functions that have an input and output as dataframes. Almost every function I have does this. Now if i want to unit test this what is the best method of doing it? It seems a bit of an effort to create a new dataframe (with values populated) for every function?

Are there any materials you can refer me to? Should you write unit tests for these functions?


Solution 1:

While Pandas' test functions are primarily used for internal testing, NumPy includes a very useful set of testing functions that are documented here: NumPy Test Support.

These functions compare NumPy arrays, but you can get the array that underlies a Pandas DataFrame using the values property. You can define a simple DataFrame and compare what your function returns to what you expect.

One technique you can use is to define one set of test data for a number of functions. That way, you can use Pytest Fixtures to define that DataFrame once, and use it in multiple tests.

In terms of resources, I found this article on Testing with NumPy and Pandas to be very useful. I also did a short presentation about data analysis testing at PyCon Canada 2016: Automate Your Data Analysis Testing.

Solution 2:

you can use pandas testing functions:

It will give more flexbile to compare your result with computed result in different ways.

For example:

df1=pd.DataFrame({'a':[1,2,3,4,5]})
df2=pd.DataFrame({'a':[6,7,8,9,10]})

expected_res=pd.Series([7,9,11,13,15])
pd.testing.assert_series_equal((df1['a']+df2['a']),expected_res,check_names=False)

For more details refer this link

Solution 3:

I don't think it's hard to create small DataFrames for unit testing?

import pandas as pd
from nose.tools import assert_dict_equal

input = pd.DataFrame.from_dict({
    'field_1': [some, values],
    'field_2': [other, values]
})
expected = {
    'result': [...]
}
assert_dict_equal(expected, my_func(input).to_dict(), "oops, there's a bug...")