Pandas: drop a level from a multi-level column index?

If I've got a multi-level column index:

>>> cols = pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")])
>>> pd.DataFrame([[1,2], [3,4]], columns=cols)

    a
   ---+--
    b | c
--+---+--
0 | 1 | 2
1 | 3 | 4

How can I drop the "a" level of that index, so I end up with:

    b | c
--+---+--
0 | 1 | 2
1 | 3 | 4

Solution 1:

You can use MultiIndex.droplevel:

>>> cols = pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")])
>>> df = pd.DataFrame([[1,2], [3,4]], columns=cols)
>>> df
   a   
   b  c
0  1  2
1  3  4

[2 rows x 2 columns]
>>> df.columns = df.columns.droplevel()
>>> df
   b  c
0  1  2
1  3  4

[2 rows x 2 columns]

Solution 2:

Another way to drop the index is to use a list comprehension:

df.columns = [col[1] for col in df.columns]

   b  c
0  1  2
1  3  4

This strategy is also useful if you want to combine the names from both levels like in the example below where the bottom level contains two 'y's:

cols = pd.MultiIndex.from_tuples([("A", "x"), ("A", "y"), ("B", "y")])
df = pd.DataFrame([[1,2, 8 ], [3,4, 9]], columns=cols)

   A     B
   x  y  y
0  1  2  8
1  3  4  9

Dropping the top level would leave two columns with the index 'y'. That can be avoided by joining the names with the list comprehension.

df.columns = ['_'.join(col) for col in df.columns]

    A_x A_y B_y
0   1   2   8
1   3   4   9

That's a problem I had after doing a groupby and it took a while to find this other question that solved it. I adapted that solution to the specific case here.

Solution 3:

As of Pandas 0.24.0, we can now use DataFrame.droplevel():

cols = pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")])
df = pd.DataFrame([[1,2], [3,4]], columns=cols)

df.droplevel(0, axis=1) 

#   b  c
#0  1  2
#1  3  4

This is very useful if you want to keep your DataFrame method-chain rolling.

Solution 4:

Another way to do this is to reassign df based on a cross section of df, using the .xs method.

>>> df

    a
    b   c
0   1   2
1   3   4

>>> df = df.xs('a', axis=1, drop_level=True)

    # 'a' : key on which to get cross section
    # axis=1 : get cross section of column
    # drop_level=True : returns cross section without the multilevel index

>>> df

    b   c
0   1   2
1   3   4

Solution 5:

You could also achieve that by renaming the columns:

df.columns = ['a', 'b']

This involves a manual step but could be an option especially if you would eventually rename your data frame.