Pandas: Replacement for .ix

Given the update to pandas 0.20.0 and the deprecation of .ix, I am wondering what the most efficient way to get the same result using the remaining .loc and .iloc. I just answered this question, but the second option (not using .ix) seems inefficient and verbose.

Snippet:

print df.iloc[df.loc[df['cap'].astype(float) > 35].index, :-1]

Is this the proper way to go when using both conditional and index position filtering?


Solution 1:

You can stay in the world of a single loc by getting at the index values you need by slicing that particular index with positions.

df.loc[
    df['cap'].astype(float) > 35,
    df.columns[:-1]
]

Solution 2:

Generally, you would prefer to avoid chained indexing in pandas (though, strictly speaking, you're actually using two different indexing methods). You can't modify your dataframe this way (details in the docs), and the docs cite performance as another reason (indexing once vs. twice).

For the latter, it's usually insignificant (or rather, unlikely to be a bottleneck in your code), and actually seems to not be the case (at least in the following example):

df = pd.DataFrame(np.random.uniform(size=(100000,10)),columns = list('abcdefghij'))
# Get columns number 2:5 where value in 'a' is greater than 0.5 
# (i.e. Boolean mask along axis 0, position slice of axis 1)

# Deprecated .ix method
%timeit df.ix[df['a'] > 0.5,2:5]
100 loops, best of 3: 2.14 ms per loop

# Boolean, then position
%timeit df.loc[df['a'] > 0.5,].iloc[:,2:5]
100 loops, best of 3: 2.14 ms per loop

# Position, then Boolean
%timeit df.iloc[:,2:5].loc[df['a'] > 0.5,]
1000 loops, best of 3: 1.75 ms per loop

# .loc
%timeit df.loc[df['a'] > 0.5, df.columns[2:5]]
100 loops, best of 3: 2.64 ms per loop

# .iloc
%timeit df.iloc[np.where(df['a'] > 0.5)[0],2:5]
100 loops, best of 3: 9.91 ms per loop

Bottom line: If you really want to avoid .ix, and you're not intending to modify values in your dataframe, just go with chained indexing. On the other hand (the 'proper' but arguably messier way), if you do need to modify values, either do .iloc with np.where() or .loc with integer slices of df.index or df.columns.

Solution 3:

How about breaking this into a two-step indexing:

df[df['cap'].astype(float) > 35].iloc[:,:-1]

or even:

df[df['cap'].astype(float) > 35].drop('cap',1)