What are the pros/cons in using pd.Index vs df.loc
What is the difference between using pd.Index
vs df.loc
? Is it effectively the same thing?
idx = pd.Index(('a', 'b'))
df = pd.DataFrame({'a': [0, 1], 'b': [2, 3], 'c': [0, 5]})
print(df.loc[:, ('a', 'b')],)
print(df[idx])
a | b | |
---|---|---|
0 | 0 | 2 |
1 | 1 | 3 |
How loc
is the preferred method is described in the documentation. Using multiple slices can lead to a SettingWithCopyWarning
:
idx = ['a', 'b']
d = df[idx]
d.iloc[0,0] = 9
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In contrast, using loc
doesn't trigger the SettingWithCopyWarning
:
idx = ['a', 'b']
d = df.loc[:,idx]
d.iloc[0,0] = 9
Of note, loc
also enables you to pass a specific axis as parameter:
df.loc(axis=1)[idx]
When you do loc
, you can do with index slice and columns slice or combine, however pd.index
can only do for column slice
df.loc[[0]]
a b c
0 0 2 0
df.loc[[0],['a','b']]
a b
0 0 2
IMO, loc
is more flexible to using, and I will select loc
which will more clear for the long run or check back stage.