Pandas selecting by label sometimes return Series, sometimes returns DataFrame
In Pandas, when I select a label that only has one entry in the index I get back a Series, but when I select an entry that has more then one entry I get back a data frame.
Why is that? Is there a way to ensure I always get back a data frame?
In [1]: import pandas as pd
In [2]: df = pd.DataFrame(data=range(5), index=[1, 2, 3, 3, 3])
In [3]: type(df.loc[3])
Out[3]: pandas.core.frame.DataFrame
In [4]: type(df.loc[1])
Out[4]: pandas.core.series.Series
Solution 1:
Granted that the behavior is inconsistent, but I think it's easy to imagine cases where this is convenient. Anyway, to get a DataFrame every time, just pass a list to loc
. There are other ways, but in my opinion this is the cleanest.
In [2]: type(df.loc[[3]])
Out[2]: pandas.core.frame.DataFrame
In [3]: type(df.loc[[1]])
Out[3]: pandas.core.frame.DataFrame
Solution 2:
You have an index with three index items 3
. For this reason df.loc[3]
will return a dataframe.
The reason is that you don't specify the column. So df.loc[3]
selects three items of all columns (which is column 0
), while df.loc[3,0]
will return a Series. E.g. df.loc[1:2]
also returns a dataframe, because you slice the rows.
Selecting a single row (as df.loc[1]
) returns a Series with the column names as the index.
If you want to be sure to always have a DataFrame, you can slice like df.loc[1:1]
. Another option is boolean indexing (df.loc[df.index==1]
) or the take method (df.take([0])
, but this used location not labels!).