What is the difference between using squared brackets or dot to access a column?
Solution 1:
The "dot notation", i.e. df.col2
is the attribute access that's exposed as a convenience.
You may access an index on a Series, column on a DataFrame, and an item on a Panel directly as an attribute:
df['col2']
does the same: it returns a pd.Series
of the column.
A few caveats about attribute access:
- you cannot add a column (
df.new_col = x
won't work, worse: it will silently actually create a new attribute rather than a column - think monkey-patching here) - it won't work if you have spaces in the column name or if the column name is an integer.
Solution 2:
They are the same as long you're accessing a single column with a simple name, but you can do more with the bracket notation. You can only use df.col
if the column name is a valid Python identifier (e.g., does not contains spaces and other such stuff). Also, you may encounter surprises if your column name clashes with a pandas method name (like sum
). With brackets you can select multiple columns (e.g., df[['col1', 'col2']]
) or add a new column (df['newcol'] = ...
), which can't be done with dot access.
The other question you linked to applies, but that is a much more general question. Python objects get to define how the .
and []
operators apply to them. Pandas DataFrames have chosen to make them the same for this limited case of accessing single columns, with the caveats described above.