beginner question: Why do I need to type "PD." for read_csv, but after i create a dataframe, i don't need to typ "PD." anymore
Solution 1:
Everything in your python code, whether it be a variable, something you have imported, or anything else, is an object. Objects have attributes which can be referenced using <object>.<attribute name>
.
When you use import pandas as pd
you create a new object, stored in the variable called pd
. This object has a lot of attributes, one of them being the function read_csv
.
When you use the function pd.read_csv
, you create a new object, which you are storing in a variable called df
. info
and sort_values
are attributes of the object stored in the df
variable not the pd
variable so you don't need pd.
to use them. All you need is the df.
.
Solution 2:
pandas
is a module. Basically a file or set of files that contain functions, classes and variables that you can use in your code.
You import
the contents of that module into your file and give it the name pd
.
import pandas as pd
This is shorthand for:
import pandas
pd = pandas
When you want to reference the variables and functions in that module
you use dot notation.
<module name>.<variable, class, or function in module>
In your case, you are using the function read_csv
inside the pandas
module and assigning the result to the variable name df
.
df = pd.read_csv("name.csv")
df
is a pandas.DataFrame
object with info
and sort_values
attributes which can also be accessed by dot notation.
df.info
df.sort_values
Dot notation for an object follows.
<object name>.<attribute(variable) or method(function)>
As an aside, every variable name used in your python file must be defined somehow. By assignment(=
), import
, or definition.
By default, python includes some built-in constants and functions that you can reference, but everything else should follow those rules.