beginner question: Why do I need to type "PD." for read_csv, but after i create a dataframe, i don't need to typ "PD." anymore

Solution 1:

Everything in your python code, whether it be a variable, something you have imported, or anything else, is an object. Objects have attributes which can be referenced using <object>.<attribute name>.

When you use import pandas as pd you create a new object, stored in the variable called pd. This object has a lot of attributes, one of them being the function read_csv.

When you use the function pd.read_csv, you create a new object, which you are storing in a variable called df. info and sort_values are attributes of the object stored in the df variable not the pd variable so you don't need pd. to use them. All you need is the df..

Solution 2:

pandas is a module. Basically a file or set of files that contain functions, classes and variables that you can use in your code.

You import the contents of that module into your file and give it the name pd.

import pandas as pd

This is shorthand for:

import pandas
pd = pandas

When you want to reference the variables and functions in that module you use dot notation.

<module name>.<variable, class, or function in module>

In your case, you are using the function read_csv inside the pandas module and assigning the result to the variable name df.

df = pd.read_csv("name.csv")

df is a pandas.DataFrame object with info and sort_values attributes which can also be accessed by dot notation.

df.info
df.sort_values

Dot notation for an object follows.

<object name>.<attribute(variable) or method(function)>

As an aside, every variable name used in your python file must be defined somehow. By assignment(=), import, or definition.

By default, python includes some built-in constants and functions that you can reference, but everything else should follow those rules.