How do I loop variable names based on values in a list

I have this list with five heights in it and I want to put it in a loop to create five separate dataframes indexed by these numbers. This would include creating a column name based on different height, reading a csv file and assigning the colNames to it, and finally dropping the unused columns. I have multiple blocks of the same code to do this but I want to learn how to do it with a loop so I can clean up my script.

I get a NameError: name 'colNames' is not defined.

    i = 0
    height = ['0', '5', '15', '25', '50']
    while i < len(height):
        colNames["height{}".format(i)] = ["A", "B_%s" % height, "C", "D"]
        df["height{}".format(i)] = pd.read_csv("test%s.csv" % height, names = colNames["height{}".format(i)])
        df["height{}".format(i)].drop(labels = ["A", "C"],axis = 1, inplace = True)

        i += 1

Expected results

colNames0 = ["A", "B_0", "C", "D"]
df0 = pd.read_csv("test0.csv", names = colNames0])
df0.drop(labels = ["A", "C"], axis = 1, inplace = True)

...

colNames50 = ["A", "B_0", "C", "D"]
df50 = pd.read_csv("test50.csv", names = colNames50])
df50.drop(labels = ["A", "C"], axis = 1, inplace = True)


Solution 1:

Trying to name separate DataFrames in this way is a bit unwieldy in Python, but here is how I might go about writing a loop for the problem you pose:

dflist = []

for num, height in enumerate(['0', '5', '15', '25', '50']):
    dflist.append(pd.read_csv('test{}.csv'.format(height), names=['A', 'B{}'.format(height), 'C', 'D'])[['B{}'.format(height), 'D']])

You would not have DataFrames named df0, df5, ..., but will rather have a list of DataFrames. Unless there is a reason to save the various column names, you can just name your columns directly in the call to pd.read_csv. Additionally, selecting only the columns you want to keep at the end of the line is a little more streamlined than dropping the others in a separate command. As a side note,

df['newname'] = value

is a way to make a new column in an existing DataFrame, not a way to define a DataFrame.

The reason you are getting a NameError is because the syntax

colNames[x] = value

assumes you are trying to assign the value to a pre-existing object named "colNames".