How do I loop variable names based on values in a list
I have this list with five heights in it and I want to put it in a loop to create five separate dataframes indexed by these numbers. This would include creating a column name based on different height, reading a csv file and assigning the colNames to it, and finally dropping the unused columns. I have multiple blocks of the same code to do this but I want to learn how to do it with a loop so I can clean up my script.
I get a NameError: name 'colNames' is not defined.
i = 0
height = ['0', '5', '15', '25', '50']
while i < len(height):
colNames["height{}".format(i)] = ["A", "B_%s" % height, "C", "D"]
df["height{}".format(i)] = pd.read_csv("test%s.csv" % height, names = colNames["height{}".format(i)])
df["height{}".format(i)].drop(labels = ["A", "C"],axis = 1, inplace = True)
i += 1
Expected results
colNames0 = ["A", "B_0", "C", "D"]
df0 = pd.read_csv("test0.csv", names = colNames0])
df0.drop(labels = ["A", "C"], axis = 1, inplace = True)
...
colNames50 = ["A", "B_0", "C", "D"]
df50 = pd.read_csv("test50.csv", names = colNames50])
df50.drop(labels = ["A", "C"], axis = 1, inplace = True)
Solution 1:
Trying to name separate DataFrames in this way is a bit unwieldy in Python, but here is how I might go about writing a loop for the problem you pose:
dflist = []
for num, height in enumerate(['0', '5', '15', '25', '50']):
dflist.append(pd.read_csv('test{}.csv'.format(height), names=['A', 'B{}'.format(height), 'C', 'D'])[['B{}'.format(height), 'D']])
You would not have DataFrames named df0, df5, ..., but will rather have a list of DataFrames. Unless there is a reason to save the various column names, you can just name your columns directly in the call to pd.read_csv. Additionally, selecting only the columns you want to keep at the end of the line is a little more streamlined than dropping the others in a separate command. As a side note,
df['newname'] = value
is a way to make a new column in an existing DataFrame, not a way to define a DataFrame.
The reason you are getting a NameError is because the syntax
colNames[x] = value
assumes you are trying to assign the value to a pre-existing object named "colNames".