How to build and fill pandas dataframe from for loop? [duplicate]
Here is a simple example of the code I am running, and I would like the results put into a pandas dataframe (unless there is a better option):
for p in game.players.passing():
print p, p.team, p.passing_att, p.passer_rating()
R.Wilson SEA 29 55.7
J.Ryan SEA 1 158.3
A.Rodgers GB 34 55.8
Using this code:
d = []
for p in game.players.passing():
d = [{'Player': p, 'Team': p.team, 'Passer Rating':
p.passer_rating()}]
pd.DataFrame(d)
I can get:
Passer Rating Player Team
0 55.8 A.Rodgers GB
Which is a 1x3 dataframe, and I understand why it is only one row but I can't figure out how to make it multi-row with the columns in the correct order. Ideally the solution would be able to deal with n number of rows (based on p) and it would be wonderful (although not essential) if the number of columns would be set by the number of stats requested. Any suggestions? Thanks in advance!
The simplest answer is what Paul H said:
d = []
for p in game.players.passing():
d.append(
{
'Player': p,
'Team': p.team,
'Passer Rating': p.passer_rating()
}
)
pd.DataFrame(d)
But if you really want to "build and fill a dataframe from a loop", (which, btw, I wouldn't recommend), here's how you'd do it.
d = pd.DataFrame()
for p in game.players.passing():
temp = pd.DataFrame(
{
'Player': p,
'Team': p.team,
'Passer Rating': p.passer_rating()
}
)
d = pd.concat([d, temp])
Try this using list comprehension:
import pandas as pd
df = pd.DataFrame(
[p, p.team, p.passing_att, p.passer_rating()] for p in game.players.passing()
)
Make a list of tuples with your data and then create a DataFrame with it:
d = []
for p in game.players.passing():
d.append((p, p.team, p.passer_rating()))
pd.DataFrame(d, columns=('Player', 'Team', 'Passer Rating'))
A list of tuples should have less overhead than a list dictionaries. I tested this below, but please remember to prioritize ease of code understanding over performance in most cases.
Testing functions:
def with_tuples(loop_size=1e5):
res = []
for x in range(int(loop_size)):
res.append((x-1, x, x+1))
return pd.DataFrame(res, columns=("a", "b", "c"))
def with_dict(loop_size=1e5):
res = []
for x in range(int(loop_size)):
res.append({"a":x-1, "b":x, "c":x+1})
return pd.DataFrame(res)
Results:
%timeit -n 10 with_tuples()
# 10 loops, best of 3: 55.2 ms per loop
%timeit -n 10 with_dict()
# 10 loops, best of 3: 130 ms per loop
I may be wrong, but I think the accepted answer by @amit has a bug.
from pandas import DataFrame as df
x = [1,2,3]
y = [7,8,9,10]
# this gives me a syntax error at 'for' (Python 3.7)
d1 = df[[a, "A", b, "B"] for a in x for b in y]
# this works
d2 = df([a, "A", b, "B"] for a in x for b in y)
# and if you want to add the column names on the fly
# note the additional parentheses
d3 = df(([a, "A", b, "B"] for a in x for b in y), columns = ("l","m","n","o"))