Create a Pandas Dataframe by appending one row at a time
I understand that Pandas is designed to load a fully populated DataFrame
, but I need to create an empty DataFrame then add rows, one by one.
What is the best way to do this?
I successfully created an empty DataFrame with:
res = DataFrame(columns=('lib', 'qty1', 'qty2'))
Then I can add a new row and fill a field with:
res = res.set_value(len(res), 'qty1', 10.0)
It works, but it seems very odd :-/ (It fails for adding a string value.)
How can I add a new row to my DataFrame (with a different columns type)?
Solution 1:
You can use df.loc[i]
, where the row with index i
will be what you specify it to be in the dataframe.
>>> import pandas as pd
>>> from numpy.random import randint
>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])
>>> for i in range(5):
>>> df.loc[i] = ['name' + str(i)] + list(randint(10, size=2))
>>> df
lib qty1 qty2
0 name0 3 3
1 name1 2 4
2 name2 2 8
3 name3 2 1
4 name4 9 6
Solution 2:
In case you can get all data for the data frame upfront, there is a much faster approach than appending to a data frame:
- Create a list of dictionaries in which each dictionary corresponds to an input data row.
- Create a data frame from this list.
I had a similar task for which appending to a data frame row by row took 30 min, and creating a data frame from a list of dictionaries completed within seconds.
rows_list = []
for row in input_rows:
dict1 = {}
# get input row in dictionary format
# key = col_name
dict1.update(blah..)
rows_list.append(dict1)
df = pd.DataFrame(rows_list)