Create a Pandas Dataframe by appending one row at a time

I understand that Pandas is designed to load a fully populated DataFrame, but I need to create an empty DataFrame then add rows, one by one. What is the best way to do this?

I successfully created an empty DataFrame with:

res = DataFrame(columns=('lib', 'qty1', 'qty2'))

Then I can add a new row and fill a field with:

res = res.set_value(len(res), 'qty1', 10.0)

It works, but it seems very odd :-/ (It fails for adding a string value.)

How can I add a new row to my DataFrame (with a different columns type)?


Solution 1:

You can use df.loc[i], where the row with index i will be what you specify it to be in the dataframe.

>>> import pandas as pd
>>> from numpy.random import randint

>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])
>>> for i in range(5):
>>>     df.loc[i] = ['name' + str(i)] + list(randint(10, size=2))

>>> df
     lib qty1 qty2
0  name0    3    3
1  name1    2    4
2  name2    2    8
3  name3    2    1
4  name4    9    6

Solution 2:

In case you can get all data for the data frame upfront, there is a much faster approach than appending to a data frame:

  1. Create a list of dictionaries in which each dictionary corresponds to an input data row.
  2. Create a data frame from this list.

I had a similar task for which appending to a data frame row by row took 30 min, and creating a data frame from a list of dictionaries completed within seconds.

rows_list = []
for row in input_rows:

        dict1 = {}
        # get input row in dictionary format
        # key = col_name
        dict1.update(blah..) 

        rows_list.append(dict1)

df = pd.DataFrame(rows_list)