Populating a data frame in R in a loop

I am trying to populate a data frame from within a for loop in R. The names of the columns are generated dynamically within the loop and the value of some of the loop variables is used as the values while populating the data frame. For instance the name of the current column could be some variable name as a string in the loop, and the column can take the value of the current iterator as its value in the data frame.

I tried to create an empty data frame outside the loop, like this

d = data.frame()

But I cant really do anything with it, the moment I try to populate it, I run into an error

 d[1] = c(1,2)
Error in `[<-.data.frame`(`*tmp*`, 1, value = c(1, 2)) : 
  replacement has 2 rows, data has 0

What may be a good way to achieve what I am looking to do. Please let me know if I wasnt clear.


Solution 1:

You could do it like this:

 iterations = 10
 variables = 2

 output <- matrix(ncol=variables, nrow=iterations)

 for(i in 1:iterations){
  output[i,] <- runif(2)

 }

 output

and then turn it into a data.frame

 output <- data.frame(output)
 class(output)

what this does:

  1. create a matrix with rows and columns according to the expected growth
  2. insert 2 random numbers into the matrix
  3. convert this into a dataframe after the loop has finished.

Solution 2:

It is often preferable to avoid loops and use vectorized functions. If that is not possible there are two approaches:

  1. Preallocate your data.frame. This is not recommended because indexing is slow for data.frames.
  2. Use another data structure in the loop and transform into a data.frame afterwards. A list is very useful here.

Example to illustrate the general approach:

mylist <- list() #create an empty list

for (i in 1:5) {
  vec <- numeric(5) #preallocate a numeric vector
  for (j in 1:5) { #fill the vector
    vec[j] <- i^j 
  }
  mylist[[i]] <- vec #put all vectors in the list
}
df <- do.call("rbind",mylist) #combine all vectors into a matrix

In this example it is not necessary to use a list, you could preallocate a matrix. However, if you do not know how many iterations your loop will need, you should use a list.

Finally here is a vectorized alternative to the example loop:

outer(1:5,1:5,function(i,j) i^j)

As you see it's simpler and also more efficient.

Solution 3:

this works too.

df = NULL
for (k in 1:10)
    {
       x = 1
       y = 2
       z = 3
       df = rbind(df, data.frame(x,y,z))
     }

output will look like this

df #enter

x y z #col names
1 2 3

Solution 4:

Thanks Notable1, works for me with the tidytextr Create a dataframe with the name of files in one column and content in other.

    diretorio <- "D:/base"
    arquivos <- list.files(diretorio, pattern = "*.PDF")
    quantidade <- length(arquivos)

#
df = NULL
for (k in 1:quantidade) {

      nome = arquivos[k]
      print(nome)
      Sys.sleep(1) 
      dados = read_pdf(arquivos[k],ocr = T)
      print(dados)
      Sys.sleep(1)
      df = rbind(df, data.frame(nome,dados))
      Sys.sleep(1)
}
Encoding(df$text) <- "UTF-8"