Transpose multiple sub-headers into factor column in R

Here is one possibility (assuming that all sub-headers begin with Fuente) using tidyverse. Here, I create a grouping column (idx) by gathering all rows until Fuente appears in a subsequent row. Then, I split those into individual dataframes and put into a list. Then, I use map to apply a function to that list. I extract the text after Fuente, then copy that to all rows in that dataframe. Finally, I bind the list of dataframes back together.

Tidyverse

library(tidyverse)

df %>%
  group_by(idx = cumsum(str_detect(fecha, "Fuente"))) %>%
  group_split(., .keep = FALSE) %>%
  map(., function(x)
    x %>%
      mutate(Fuente = sub('.*:\\s*', "", fecha)[1]) %>%
      slice(-1)) %>%
  bind_rows()

Or if you do have other sub-headers besides Fuente, then you could use "[a-z]" in group_by(idx = cumsum(str_detect(fecha, "[a-z]")), rather than "Fuente".

Data.table

Another option using data.table:

setDT(dt)[, Fuente := ifelse(grepl(':', df$fecha, fixed = TRUE),
                             sub('.*:\\s*', "", df$fecha), NA)]
dt[, Fuente := Fuente[nafill(replace(.I, is.na(Fuente), NA), "locf")]]
dt <- dt[!grepl("Fuente", dt$fecha),]

Output

   fecha      producto          origen     pmin   pmax   pfrec  obs   category Fuente               
   <chr>      <chr>             <chr>      <chr>  <chr>  <chr>  <chr> <chr>    <chr>                
 1 20/02/2020 Aleta de raya     Tabasco    23.00  27.00  25.00  ""    pescado  La Nueva Viga, DF    
 2 20/02/2020 Bandera           Campeche   35.00  39.00  37.00  ""    pescado  La Nueva Viga, DF    
 3 20/02/2020 Besugo            Veracruz   15.00  19.00  17.00  ""    pescado  La Nueva Viga, DF    
 4 20/02/2020 Cazón con cabeza  Veracruz   60.00  65.00  63.00  ""    pescado  La Nueva Viga, DF    
 5 20/02/2020 Huachinango Golfo Tamaulipas 165.00 200.00 190.00 "OBS" pescado  Monterrey, Nuevo León
 6 20/02/2020 Pampano           Tamaulipas --     --     195.00 "OBS" pescado  Monterrey, Nuevo León
 7 20/02/2020 Sargo             Tamaulipas --     --     84.00  "OBS" pescado  Monterrey, Nuevo León
 8 20/02/2020 Trucha marina     Tamaulipas --     --     98.00  "OBS" pescado  Monterrey, Nuevo León
 9 17/02/2020 Huachinango Golfo Tamaulipas 210.00 220.00 215.00 "OBS" pescado  Monterrey, Nuevo León
10 17/02/2020 Pampano           Tamaulipas --     --     195.00 "OBS" pescado  Monterrey, Nuevo León

Benchmark

data.table is faster than any of the tidyverse options

enter image description here

Here is an alternative tidyverse approach, main points are adding a new column with add_column from tibble package and data wrangling:

filter all rows that contain Fuente
bind to original df to get equal column lengths!
add a new column by getting long form of df filtering and wrangling
remove with filter ! rows containing Fuente:

library(tidyverse)

df %>% 
  filter(if_any(everything(), ~str_detect(., "Fuente"))) %>% 
  bind_rows(df) %>% 
  add_column(df %>% 
               pivot_longer(everything(), values_to = "Fuente") %>% 
               filter(str_detect(Fuente, "Fuente")) %>% 
               mutate(Fuente = sub('.*:', '', Fuente)) %>% 
               select(-name)
             )%>% 
  filter(!if_any(everything(), ~str_detect(fecha, "Fuente:")))

        fecha          producto     origen   pmin   pmax  pfrec obs category                Fuente
1  20/02/2020     Aleta de raya    Tabasco  23.00  27.00  25.00      pescado     La Nueva Viga, DF
2  20/02/2020           Bandera   Campeche  35.00  39.00  37.00      pescado     La Nueva Viga, DF
3  20/02/2020            Besugo   Veracruz  15.00  19.00  17.00      pescado     La Nueva Viga, DF
4  20/02/2020  Cazón con cabeza   Veracruz  60.00  65.00  63.00      pescado     La Nueva Viga, DF
5  20/02/2020 Huachinango Golfo Tamaulipas 165.00 200.00 190.00 OBS  pescado Monterrey, Nuevo León
6  20/02/2020           Pampano Tamaulipas     --     -- 195.00 OBS  pescado Monterrey, Nuevo León
7  20/02/2020             Sargo Tamaulipas     --     --  84.00 OBS  pescado Monterrey, Nuevo León
8  20/02/2020     Trucha marina Tamaulipas     --     --  98.00 OBS  pescado Monterrey, Nuevo León
9  17/02/2020 Huachinango Golfo Tamaulipas 210.00 220.00 215.00 OBS  pescado Monterrey, Nuevo León
10 17/02/2020           Pampano Tamaulipas     --     -- 195.00 OBS  pescado Monterrey, Nuevo León

Transpose multiple sub-headers into factor column in R

Related

Recent Posts