pivot_longer a grouped summary statistics data frame

There are multiple instances of _ in the column name so, we may use names_pattern to capture characters as a group i.e. the 'metric' column should get all the characters before the _ (.*) and the value columns (.value) would be one or more characters that are not a _ ([^_]+) till the end ($) of the string in column name

library(dplyr)
library(tidyr)
summary_statistics %>% 
   pivot_longer(cols = -major, names_to = c("metric", ".value"), 
         names_pattern = "(.*)_([^_]+)$")

-output

# A tibble: 32 × 6
   major                         metric                               median     mean       sd skewness
   <chr>                         <chr>                                 <dbl>    <dbl>    <dbl>    <dbl>
 1 Agricultural Production Crops K_int                               34.9    760.     2681.       5.39 
 2 Agricultural Production Crops K_phys                             107.     776.     1781.       3.98 
 3 Agricultural Production Crops Intangibles_intensity                0.325    0.346     0.255    0.382
 4 Agricultural Production Crops g_k_it_to_K_int                      0        0.132     0.217    1.56 
 5 Agricultural Production Crops total_q                              0.764    1.40      1.83     2.70 
 6 Agricultural Production Crops i_phys                               0.0471   0.169     1.63    18.5  
 7 Agricultural Production Crops i_int                                0.0543   0.0885    0.106    2.99 
 8 Agricultural Production Crops i_tot                                0.124    0.258     1.63    18.3  
 9 Agricultural Production Crops c_tot                                0.0907   0.123     0.506   10.8  
10 Agricultural Production Crops operating_activities_net_cash_flow   9.35   144.      500.       4.72 
# … with 22 more rows

With names_sep we can use _ with a regex lookaround to suggest that there are no more _ till the end ($) of the string that follows the _

summary_statistics %>%
  pivot_longer(
    cols = -major,
    names_to = c("metric", ".value"),
    names_sep = "_(?=[^_]+$)"
  )
# A tibble: 32 × 6
   major                         metric                               median     mean       sd skewness
   <chr>                         <chr>                                 <dbl>    <dbl>    <dbl>    <dbl>
 1 Agricultural Production Crops K_int                               34.9    760.     2681.       5.39 
 2 Agricultural Production Crops K_phys                             107.     776.     1781.       3.98 
 3 Agricultural Production Crops Intangibles_intensity                0.325    0.346     0.255    0.382
 4 Agricultural Production Crops g_k_it_to_K_int                      0        0.132     0.217    1.56 
 5 Agricultural Production Crops total_q                              0.764    1.40      1.83     2.70 
 6 Agricultural Production Crops i_phys                               0.0471   0.169     1.63    18.5  
 7 Agricultural Production Crops i_int                                0.0543   0.0885    0.106    2.99 
 8 Agricultural Production Crops i_tot                                0.124    0.258     1.63    18.3  
 9 Agricultural Production Crops c_tot                                0.0907   0.123     0.506   10.8  
10 Agricultural Production Crops operating_activities_net_cash_flow   9.35   144.      500.       4.72 
# … with 22 more rows

pivot_longer a grouped summary statistics data frame

Related

Recent Posts