Dynamically generate subset column names for a dataframe using for loop

Solution 1:

You can do it in base R like this with a bit of help from the lubridate package.

year_months <- c('2021-12', '2021-11', '2021-10')  
curr <- lubridate::ym(year_months)
prev <- curr - months(2L)
mapply(function(x, y) {
  df[c(
    "id", 
    format(seq.Date(y, x, by = "month"), "%Y-%m(actual)"), 
    format(x, "%Y-%m(pred)"), 
    format(x, "%Y-%m(error)")
  )]
}, curr, prev, SIMPLIFY = FALSE)

Output

[[1]]
        id 2021-10(actual) 2021-11(actual) 2021-12(actual) 2021-12(pred) 2021-12(error)
1 M0000607             8.9             7.3             6.1      6.113632      0.7198461
2 M0000609            15.7            14.8            14.2     14.162432      0.1544640
3 M0000612             5.3             3.1             3.5      3.288373      1.2259926

[[2]]
        id 2021-09(actual) 2021-10(actual) 2021-11(actual) 2021-11(pred) 2021-11(error)
1 M0000607            10.3             8.9             7.3      8.352098      1.9981091
2 M0000609            17.3            15.7            14.8     13.973182      0.4143733
3 M0000612             6.4             5.3             3.1      3.164683      0.3420726

[[3]]
        id 2021-08(actual) 2021-09(actual) 2021-10(actual) 2021-10(pred) 2021-10(error)
1 M0000607            12.6            10.3             8.9      9.619846      0.9455678
2 M0000609            19.2            17.3            15.7     15.545536      4.8832500
3 M0000612             8.3             6.4             5.3      6.525993      1.2158196

If you want to apply a plot function to the selected dataframe, then

year_months <- c('2021-12', '2021-11', '2021-10')  
curr <- lubridate::ym(year_months)
prev <- curr - months(2L)
plots <- mapply(function(x, y) {
  plot_fun(df[c(
    "id", 
    format(seq.Date(y, x, by = "month"), "%Y-%m(actual)"), 
    format(x, "%Y-%m(pred)"), 
    format(x, "%Y-%m(error)")
  )])
}, curr, prev, SIMPLIFY = FALSE)

gives you a list of (gg)plots.


Update (to also select last year of the current month). However, you need to ensure that the columns you want to select exist in the dataframe; otherwise, you will get an error.

year_months <- c('2021-12', '2021-11', '2021-10')  
curr <- lubridate::ym(year_months)
prev <- curr - months(2L)
mapply(function(x, y) {
  df[c(
    "id", 
    format(c(x - lubridate::years(1L), seq.Date(y, x, by = "month")), "%Y-%m(actual)"),  
    format(x, "%Y-%m(pred)"), 
    format(x, "%Y-%m(error)")
  )]
}, curr, prev, SIMPLIFY = FALSE)