How can I extract model summary from multiple tidymodels objects using purrr::map functions in R?

I want to use purrr::map_* functions to extract info from multiple models involving linear regression method. I am first creating some random dataset. The dataset has three dependent variables, and one independent variable.

set.seed(123)
library(olsrr)
#;-) 
#;-) Attaching package: 'olsrr'
#;-) The following object is masked from 'package:datasets':
#;-) 
#;-)     rivers
y1 <- rnorm(100) + runif(100)
y2 <- rnorm(100) + runif(100)
y3 <- rnorm(100) + runif(100)
x <- rnorm(100)
df1 <- data.frame(y1, y2, y3, x)

After creating data, now I am creating three models. This time, I am using the base function lm() to create the model. I am using lm() function to demonstrate that it works without using tidymodels framework.

m1 <- lm(y1 ~ x, data = df1)
m2 <- lm(y2 ~ x, data = df1)
m3 <- lm(y3 ~ x, data = df1)

Now I attempt to fetch summary of these models through purrr::map function. I am first creating a list of models, and then use map function.

list_of_models <- list(m1, m2, m3)
model_summaries <- purrr::map(list_of_models, summary)
model_normality <- purrr::map(list_of_models, olsrr::ols_test_normality)
model_normality
#;-) [[1]]
#;-) -----------------------------------------------
#;-)        Test             Statistic       pvalue  
#;-) -----------------------------------------------
#;-) Shapiro-Wilk              0.9872         0.4517 
#;-) Kolmogorov-Smirnov        0.0865         0.4423 
#;-) Cramer-von Mises          8.9238         0.0000 
#;-) Anderson-Darling          0.5807         0.1276 
#;-) -----------------------------------------------
#;-) 
#;-) [[2]]
#;-) -----------------------------------------------
#;-)        Test             Statistic       pvalue  
#;-) -----------------------------------------------
#;-) Shapiro-Wilk              0.9499          8e-04 
#;-) Kolmogorov-Smirnov        0.1252         0.0869 
#;-) Cramer-von Mises          9.115          0.0000 
#;-) Anderson-Darling          1.6102          4e-04 
#;-) -----------------------------------------------
#;-) 
#;-) [[3]]
#;-) -----------------------------------------------
#;-)        Test             Statistic       pvalue  
#;-) -----------------------------------------------
#;-) Shapiro-Wilk               0.99          0.6685 
#;-) Kolmogorov-Smirnov        0.0551         0.9213 
#;-) Cramer-von Mises          5.8198         0.0000 
#;-) Anderson-Darling          0.3632         0.4347 
#;-) -----------------------------------------------

So, I can see that it is possible to use map function to fetch model summary, and the test of residual normality when we use base function lm(). But I want to use tidymodels framework for my analysis.

library(tidymodels)
#;-) Registered S3 method overwritten by 'tune':
#;-)   method                   from   
#;-)   required_pkgs.model_spec parsnip
#;-) -- Attaching packages --------------------------------- tidymodels 0.1.4.9000 --
#;-) v broom        0.7.11     v recipes      0.1.17
#;-) v dials        0.0.10     v rsample      0.1.1 
#;-) v dplyr        1.0.7      v tibble       3.1.6 
#;-) v ggplot2      3.3.5      v tidyr        1.1.4 
#;-) v infer        1.0.0      v tune         0.1.6 
#;-) v modeldata    0.1.1      v workflows    0.2.4 
#;-) v parsnip      0.1.7      v workflowsets 0.1.0 
#;-) v purrr        0.3.4      v yardstick    0.0.9
#;-) -- Conflicts ----------------------------------------- tidymodels_conflicts() --
#;-) x purrr::discard() masks scales::discard()
#;-) x dplyr::filter()  masks stats::filter()
#;-) x dplyr::lag()     masks stats::lag()
#;-) x recipes::step()  masks stats::step()
#;-) * Dig deeper into tidy modeling with R at https://www.tmwr.org
lm_spec <- linear_reg()
lm_fit1 <- fit(lm_spec, y1 ~ x, data = df1)
lm_fit2 <- fit(lm_spec, y2 ~ x, data = df1)
lm_fit3 <- fit(lm_spec, y3 ~ x, data = df1)
list_tidymodels <- c(lm_fit1, lm_fit2, lm_fit3)
get_lm_coefs <- function(x) {
  x %>%
    # get the lm model object
    extract_fit_engine() %>%
    # transform its format
    tidy()
}
map(list_tidymodels, get_lm_coefs)
#;-) Error in UseMethod("extract_fit_engine"): no applicable method for 'extract_fit_engine' applied to an object of class "NULL"

The above command gives error. Now finally, I am trying to find summary of each model individually without using map function.

lm_fit1 %>%
  extract_fit_engine() %>%
  summary()
#;-) 
#;-) Call:
#;-) stats::lm(formula = y1 ~ x, data = data)
#;-) 
#;-) Residuals:
#;-)      Min       1Q   Median       3Q      Max 
#;-) -2.90637 -0.57659  0.02682  0.46190  2.45272 
#;-) 
#;-) Coefficients:
#;-)             Estimate Std. Error t value Pr(>|t|)    
#;-) (Intercept)  0.59226    0.09883   5.993 3.44e-08 ***
#;-) x           -0.10387    0.10697  -0.971    0.334    
#;-) ---
#;-) Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#;-) 
#;-) Residual standard error: 0.9748 on 98 degrees of freedom
#;-) Multiple R-squared:  0.00953,  Adjusted R-squared:  -0.0005771 
#;-) F-statistic: 0.9429 on 1 and 98 DF,  p-value: 0.3339
lm_fit1 %>%
  extract_fit_engine() %>%
  olsrr::ols_test_normality()
#;-) -----------------------------------------------
#;-)        Test             Statistic       pvalue  
#;-) -----------------------------------------------
#;-) Shapiro-Wilk              0.9872         0.4517 
#;-) Kolmogorov-Smirnov        0.0865         0.4423 
#;-) Cramer-von Mises          8.9238         0.0000 
#;-) Anderson-Darling          0.5807         0.1276 
#;-) -----------------------------------------------

So, it is clear that the errors come only when I combine tidymodels with map function. What am I doing wrong here?


Solution 1:

The list_tidymodels needs to be created with list() and not with c().

set.seed(123)

library(olsrr)
library(tidymodels)

y1 <- rnorm(100) + runif(100)
y2 <- rnorm(100) + runif(100)
y3 <- rnorm(100) + runif(100)
x <- rnorm(100)
df1 <- data.frame(y1, y2, y3, x)

lm_spec <- linear_reg()
lm_fit1 <- fit(lm_spec, y1 ~ x, data = df1)
lm_fit2 <- fit(lm_spec, y2 ~ x, data = df1)
lm_fit3 <- fit(lm_spec, y3 ~ x, data = df1)

list_tidymodels <- list(lm_fit1, lm_fit2, lm_fit3) # this line used `c(lm_fit1, ...)`.

get_lm_coefs <- function(x) {
  x %>%
    # get the lm model object
    extract_fit_engine() %>%
    # transform its format
    tidy()
}

map(list_tidymodels, get_lm_coefs)

#> [[1]]
#> # A tibble: 2 x 5
#>   term        estimate std.error statistic      p.value
#>   <chr>          <dbl>     <dbl>     <dbl>        <dbl>
#> 1 (Intercept)    0.592    0.0988     5.99  0.0000000344
#> 2 x             -0.104    0.107     -0.971 0.334       
#> 
#> [[2]]
#> # A tibble: 2 x 5
#>   term        estimate std.error statistic    p.value
#>   <chr>          <dbl>     <dbl>     <dbl>      <dbl>
#> 1 (Intercept)   0.523      0.101     5.16  0.00000129
#> 2 x             0.0705     0.110     0.642 0.522     
#> 
#> [[3]]
#> # A tibble: 2 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)   0.443      0.114     3.87  0.000197
#> 2 x             0.0437     0.124     0.353 0.725

Created on 2022-01-20 by the reprex package (v2.0.1)