How can I extract model summary from multiple tidymodels objects using purrr::map functions in R?
I want to use purrr::map_* functions to extract info from multiple models involving linear regression method. I am first creating some random dataset. The dataset has three dependent variables, and one independent variable.
set.seed(123)
library(olsrr)
#;-)
#;-) Attaching package: 'olsrr'
#;-) The following object is masked from 'package:datasets':
#;-)
#;-) rivers
y1 <- rnorm(100) + runif(100)
y2 <- rnorm(100) + runif(100)
y3 <- rnorm(100) + runif(100)
x <- rnorm(100)
df1 <- data.frame(y1, y2, y3, x)
After creating data, now I am creating three models. This time, I am using the base function lm() to create the model. I am using lm() function to demonstrate that it works without using tidymodels framework.
m1 <- lm(y1 ~ x, data = df1)
m2 <- lm(y2 ~ x, data = df1)
m3 <- lm(y3 ~ x, data = df1)
Now I attempt to fetch summary of these models through purrr::map function. I am first creating a list of models, and then use map function.
list_of_models <- list(m1, m2, m3)
model_summaries <- purrr::map(list_of_models, summary)
model_normality <- purrr::map(list_of_models, olsrr::ols_test_normality)
model_normality
#;-) [[1]]
#;-) -----------------------------------------------
#;-) Test Statistic pvalue
#;-) -----------------------------------------------
#;-) Shapiro-Wilk 0.9872 0.4517
#;-) Kolmogorov-Smirnov 0.0865 0.4423
#;-) Cramer-von Mises 8.9238 0.0000
#;-) Anderson-Darling 0.5807 0.1276
#;-) -----------------------------------------------
#;-)
#;-) [[2]]
#;-) -----------------------------------------------
#;-) Test Statistic pvalue
#;-) -----------------------------------------------
#;-) Shapiro-Wilk 0.9499 8e-04
#;-) Kolmogorov-Smirnov 0.1252 0.0869
#;-) Cramer-von Mises 9.115 0.0000
#;-) Anderson-Darling 1.6102 4e-04
#;-) -----------------------------------------------
#;-)
#;-) [[3]]
#;-) -----------------------------------------------
#;-) Test Statistic pvalue
#;-) -----------------------------------------------
#;-) Shapiro-Wilk 0.99 0.6685
#;-) Kolmogorov-Smirnov 0.0551 0.9213
#;-) Cramer-von Mises 5.8198 0.0000
#;-) Anderson-Darling 0.3632 0.4347
#;-) -----------------------------------------------
So, I can see that it is possible to use map function to fetch model summary, and the test of residual normality when we use base function lm(). But I want to use tidymodels framework for my analysis.
library(tidymodels)
#;-) Registered S3 method overwritten by 'tune':
#;-) method from
#;-) required_pkgs.model_spec parsnip
#;-) -- Attaching packages --------------------------------- tidymodels 0.1.4.9000 --
#;-) v broom 0.7.11 v recipes 0.1.17
#;-) v dials 0.0.10 v rsample 0.1.1
#;-) v dplyr 1.0.7 v tibble 3.1.6
#;-) v ggplot2 3.3.5 v tidyr 1.1.4
#;-) v infer 1.0.0 v tune 0.1.6
#;-) v modeldata 0.1.1 v workflows 0.2.4
#;-) v parsnip 0.1.7 v workflowsets 0.1.0
#;-) v purrr 0.3.4 v yardstick 0.0.9
#;-) -- Conflicts ----------------------------------------- tidymodels_conflicts() --
#;-) x purrr::discard() masks scales::discard()
#;-) x dplyr::filter() masks stats::filter()
#;-) x dplyr::lag() masks stats::lag()
#;-) x recipes::step() masks stats::step()
#;-) * Dig deeper into tidy modeling with R at https://www.tmwr.org
lm_spec <- linear_reg()
lm_fit1 <- fit(lm_spec, y1 ~ x, data = df1)
lm_fit2 <- fit(lm_spec, y2 ~ x, data = df1)
lm_fit3 <- fit(lm_spec, y3 ~ x, data = df1)
list_tidymodels <- c(lm_fit1, lm_fit2, lm_fit3)
get_lm_coefs <- function(x) {
x %>%
# get the lm model object
extract_fit_engine() %>%
# transform its format
tidy()
}
map(list_tidymodels, get_lm_coefs)
#;-) Error in UseMethod("extract_fit_engine"): no applicable method for 'extract_fit_engine' applied to an object of class "NULL"
The above command gives error. Now finally, I am trying to find summary of each model individually without using map function.
lm_fit1 %>%
extract_fit_engine() %>%
summary()
#;-)
#;-) Call:
#;-) stats::lm(formula = y1 ~ x, data = data)
#;-)
#;-) Residuals:
#;-) Min 1Q Median 3Q Max
#;-) -2.90637 -0.57659 0.02682 0.46190 2.45272
#;-)
#;-) Coefficients:
#;-) Estimate Std. Error t value Pr(>|t|)
#;-) (Intercept) 0.59226 0.09883 5.993 3.44e-08 ***
#;-) x -0.10387 0.10697 -0.971 0.334
#;-) ---
#;-) Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#;-)
#;-) Residual standard error: 0.9748 on 98 degrees of freedom
#;-) Multiple R-squared: 0.00953, Adjusted R-squared: -0.0005771
#;-) F-statistic: 0.9429 on 1 and 98 DF, p-value: 0.3339
lm_fit1 %>%
extract_fit_engine() %>%
olsrr::ols_test_normality()
#;-) -----------------------------------------------
#;-) Test Statistic pvalue
#;-) -----------------------------------------------
#;-) Shapiro-Wilk 0.9872 0.4517
#;-) Kolmogorov-Smirnov 0.0865 0.4423
#;-) Cramer-von Mises 8.9238 0.0000
#;-) Anderson-Darling 0.5807 0.1276
#;-) -----------------------------------------------
So, it is clear that the errors come only when I combine tidymodels with map function. What am I doing wrong here?
Solution 1:
The list_tidymodels
needs to be created with list()
and not with c()
.
set.seed(123)
library(olsrr)
library(tidymodels)
y1 <- rnorm(100) + runif(100)
y2 <- rnorm(100) + runif(100)
y3 <- rnorm(100) + runif(100)
x <- rnorm(100)
df1 <- data.frame(y1, y2, y3, x)
lm_spec <- linear_reg()
lm_fit1 <- fit(lm_spec, y1 ~ x, data = df1)
lm_fit2 <- fit(lm_spec, y2 ~ x, data = df1)
lm_fit3 <- fit(lm_spec, y3 ~ x, data = df1)
list_tidymodels <- list(lm_fit1, lm_fit2, lm_fit3) # this line used `c(lm_fit1, ...)`.
get_lm_coefs <- function(x) {
x %>%
# get the lm model object
extract_fit_engine() %>%
# transform its format
tidy()
}
map(list_tidymodels, get_lm_coefs)
#> [[1]]
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 0.592 0.0988 5.99 0.0000000344
#> 2 x -0.104 0.107 -0.971 0.334
#>
#> [[2]]
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 0.523 0.101 5.16 0.00000129
#> 2 x 0.0705 0.110 0.642 0.522
#>
#> [[3]]
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 0.443 0.114 3.87 0.000197
#> 2 x 0.0437 0.124 0.353 0.725
Created on 2022-01-20 by the reprex package (v2.0.1)