R two different code chunks to get a p-value but the code evaluates differently and I can't figure out the difference

r p-value

I'm trying to figure out why these two code chunks give me different p-values for Welch's T-Test. I really just tried to do a tidy version of the base R code and create a table with both statistics. But the tidy version I'm using has a very small p-value and I'm confused as to why.

t.test(mpg ~ vs, data = mtcars) # p-value = 0.0001098
t.test(mpg ~ am, data = mtcars) # p-value = 0.001374

options(scipen = 999)
mtcars %>%
  dplyr::select(mpg, vs, am) %>%
  pivot_longer(names_to = 'names', values_to = 'values', 2:3) %>%
  nest(data = -names) %>% 
  mutate(
    test = map(data, ~ t.test(.x$mpg, .x$values)), # S3 list-col
    tidied = map(test, tidy)
  ) %>% 
  unnest(tidied) # vs = 0.000000000000000010038009 and am = 0.000000000000000009611758

Solution 1:

If you run simply:

t.test(mtcars$mpg, mtcars$vs)

You'll get the same values as in your nested data example.

So the issue is not the nesting - it's that you're performing a different kind of t-test. The formula version is treating the variables vs or am as having two groups (0, 1) and the vectorized version is not.

R two different code chunks to get a p-value but the code evaluates differently and I can't figure out the difference

Solution 1:

Related

Recent Posts