How do I calculate cronbach's alpha on multiply imputed data?

I have run a multiple imputation (m=45, 10 iterations) using the MICE package, and want to calculate the cronbach's alpha for a number of ordinal scales in the data. Is there a function in r that could assist me in calculating the alpha coefficient across the imputed datasets in a manner that would satisfy Rubin's rules for pooling estimates?


We may exploit pool.scalar from the mice package, which performs pooling of univariate estimates according to Rubin's rules.

Since you have not provided a reproducible example yourself, I will provide one.

set.seed(123)

# sample survey responses
df <- data.frame(
  x1 = c(1,2,2,3,2,2,3,3,2,3,
         1,2,2,3,2,2,3,3,2,3,
         1,2,2,3,2,2,3,3,2,3),
  x2 = c(1,1,1,2,3,3,2,3,3,3,
         1,1,1,2,3,3,2,3,3,3,
         1,2,2,3,2,2,3,3,2,3),
  x3 = c(1,1,2,1,2,3,3,3,2,3,
         1,1,2,1,2,3,3,3,2,3,
         1,2,2,3,2,2,3,3,2,3)
)

# function to column-wise generate missing values (MCAR)
create_missings <- function(data, prob) {
  x <- replicate(ncol(data),rbinom(nrow(data), 1, prob))
  for(k in 1:ncol(data)) {
    data[, k] <- ifelse(x[, k] == 1, NA, data[,k])
  }
  data
}
df <- create_missings(df, prob = 0.2)

# multiple imputation ----------------------------------

library(mice)
imp <- mice(df, m = 10, maxit = 20)

# extract the completed data in long format
implong <- complete(imp, 'long')

We need a function to compute cronbach's alpha and obtain an estimate of the standard error of alpha, which can be used in a call to pool.scalar() later on. Since there is no available formula with which we can analytically estimate the standard error of alpha, we also need to deploy a bootstrapping procedure to estimate this standard error.

The function cronbach_fun() takes the following arguments:

  • list_compl_data: a character string specifying the list of completed data from a mids object.
  • boot: a logical indicating whether a non-parametrical bootstrap should be conducted.
  • B: an integer specifying the number of bootstrap samples to be taken.
  • ci: a logical indicating whether a confidence interval around alpha should be estimated.
cronbach_fun <- function(list_compl_data, boot = TRUE, B = 1e4, ci = FALSE) {
  n <- nrow(list_compl_data); p <- ncol(list_compl_data)
  total_variance <- var(rowSums(list_compl_data))
  item_variance <- sum(apply(list_compl_data, 2, sd)^2)
  alpha <- (p/(p - 1)) * (1 - (item_variance/total_variance))
  out <- list(alpha = alpha)
  boot_alpha <- numeric(B)
  if (boot) {
    for (i in seq_len(B)) {
      boot_dat <- list_compl_data[sample(seq_len(n), replace = TRUE), ]
      total_variance <- var(rowSums(boot_dat))
      item_variance <- sum(apply(boot_dat, 2, sd)^2)
      boot_alpha[i] <- (p/(p - 1)) * (1 - (item_variance/total_variance))
    }
    out$var <- var(boot_alpha)
  }
  if (ci){
    out$ci <- quantile(boot_alpha, c(.025,.975))
  }
  return(out)
}

Now that we have our function to do the 'heavy lifting', we can run it on all m completed data sets, after which we can obtain Q and U (which are required for the pooling of the estimates). Consult ?pool.scalar for more information.

m <- length(unique(implong$.imp))
boot_alpha <- rep(list(NA), m)
for (i in seq_len(m)) {
  set.seed(i) # fix random number generator
  sub <- implong[implong$.imp == i, -c(1,2)]
  boot_alpha[[i]] <- cronbach_fun(sub)
}

# obtain Q and U (see ?pool.scalar)
Q <- sapply(boot_alpha, function(x) x$alpha)
U <- sapply(boot_alpha, function(x) x$var)

# pooled estimates
pool_estimates <- function(x) {
  out <- c(
    alpha = x$qbar,
    lwr = x$qbar - qt(0.975, x$df) * sqrt(x$t),
    upr = x$qbar + qt(0.975, x$df) * sqrt(x$t)
  )
  return(out)
}

Output

# Pooled estimate of alpha (95% CI)
> pool_estimates(pool.scalar(Q, U))
    alpha       lwr       upr 
0.7809977 0.5776041 0.9843913