How to best create a table to display demographics for multiple outcomes?

I am trying to present a table showing demographic information for multiple dichotomous outcomes asked in a survey.

This is an example of what I am starting with:

df1 <- data.frame(ID=c(1,2,3,4,5,6),
                  blondehair=c(0,1,1,0,0,1),
                  ateapple=c(1,1,1,0,1,1),
                  righthanded=c(0,1,1,1,1,0),
                  agecategory=c(1,1,2,2,1,1),
                  educationcategory=c(1,1,2,2,1,1))
df1

table1 <- df1 %>% select(ateapple,agecategory,educationcategory)
colnames(table1) <- c("Percentage that ate apple", "Age Category","Education Level")

table1 %>% tbl_summary(by=`Percentage that ate apple`,
                       statistic = list(all_categorical() ~ "{p}%"),
                                 missing_text = "Missing") %>%
  add_overall(last=TRUE) %>%
  modify_header(label ~ "Demographics")
table1

I was thinking of making a tbl_summary for each of the outcomes I'm interested in and then merging them all together. However, I do not want the table to display both "0 (No)" and "1 (Yes)" categories for the outcomes. I would only like to get the percentage that do have the outcome (i.e, only show percentage that did eat an apple). My real data table has 10 dichotomous outcomes and 7 categorical demographic variables, so I'm a little hesitant to merge so many individual tbl_summary tables.

This is what I am trying to get:

      Percentage that ate an apple    Percentage blonde   Percentage right-handed 
Age Category     
1              67 %                          33%                         33%     
2             17%                            17%                         33%

Education
1        33%                           6%                           33%
2        17%                           3%                           33%

Does R have packages that would be able to assist with this? I was thinking of using tbl_summary but I don't think that will give me what I'm looking for.


Solution 1:

You can use the table1 package (disclaimer: I am the package author). Here is an example using your data:

library(table1)

df1 <- data.frame(ID=c(1,2,3,4,5,6),
                  blondehair=c(0,1,1,0,0,1),
                  ateapple=c(1,1,1,0,1,1),
                  righthanded=c(0,1,1,1,1,0),
                  agecategory=c(1,1,2,2,1,1),
                  educationcategory=c(1,1,2,2,1,1))

# For dichotomous variables, transform to logical
df1$blondehair  <- as.logical(df1$blondehair)
df1$ateapple    <- as.logical(df1$ateapple)
df1$righthanded <- as.logical(df1$righthanded)

# For categorical variables, transform to factor
df1$agecategory       <- factor(df1$agecategory)
df1$educationcategory <- factor(df1$educationcategory)

# Add labels
label(df1$blondehair)        <- "Percentage with blond hair"
label(df1$ateapple)          <- "Percentage that ate an apple"
label(df1$righthanded)       <- "Percentage of right handed"
label(df1$agecategory)       <- "Age Category"
label(df1$educationcategory) <- "Education Level"

rndr <- function(x, ...) {
    y <- stats.apply.rounding(stats.default(x, ...), ...)
    y <- lapply(y, getElement, "PCT")  # Only percent
    if (is.logical(x)) y$Yes else c("", y)
}

table1(~ blondehair + ateapple + righthanded + agecategory + educationcategory, 
    data=df1, render=rndr, overall="Response, %")

table1 output

There are many more options to control the output (trying to understand exactly what you want).

EDIT: fixed typo.