How to best create a table to display demographics for multiple outcomes?
I am trying to present a table showing demographic information for multiple dichotomous outcomes asked in a survey.
This is an example of what I am starting with:
df1 <- data.frame(ID=c(1,2,3,4,5,6),
blondehair=c(0,1,1,0,0,1),
ateapple=c(1,1,1,0,1,1),
righthanded=c(0,1,1,1,1,0),
agecategory=c(1,1,2,2,1,1),
educationcategory=c(1,1,2,2,1,1))
df1
table1 <- df1 %>% select(ateapple,agecategory,educationcategory)
colnames(table1) <- c("Percentage that ate apple", "Age Category","Education Level")
table1 %>% tbl_summary(by=`Percentage that ate apple`,
statistic = list(all_categorical() ~ "{p}%"),
missing_text = "Missing") %>%
add_overall(last=TRUE) %>%
modify_header(label ~ "Demographics")
table1
I was thinking of making a tbl_summary for each of the outcomes I'm interested in and then merging them all together. However, I do not want the table to display both "0 (No)" and "1 (Yes)" categories for the outcomes. I would only like to get the percentage that do have the outcome (i.e, only show percentage that did eat an apple). My real data table has 10 dichotomous outcomes and 7 categorical demographic variables, so I'm a little hesitant to merge so many individual tbl_summary tables.
This is what I am trying to get:
Percentage that ate an apple Percentage blonde Percentage right-handed
Age Category
1 67 % 33% 33%
2 17% 17% 33%
Education
1 33% 6% 33%
2 17% 3% 33%
Does R have packages that would be able to assist with this? I was thinking of using tbl_summary but I don't think that will give me what I'm looking for.
Solution 1:
You can use the table1
package (disclaimer: I am the package author). Here is an example using your data:
library(table1)
df1 <- data.frame(ID=c(1,2,3,4,5,6),
blondehair=c(0,1,1,0,0,1),
ateapple=c(1,1,1,0,1,1),
righthanded=c(0,1,1,1,1,0),
agecategory=c(1,1,2,2,1,1),
educationcategory=c(1,1,2,2,1,1))
# For dichotomous variables, transform to logical
df1$blondehair <- as.logical(df1$blondehair)
df1$ateapple <- as.logical(df1$ateapple)
df1$righthanded <- as.logical(df1$righthanded)
# For categorical variables, transform to factor
df1$agecategory <- factor(df1$agecategory)
df1$educationcategory <- factor(df1$educationcategory)
# Add labels
label(df1$blondehair) <- "Percentage with blond hair"
label(df1$ateapple) <- "Percentage that ate an apple"
label(df1$righthanded) <- "Percentage of right handed"
label(df1$agecategory) <- "Age Category"
label(df1$educationcategory) <- "Education Level"
rndr <- function(x, ...) {
y <- stats.apply.rounding(stats.default(x, ...), ...)
y <- lapply(y, getElement, "PCT") # Only percent
if (is.logical(x)) y$Yes else c("", y)
}
table1(~ blondehair + ateapple + righthanded + agecategory + educationcategory,
data=df1, render=rndr, overall="Response, %")
There are many more options to control the output (trying to understand exactly what you want).
EDIT: fixed typo.