Structure of variables not recognised when dataframe is a tibble
I have made a function which assesses the structure of an input variable and then performs conditional descriptive statistics depending on what the variable is, means and sd for numeric and frequencies and proportions for factors.
However, when the dataframe is a tibble the method I have used to identify the structure of the variable doesn't seem to work. Here is some toy data
set.seed(123)
df <- tibble(a = round(rnorm(5),1),
b = factor(letters[1:5]))
glimpse(df)
# Output
# Rows: 5
# Columns: 2
# $ a <dbl> -0.6, -0.2, 1.6, 0.1, 0.1
# $ b <fct> a, b, c, d, e
Now if we ask R what type of variable each column is using the is.x()
suite of functions it fails
is.numeric(df[,"a"])
# [1] FALSE
is.factor(df[,"b"])
# [1] FALSE
But, if we turn the dataframe to a data.frame
type object it identifies them correctly
df <- as.data.frame(df)
is.numeric(df[,"a"])
# [1] TRUE
is.factor(df[,"b"])
# [1] TRUE
Now of course I could just convert the data.frame to a tibble in my function, but I was just curious how to get the result I got with the data.frame with the tibble, or some equivalent workaround?
The answer is to use [[
to subset the columns from tibble or a dataframe which would give you consistent results. To differentiate between dataframe and tibble let's call the tibble variable as df_tib
and dataframe variable as df_dat
.
df_tib <- df
df_dat <- data.frame(df)
is.numeric(df_tib[['a']])
#[1] TRUE
is.numeric(df_dat[['a']])
#[1] TRUE
is.factor(df_tib[['b']])
#[1] TRUE
is.factor(df_dat[['b']])
#[1] TRUE
The reason why the issue occurs is how they (dataframe and tibble) react while subsetting with [
.
df_tib[, 'a']
# A tibble: 5 x 1
# a
# <dbl>
#1 -0.6
#2 -0.2
#3 1.6
#4 0.1
#5 0.1
df_dat[, 'a']
#[1] -0.6 -0.2 1.6 0.1 0.1
df_tib
returns a tibble when you subset with [
whereas since you have a single column in df_dat
it returns a vector. is.factor
and is.numeric
would always return FALSE
on dataframe/tibble object.