How do I subset data table columns based on column contents rather than name in R?
I am trying to use RStudio to generate lists of species presence in several regions, where each region contains >10 separate survey plots in which presence/absence of >50 species was recorded. Lists will be character vectors of species names.
Here is a dummy data table in the format I'm using, where presence at a site is indicated by 1 and absence by 0:
dummy_dt <- data.table(region=c("north","north","south","south"),
site=c("a","b","a","b"),
species_1=c(1,0,0,0),
species_2=c(0,1,0,0),
species_3=c(0,1,1,1),
species_4=c(0,0,1,1))
Species 1, 2, and 3 are present in at least one "north" region site and species 3 and 4 are present in at least one "south" region site. I am interested only in presence/absence data at the regional level and not number or fraction of occupied sites within a region (site codes "a" and "b" are included in dummy_dt
to make it clear that each region contains >1 site).
I assume that I will need to subset dummy_dt
by region as below before proceeding:
north_dt <- dummy_dt[region == "north"]
south_dt <- dummy_dt[region == "south"]
By hand I can easily generate a species list for each region as a character vector conducive to calculation of a Jaccard similarity coefficient:
north_list <- c("species_1","species_2","species_3")
south_list <- c("species_3","species_4")
Is it possible to automate the generation of character vectors like those above, where elements of the vector are names of columns which contain one or more 1 (either using the subsetted data tables north_dt
and south_dt
or the original data table dummy_dt
)?
Solution 1:
tmp <- melt(dummy_dt, id.vars = c("region", "site"), variable.factor = FALSE)[ value > 0,]
tmp
# region site variable value
# <char> <char> <char> <num>
# 1: north a species_1 1
# 2: north b species_2 1
# 3: north b species_3 1
# 4: south a species_3 1
# 5: south b species_3 1
# 6: south a species_4 1
# 7: south b species_4 1
lapply(split(tmp$variable, tmp$region), unique)
# $north
# [1] "species_1" "species_2" "species_3"
# $south
# [1] "species_3" "species_4"