R: Transform dataframe column using dictionary/list?
Solution 1:
As there often is, there is a base R function designed to do exactly this. levels<-
is what you want:
df$new.weather <- `levels<-`(df$weather, weather.levels)
df
# weather new.weather
#1 Clear dry
#2 Snow wet
#3 Clear dry
#4 Rain wet
#5 Rain wet
#6 Other other
#7 Hail/sleet wet
#8 Unknown other
In a slightly longer but simpler to read form this is equivalent to:
df$new.weather <- df$weather
levels(df$new.weather) <- weather.levels
Solution 2:
Here's one way using dplyr
-
weather.levels %>%
unlist() %>%
data_frame(new.weather = gsub("[0-9]", "", names(.)), old.weather = .) %>%
left_join(df, ., by = c("weather" = "old.weather"))
weather new.weather
1 Clear dry
2 Snow wet
3 Clear dry
4 Rain wet
5 Rain wet
6 Other other
7 Hail/sleet wet
8 Unknown other
Solution 3:
There are three easy methods. Up front, I'm going to modify the data slightly (remove "Other") to highlight one strength of one of the methods.
df <- data.frame(weather = c('Clear','Snow','Clear','Rain','Rain','Other','Hail/sleet','Unknown'))
weather.levels <- list(
dry = c('Clear', 'Cloudy'),
wet = c('Snow', 'Rain', 'Hail/sleet'),
other = c('Unknown'))
Simple Lookup
levels1 <- c(Unknown="other",Snow="wet",Rain="wet","Hail/sleet"="wet",Clear="dry",Cloudy="dry")
### levels1 <- setNames(rep(names(weather.levels), lengths(weather.levels)), unlist(weather.levels))
transform(df, newwx = levels1[as.character(weather)])
# weather newwx
# 1 Clear dry
# 2 Snow wet
# 3 Clear dry
# 4 Rain wet
# 5 Rain wet
# 6 Other <NA>
# 7 Hail/sleet wet
# 8 Unknown other
(I'm using transform
which is base-R, but you can easily use dplyr
and such if you're more comfortable.)
Table Merge
This is essentially what Shree's answer does (though the concept is not just dplyr
and friends).
df2 <- data.frame(wxfrom = names(levels1), wxto = levels1, stringsAsFactors=FALSE, row.names=NULL)
merge(df, df2, by.x="weather", by.y="wxfrom", all.x=TRUE)
# weather wxto
# 1 Clear dry
# 2 Clear dry
# 3 Hail/sleet wet
# 4 Other <NA>
# 5 Rain wet
# 6 Rain wet
# 7 Snow wet
# 8 Unknown other
Similar to:
dplyr::left_join(df, df2, by=c("weather"="wxfrom"))
Lookup With Default
transform(df, newwx = levels1[ match(as.character(weather), names(levels1), nomatch=1L) ])
# weather newwx
# 1 Clear dry
# 2 Snow wet
# 3 Clear dry
# 4 Rain wet
# 5 Rain wet
# 6 Other other
# 7 Hail/sleet wet
# 8 Unknown other
This last one has the innate ability to assign an unknown to any non-matches. With the others, it is as simple as doing ifelse(is.na(newwx), "unk", newwx)
, so it doesn't add a whole lot.