R: Transform dataframe column using dictionary/list?

Solution 1:

As there often is, there is a base R function designed to do exactly this. levels<- is what you want:

df$new.weather <- `levels<-`(df$weather, weather.levels)
df
#     weather new.weather
#1      Clear         dry
#2       Snow         wet
#3      Clear         dry
#4       Rain         wet
#5       Rain         wet
#6      Other       other
#7 Hail/sleet         wet
#8    Unknown       other

In a slightly longer but simpler to read form this is equivalent to:

df$new.weather <- df$weather
levels(df$new.weather) <- weather.levels

Solution 2:

Here's one way using dplyr -

weather.levels %>% 
  unlist() %>% 
  data_frame(new.weather = gsub("[0-9]", "", names(.)), old.weather = .) %>% 
  left_join(df, ., by = c("weather" = "old.weather"))

     weather new.weather
1      Clear         dry
2       Snow         wet
3      Clear         dry
4       Rain         wet
5       Rain         wet
6      Other       other
7 Hail/sleet         wet
8    Unknown       other

Solution 3:

There are three easy methods. Up front, I'm going to modify the data slightly (remove "Other") to highlight one strength of one of the methods.

df <- data.frame(weather = c('Clear','Snow','Clear','Rain','Rain','Other','Hail/sleet','Unknown'))
weather.levels <- list(
  dry = c('Clear', 'Cloudy'),
  wet = c('Snow', 'Rain', 'Hail/sleet'),
  other = c('Unknown'))

Simple Lookup

levels1 <- c(Unknown="other",Snow="wet",Rain="wet","Hail/sleet"="wet",Clear="dry",Cloudy="dry")
### levels1 <- setNames(rep(names(weather.levels), lengths(weather.levels)), unlist(weather.levels))
transform(df, newwx = levels1[as.character(weather)])
#      weather newwx
# 1      Clear   dry
# 2       Snow   wet
# 3      Clear   dry
# 4       Rain   wet
# 5       Rain   wet
# 6      Other  <NA>
# 7 Hail/sleet   wet
# 8    Unknown other

(I'm using transform which is base-R, but you can easily use dplyr and such if you're more comfortable.)

Table Merge

This is essentially what Shree's answer does (though the concept is not just dplyr and friends).

df2 <- data.frame(wxfrom = names(levels1), wxto = levels1, stringsAsFactors=FALSE, row.names=NULL)
merge(df, df2, by.x="weather", by.y="wxfrom", all.x=TRUE)
#      weather  wxto
# 1      Clear   dry
# 2      Clear   dry
# 3 Hail/sleet   wet
# 4      Other  <NA>
# 5       Rain   wet
# 6       Rain   wet
# 7       Snow   wet
# 8    Unknown other

Similar to:

dplyr::left_join(df, df2, by=c("weather"="wxfrom"))

Lookup With Default

transform(df, newwx = levels1[ match(as.character(weather), names(levels1), nomatch=1L) ])
#      weather newwx
# 1      Clear   dry
# 2       Snow   wet
# 3      Clear   dry
# 4       Rain   wet
# 5       Rain   wet
# 6      Other other
# 7 Hail/sleet   wet
# 8    Unknown other

This last one has the innate ability to assign an unknown to any non-matches. With the others, it is as simple as doing ifelse(is.na(newwx), "unk", newwx), so it doesn't add a whole lot.