Creating a new variable from a lookup table
Here is how to use a named vector for the lookup:
Define test data:
dat <- data.frame(
presult = c(rep("I", 4), "SS", "ZZ"),
aresult = c("single", "double", "triple", "home run", "strikeout", "home run"),
stringsAsFactors=FALSE
)
Define a named numeric vector with the scores:
score <- c(single=1, double=2, triple=3, `home run`=4, strikeout=0)
Use vector indexing to match the scores against results:
dat$base <- score[dat$aresult]
dat
presult aresult base
1 I single 1
2 I double 2
3 I triple 3
4 I home run 4
5 SS strikeout 0
6 ZZ home run 4
Additional information:
If you don't wish to construct the named vector by hand, say in the case where you have large amounts of data, then do it as follows:
scores <- c(1:4, 5)
names(scores) <- c("single", "double", "triple", "home run", "strikeout")
(Or read the values and names from existing data. The point is to construct a numeric vector and then assign names.)
define your lookup table
lookup= data.frame(
base=c(0,1,2,3,4),
aresult=c("strikeout","single","double","triple","home run"))
then use join from plyr
dataset = join(dataset,lookup,by='aresult')
An alternative to Dieter's answer:
dat <- data.frame(
presult = c(rep("I", 4), "SS", "ZZ"),
aresult = c("single", "double", "triple", "home run", "strikeout", "home run"),
stringsAsFactors=FALSE
)
dat$base <- as.integer(factor(dat$aresult,
levels=c("strikeout","single","double","triple","home run")))-1
dataset$base <- as.integer(as.factor(dataset$aresult))
Depending on your data as.factor() could be omitted, because in many cases strings are factor by default, e.g. with read.table