R self reference

Try package data.table and its := operator. It's very fast and very short.

DT[col1==something, col2:=col3+1]

The first part col1==something is the subset. You can put anything here and use the column names as if they are variables; i.e., no need to use $. Then the second part col2:=col3+1 assigns the RHS to the LHS within that subset, where the column names can be assigned to as if they are variables. := is assignment by reference. No copies of any object are taken, so is faster than <-, =, within and transform.

Also, soon to be implemented in v1.8.1, one end goal of j's syntax allowing := in j like that is combining it with by, see question: when should I use the := operator in data.table.

UDPDATE : That was indeed released (:= by group) in July 2012.

You should be paying more attention to Gabor Grothendeick (and not just in this instance.) The cited inc function on Matt Asher's blog does all of what you are asking:

(And the obvious extension works as well.)

add <- function(x, inc=1) {
   eval.parent(substitute(x <- x + inc))
 }
# Testing the `inc` function behavior

EDIT: After my temporary annoyance at the lack of approval in the first comment, I took the challenge of adding yet a further function argument. Supplied with one argument of a portion of a dataframe, it would still increment the range of values by one. Up to this point has only been very lightly tested on infix dyadic operators, but I see no reason it wouldn't work with any function which accepts only two arguments:

transfn <- function(x, func="+", inc=1) {
   eval.parent(substitute(x <- do.call(func, list(x , inc)))) }

(Guilty admission: This somehow "feels wrong" from the traditional R perspective of returning values for assignment.) The earlier testing on the inc function is below:

df <- data.frame(a1 =1:10, a2=21:30, b=1:2)
 inc <- function(x) {
   eval.parent(substitute(x <- x + 1))
 }

#---- examples===============>

> inc(df$a1)  # works on whole columns
> df
   a1 a2 b
1   2 21 1
2   3 22 2
3   4 23 1
4   5 24 2
5   6 25 1
6   7 26 2
7   8 27 1
8   9 28 2
9  10 29 1
10 11 30 2
> inc(df$a1[df$a1>5]) # testing on a restricted range of one column
> df
   a1 a2 b
1   2 21 1
2   3 22 2
3   4 23 1
4   5 24 2
5   7 25 1
6   8 26 2
7   9 27 1
8  10 28 2
9  11 29 1
10 12 30 2

> inc(df[ df$a1>5, ])  #testing on a range of rows for all columns being transformed
> df
   a1 a2 b
1   2 21 1
2   3 22 2
3   4 23 1
4   5 24 2
5   8 26 2
6   9 27 3
7  10 28 2
8  11 29 3
9  12 30 2
10 13 31 3
# and even in selected rows and grepped names of columns meeting a criterion
> inc(df[ df$a1 <= 3, grep("a", names(df)) ])
> df
   a1 a2 b
1   3 22 1
2   4 23 2
3   4 23 1
4   5 24 2
5   8 26 2
6   9 27 3
7  10 28 2
8  11 29 3
9  12 30 2
10 13 31 3

Here is what you can do. Let us say you have a dataframe

df = data.frame(x = 1:10, y = rnorm(10))

And you want to increment all the y by 1. You can do this easily by using transform

df = transform(df, y = y + 1)

I'd be partial to (presumably the subset is on rows)

ridx <- adataframe$col==something
adataframe[ridx,] <- adataframe[ridx,] + 1

which doesn't rely on any fancy / fragile parsing, is reasonably expressive about the operation being performed, and is not too verbose. Also tends to break lines into nicely human-parse-able units, and there is something appealing about using standard idioms -- R's vocabulary and idiosyncrasies are already large enough for my taste.

R self reference

Related

Recent Posts