Remove rows conditionally from a data.table in R

I have a data.table with fields {id, menuitem, amount}.

This is transaction data - so, ids are unique, but menuitem repeats. Now, I want to remove all entries where menuitem == 'coffee'.

Also, want to delete all rows where amount <= 0;

What is the right way to do this in data.table?

I can use data$menuitem!='coffee' and then index int into data[] - but that is not necessarily efficient and does not take advantage of data.table.

Any pointers in the right direction are appreciated.

In this scenario it is not so different than data.frame

data <- data[ menuitem != 'coffee' | amount > 0]

Delete/add row by reference it is to be implemented. You find more info in this question

Regarding speed:

1 You can benefit from keys by doing something like:

setkey(data, menuitem)
data <- data[!"coffee"]

which will be faster than data <- data[ menuitem != 'coffee']. However to apply the same filters you asked in the question you'll need a rolling join (I've finished my lunch break I can add something later :-)).

2 Even without key data.table is much faster for relatively big table (similar speed for handful amount of rows)

dt<-data.table(id=sample(letters,1000000,T),var=rnorm(1000000))
df<-data.frame(id=sample(letters,1000000,T),var=rnorm(1000000))
library(microbenchmark)
> microbenchmark(dt[ id == "a"], df[ df$id == "a",])
Unit: milliseconds
               expr       min        lq    median        uq       max neval
      dt[id == "a"]  24.42193  25.74296  26.00996  26.35778  27.36355   100
 df[df$id == "a", ] 138.17500 146.46729 147.38646 149.06766 154.10051   100

try this:

data <- data[ !(menuitem == 'coffee' | amount <= 0),]

Generally:

dt <- data.table(a=c(1,1,1,2,2,2,3,3,3),b=c(4,2,3,1,5,3,4,7,6))
dt
#>    a b
#> 1: 1 4
#> 2: 1 2
#> 3: 1 3
#> 4: 2 1
#> 5: 2 5
#> 6: 2 3
#> 7: 3 4
#> 8: 3 7
#> 9: 3 6
dt[a!=1,]
#>    a b
#> 1: 2 1
#> 2: 2 5
#> 3: 2 3
#> 4: 3 4
#> 5: 3 7
#> 6: 3 6

Remove rows conditionally from a data.table in R

Related

Recent Posts