DT[!(x == .)] and DT[x != .] treat NA in x inconsistently

I think it is documented and consistent behaviour.

The main thing to note is that the prefix ! within the i argument is a flag for a not join, so x != 0 and !(x==0) are no longer the same logical operation when working with the documented handling of NA within data.table

The section from the news regarding the not join

A new "!" prefix on i signals 'not-join' (a.k.a. 'not-where'), #1384i.
            DT[-DT["a", which=TRUE, nomatch=0]]   # old not-join idiom, still works
            DT[!"a"]                              # same result, now preferred.
            DT[!J(6),...]                         # !J == not-join
            DT[!2:3,...]                          # ! on all types of i
            DT[colA!=6L | colB!=23L,...]          # multiple vector scanning approach (slow)
            DT[!J(6L,23L)]                        # same result, faster binary search
        '!' has been used rather than '-' :
            * to match the 'not-join'/'not-where' nomenclature
            * with '-', DT[-0] would return DT rather than DT[0] and not be backwards
              compatible. With '!', DT[!0] returns DT both before (since !0 is TRUE in
              base R) and after this new feature.
            * to leave DT[+J...] and DT[-J...] available for future use

And from ?data.table

All types of 'i' may be prefixed with !. This signals a not-join or not-select should be performed. Throughout data.table documentation, where we refer to the type of 'i', we mean the type of 'i' after the '!', if present. See examples.


Why is it consistent with the documented handling of NA within data.table

NA values are considered FALSE. Think of it like doing isTRUE on each element.

so DT[x!=0] is indexed with TRUE FALSE NA which becomes TRUE FALSE FALSE due to the documented NA handling.

You are wanting to subset when things are TRUE.

This means you are getting those where x != 0 is TRUE ( and not NA)

DT[!(x==0)] uses the not join states you want everything that is not 0 (which can and will include the NA values).


follow up queries / further examples

DT[!(x!=0)]

## returns
    x y
1:  0 2
2: NA 3

x!=0 is TRUE for one value, so the not join will return what isn't true. (ie what was FALSE (actually == 0) or NA

DT[!!(x==0)]

## returns
    x y
1:  0 2
2: NA 3

This is parsed as !(!(x==0)). The prefix ! denotes a not join, and the inner !(x==0) is parsed identically to x!=0, so the reasoning from the case immediately above applies.


As of version 1.8.11 the ! does not trigger a not-join for logical expressions and the results for the two expressions are the same:

DT <- data.table(x=c(1,0,NA), y=1:3)
DT[x != 0]
#   x y
#1: 1 1
DT[!(x == 0)]
#   x y
#1: 1 1

A couple other expressions mentioned in @mnel's answer also behave in a more predictable fashion now:

DT[!(x != 0)]
#   x y
#1: 0 2
DT[!!(x == 0)]
#   x y
#1: 0 2

I'm a month late to this discussion, but with fresh eyes and reading all the comments ... yes I reckon DT[x != .] would be better if it included any rows with NA in x in the result, and we should change it to do that.

New answer added to the linked question with further background from a different angle :

https://stackoverflow.com/a/17008872/403310