Subsetting R data frame results in mysterious NA rows

Solution 1:

Wrap the condition in which:

df[which(df$number1 < df$number2), ]

How it works:

It returns the row numbers where the condition matches (where the condition is TRUE) and subsets the data frame on those rows accordingly.

Say that:

which(df$number1 < df$number2)

returns row numbers 1, 2, 3, 4 and 5.

As such, writing:

df[which(df$number1 < df$number2), ]

is the same as writing:

df[c(1, 2, 3, 4, 5), ]

Or an even simpler version is:

df[1:5, ]

Solution 2:

I see this was already answered by the OP, but since his comment is buried deep within the comment section, here's my attempt to fix this issue (at least with my data, which was behaving the same way).

First of all, some sample data:

> df <- data.frame(name = LETTERS[1:10], number1 = 1:10, number2 = c(10:3, NA, NA))
> df
   name number1 number2
1     A       1      10
2     B       2       9
3     C       3       8
4     D       4       7
5     E       5       6
6     F       6       5
7     G       7       4
8     H       8       3
9     I       9      NA
10    J      10      NA

Now for a simple filter:

> df[df$number1 < df$number2, ]
     name number1 number2
1       A       1      10
2       B       2       9
3       C       3       8
4       D       4       7
5       E       5       6
NA   <NA>      NA      NA
NA.1 <NA>      NA      NA

The problem here is that the presence of NAs in the third column causes R to rewrite the whole row as NA. Nonetheless, the data frame dimensions are maintained. Here's my fix, which requires knowledge of which column contains the NAs:

> df[df$number1 < df$number2 & !is.na(df$number2), ]
  name number1 number2
1    A       1      10
2    B       2       9
3    C       3       8
4    D       4       7
5    E       5       6

Solution 3:

I get the same problem when using code similar to what you posted. Using the function subset()

subset(example,example$var1=="A")

the NA row instead gets excluded.

Solution 4:

Using dplyr:

library(dplyr)
filter(df, number1 < number2)