Subset of rows containing NA (missing) values in a chosen column of a data frame
We have a data frame from a CSV file. The data frame DF
has columns that contain observed values and a column (VaR2
) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the value NA
, for missing data.
Var1 Var2
10 2010/01/01
20 NA
30 2010/03/01
We would like to use the subset command to define a new data frame new_DF
such that it only contains rows that have an NA'
value from the column (VaR2
). In the example given, only Row 2 will be contained in the new DF
.
The command
new_DF<-subset(DF,DF$Var2=="NA")
does not work, the resulting data frame has no row entries.
If in the original CSV file the Value NA
are exchanged with NULL
, the same command produces the desired result: new_DF<-subset(DF,DF$Var2=="NULL")
.
How can I get this method working, if for the character string the value NA
is provided in the original CSV file?
Never use =='NA' to test for missing values. Use is.na()
instead. This should do it:
new_DF <- DF[rowSums(is.na(DF)) > 0,]
or in case you want to check a particular column, you can also use
new_DF <- DF[is.na(DF$Var),]
In case you have NA character values, first run
Df[Df=='NA'] <- NA
to replace them with missing values.
complete.cases
gives TRUE
when all values in a row are not NA
DF[!complete.cases(DF), ]