Removing NA in dplyr pipe [duplicate]
I tried to remove NA's from the subset using dplyr piping. Is my answer an indication of a missed step. I'm trying to learn how to write functions using dplyr:
> outcome.df%>%
+ group_by(Hospital,State)%>%
+ arrange(desc(HeartAttackDeath,na.rm=TRUE))%>%
+ head()
Source: local data frame [6 x 5]
Groups: Hospital, State
Hospital State HeartAttackDeath 1 ABBEVILLE AREA MEDICAL CENTER SC NA 2 ABBEVILLE GENERAL HOSPITAL LA NA 3 ABBOTT NORTHWESTERN HOSPITAL MN 12.3 4 ABILENE REGIONAL MEDICAL CENTER TX 17.2 5 ABINGTON MEMORIAL HOSPITAL PA 14.3 6 ABRAHAM LINCOLN MEMORIAL HOSPITAL IL NA Variables not shown: HeartFailureDeath (dbl), PneumoniaDeath (dbl)
I don't think desc
takes an na.rm
argument... I'm actually surprised it doesn't throw an error when you give it one. If you just want to remove NA
s, use na.omit
(base) or tidyr::drop_na
:
outcome.df %>%
na.omit() %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()
library(tidyr)
outcome.df %>%
drop_na() %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()
If you only want to remove NA
s from the HeartAttackDeath column, filter with is.na
, or use tidyr::drop_na
:
outcome.df %>%
filter(!is.na(HeartAttackDeath)) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()
outcome.df %>%
drop_na(HeartAttackDeath) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()
As pointed out at the dupe, complete.cases
can also be used, but it's a bit trickier to put in a chain because it takes a data frame as an argument but returns an index vector. So you could use it like this:
outcome.df %>%
filter(complete.cases(.)) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()