There is pmin and pmax each taking na.rm, why no psum?
It seems that R might be missing an obvious simple function: psum
. Does it exist as a different name, or is it in a package somewhere?
x = c(1,3,NA,5)
y = c(2,NA,4,1)
min(x,y,na.rm=TRUE) # ok
[1] 1
max(x,y,na.rm=TRUE) # ok
[1] 5
sum(x,y,na.rm=TRUE) # ok
[1] 16
pmin(x,y,na.rm=TRUE) # ok
[1] 1 3 4 1
pmax(x,y,na.rm=TRUE) # ok
[1] 2 3 4 5
psum(x,y,na.rm=TRUE)
[1] 3 3 4 6 # expected result
Error: could not find function "psum" # actual result
I realise that +
is already like psum
, but what about NA
?
x+y
[1] 3 NA NA 6 # can't supply `na.rm=TRUE` to `+`
Is there a case to add psum
? Or have I missed something.
This question is a follow up from this question :
Using :=
in data.table to sum the values of two columns in R, ignoring NAs
Following @JoshUlrich's comment on the previous question,
psum <- function(...,na.rm=FALSE) {
rowSums(do.call(cbind,list(...)),na.rm=na.rm) }
edit: from Sven Hohenstein:
psum2 <- function(...,na.rm=FALSE) {
dat <- do.call(cbind,list(...))
res <- rowSums(dat, na.rm=na.rm)
idx_na <- !rowSums(!is.na(dat))
res[idx_na] <- NA
res
}
x = c(1,3,NA,5,NA)
y = c(2,NA,4,1,NA)
z = c(1,2,3,4,NA)
psum(x,y,na.rm=TRUE)
## [1] 3 3 4 6 0
psum2(x,y,na.rm=TRUE)
## [1] 3 3 4 6 NA
n = 1e7
x = sample(c(1:10,NA),n,replace=TRUE)
y = sample(c(1:10,NA),n,replace=TRUE)
z = sample(c(1:10,NA),n,replace=TRUE)
library(rbenchmark)
benchmark(psum(x,y,z,na.rm=TRUE),
psum2(x,y,z,na.rm=TRUE),
pmin(x,y,z,na.rm=TRUE),
pmax(x,y,z,na.rm=TRUE), replications=20)
## test replications elapsed relative
## 4 pmax(x, y, z, na.rm = TRUE) 20 26.114 1.019
## 3 pmin(x, y, z, na.rm = TRUE) 20 25.632 1.000
## 2 psum2(x, y, z, na.rm = TRUE) 20 164.476 6.417
## 1 psum(x, y, z, na.rm = TRUE) 20 63.719 2.486
Sven's version (which arguably is the correct one) is quite a bit slower, although whether it matters obviously depends on the application. Anyone want to hack up an inline/Rcpp version?
As for why this doesn't exist: don't know, but good luck getting R-core to make additions like this ... I can't offhand think of a sufficiently widespread *misc
package into which this could go ...
Follow up thread by Matthew on r-devel is here (which seems to confirm) :
r-devel: There is pmin and pmax each taking na.rm, how about psum?
After a quick search on CRAN, there are at least 3 packages that have a psum
function. rccmisc
, incadata
and kit
. kit
seems to be the fastest. Below reproducing the example of Ben Bolker.
benchmark(
rccmisc::psum(x,y,z,na.rm=TRUE),
incadata::psum(x,y,z,na.rm=TRUE),
kit::psum(x,y,z,na.rm=TRUE),
psum(x,y,z,na.rm=TRUE),
psum2(x,y,z,na.rm=TRUE),
replications=20
)
# test replications elapsed relative
# 2 incadata::psum(x, y, z, na.rm = TRUE) 20 20.05 14.220
# 3 kit::psum(x, y, z, na.rm = TRUE) 20 1.41 1.000
# 4 psum(x, y, z, na.rm = TRUE) 20 8.04 5.702
# 5 psum2(x, y, z, na.rm = TRUE) 20 20.44 14.496
# 1 rccmisc::psum(x, y, z, na.rm = TRUE) 20 23.24 16.482
Another approach whose advantage is to also work with matrices, just like pmin
and pmax
.
psum <- function(..., na.rm = FALSE) {
plus_na_rm <- function(x, y) ifelse(is.na(x), 0, x) + ifelse(is.na(y), 0, y)
Reduce(if(na.rm) plus_na_rm else `+`, list(...))
}
x = c(1,3,NA,5)
y = c(2,NA,4,1)
psum(x, y)
#> [1] 3 NA NA 6
psum(x, y, na.rm = TRUE)
#> [1] 3 3 4 6
# With matrices
A <- matrix(1:9, nrow = 3)
B <- matrix(c(NA, 2:8, NA), nrow = 3)
psum(A, B)
#> [,1] [,2] [,3]
#> [1,] NA 8 14
#> [2,] 4 10 16
#> [3,] 6 12 NA
psum(A, B, na.rm = TRUE)
#> [,1] [,2] [,3]
#> [1,] 1 8 14
#> [2,] 4 10 16
#> [3,] 6 12 9
Created on 2020-03-09 by the reprex package (v0.3.0)
One caveat: if an element is NA
across all the summed objects and na.rm = TRUE
, the result will be 0
(and not NA
).
For example:
psum(NA, NA, na.rm = TRUE)
#> [1] 0