Why do R objects not print in a function or a "for" loop?

I have an R matrix named ddd. When I enter this, everything works fine:

i <- 1
shapiro.test(ddd[,y])
ad.test(ddd[,y]) 
stem(ddd[,y]) 
print(y) 

The calls to Shapiro Wilk, Anderson Darling, and stem all work, and extract the same column.

If I put this code in a "for" loop, the calls to Shapiro Wilk, and Anderson Darling stop working, while the the stem & leaf call and the print call continue to work.

for (y in 7:10) {
    shapiro.test(ddd[,y])
    ad.test(ddd[,y]) 
    stem(ddd[,y]) 
    print(y)
}

The decimal point is 1 digit(s) to the right of the |

  0 | 0
  0 | 899999
  1 | 0

[1] 7

The same thing happens if I try and write a function. SW & AD do not work. The other calls do.

> D <- function (y) {
+ shapiro.test(ddd[,y])
+ ad.test(ddd[,y]) 
+ stem(ddd[,y]) 
+ print(y)  }

> D(9)

  The decimal point is at the |

   9 | 000
   9 | 
  10 | 00000

[1] 9

Why don't all the calls behave the same way?


In a loop, automatic printing is turned off, as it is inside a function. You need to explicitly print something in both cases if you want to see the output. The [1] 9 things you are getting is because you are explicitly printing the values of y.

Here is an example of how you might want to consider going about doing this.

> DF <- data.frame(A = rnorm(100), B = rlnorm(100))
> y <- 1
> shapiro.test(DF[,y])

    Shapiro-Wilk normality test

data:  DF[, y] 
W = 0.9891, p-value = 0.5895

So we have automatic printing. In the loop we would have to do this:

for(y in 1:2) {
    print(shapiro.test(DF[,y]))
}

If you want to print more tests out, then just add them as extra lines in the loop:

for(y in 1:2) {
    writeLines(paste("Shapiro Wilks Test for column", y))
    print(shapiro.test(DF[,y]))
    writeLines(paste("Anderson Darling Test for column", y))
    print(ad.test(DF[,y]))
}

But that isn't very appealing unless you like reading through reams of output. Instead, why not save the fitted test objects and then you can print them and investigate them, maybe even process them to aggregate the test statistics and p-values into a table? You can do that using a loop:

## object of save fitted objects in
obj <- vector(mode = "list", length = 2)
## loop
for(y in seq_along(obj)) {
    obj[[y]] <- shapiro.test(DF[,y])
}

We can then look at the models using

> obj[[1]]

    Shapiro-Wilk normality test

data:  DF[, y] 
W = 0.9891, p-value = 0.5895

for example, or using lapply, which takes care of setting up the object we use to store the results for us:

> obj2 <- lapply(DF, shapiro.test)
> obj2[[1]]

    Shapiro-Wilk normality test

data:  X[[1L]] 
W = 0.9891, p-value = 0.5895

Say now I wanted to extract the W and p-value data, we can process the object storing all the results to extract the bits we want, e.g.:

> tab <- t(sapply(obj2, function(x) c(x$statistic, x$p.value)))
> colnames(tab) <- c("W", "p.value")
> tab
          W      p.value
A 0.9890621 5.894563e-01
B 0.4589731 1.754559e-17

Or for those with a penchant for significance stars:

> tab2 <- lapply(obj2, function(x) c(W = unname(x$statistic), 
+                                    `p.value` = x$p.value))
> tab2 <- data.frame(do.call(rbind, tab2))
> printCoefmat(tab2, has.Pvalue = TRUE)
       W p.value    
A 0.9891  0.5895    
B 0.4590  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This has got to be better than firing output to the screen that you then have to pour through?


Not a new answer, but in addition to the above: "flush.console()" is necessary to force printing to take place DURING the loop rather than after. Only reason I use print() during a loop is to show progress, e.g., of reading many files.

for (i in 1:10) {
  print(i)
  flush.console()
  for(j in 1:100000)
    k <- 0
}

Fantastic answer from Gavin Simpson. I took the last bit of magic and turned it into a function.

sw.df <- function ( data ) { 
   obj <- lapply(data, shapiro.test)
   tab <- lapply(obj, function(x) c(W = unname(x$statistic), `p.value` = x$p.value))
   tab <- data.frame(do.call(rbind, tab))
   printCoefmat(tab, has.Pvalue = TRUE)
}

Then you can just call it with your data frame sw.df ( df )

And if you want to try a transformation: sw.df ( log(df) )