How can I print when using %dopar%
I have a foreach
loop that uses %dopar%
with doSNOW
as the back-end. How can I have the loop print something out each iteration?
My code below is what I'm currently using, but its not printing anything.
foreach(ntree=rep(25,2),.combine=combine,.packages='randomForest',
.inorder=FALSE) %dopar% {
print("RANDOM FOREST")
randomForest(classForm,data=data,na.action=na.action,do.trace=do.trace,ntree=ntree,mtry=mtry)
}
Output produced by the snow workers gets thrown away by default, but you can use the makeCluster "outfile" option to change that. Setting outfile to the empty string ("") will prevent snow from redirecting the output, often resulting in the output from your print messages showing up on the terminal of the master process.
Just create and register your cluster with something like:
library(doSNOW)
cl <- makeCluster(4, outfile="")
registerDoSNOW(cl)
Your foreach loop doesn't need to change at all.
This works for me with both SOCK clusters and MPI clusters using Rmpi built with Open MPI. On Windows, you won't see any output if you're using Rgui. If you use Rterm.exe instead, you will.
Note that in addition to your own output, you'll see messages produced by snow which can also be useful.
To use a progress bar, doSNOW version 1.0.14 has a progress
option. Here is a complete example:
library(doSNOW)
library(tcltk)
library(randomForest)
cl <- makeSOCKcluster(3)
registerDoSNOW(cl)
ntasks <- 100
pb <- tkProgressBar(max=ntasks)
progress <- function(n) setTkProgressBar(pb, n)
opts <- list(progress=progress)
x <- matrix(runif(500), 100)
y <- gl(2, 50)
rf <- foreach(ntree=rep(25, ntasks), .combine=combine,
.multicombine=TRUE, .packages='randomForest',
.options.snow=opts) %dopar% {
randomForest(x, y, ntree=ntree)
}
The progress
option is fairly general, so you could simply print a message using a function such as:
progress <- function(n) cat(sprintf("task %d is complete\n", n))
The function can take 0, 1, or 2 arguments. The first supplied argument is the total number of completed tasks, and the second is the task number of the task that just finished.
The simplest example simply prints a .
when a task completes:
progress <- function() cat('.')
This example displays both arguments and can be used to demonstrate that tasks aren't always completed in order:
progress <- function(nfin, tag) {
cat(sprintf('tasks completed: %d; tag: %d\n', nfin, tag))
}
There are a number of good solutions posted here, but I find it easiest to log to a socket and use a separate process to output the log calls in a console.
I use the following function:
log.socket <- make.socket(port=4000)
Log <- function(text, ...) {
msg <- sprintf(paste0(as.character(Sys.time()), ": ", text, "\n"), ...)
cat(msg)
write.socket(log.socket, msg)
}
You can then place log statements in the code such as:
Log("Processing block %d of %d", i, n.blocks)
Log output can viewed in real-time using any simple socket listening tool. For example, using netcat on Linux:
nc -l 4000
The above log statement would display in the netcat terminal as:
2014-06-25 12:30:45: Processing block 2 of 13
This method has the advantage of working remotely and provides as detailed output as you care to log.
p.s. For those on Windows, see Jon Craton's netcat port.
p.p.s I'm guessing the write.socket
R function probably isn't thread-safe, but unless you're logging at high frequency, you're unlikely to run into any issue. Something to be aware of though.
A way I've kept track of progress on nodes during long operations is to create a progress bar using tkProgressBar
from the tcltk
package. It's not quite what you asked for, but it should let you see something from the nodes. At least it does when the cluster is a socket cluster running on the local host (which is a Windows machine). The potential problem is that the progress bar either remains and clutters your monitor or it gets close
d and the printed info is gone. For me, that wasn't a problem, though, since I just wanted to know what the current status was.
library(parallel)
library(doSNOW)
cl<-makeCluster(detectCores(),type="SOCK")
registerDoSNOW(cl)
Using your code,
foreach(ntree=rep(25,2),.combine=combine,.packages=c('randomForest','tcltk'),
.inorder=FALSE) %dopar% {
mypb <- tkProgressBar(title = "R progress bar", label = "",
min = 0, max = 1, initial = 0, width = 300)
setTkProgressBar(mypb, 1, title = "RANDOM FOREST", label = NULL)
ans <- randomForest(classForm,data=data,na.action=na.action,do.trace=do.trace,ntree=ntree,mtry=mtry)
close(mypb)
ans
}
Here's a more general use example:
jSeq <- seq_len(30)
foreach(i = seq_len(2), .packages = c('tcltk', 'foreach')) %dopar% {
mypb <- tkProgressBar(title = "R progress bar", label = "",
min = 0, max = max(jSeq), initial = 0, width = 300)
foreach(j = jSeq) %do% {
Sys.sleep(.1)
setTkProgressBar(mypb, j, title = "RANDOM FOREST", label = NULL)
}
NULL
}