H2O single node Vs cluster

I have recently started learning about H2O AutoML. I am wondering which one of the following options works better. Single node with 6GB of memory or a cluster of three nodes with 2GB memory each.

  1. java -Xmx6g -jar h2o.jar -name MyCluster
  2. java -Xmx2g -jar h2o.jar & java -Xmx2g -jar h2o.jar & java -Xmx2g -jar h2o.jar &

If there are drawbacks with single node deployment, can you recommend any methods to optimize the performance? Thanks in advance!


Solution 1:

My guess is that the first approach will give better performance due to less context switching. I'm not too familiar with H2O but I guess they start a thread per core. So if you have 3 H2O instances, you get 3 threads per core which will lead to an increased number of context switches and hence reduced performance.

And I'm pretty sure that H2O can work with huge amounts of memory. They can pool the created arrays, so there should not be too much need for garbage collection for the actual data.