Run validation on 1 GPU while Train on multi-GPU Pytorch Lightning
Is there any way I can execute validation_step
method on single GPU while training_step
with multiple GPU using DDP.
The reason I want to do is because there are several metrics which I want to implement which requires complete access to the data, and running on single GPU will ensure that. I have tried validation_step_end
method but somehow I am only getting part of the data. That post is here: Stack Overflow Post
I am afraid that this is not possible. But there is the TorchMetrics package which has been developed with multi-GPU support in mind so when your custom metric is derived from TM you shall be able to get running even on your multi-GPU setting.