Pytorch model gradients are printed correctly but copied wrongly
I want to copy the gradients of loss, with respect to weight, for different data samples using pytorch. In the code below, I am iterating one sample each time from the data loader (batch size = 1) and collecting gradients for 1st fully connected (fc1) layer. Gradients should be different for different samples. The print function shows correct gradients, which are different for different samples. But when I store them in a list, I get the same gradients repeatedly. Any suggestions would be much appreciated. Thanks in advance!
grad_list = [ ]
for data in test_loader:
inputs, labels = data[0], data[1]
inputs = torch.autograd.Variable(inputs)
labels = torch.autograd.Variable(labels)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward
output = target_model(inputs)
loss = criterion(output, labels)
loss.backward()
grad_list.append(target_model.fc1.weight.grad.data)
print(target_model.fc1.weight.grad.data)
Try using clone and detach instead:
grad_list.append(target_model.fc1.weight.grad.clone().detach())
The data
property you are appending to your list is a mutable reference to the storage of the parameter (i.e. the actual memory address and the values contained within). What you need to do is create a replica of the gradient tensor (with clone
) and remove it from the computational graph (with detach
) to avoid it interfering with gradient computation.