How to parallelize this matrix operation

In the calculation of the resistance distance matrix there is a bottleneck in the final step, which is as follows:

matrix1[i, j] = matrix2[i, i] - 2*matrix2[i, j] + matrix[j, j]

Every algorithm I find parallelizes every step except this one, so right now I just have a double for loop. The matrices are 16000x16000 so this is quite time costly. Is there a way to speed up such an operation?


How about:

matrix1 = matrix2.diagonal()[:,None] - 2*matrix2 + matrix.diagonal()[None,:]