Why does corrcoef return a matrix?
It seems strange to me that np.corrcoef returns a matrix.
correlation1 = corrcoef(Strategy1Returns,Strategy2Returns)
[[ 1. -0.99598935]
[-0.99598935 1. ]]
Does anyone know why this is the case and whether it is possible to return just one value in the classical sense?
Solution 1:
It allows you to compute correlation coefficients of >2 data sets, e.g.
>>> from numpy import *
>>> a = array([1,2,3,4,6,7,8,9])
>>> b = array([2,4,6,8,10,12,13,15])
>>> c = array([-1,-2,-2,-3,-4,-6,-7,-8])
>>> corrcoef([a,b,c])
array([[ 1. , 0.99535001, -0.9805214 ],
[ 0.99535001, 1. , -0.97172394],
[-0.9805214 , -0.97172394, 1. ]])
Here we can get the correlation coefficient of a,b (0.995), a,c (-0.981) and b,c (-0.972) at once. The two-data-set case is just a special case of N-data-set class. And probably it's better to keep the same return type. Since the "one value" can be obtained simply with
>>> corrcoef(a,b)[1,0]
0.99535001355530017
there's no big reason to create the special case.
Solution 2:
corrcoef
returns the normalised covariance matrix.
The covariance matrix is the matrix
Cov( X, X ) Cov( X, Y )
Cov( Y, X ) Cov( Y, Y )
Normalised, this will yield the matrix:
Corr( X, X ) Corr( X, Y )
Corr( Y, X ) Corr( Y, Y )
correlation1[0, 0 ]
is the correlation between Strategy1Returns
and itself, which must be 1. You just want correlation1[ 0, 1 ]
.