Calculating Covariance with Python and Numpy
Solution 1:
When a
and b
are 1-dimensional sequences, numpy.cov(a,b)[0][1]
is equivalent to your cov(a,b)
.
The 2x2 array returned by np.cov(a,b)
has elements equal to
cov(a,a) cov(a,b)
cov(a,b) cov(b,b)
(where, again, cov
is the function you defined above.)
Solution 2:
Thanks to unutbu for the explanation. By default numpy.cov calculates the sample covariance. To obtain the population covariance you can specify normalisation by the total N samples like this:
numpy.cov(a, b, bias=True)[0][1]
or like this:
numpy.cov(a, b, ddof=0)[0][1]
Solution 3:
Note that starting in Python 3.10
, one can obtain the covariance directly from the standard library.
Using statistics.covariance
which is a measure (the number you're looking for) of the joint variability of two inputs:
from statistics import covariance
# x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
# y = [1, 2, 3, 1, 2, 3, 1, 2, 3]
covariance(x, y)
# 0.75