Cosine Similarity between 2 Number Lists
I want to calculate the cosine similarity between two lists, let's say for example list 1 which is dataSetI
and list 2 which is dataSetII
.
Let's say dataSetI
is [3, 45, 7, 2]
and dataSetII
is [2, 54, 13, 15]
. The length of the lists are always equal. I want to report cosine similarity as a number between 0 and 1.
dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]
def cosine_similarity(list1, list2):
# How to?
pass
print(cosine_similarity(dataSetI, dataSetII))
Solution 1:
You should try SciPy. It has a bunch of useful scientific routines for example, "routines for computing integrals numerically, solving differential equations, optimization, and sparse matrices." It uses the superfast optimized NumPy for its number crunching. See here for installing.
Note that spatial.distance.cosine computes the distance, and not the similarity. So, you must subtract the value from 1 to get the similarity.
from scipy import spatial
dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]
result = 1 - spatial.distance.cosine(dataSetI, dataSetII)
Solution 2:
another version based on numpy
only
from numpy import dot
from numpy.linalg import norm
cos_sim = dot(a, b)/(norm(a)*norm(b))
Solution 3:
You can use cosine_similarity
function form sklearn.metrics.pairwise
docs
In [23]: from sklearn.metrics.pairwise import cosine_similarity
In [24]: cosine_similarity([[1, 0, -1]], [[-1,-1, 0]])
Out[24]: array([[-0.5]])