余弦距離與歐幾里德距離都是常用的距離度量方式。
關(guān)于兩個(gè)向量之間求距離的能找到很多的參考材料,這里就不再贅述了。
在項(xiàng)目中用到了兩個(gè)矩陣的多行向量需要計(jì)算兩兩之間的距離,就在這里做一個(gè)分享。
一 余弦距離
- 直接上代碼啦:
def cosine_distance(matrix1,matrix2):
matrix1_matrix2 = np.dot(matrix1, matrix2.transpose())
matrix1_norm = np.sqrt(np.multiply(matrix1, matrix1).sum(axis=1))
matrix1_norm = matrix1_norm[:, np.newaxis]
matrix2_norm = np.sqrt(np.multiply(matrix2, matrix2).sum(axis=1))
matrix2_norm = matrix2_norm[:, np.newaxis]
cosine_distance = np.divide(matrix1_matrix2, np.dot(matrix1_norm, matrix2_norm.transpose()))
return cosine_distance
- 運(yùn)行結(jié)果驗(yàn)證:
matrix1=np.array([[1,1],[1,2]])
matrix2=np.array([[2,1],[2,2],[2,3]])
cosine_dis=cosine_distance(matrix1,matrix2)
print (cosine_dis)
-
結(jié)果:
~~
20190307更新
這個(gè)也有封裝好的,只是之前沒有發(fā)現(xiàn)(▽)
from sklearn.metrics.pairwise import cosine_similarity
cosine_dis2 = cosine_similarity(matrix1,matrix2)
- 驗(yàn)證:
from sklearn.metrics.pairwise import cosine_similarity
def cosine_distance(matrix1, matrix2):
matrix1_matrix2 = np.dot(matrix1, matrix2.transpose())
matrix1_norm = np.sqrt(np.multiply(matrix1, matrix1).sum(axis=1))
matrix1_norm = matrix1_norm[:, np.newaxis]
matrix2_norm = np.sqrt(np.multiply(matrix2, matrix2).sum(axis=1))
matrix2_norm = matrix2_norm[:, np.newaxis]
cosine_distance = np.divide(matrix1_matrix2, np.dot(matrix1_norm, matrix2_norm.transpose()))
return cosine_distance
matrix1=np.array([[1,1],[1,2]])
matrix2=np.array([[2,1],[2,2],[2,3]])
cosine_dis=cosine_distance(matrix1,matrix2)
print ('cosine_dis:',cosine_dis)
cosine_dis2 = cosine_similarity(matrix1,matrix2)
print('cosine_dis2:',cosine_dis2)
- 結(jié)果:
[[0.9486833 1. 0.98058068]
[0.8 0.9486833 0.99227788]]
[[0.9486833 1. 0.98058068]
[0.8 0.9486833 0.99227788]]
二 歐幾里德距離
- 代碼:
def EuclideanDistances(A, B):
BT = B.transpose()
vecProd = np.dot(A,BT)
SqA = A**2
sumSqA = np.matrix(np.sum(SqA, axis=1))
sumSqAEx = np.tile(sumSqA.transpose(), (1, vecProd.shape[1]))
SqB = B**2
sumSqB = np.sum(SqB, axis=1)
sumSqBEx = np.tile(sumSqB, (vecProd.shape[0], 1))
SqED = sumSqBEx + sumSqAEx - 2*vecProd
SqED[SqED<0]=0.0
ED = np.sqrt(SqED)
return ED
- 運(yùn)行結(jié)果驗(yàn)證:
matrix1=np.array([[1,1],[1,2]])
matrix2=np.array([[2,1],[2,2],[2,3]])
Euclidean_dis=EuclideanDistances(matrix1,matrix2)
print (Euclidean_dis)
-
結(jié)果:
20190223更新~~~~~~~~
發(fā)現(xiàn)已經(jīng)有封裝好的函數(shù)了哈哈哈哈,順便又驗(yàn)證了一下上面的代碼:
from scipy.spatial.distance import cdist
dis = cdist(matrix1,matrix2,metric='euclidean')
- 驗(yàn)證代碼
matrix1 = np.array([[1, 1], [1, 2]])
matrix2 = np.array([[2, 1], [2, 2], [2, 3]])
Euclidean_dis= EuclideanDistances(matrix1, matrix2)
print(Euclidean_dis)
from scipy.spatial.distance import cdist
dis = cdist(matrix1,matrix2,metric='euclidean')
print(dis)
print(Euclidean_dis==dis)
- 結(jié)果:
[[1. 1.41421356 2.23606798]
[1.41421356 1. 1.41421356]]
[[1. 1.41421356 2.23606798]
[1.41421356 1. 1.41421356]]
[[ True True True]
[ True True True]]

