相似度量是用于生成個性化推薦的重要組件,這些推薦允許我們量化兩個項目的相似程度(或者我們稍后會看到,兩個用戶偏好的相似程度)。

Jaccard指數(shù)是0到1之間的數(shù)字,表示兩組的相似程度。
兩個相同集合的Jaccard指數(shù)是1.
如果兩個集合沒有公共元素,則Jaccard索引為0.
通過將兩個集合的交集的大小除以兩個集合的并集來計算Jaccard。
我們可以計算電影類型集的Jaccard指數(shù),以確定兩部電影的相似程度。
哪些電影是跟《盜夢空間》基于Jaccard指數(shù)最相似的?
MATCH (m:Movie {title: "Inception"})-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(other:Movie)
WITH m, other, COUNT(g) AS intersection, COLLECT(g.name) AS i
MATCH (m)-[:IN_GENRE]->(mg:Genre)
WITH m,other, intersection,i, COLLECT(mg.name) AS s1
MATCH (other)-[:IN_GENRE]->(og:Genre)
WITH m,other,intersection,i, s1, COLLECT(og.name) AS s2
WITH m,other,intersection,s1,s2
WITH m,other,intersection,s1+filter(x IN s2 WHERE NOT x IN s1) AS union, s1, s2
RETURN m.title, other.title, s1,s2,((1.0*intersection)/SIZE(union)) AS jaccard ORDER BY jaccard DESC LIMIT 100
分析:
首先查詢出電影盜夢空間和與它流派相關(guān)性的電影集other
count(g) 其實就是電影盜夢空間和電影集other 的流派交集的數(shù)量(共同的流派)3.?
s1+filter(x IN s2 WHERE NOT x IN s1) AS union? 此 union 即是s1 和 s2 的并集(集合s1 加上 s2中不包含s1 的那部分)
?((1.0*intersection)/SIZE(union)) AS jaccard? 根據(jù)上面的Jaccard指數(shù)公式計算所得的指數(shù)。
運算結(jié)果如下:

我們可以將這個相同的方法應(yīng)用于電影的所有特征(如流派、演員、導(dǎo)演等):
MATCH (m:Movie {title: "Inception"})-[:IN_GENRE|:ACTED_IN|:DIRECTED]-(t)<-[:IN_GENRE|:ACTED_IN|:DIRECTED]-(other:Movie)
WITH m, other, COUNT(t) AS intersection, COLLECT(t.name) AS i
MATCH (m)-[:IN_GENRE|:ACTED_IN|:DIRECTED]-(mt)
WITH m,other, intersection,i, COLLECT(mt.name) AS s1
MATCH (other)-[:IN_GENRE|:ACTED_IN|:DIRECTED]-(ot)
WITH m,other,intersection,i, s1, COLLECT(ot.name) AS s2
WITH m,other,intersection,s1,s2
WITH m,other,intersection,s1+filter(x IN s2 WHERE NOT x IN s1) AS union, s1, s2
RETURN m.title, other.title, s1,s2,((1.0*intersection)/SIZE(union)) AS jaccard ORDER BY jaccard DESC LIMIT 100

Neo4j 做推薦 (1)—— 基礎(chǔ)數(shù)據(jù)
Neo4j 做推薦 (4)—— 基于內(nèi)容的過濾(續(xù))
Neo4j 做推薦 (6)—— 加權(quán)內(nèi)容算法
Neo4j 做推薦 (7)—— 基于內(nèi)容的相似度量標(biāo)準(zhǔn)
Neo4j 做推薦 (8)—— 協(xié)同過濾(利用電影評級)
Neo4j 做推薦 (9)—— 協(xié)同過濾(人群的智慧)
Neo4j 做推薦 (10)—— 協(xié)同過濾(皮爾遜相似性)