IR-chapter7: computing scores in a complete search system

efficient scoring and ranking

FastCosineScore
  • constructing a heap to pick out top K components
  • Inexact top K document retrieval
  • index elimination
    considering documents containing terms whose idf exceeds a preset threshold
    considering documents containing many(even all) terms
  • champion list
    precompute r documents with the highest weights for each term.
    r does not to be the same for every term.(rarer term, larger)
  • static quality scoring and ordering
    net-score
    global champion list, expansion two lists sorted by g(d) value
  • impact ordering
    sorted by common ordering: document-at-a-time scoring
    sorted by uncommon ordering: term-at-a-time scoring
    ordered by a decreasing tf value,advantage:
    1.stop after considering a prefix of posting list
    2.consedering query terms in decreasing order of idf.
  • cluster pruning
    pick ,compute nearest, cluster, computing cosine similarity from q to each leader, then the closest L and its follower
    variation - b1,b2

components of an information retrieval system

  • tiered indexes
    motivation: A has fewer than K documents
    solution: we set a tf threshold of 20 for tier 1 and 10 for tier 2, meaning that the tier 1 index only has postings entries with tf values exceeding 20, and the tier 2 index only has postings entries with tf values exceeding 10.
tiered indexes
  • designing parsing and scoring function
    query parser - translate the user-specified keywords into a query with various operators
    scoring function - manual configuration or machine-learned scoring

  • putting it all together

a complete search system

results snippets: snippets of text accompanying each document in the results list for a query.

Vector space scoring and query operator interaction

Google: the semantics of a conjunctive query that only retrieves documents containing all or most query terms.

  • Boolean retrieval
  • wildcard queries
  • phase queries
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • **2014真題Directions:Read the following text. Choose the be...
    又是夜半驚坐起閱讀 11,111評論 0 23
  • 曠野上枯死的草芥 火依舊在蔓延 再無一絲生機 一點都沒有 唯有最后一點從眼角滲出的水分 浸潤著 這枯榮 然而 那不...
    予辰同學(xué)閱讀 530評論 0 4
  • 蓋茨推薦的書 : 蒂芬·平克《人性中的善良天使》 集裝箱改變世界 亞洲大趨勢 21世紀(jì)資本論 自然的魔法 那些古怪...
    savvyisme閱讀 141評論 0 0

友情鏈接更多精彩內(nèi)容