信息檢索報(bào)告整理

前言

最近聽(tīng)了業(yè)界大佬Maarten的一個(gè)關(guān)于IR的Talk,如果我沒(méi)記錯(cuò),應(yīng)該和去年在ESSIR上聽(tīng)到的是一樣的,不過(guò)每次聽(tīng)都有新的收獲,將要整理記錄如下。

Query Improvement (online)

  1. 主要的目:提供shortcut給用戶(hù)、處理查詢(xún)的error
  2. 主要方式:Log analysis (AOL dataset)
  3. 主要途徑:
    • Query Auto-Completion (QAC): what users' intent in mind but not clearly expressed
    • Query Suggestion: recommendation, ranking & diversity
    • Query Expansion
    • Query Correction
  4. 關(guān)鍵在于將Query的signals,如clicks, time, news, personal, general, location等信息和query logs相結(jié)合

Getting Content (offline)

  1. Crawling中常見(jiàn)的問(wèn)題:
    • Scale
    • Content selection
    • URL filtering
    • Remove duplicate URLs: exact & near (compare sequences of word, like n-gram words)
    • Spam detection: meaningful expressions, sentiment analysis & supervised learning
    • Aggregation: considering anchor text on the web & information among entities.
    • Inverted index construction: collect -> tokenize -> stopwords -> stem/lemma -> index
    • Temporal IR: info can be images, songs, books, news, webs, videos and apps

Query Understanding (online)

  1. The result of query understanding can be presented on search engine results page (SERP), some contexts should be considered:
    • Search goals? search tasks?
    • Semantic topics?
    • Time-sensitive? location-sensitive?
  2. Classification query based on pre-defined intent is difficult (short & ambiguous): click-though data & session data.
  3. Intent Discovery (Non-predefined)
    • Shifting intents: intents change with time (Radinsky. 2013)
    • Learning to detect intent shifting (Lefortier. 2014)
      • Queries whose intents from non-fresh to fresh
      • More clicks to some links?
  4. Diversity
    • Extrinsic: query with uncertainty
    • Intrinsic: diversity is part of info needs

Ranker (learning to rank)

  1. content-based
  2. structure-based (title, content, tags, time)
  3. based on interaction behaviors (click through, scanning)
  4. docs represented by feature vector

Responsible IR

Privacy, Fairness, Accuracy, Transparency (let the sys explain why)

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 看了空巢老人的熱搜 很不舒服 在臺(tái)灣的時(shí)候?qū)W的是社工 總會(huì)去老師開(kāi)設(shè)的安養(yǎng)中心做義工 那是我第一次知道原來(lái)我們身邊...
    曼總閱讀 145評(píng)論 0 0
  • http協(xié)議是目前非常普及的應(yīng)用層傳輸協(xié)議,了解https之前要先知道http的缺點(diǎn). 1.通信使用明文(不加密)...
    楊帥iOS閱讀 9,772評(píng)論 5 18
  • 每個(gè)女孩都渴望閨蜜。 電影《滾蛋吧,腫瘤君》中,女主熊頓深情地對(duì)閨蜜說(shuō):我可以失戀十次,卻不能失去你一次,讓人淚目...
    冰冰不怕加熱閱讀 1,223評(píng)論 21 19

友情鏈接更多精彩內(nèi)容