作業(yè)完成情況——自然語(yǔ)言處理

參與人員:

  1. 余艾鍶、2. 程會(huì)林、3. 黃莉婷、4. 梁清源、5. 曾偉、6. 陳南浩

完成檢查:博客(讀書(shū)筆記)、課后習(xí)題答案、代碼、回答問(wèn)題

《Text Mining and Analytics》(12.13)
https://www.coursera.org/learn/text-mining

Week1:

Guiding Questions

Develop your answers to the following guiding questions while watching the video lectures throughout the week.

  1. What does a computer have to do in order to understand a natural language sentence?
  2. What is ambiguity?
  3. Why is natural language processing (NLP) difficult for computers?
  4. What is bag-of-words representation?
  5. Why is this word-based representation more robust than representations derived from syntactic and semantic analysis of text?
  6. What is a paradigmatic relation?
  7. What is a syntagmatic relation?
  8. What is the general idea for discovering paradigmatic relations from text?
  9. What is the general idea for discovering syntagmatic relations from text?
  10. Why do we want to do Term Frequency Transformation when computing similarity of context?
  11. How does BM25 Term Frequency transformation work?
  12. Why do we want to do Inverse Document Frequency (IDF) weighting when computing similarity of context?

未完成:

已完成:

黃莉婷
http://blog.csdn.net/weixin_40962955/article/details/78828721
梁清源
http://blog.csdn.net/qq_33414271/article/details/78802272
http://www.itdecent.cn/u/337e85e2a284
曾偉
http://www.itdecent.cn/p/9e520d5ccdaa
程會(huì)林
http://blog.csdn.net/qq_35159009/article/details/78836340
余艾鍶
http://blog.csdn.net/xy773545778/article/details/78829053
陳南浩
http://blog.csdn.net/DranGoo/article/details/78850788

Week2:

Guiding Questions
Develop your answers to the following guiding questions while watching the video lectures throughout the week.

  1. What is entropy? For what kind of random variables does the entropy function reach its minimum and maximum, respectively? 1
  2. What is conditional entropy? 2
  3. What is the relation between conditional entropy H(X|Y) and entropy H(X)? Which is larger? 3
  4. How can conditional entropy be used for discovering syntagmatic relations? 4
  5. What is mutual information I(X;Y)? How is it related to entropy H(X) and conditional entropy H(X|Y)? 5
  6. What’s the minimum value of I(X;Y)? Is it symmetric? 6
  7. For what kind of X and Y, does mutual information I(X;Y) reach its minimum? For a given X, for what Y does I(X;Y) reach its maximum? 1
  8. Why is mutual information sometimes more useful for discovering syntagmatic relations than conditional entropy?
    What is a topic? 2
  9. How can we define the task of topic mining and analysis computationally? What’s the input? What’s the output? 3
  10. How can we heuristically solve the problem of topic mining and analysis by treating a term as a topic? What are the main problems of such an approach? 4
  11. What are the benefits of representing a topic by a word distribution? 5
  12. What is a statistical language model? What is a unigram language model? How can we compute the probability of a sequence of words given a unigram language model? 6
  13. What is Maximum Likelihood estimate of a unigram language model given a text article? 1
  14. What is the basic idea of Bayesian estimation? What is a prior distribution? What is a posterior distribution? How are they related with each other? What is Bayes rule? 2

未完成:陳南浩

已完成:
梁清源
http://blog.csdn.net/qq_33414271/article/details/78871154
程會(huì)林
http://www.itdecent.cn/p/61614d406b0f
黃莉婷
http://blog.csdn.net/weixin_40962955/article/details/78877103
余艾鍶
http://blog.csdn.net/xy773545778/article/details/78848613
曾偉
http://blog.csdn.net/qq_39759159/article/details/78882651

Week3:

Guiding Questions
Develop your answers to the following guiding questions while watching the video lectures throughout the week.

  1. What is a mixture model? In general, how do you compute the probability of observing a particular word from a mixture model? What is the general form of the expression for this probability? 3
  2. What does the maximum likelihood estimate of the component word distributions of a mixture model behave like? In what sense do they “collaborate” and/or “compete”? 4
  3. Why can we use a fixed background word distribution to force a discovered topic word distribution to reduce its probability on the common (often non-content) words? 5
  4. What is the basic idea of the EM algorithm? What does the E-step typically do? What does the M-step typically do? In which of the two steps do we typically apply the Bayes rule? Does EM converge to a global maximum? 6
  5. What is PLSA? How many parameters does a PLSA model have? How is this number affected by the size of our data set to be mined? How can we adjust the standard PLSA to incorporate a prior on a topic word distribution? 1
  6. How is LDA different from PLSA? What is shared by the two models? 2

未完成:余艾鍶
已完成:
程會(huì)林:公式歸一化為什么不同?
http://www.itdecent.cn/p/bcef1ad7a530?utm_campaign=haruki&utm_content=note&utm_medium=reader_share&utm_source=qq
曾偉
http://www.cnblogs.com/Negan-ZW/p/8179076.html
梁清源
http://blog.csdn.net/qq_33414271/article/details/78938301
黃莉婷 LDA 的原理
http://blog.csdn.net/weixin_40962955/article/details/78941383#t10
陳南浩
http://blog.csdn.net/DranGoo/article/details/78968749

Week4:

Guiding Questions
Develop your answers to the following guiding questions while watching the video lectures throughout the week.

  1. What is clustering? What are some applications of clustering in text mining and analysis? 3
  2. How can we use a mixture model to do document clustering? How many parameters are there in such a model? 4
  3. How is the mixture model for document clustering related to a topic model such as PLSA? In what way are they similar? Where are they different? 5
  4. How do we determine the cluster for each document after estimating all the parameters of a mixture model? 6
  5. How does hierarchical agglomerative clustering work? How do single-link, complete-link, and average-link work for computing group similarity? Which of these three ways of computing group similarity is least sensitive to outliers in the data? 1
  6. How do we evaluate clustering results? 2
  7. What is text categorization? What are some applications of text categorization? 3
  8. What does the training data for categorization look like?
  9. How does the Na?ve Bayes classifier work? 4
  10. Why do we often use logarithm in the scoring function for Na?ve Bayes? 5

未完成:余艾鍶、程會(huì)林、黃莉婷、梁清源、曾偉、陳南浩
已完成:

Week5:

未完成:余艾鍶、程會(huì)林、黃莉婷、梁清源、曾偉、陳南浩
已完成:

Week6:

未完成:余艾鍶、程會(huì)林、黃莉婷、梁清源、曾偉、陳南浩
已完成:

《Text Retrieval and Search Engines》(12.13)

https://www.coursera.org/learn/text-retrieval

Week1:

未完成:余艾鍶、程會(huì)林、黃莉婷、梁清源、曾偉、陳南浩
已完成:

Week2:

未完成:余艾鍶、程會(huì)林、黃莉婷、梁清源、曾偉、陳南浩
已完成:

Week3:

未完成:余艾鍶、程會(huì)林、黃莉婷、梁清源、曾偉、陳南浩
已完成:

Week4:

未完成:余艾鍶、程會(huì)林、黃莉婷、梁清源、曾偉、陳南浩
已完成:

Week5:

未完成:余艾鍶、程會(huì)林、黃莉婷、梁清源、曾偉、陳南浩
已完成:

Week6:

未完成:余艾鍶、程會(huì)林、黃莉婷、梁清源、曾偉、陳南浩
已完成:

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 忙碌里真的不知道過(guò)去的幾天都發(fā)生了什么,一周前,三四天前,前天,昨天都發(fā)生了什么,渾然不知了。。。有很久沒(méi)有寫(xiě)日記...
    撿書(shū)時(shí)代閱讀 166評(píng)論 0 0
  • 今天又向客戶承諾了,又沒(méi)有做到,真的想找面墻裝進(jìn)去,不出來(lái)了。事情是這樣的: 客戶下午1:50打電話過(guò)來(lái),詢問(wèn)課件...
    王小慧閱讀 527評(píng)論 0 1
  • 在夜晚的列車上 身體安靜地旅行 神經(jīng)還被白天的情境牽扯 混進(jìn)來(lái)了過(guò)去的幸運(yùn)和不幸 模糊的呈現(xiàn)出好看的七色光 柔軟的...
    喜樂(lè)心記閱讀 212評(píng)論 0 2
  • 最近,我可能有些悲觀。最近,我看的書(shū)都與“死亡”相關(guān)。大可不必?fù)?dān)心,我沒(méi)有故作勇敢,也沒(méi)有熬那虛妄的雞湯叫你們“要...
    小薇子閱讀 448評(píng)論 0 0

友情鏈接更多精彩內(nèi)容