Python nltk 英文詞性分析

在NLP任務中,常需要分析單詞的詞性,借助nltk庫的pos_tag方法可以較好地實現(xiàn)。

以下是一個例子:

import nltk
line = 'i love this world which was beloved by all the people here'
tokens = nltk.word_tokenize(line)
# ['i', 'love', 'this', 'world', 'which', 'was', 'beloved', 'by', 
# 'all', 'the', 'people', 'here']
pos_tags = nltk.pos_tag(tokens)
# [('i', 'RB'), ('love', 'VBP'), ('this', 'DT'), ('world', 'NN'), ('which', 'WDT'), 
# ('was', 'VBD'), ('beloved', 'VBN'), ('by', 'IN'), ('all', 'PDT'), ('the', 'DT'),
# ('people', 'NNS'), ('here', 'RB')]
for word,pos in pos_tags:
     if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS'):
             print word,pos
# world NN
# people NNS

作為nltk的替代,TextBlob庫能夠更進一步進行詞組劃分,例如“computer science”會被當做一個單詞,而非"computer"和"science"

from textblob import TextBlob
txt = """Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the inter
actions between computers and human (natural) languages."""
blob = TextBlob(txt)
print(blob.noun_phrases)
# [u'natural language processing', 'nlp', u'computer science', u'artificial intelligence', u'computational linguistics']

更多例子請參考nltk官方教科書第五章
其中pos_tag分析出來的詞性含義按照賓夕法尼亞大學tag詞性對照表

tag 含義
CC Coordinating conjunction
CD Cardinal number
DT Determiner
EX Existential there
FW Foreign word
IN Preposition or subordinating conjunction
JJ Adjective
JJR Adjective, comparative
JJS Adjective, superlative
LS List item marker
MD Modal
NN Noun, singular or mass
NNS Noun, plural
NNP Proper noun, singular
NNPS Proper noun, plural
PDT Predeterminer
POS Possessive ending
PRP Personal pronoun
PRP$ Possessive pronoun
RB Adverb
RBR Adverb, comparative
RBS Adverb, superlative
RP Particle
SYM Symbol
TO to
UH Interjection
VB Verb, base form
VBD Verb, past tense
VBG Verb, gerund or present participle
VBN Verb, past participle
VBP Verb, non-3rd person singular present
VBZ Verb, 3rd person singular present
WDT Wh-determiner
WP Wh-pronoun
WP$ Possessive wh-pronoun
WRB Wh-adverb
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容