代碼來源

北京理工大學(xué)慕課-嵩天老師課程，統(tǒng)計三國演義人物出現(xiàn)最多的前15位。

#CalThreeKingdomsV2.py
import jieba
excludes = {"將軍","卻說","荊州","二人","不可","不能","如此","商議","如何","軍士"}
txt = open("threekingdoms.txt", "r", encoding='utf-8').read()
words  = jieba.lcut(txt)
counts = {}
for word in words:
    if len(word) == 1:
        continue
    elif word == "諸葛亮" or word == "孔明曰":
        rword = "孔明"
    elif word == "關(guān)公" or word == "云長":
        rword = "關(guān)羽"
    elif word == "玄德" or word == "玄德曰":
        rword = "劉備"
    elif word == "孟德" or word == "丞相":
        rword = "曹操"
    else:
        rword = word
    counts[rword] = counts.get(rword,0) + 1
for word in excludes:
    del counts[word]
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True) 
for i in range(15):
    word, count = items[i]
   print ("{0:<10}{1:>5}".format(word, count))

思路

用jieda庫切片，所以根本不需要考慮標點符號、空格的去除。
把把一個人的不同稱謂統(tǒng)一成一類
把每個詞出現(xiàn)次數(shù)寫進字典{詞語：出現(xiàn)次數(shù)}
把一些顯然易見不是人名的詞從counts字典中刪掉（這依賴于多運行幾次這段代碼，然后設(shè)置一個excludes詞庫，再運行，再擴充詞庫。
把counts字典，用.itmes弄成鍵值對信息，再用list轉(zhuǎn)成列表。見下：

>>> a={"2d":"哈",23:"s"}
>>> print(a)
{'2d': '哈', 23: 's'}
>>> c=a.items()
>>> print(c)
dict_items([('2d', '哈'), (23, 's')])
>>> list(c)
[('2d', '哈'), (23, 's')]
>>> print(c)
dict_items([('2d', '哈'), (23, 's')])
>>> print(list(c))
[('2d', '哈'), (23, 's')]
>>>

用.sort函數(shù)排序，其中排序依據(jù)key用一個匿名函數(shù)lambda表達，這里搞不太清，反正是用列表的二維x[1]作為排序依據(jù)，reverse=True即從大到小輸出為新的items列表。
for循環(huán)跑15次，把前15輸出。

另外

老師還講了莎士比亞《哈姆雷特》單詞出現(xiàn)的排序。

#CalHamletV1.py
def getText():
    txt = open("hamlet.txt", "r").read()
    txt = txt.lower()
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':
        txt = txt.replace(ch, " ")   #將文本中特殊字符替換為空格
    return txt

hamletTxt = getText()
words  = hamletTxt.split()
counts = {}
for word in words:          
    counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True) 
for i in range(10):
    word, count = items[i]
    print ("{0:<10}{1:>5}".format(word, count))

于是

是否可以統(tǒng)計《哈姆雷特》和《三國演義》除符號、空格外的數(shù)量。

def getText():
    txt = open("hamlet.txt", "r").read()
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':
        txt = txt.strip(ch)   #將文本中特殊字符替換為空格
    return txt
l=getText()
count=len(l)
print("全文總字母數(shù)：{}".format(count))

對于哈姆雷特，老師也是這么處理的，我先不管了。
但......

#CalThreeKingdomsV2.py
txt = open("threekingdoms.txt", "r", encoding='utf-8').read()
for ch in '!"#$%&()*+,-。/:;<=>?@[\\]^_‘{|}~':
        txt = txt.strip(ch)   #將文本中特殊字符替換為空格
count=len(txt)
print(count)

輸出結(jié)果：602415
我在那一長串字符中，加了個空格，本來預(yù)期，字數(shù)會減少，結(jié)果紋絲不動，我又刪了幾個字符，以為會增多，結(jié)果也紋絲不動，好吧，有問題。
再寫。

#CalThreeKingdomsV2.py
import re
txt = open("threekingdoms.txt", "r", encoding='utf-8').read()
clear='[!"#$%&()*+,-。/:;<=>?@[\\] ^_‘{|}~]'
str=re.sub(clear,"",txt)   
count=len(str)
print(count)

輸出結(jié)果：555212
反正，貌似是靠譜的，這里面用了re庫（我目前完全不懂），還發(fā)現(xiàn)，要注意它的使用

>>> str=re.sub('[a]',"d",txt)
>>> print(str)
dddd

若不然，就錯了

>>> str=re.sub([a],"d",txt)
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
    str=re.sub([a],"d",txt)
  File "D:\download\python\lib\re.py", line 210, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "D:\download\python\lib\re.py", line 294, in _compile
    return _cache[type(pattern), pattern, flags]
TypeError: unhashable type: 'list'

雙引號也行哈

txt="adda"
>>> str=re.sub("[a]","d",txt)
>>> print(str)
dddd

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

2022-04-04 統(tǒng)計《三國演義》漢字數(shù)量

2022-04-04 統(tǒng)計《三國演義》漢字數(shù)量

代碼來源

思路

另外

于是

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

2022-04-04 統(tǒng)計《三國演義》漢字數(shù)量

代碼來源

思路

另外

于是

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av