學(xué)習(xí)Python的第三天

實(shí)現(xiàn)詞云的繪制

步驟:

1.繪制詞云的形狀

from wordcloud import WordCloud
import jieba
import imageio

mask = imageio.imread('./china.jpg')    #要繪制詞云的形狀

2.讀取小說(shuō)內(nèi)容

with open('./novel/threekingdom.txt', 'r', encoding='utf-8') as f:
    words = f.read()

    counts = {}  # {‘曹操’:234,‘回寨’:56}
    excludes = {"將軍", "卻說(shuō)", "丞相", "二人", "不可", "荊州", "不能", "如此", "商議",
                "如何", "主公", "軍士", "軍馬", "左右", "次日", "引兵", "大喜", "天下",
                "東吳", "于是", "今日", "不敢", "魏兵", "陛下", "都督", "人馬", "不知",
                "孔明曰","玄德曰","劉備","云長(zhǎng)"}

3.分詞

    words_list = jieba.lcut(words)
    # print(words_list)
    for word in words_list:
        if len(word) <= 1:
            continue
        else:
            # 更新字典中的值
            # counts[word] = 取出字典中原來(lái)鍵對(duì)應(yīng)的值 + 1
            # counts[word] = counts[word] + 1  # counts[word]如果沒(méi)有就要報(bào)錯(cuò)
            # 字典。get(k) 如果字典中沒(méi)有這個(gè)鍵 返回 NONE
            counts[word] = counts.get(word, 0) + 1

4.詞語(yǔ)過(guò)濾,刪除無(wú)關(guān)詞,重復(fù)詞

    counts['孔明'] =  counts['孔明'] +  counts['孔明曰']
    counts['玄德'] = counts['玄德'] + counts['玄德曰'] +counts['劉備']
    counts['關(guān)公'] = counts['關(guān)公'] +counts['云長(zhǎng)']
    for word in excludes:
        del counts[word]

5.排序

    items = list(counts.items())
    print(items)

    def sort_by_count(x):
        return x[1]
    items.sort(key=sort_by_count, reverse=True)

6.序列解包

    li=[]
    for i in range(10):
        # 序列解包
        role, count = items[i]
        print(role, count)
        for _ in range(count):      #_是告訴看代碼的人循環(huán)里不需要使用臨時(shí)變量
            li.append(role)

7.結(jié)論

text=' '.join(li)
    WordCloud(
        font_path='msyh.ttc',
        background_color='white',
        width=800,
        height=600,
        mask=mask,
        #相鄰兩個(gè)值的重復(fù)
        collocations=False
    ).generate(text).to_file('Top10.png')

匿名函數(shù)

匿名函數(shù):lambda函數(shù)是一種快速定義單行的最小函數(shù),可以用在任何需要用到函數(shù)的地方。
匿名函數(shù)返回值是一個(gè)函數(shù)對(duì)象,可以使用變量去接收這個(gè)對(duì)象。例如 x = lambda x,y:x*y ,代表x是一個(gè)計(jì)算兩數(shù)想成的函數(shù),調(diào)用時(shí)可以寫(xiě)成x(3,5)

匿名函數(shù)的優(yōu)點(diǎn):

-使代碼精簡(jiǎn)
-有些只使用一次的函數(shù),不用為其命名
-讓代碼更容易理解

格式:

lambda 參數(shù):返回值

例子

sum=lambda x1,x2:x1+x2
print(sum (2,3))

參數(shù)可以有無(wú)數(shù)個(gè),但是表達(dá)式只能有一個(gè)
下例函數(shù)改為為匿名函數(shù)

name_info_list = [
    ('張三',4500),
    ('李四',9900),
    ('王五',2000),
    ('趙六',5500),
]
def sort_by_gz(x):
     return x[1]
name_info_list.sort(key = sort_by_gz)
print('排序后', name_info_list)

修改之后

name_info_list = [
    ('張三',4500),
    ('李四',9900),
    ('王五',2000),
    ('趙六',5500),
]
name_info_list.sort(key=lambda x:x[1],reverse=True)
print(name_info_list)

lambda匿名函數(shù)是python語(yǔ)言的一種特色,當(dāng)我們不需要定義一個(gè)函數(shù)的時(shí)候,可以使用匿名函數(shù)來(lái)做。

匿名函數(shù)的限制:

就是只能有一個(gè)表達(dá)式,不用寫(xiě)return,返回值就是該表達(dá)式的結(jié)果。

匿名函數(shù)的好處:

即函數(shù)沒(méi)有名字,不用擔(dān)心函數(shù)名沖突,此外,匿名函數(shù)也是一個(gè)函數(shù)對(duì)象,也可以把匿名函數(shù)賦值給一個(gè)變量,再利用變量來(lái)調(diào)用該函數(shù)。

列表推導(dǎo)式,列表解析和字典解析

推導(dǎo)式comprehensions(又稱解析式),是Python的一種獨(dú)有特性。推導(dǎo)式是可以從一個(gè)數(shù)據(jù)序列構(gòu)建另一個(gè)新的數(shù)據(jù)序列的結(jié)構(gòu)體。

1.列表推導(dǎo)式

之前我們創(chuàng)建列表是利用for循環(huán)

li=[]
for i in range(10):
    li.append(i)
print(li)

列表推導(dǎo)式,也叫列表解析式,英文名稱為list comprehension,可以使用非常簡(jiǎn)潔的方式來(lái)快速生成滿足特定需求的列表,代碼具有非常強(qiáng)的可讀性。另外,Python的內(nèi)部實(shí)現(xiàn)對(duì)列表推導(dǎo)式做了大量?jī)?yōu)化,可以保證很快的運(yùn)行速度。

格式:[表達(dá)式 for 臨時(shí)變量 in 可迭代對(duì)象 可以追加條件]

使用列表推導(dǎo)式

print([i for i in range(10)])

利用列表推導(dǎo)式只用一條語(yǔ)句就可以創(chuàng)建列表了。

列表解析

篩選出列表中所以的偶數(shù)

li=[]
for i in range(10):
    if i % 2 ==0:
        li.append(i)
print(li)
使用列表解析
print([i for i in range(10) if i%2==0])

篩選出列表中大于0的數(shù)(隨機(jī)產(chǎn)生10個(gè)數(shù))

from  random import  randint
num_list=[randint(-10,10) for _ in range(10)]
print(num_list)
print([i for i in num_list if i>0])
字典解析

生成100給學(xué)生的成績(jī)

stu_grades={'student{}'.format(i):randint(50,100) for i in range (1,101)}
print(stu_grades)

篩選大于60分的所以學(xué)生

print({k:v for k,v in stu_grades.items() if v>60})

matplotlib庫(kù)

1.曲線圖

from matplotlib import pyplot as plt

用100個(gè)點(diǎn)繪制正弦曲線圖[0,2pi]

import numpy as np

plt.rcParams["font.sans-serif"] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False


x=np.linspace(0,2*np.pi,num=100)
print(x)
y=np.sin(x)
plt.plot(x,y,color='g',linestyle='--',label='sin(x)')

cosy=np.cos(x)

plt.plot(x,cosy, color='r',label='cos(x)')
plt.xlabel('時(shí)間(s)')
plt.ylabel('電壓(v)')
plt.title('歡迎來(lái)到python世界')

plt.legend()
plt.show()

2.柱狀圖

import string
from random import randint
# print(string.ascii_uppercase[0:6])
# ['A', 'B', 'C'...]
x = ['口紅{}'.format(x) for x in string.ascii_uppercase[0:5]]

y = [randint(200, 500) for _ in range(5)]
print(x)
print(y)
plt.xlabel('口紅品牌')
plt.ylabel('價(jià)格(元)')
plt.bar(x, y)
plt.show()

3.餅圖

隨機(jī)產(chǎn)生6個(gè)員工的工資范圍在(3500, 9000),并畫(huà)出餅圖

from random import randint
import string
counts = [randint(3500, 9000) for _ in range(6)]
labels = ['員工{}'.format(x) for x in string.ascii_lowercase[:6] ]
# 距離圓心點(diǎn)距離
explode = [0.1,0,0, 0, 0,0]
colors = ['red', 'purple','blue', 'yellow','gray','green']
plt.pie(counts,explode = explode,shadow=True, labels=labels, autopct = '%1.1f%%',colors=colors)
plt.legend(loc=2)
plt.axis('equal')
plt.show()

4.散點(diǎn)圖

均值為0,標(biāo)準(zhǔn)差為1的正太分布數(shù)據(jù)

x=np.random.normal(0,1,100)
y=np.random.normal(0,1,100)
plt.scatter(x,y,alpha=0.5)
plt.show()
x=np.random.normal(0,1,100000)
y=np.random.normal(0,1,100000)
plt.scatter(x,y,alpha=0.1)
plt.show()

將之前做好的三國(guó)top10人物以餅圖的方式展示

    li=[]
    peo_li=[]
    for i in range(10):
        # 序列解包
        role, count = items[i]
        a={'name':'','count':0}
        a['name']=role
        a['count']=count
        peo_li.append(a)
        print(role, count)
        for _ in range(count):      #_是告訴看代碼的人循環(huán)里不需要使用臨時(shí)變量
            li.append(role)
#在解包的同時(shí)將前十的人物即出現(xiàn)次數(shù)存放當(dāng)peo_list列表中
    counts = []
    labels = []
    for i in range(len(peo_li)):
        counts.append(peo_li[i]['count'])
        labels.append(peo_li[i]['name'])
    # 距離圓心點(diǎn)距離
    explode = [0.1, 0, 0, 0, 0, 0,0,0,0,0]
    #colors = ['red', 'purple', 'blue', 'yellow', 'gray', 'green']
    plt.pie(counts, explode=explode, shadow=True, labels=labels, autopct = '%1.1f%%')
    plt.legend(loc=2)
    plt.axis('equal')
    plt.show()

完整代碼:

from wordcloud import WordCloud
import jieba
import imageio

mask = imageio.imread('./china.jpg')
1.讀取小說(shuō)內(nèi)容
with open('./novel/threekingdom.txt', 'r', encoding='utf-8') as f:
    words = f.read()

    counts = {}  # {‘曹操’:234,‘回寨’:56}
    excludes = {"將軍", "卻說(shuō)", "丞相", "二人", "不可", "荊州", "不能", "如此", "商議",
                "如何", "主公", "軍士", "軍馬", "左右", "次日", "引兵", "大喜", "天下",
                "東吳", "于是", "今日", "不敢", "魏兵", "陛下", "都督", "人馬", "不知",
                "孔明曰","玄德曰","劉備","云長(zhǎng)"}
2. 分詞
    words_list = jieba.lcut(words)
    # print(words_list)
    for word in words_list:
        if len(word) <= 1:
            continue
        else:
            # 更新字典中的值
            # counts[word] = 取出字典中原來(lái)鍵對(duì)應(yīng)的值 + 1
            # counts[word] = counts[word] + 1  # counts[word]如果沒(méi)有就要報(bào)錯(cuò)
            # 字典。get(k) 如果字典中沒(méi)有這個(gè)鍵 返回 NONE
            counts[word] = counts.get(word, 0) + 1

    print(len(counts))
3. 詞語(yǔ)過(guò)濾,刪除無(wú)關(guān)詞,重復(fù)詞
    counts['孔明'] =  counts['孔明'] +  counts['孔明曰']
    counts['玄德'] = counts['玄德'] + counts['玄德曰'] +counts['劉備']
    counts['關(guān)公'] = counts['關(guān)公'] +counts['云長(zhǎng)']
    for word in excludes:
        del counts[word]
4.排序 [(), ()]
    items = list(counts.items())
    print(items)

    # def sort_by_count(x):
    #     return x[1]
    # items.sort(key=sort_by_count, reverse=True)
    items.sort(key=lambda i: i[1], reverse=True)
    li=[]
    peo_li=[]
    for i in range(10):
        # 序列解包
        role, count = items[i]
        a={'name':'','count':0}
        a['name']=role
        a['count']=count
        peo_li.append(a)
        print(role, count)
        for _ in range(count):      #_是告訴看代碼的人循環(huán)里不需要使用臨時(shí)變量
            li.append(role)
5.得出結(jié)論
    text=' '.join(li)
    WordCloud(
        font_path='msyh.ttc',
        background_color='white',
        width=800,
        height=600,
        mask=mask,
        #相鄰兩個(gè)值的重復(fù)
        collocations=False
    ).generate(text).to_file('Top10.png')

    #用餅圖顯示人物
    from random import randint
    import string
    from matplotlib import pyplot as plt

    plt.rcParams["font.sans-serif"] = ['SimHei']
    plt.rcParams['axes.unicode_minus'] = False

    counts = []
    labels = []
    for i in range(len(peo_li)):
        counts.append(peo_li[i]['count'])
        labels.append(peo_li[i]['name'])
    # 距離圓心點(diǎn)距離
    explode = [0.1, 0, 0, 0, 0, 0,0,0,0,0]
    #colors = ['red', 'purple', 'blue', 'yellow', 'gray', 'green']
    plt.pie(counts, explode=explode, shadow=True, labels=labels, autopct = '%1.1f%%')
    plt.legend(loc=2)
    plt.axis('equal')
    plt.show()

練習(xí):將紅樓夢(mèng)的top10人物繪制餅圖

完整代碼

from wordcloud import WordCloud
import jieba
import imageio

mask = imageio.imread('./china.jpg')
1.讀取小說(shuō)內(nèi)容
with open('./novel/all.txt', 'r', encoding='utf-8') as f:
    words = f.read()
    #print(words)
    counts = {}
    excludes = {"什么", "一個(gè)", "我們", "你們", "如今", "說(shuō)道", "知道", "起來(lái)", "這里",
               "出來(lái)", "眾人", "那里", "自己", "一面", "只見(jiàn)", "太太", "兩個(gè)", "沒(méi)有",
               "怎么", "不是", "不知", "這個(gè)", "聽(tīng)見(jiàn)", "這樣", "進(jìn)來(lái)", "咱們", "就是",
               "老太太", "東西", "告訴", "回來(lái)", "只是", "大家", "姑娘", "奶奶", "鳳姐兒","分節(jié)"}
2. 分詞
    words_list = jieba.lcut(words)
    for word in words_list:
        if len(word) <= 1:
            continue
        else:
            # 更新字典中的值
            # counts[word] = 取出字典中原來(lái)鍵對(duì)應(yīng)的值 + 1
            # counts[word] = counts[word] + 1  # counts[word]如果沒(méi)有就要報(bào)錯(cuò)
            # 字典。get(k) 如果字典中沒(méi)有這個(gè)鍵 返回 NONE
            counts[word] = counts.get(word, 0) + 1

    print(len(counts))
3. 詞語(yǔ)過(guò)濾,刪除無(wú)關(guān)詞,重復(fù)詞
    counts['賈母'] =  counts['賈母'] +  counts['老太太']
    counts['寶釵'] = counts['寶釵'] + counts['薛寶釵']
    counts['鳳姐'] = counts['鳳姐兒'] + counts['王熙鳳'] +counts['鳳姐']
    counts['寶玉'] = counts['賈寶玉'] +counts['寶玉']
    counts['王夫人'] = counts['王夫人'] + counts['太太']
    counts['黛玉'] = counts['黛玉'] + counts['林黛玉']
    counts['賈政']=counts['賈政']+counts['老爺']
    for word in excludes:
        del counts[word]
4.排序 [(), ()]
    items = list(counts.items())
    #print(items)
   items.sort(key=lambda i: i[1], reverse=True)
    li=[]
    peo_li=[]
    for i in range(10):
        # 序列解包
        role, count = items[i]
        a={'name':'','count':0}
        a['name']=role
        a['count']=count
        peo_li.append(a)
        print(role, count)
        for _ in range(count):      #_是告訴看代碼的人循環(huán)里不需要使用臨時(shí)變量
            li.append(role)
5.得出結(jié)論
    text=' '.join(li)
    WordCloud(
        font_path='msyh.ttc',
        background_color='white',
        width=800,
        height=600,
        mask=mask,
        #相鄰兩個(gè)值的重復(fù)
        collocations=False
    ).generate(text).to_file('紅樓Top10.png')

    #用餅圖顯示人物
    from random import randint
    import string
    from matplotlib import pyplot as plt

    plt.rcParams["font.sans-serif"] = ['SimHei']
    plt.rcParams['axes.unicode_minus'] = False

    counts = []
    labels = []
    for i in range(len(peo_li)):
        counts.append(peo_li[i]['count'])
        labels.append(peo_li[i]['name'])
    # 距離圓心點(diǎn)距離
    explode = [0.1, 0, 0, 0, 0, 0,0,0,0,0]
    #colors = ['red', 'purple', 'blue', 'yellow', 'gray', 'green']
    plt.pie(counts, explode=explode, shadow=True, labels=labels, autopct = '%1.1f%%')
    plt.legend(loc=2)
    plt.axis('equal')
    plt.show()
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容