image.png

一、Numpy

NumPy庫名字來源于“Numerical Python” 的縮寫。

1.1、數(shù)組的概念

a為定義的一個列表，b為定義的一個數(shù)組。

import numpy as np
a = [1, 2, 3, 4, 5]  # 定義一個列表
b = np.array([1, 2, 3, 4, 5])  # 定義一個數(shù)組

image.png

1.2、數(shù)據(jù)list和列表ndarray的區(qū)別

image.png

總結(jié)：數(shù)組和列表的區(qū)別，相同的索引機制，但是數(shù)組的元素通過空格隔開，列表的元素通過逗號隔開。

為什么數(shù)據(jù)分析選用數(shù)組ndarray而不用列表？看兩個栗子。

1.2.1、讓數(shù)組和列表都同時乘以2運算

image.png

總結(jié)：列表list乘以2后只是將數(shù)據(jù)復(fù)制了一遍，而數(shù)組ndarray則是將其中的每個元素都乘以了2。那么在實際運用中，ndarray更能滿足我們對于數(shù)學運算的需求。

1.2.2、讓數(shù)組和列表都存儲3個小列表

a = [[1,2,3], [4,5,6], [7,8,9]]
b = np.array([[1,2,3], [4,5,6], [7,8,9]])

image.png

總結(jié)：列表list雖然包含3個小列表，但是結(jié)構(gòu)是一維的。數(shù)組ndarray則是3行3列的三維結(jié)構(gòu)，列表list只能存儲一維結(jié)構(gòu)，數(shù)組ndarray能存儲二維，三維，甚至多維的結(jié)構(gòu)。

以上兩點就是為什么數(shù)據(jù)分析中使用數(shù)組ndarray而不是列表list的主要原因。

1.3、創(chuàng)建數(shù)組的方式
創(chuàng)建一維數(shù)組

創(chuàng)建二維數(shù)據(jù)

image.png

通過np.arange創(chuàng)建

image.png

通過隨機數(shù)創(chuàng)建

image.png

需要注意的是：通過np.random.rand() 生成的隨機數(shù)都在0~1的范圍之內(nèi)。

通過arange()創(chuàng)建

image.png

通過隨機整數(shù)二維數(shù)組創(chuàng)建

image.png

二、pandas

pandas庫是基于NumPy庫的一個開源Python庫，名字來源于 “panel data”（面板數(shù)據(jù)）。pandas庫提供了非常直觀的數(shù)據(jù)結(jié)構(gòu)和強大的數(shù)據(jù)處理功能。

2.1、二維數(shù)據(jù)表格DataFrame的創(chuàng)建
2.1.1、通過列表創(chuàng)建

image.png

總結(jié)：pandas庫中的DataFrame更像Excel中的二維表格數(shù)據(jù)，有行索引和列索引。需要注意的是，索引都是從0開始的。

定義行索引和列索引：

image.png

通過空 DataFrame創(chuàng)建：

image.png

2.1.2、通過字典創(chuàng)建DataFrame

image.png

通過from_dict()函數(shù)創(chuàng)建

image.png

可以看到，通過from_dict()把字典鍵變成了行索引，參數(shù)orient指定字典鍵為列索引還是行索引，默認值為columns，即默認字典鍵為列索引，如果設(shè)置成index，則表示字典鍵為行索引。

通過.T轉(zhuǎn)置：

image.png

總結(jié)：.T （轉(zhuǎn)置）的效果和 orient='index'的效果一樣。

2.1.3、通過二維數(shù)據(jù)創(chuàng)建DataFrame：

image.png

修改行索引和列索引：

image.png

查看index的值：

image.png

將行索引改成某列內(nèi)容：

image.png

改成數(shù)字索引：

image.png

2.2、Excel文件的讀取和寫入
2.2.1、文件的讀取：

import pandas as pd
data = pd.read_excel('Customer_value.xlsx')
data.head()

image.png

參數(shù)的設(shè)置：

data = pd.read_excel('Customer_value.xlsx', sheet_name=0, encoding='utf-8', delimiter=',')

sheet_name : 表示讀取第幾張sheet頁的表格
delimiter ：表示分割符號

2.2.2、文件的寫入

a = pd.DataFrame(np.arange(15).reshape(3,5), index=['張三', '李四', '王五'], columns=['A', 'B', 'C', 'D', 'E'])
a.to_excel('data_test.xlsx')
# 或者
a.to_csv('data_test.csv')

參數(shù)的設(shè)置：

a.to_excel('data_test.xlsx', columns=['B', 'C'], index=False)

columns : 指定要寫入的字段
index = False : 不對行索引index進行寫入

2.3、數(shù)據(jù)的選取與處理

image.png

2.3.1、數(shù)據(jù)的選取
按列選取

image.png

總結(jié)：通過 d['col2']選取的是一個一維的Series類型的數(shù)據(jù)，是不包含columns的，而通過 d[['col2']]返回的是一個二維的表格數(shù)據(jù)。而要選取多列時，必須要通過列表的方式進行訪問，即[['col1', 'col2']]

選取多列：

image.png

按行選取

image.png

選取1到2行，中括號中的數(shù)字，是前面的數(shù)字包括，后面的數(shù)字不包括。

image.png

總結(jié)：直接d[1:3] 和 d.iloc[1:3] 的效果是一樣的，panda庫推薦使用iloc方法，更加直觀。

選取前1行：

image.png

選取最后1行：

image.png

注意：head() 和 tail() 括號中的數(shù)字如果不填的話，默認是5.

** 按區(qū)塊選取數(shù)據(jù)**

image.png

總結(jié)：d[['col1', 'col2']][1:3]和d[1:3][['col1', 'col2']]，d.iloc[1:3][['col1', 'col2']]三者的選取效果是一樣的，不過pandas庫官方更推薦用 d.iloc[1:3][['col1', 'col2']]

2.3.2、數(shù)據(jù)篩選
單條件篩選：

image.png

多條件篩選：

image.png

2.3.3、整體數(shù)據(jù)查看
查看數(shù)據(jù)的行與列的個數(shù)：

image.png

查看數(shù)據(jù)的描述性分析：

image.png

查看數(shù)據(jù)在1%，10%，25% ，50%，75%，90%，99%上的分布：

image.png

查看數(shù)據(jù)概覽：

image.png

統(tǒng)計某一列的數(shù)據(jù)的頻次：

image.png

2.3.4、數(shù)據(jù)的運算，排序與刪除
數(shù)據(jù)的運算：
二維數(shù)據(jù)的行列之間具有廣播功能，可以直接進行加減乘除的運算

image.png

數(shù)據(jù)的排序：

image.png

參數(shù)說明：by用來指定要按哪一列來排序，ascending默認為True，表示升序排序，設(shè)置為False則是倒序排序。

image.png

sort_index() 函數(shù)表示根據(jù)索引來進行排序。

數(shù)據(jù)的刪除：
按列索引刪除：

image.png

需要注意的是，用drop刪除時要用a = a.drop進行賦值，這樣才能刪除成功。如果直接用 a.drop（）刪除，再去查看a時，是沒有真正刪除的。如果想不用a=進行賦值的話，可以加上參數(shù) inplace = True，表示在a的基礎(chǔ)上進行刪除。這和a=賦值是等價的。代碼如下：

a.drop(columns=['col2', 'col3'], inplace=True)
a = a.drop(columns=['col2', 'col3'])  
# 這兩句代碼的效果一樣

按行索引刪除：

image.png

行索引中的inplace用法一樣，不再說明。

2.4、數(shù)據(jù)表拼接
pandas提供了merge()，join() ，concat()，append()函數(shù)進行拼接，分別介紹。
首先定義2個DataFrame，df1 和 df2：

image.png

2.4.1、merge()函數(shù)

image.png

merge函數(shù)不加任何的參數(shù)，在默認情況下，是取交集，默認參數(shù)how = 'inner' ，內(nèi)連接。

設(shè)置how參數(shù)，改成以左表為主表，對右表進行關(guān)聯(lián)：

image.png

以右表為主表，對左表進行關(guān)聯(lián)：

image.png

需要注意的是，以右表為主表關(guān)聯(lián)左表時，左表中沒有的數(shù)據(jù)會用NaN進行填充。

以索引進行合并：

image.png

2.4.2、join函數(shù)：

image.png

2.4.3、concat函數(shù)

image.png

concat函數(shù)默認縱向拼接，沒有的數(shù)據(jù)用NaN進行填充。

2.4.4、append函數(shù)

image.png

三、Matplotlib

3.1、繪制折線圖

import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore") # 忽略警告
x = [1, 2, 3]
y = [3, 6, 9]
plt.plot(x, y)  # 繪制折線圖
plt.show() #展示圖形

image.png

import matplotlib.pyplot as plt
x = [1, 2, 3]
y = [3, 6, 9]
y2 = np.array(y) + 1
plt.plot(x, y2, linestyle='--')
plt.plot(x, y) 
plt.show()

image.png

3.2、繪制柱形圖

import matplotlib.pyplot as plt
x = np.arange(5)
y = [5, 4, 3, 2, 1]
plt.bar(x, y)
plt.show()

image.png

3.3、繪制散點圖

import matplotlib.pyplot as plt
import numpy as np
x = np.random.rand(20)
y = np.random.rand(20)
plt.scatter(x, y)
plt.show()

image.png

3.4、繪制直方圖

import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(1000) # 隨機生成1000個服從正態(tài)分布的數(shù)據(jù)
plt.hist(data, bins=20, edgecolor='blue')
plt.show()

image.png

3.5、用pandas庫繪制圖表

import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(1000) # 隨機生成1000個服從正態(tài)分布的數(shù)據(jù)
df = pd.DataFrame(data)
df.hist(bins=20, edgecolor='black')
plt.show()

image.png

import matplotlib.pyplot as plt
import pylab as mpl  #導入中文字體，避免顯示亂碼
mpl.rcParams['font.sans-serif']=['SimHei']  #設(shè)置為黑體字
df = pd.DataFrame({'姓名':['張三', '李四', '王五'], '語文':[88, 80, 70], '數(shù)學':[100, 95, 90], '外語':[95, 98, 99]})
df.set_index(keys='姓名', inplace=True)
df['數(shù)學'].plot(kind='line')
df['數(shù)學'].plot(kind='bar')

image.png

df['數(shù)學'].plot(kind='pie')

image.png

df['數(shù)學'].plot(kind='box')

image.png

3.6、數(shù)據(jù)可視化常用技巧
3.6.1、添加文字說明：

import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore") # 忽略警告
x = [1, 2, 3]
y = [3, 6, 9]
plt.title('這里是標題')
plt.xlabel('這里是x軸')
plt.ylabel('這里是y軸')
plt.plot(x, y)  # 繪制折線圖
plt.show() #展示圖形

image.png

3.6.2添加圖例：

import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore") # 忽略警告
x = [1, 2, 3]
y = [3, 6, 9]
y2 = np.array(y) + 1
plt.plot(x, y2, linestyle='--', label='y=x+1')
plt.plot(x, y, label='y2=y+2') 
plt.legend()
plt.show()

image.png

3.6.3設(shè)置雙坐標軸：

import matplotlib.pyplot as plt
x = [1, 2, 3]
y = [3, 6, 9]
y2 = np.array(y) + 1
plt.plot(x, y2, linestyle='--', label='y=x+1')
plt.plot(x, y, label='y2=y+2') 
plt.legend()
plt.twinx() # 設(shè)置雙坐標軸
plt.show()

image.png

3.6.4設(shè)置圖表大小：

import matplotlib.pyplot as plt
x = [1, 2, 3]
y = [3, 6, 9]
y2 = np.array(y) + 1
plt.figure(figsize=(10,5))  # 設(shè)置圖表大小
# 或者用 plt.rcParams['figure.figsize'] = [10, 5]
plt.plot(x, y2, linestyle='--', label='y=x+1')
plt.plot(x, y, label='y2=y+2') 
plt.show()

image.png

3.6.5設(shè)置x軸刻度角度：

import matplotlib.pyplot as plt
x = [1, 2, 3]
y = [3, 6, 9]
y2 = np.array(y) + 1
plt.xticks(rotation=45)  # 設(shè)置 x 軸的刻度為45度角
plt.plot(x, y2, linestyle='--', label='y=x+1')
plt.plot(x, y, label='y2=y+2') 
plt.show()

image.png

3.6.6解決中文顯示問題：

import pylab as mpl  #導入中文字體，避免顯示亂碼
mpl.rcParams['font.sans-serif']=['SimHei']  #設(shè)置為黑體字

3.6.7繪制多圖：
方式一：

import matplotlib.pyplot as plt
df = pd.DataFrame({'姓名':['張三', '李四', '王五'], '語文':[88, 80, 70], '數(shù)學':[100, 95, 90], '外語':[95, 98, 99]})
df.set_index(keys='姓名', inplace=True)
plt.rcParams['figure.figsize'] = [15, 8]

ax1 = plt.subplot(221)  # 221分別表示子圖的行數(shù)，列數(shù)，子圖的序號
df['數(shù)學'].plot(kind='line')
df['數(shù)學'].plot(kind='bar')

ax2 = plt.subplot(222) 
df['數(shù)學'].plot(kind='box')

ax2 = plt.subplot(223) 
df['數(shù)學'].plot(kind='pie')

ax2 = plt.subplot(224) 
df['數(shù)學'].plot(kind='area')
plt.show()

image.png

方式二：

import matplotlib.pyplot as plt
df = pd.DataFrame({'姓名':['張三', '李四', '王五'], '語文':[88, 80, 70], '數(shù)學':[100, 95, 90], '外語':[95, 98, 99]})
df.set_index(keys='姓名', inplace=True)

fig, axes = plt.subplots(2,2,figsize=(15,8))
ax1, ax2, ax3, ax4 = axes.flatten()
ax1.plot(df['數(shù)學'])
ax2.bar([1,2,3], df['數(shù)學'])
ax3.scatter([1,2,3], df['數(shù)學'])
ax4.pie(df['數(shù)學'])
plt.show()

image.png

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

python機器學習（七）數(shù)據(jù)分析利器，NumPy，pandas與Matplotlib

python機器學習（七）數(shù)據(jù)分析利器，NumPy，pandas與Matplotlib

一、Numpy

1.1、數(shù)組的概念

1.2、數(shù)據(jù)list和列表ndarray的區(qū)別

二、pandas

三、Matplotlib

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

python機器學習（七）數(shù)據(jù)分析利器，NumPy，pandas與Matplotlib

一、Numpy

1.1、數(shù)組的概念

1.2、數(shù)據(jù)list和列表ndarray的區(qū)別

二、pandas

三、Matplotlib

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

python機器學習（七）數(shù)據(jù)分析利器，NumPy，pandas與Matplotlib

1.1、數(shù)組的概念

1.2、數(shù)據(jù)list和列表ndarray的區(qū)別

二、pandas

三、Matplotlib