Python Pandas 使用[ ]進(jìn)行數(shù)據(jù)操作

Python Pandas 使用[ ]進(jìn)行數(shù)據(jù)操作

本文將介紹Pandas中“[ ]”的一些相關(guān)操作,如進(jìn)行數(shù)據(jù)選擇及更改。

“[ ]” 應(yīng)該是最基本的選擇數(shù)據(jù)的方法,下面是可以向其中傳入的類型:

  • 可以直接傳入column;
  • 也可以傳入column list;
  • 使用切片;
  • 使用布爾索引。

讀入數(shù)據(jù)

import pandas as pd
import numpy as np
import seaborn as sns
df
dates = pd.date_range('1/1/2020', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=list('ABCD'))
df

out:
    A   B   C   D
2020-01-01  0.336131    -0.086456   0.096903    -1.230599
2020-01-02  -0.106293   0.111821    1.165342    -1.378462
2020-01-03  -0.933779   0.898738    0.013194    -0.593243
2020-01-04  0.190229    -1.108908   0.597650    2.759475
2020-01-05  -0.647080   1.573537    1.357191    -0.536916
2020-01-06  -0.455373   1.342904    -0.316548   0.145119
2020-01-07  -1.350214   -0.044642   0.501508    1.969973
2020-01-08  -0.474602   -0.384916   1.829222    0.853519

傳入列表

傳入列表,并以列表順序讀取,返回 DataFrame對(duì)象。

df[['C','D']]

    C   D
2020-01-01  0.096903    -1.230599
2020-01-02  1.165342    -1.378462
2020-01-03  0.013194    -0.593243
2020-01-04  0.597650    2.759475
2020-01-05  1.357191    -0.536916
2020-01-06  -0.316548   0.145119
2020-01-07  0.501508    1.969973
2020-01-08  1.829222    0.853519

傳入單列

如果單獨(dú)傳入某一列,則返回series對(duì)象;如果傳入列表,則返回DataFrame對(duì)象,即使列表的長度為1.

df['C']

out:
2020-01-01    0.096903
2020-01-02    1.165342
2020-01-03    0.013194
2020-01-04    0.597650
2020-01-05    1.357191
2020-01-06   -0.316548
2020-01-07    0.501508
2020-01-08    1.829222
Freq: D, Name: C, dtype: float64
df[['C']]

out:
2020-01-01  0.096903
2020-01-02  1.165342
2020-01-03  0.013194
2020-01-04  0.597650
2020-01-05  1.357191
2020-01-06  -0.316548
2020-01-07  0.501508
2020-01-08  1.829222

可以用來交換列值。

df[['A','B']] = df[['B','A']]
df

out:
    A   B   C   D
2020-01-01  -0.086456   0.336131    0.096903    -1.230599
2020-01-02  0.111821    -0.106293   1.165342    -1.378462
2020-01-03  0.898738    -0.933779   0.013194    -0.593243
2020-01-04  -1.108908   0.190229    0.597650    2.759475
2020-01-05  1.573537    -0.647080   1.357191    -0.536916
2020-01-06  1.342904    -0.455373   -0.316548   0.145119
2020-01-07  -0.044642   -1.350214   0.501508    1.969973
2020-01-08  -0.384916   -0.474602   1.829222    0.853519

如下所示是另一種交換子集的方法。

df.loc[:, ['A', 'B']] = df[['B', 'A']]
df.loc[:, ['A', 'B']] = df[['B', 'A']]
df

out:
    A   B   C   D
2020-01-01  -0.086456   0.336131    0.096903    -1.230599
2020-01-02  0.111821    -0.106293   1.165342    -1.378462
2020-01-03  0.898738    -0.933779   0.013194    -0.593243
2020-01-04  -1.108908   0.190229    0.597650    2.759475
2020-01-05  1.573537    -0.647080   1.357191    -0.536916
2020-01-06  1.342904    -0.455373   -0.316548   0.145119
2020-01-07  -0.044642   -1.350214   0.501508    1.969973
2020-01-08  -0.384916   -0.474602   1.829222    0.853519

上面的操作不會(huì)交換列值,交換列值需要使用值來交換。

df.loc[:, ['A', 'B']] = df[['B', 'A']].values
df

out:
A   B   C   D
2020-01-01  0.336131    -0.086456   0.096903    -1.230599
2020-01-02  -0.106293   0.111821    1.165342    -1.378462
2020-01-03  -0.933779   0.898738    0.013194    -0.593243
2020-01-04  0.190229    -1.108908   0.597650    2.759475
2020-01-05  -0.647080   1.573537    1.357191    -0.536916
2020-01-06  -0.455373   1.342904    -0.316548   0.145119
2020-01-07  -1.350214   -0.044642   0.501508    1.969973
2020-01-08  -0.474602   -0.384916   1.829222    0.853519

使用to_numpy()也可以進(jìn)行交換。

df.loc[:, ['A', 'B']] = df[['B', 'A']].to_numpy()
df

out:
A   B   C   D
2020-01-01  -0.086456   0.336131    0.096903    -1.230599
2020-01-02  0.111821    -0.106293   1.165342    -1.378462
2020-01-03  0.898738    -0.933779   0.013194    -0.593243
2020-01-04  -1.108908   0.190229    0.597650    2.759475
2020-01-05  1.573537    -0.647080   1.357191    -0.536916
2020-01-06  1.342904    -0.455373   -0.316548   0.145119
2020-01-07  -0.044642   -1.350214   0.501508    1.969973
2020-01-08  -0.384916   -0.474602   1.829222    0.853519

使用切片

獲取前兩行數(shù)據(jù)

df[:2]

out:
A   B   C   D
2020-01-01  -0.086456   0.336131    0.096903    -1.230599
2020-01-02  1.000000    2.000000    5.000000    6.000000

設(shè)置步長

df[::2]

out:
A   B   C   D
2020-01-01  -0.086456   0.336131    0.096903    -1.230599
2020-01-03  0.898738    -0.933779   0.013194    -0.593243
2020-01-05  1.573537    -0.647080   1.357191    -0.536916
2020-01-07  -0.044642   -1.350214   0.501508    1.969973
df[1::2]

out:
A   B   C   D
2020-01-02  4.000000    5.000000    6.000000    7.000000
2020-01-04  -1.108908   0.190229    0.597650    2.759475
2020-01-06  1.342904    -0.455373   -0.316548   0.145119
2020-01-08  -0.384916   -0.474602   1.829222    0.853519

將數(shù)據(jù)逆序排列

df[::-1]

out:
A   B   C   D
2020-01-08  -0.384916   -0.474602   1.829222    0.853519
2020-01-07  -0.044642   -1.350214   0.501508    1.969973
2020-01-06  1.342904    -0.455373   -0.316548   0.145119
2020-01-05  1.573537    -0.647080   1.357191    -0.536916
2020-01-04  -1.108908   0.190229    0.597650    2.759475
2020-01-03  0.898738    -0.933779   0.013194    -0.593243
2020-01-02  1.000000    2.000000    5.000000    6.000000
2020-01-01  -0.086456   0.336131    0.096903    -1.230599

使用切片進(jìn)行賦值

df[:2] = np.arange(8).reshape(2,4)
df

out:
A   B   C   D
2020-01-01  0.000000    1.000000    2.000000    3.000000
2020-01-02  4.000000    5.000000    6.000000    7.000000
2020-01-03  0.898738    -0.933779   0.013194    -0.593243
2020-01-04  -1.108908   0.190229    0.597650    2.759475
2020-01-05  1.573537    -0.647080   1.357191    -0.536916
2020-01-06  1.342904    -0.455373   -0.316548   0.145119
2020-01-07  -0.044642   -1.350214   0.501508    1.969973
2020-01-08  -0.384916   -0.474602   1.829222    0.853519

使用布爾索引

df = pd.DataFrame(np.random.randn(8,4),index=dates,columns=list('abcd'))
df

out:
a   b   c   d
2020-01-01  -1.749988   -0.249398   -1.165277   -0.806687
2020-01-02  0.026334    0.158118    0.341183    -1.042534
2020-01-03  0.513027    -0.127235   -0.454433   -0.162600
2020-01-04  1.719313    -1.417885   0.267647    -0.960537
2020-01-05  -0.259797   -0.851702   -0.873451   -0.476420
2020-01-06  -0.048619   -0.690095   0.759120    1.184295
2020-01-07  -0.748535   -1.252718   0.386220    -0.415996
2020-01-08  -0.497471   -0.550428   -0.867333   -0.109223
mask = df['a'] > 0
mask

out:
2020-01-01    False
2020-01-02     True
2020-01-03     True
2020-01-04     True
2020-01-05    False
2020-01-06    False
2020-01-07    False
2020-01-08    False
Freq: D, Name: a, dtype: bool
df[mask]

out:
a   b   c   d
2020-01-02  0.026334    0.158118    0.341183    -1.042534
2020-01-03  0.513027    -0.127235   -0.454433   -0.162600
2020-01-04  1.719313    -1.417885   0.267647    -0.960537

多條件

df[mask & mask2]
mask2 = df['b'] < 0
?
df[mask & mask2]

out:
a   b   c   d
2020-01-03  0.513027    -0.127235   -0.454433   -0.162600
2020-01-04  1.719313    -1.417885   0.267647    -0.960537

使用布爾索引更改數(shù)據(jù)

df[mask & mask2] = np.arange(8).reshape(2,4)
df

out:
a   b   c   d
2020-01-01  -1.749988   -0.249398   -1.165277   -0.806687
2020-01-02  0.026334    0.158118    0.341183    -1.042534
2020-01-03  0.000000    1.000000    2.000000    3.000000
2020-01-04  4.000000    5.000000    6.000000    7.000000
2020-01-05  -0.259797   -0.851702   -0.873451   -0.476420
2020-01-06  -0.048619   -0.690095   0.759120    1.184295
2020-01-07  -0.748535   -1.252718   0.386220    -0.415996
2020-01-08  -0.497471   -0.550428   -0.867333   -0.109223
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容