pandas describe 函數(shù)的參數(shù)理解及應(yīng)用

percentile:它是一個(gè)可選參數(shù), 它是一個(gè)列表, 如數(shù)字的數(shù)據(jù)類型, 應(yīng)在0到1之間。其默認(rèn)值為[.25, .5, .75], 它返回第25、50和75個(gè)百分位數(shù)。

include:它也是一個(gè)可選參數(shù), 在描述DataFrame時(shí)包括數(shù)據(jù)類型列表。其默認(rèn)值為無。

exclude:它也是一個(gè)可選參數(shù), 在描述DataFrame時(shí)不包括數(shù)據(jù)類型列表。其默認(rèn)值為無。

用法:DataFrame.describe(percentiles=None, include=None, exclude=None)
info = pd.DataFrame({'categorical': pd.Categorical(['s', 't', 'u']),
                   'numeric': [1, 2, 3], 'object': ['p', 'q', 'r']})
print(info.describe(),'\n')
          numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

print(info.describe(include='all'),'\n')       
           categorical  numeric object
count            3      3.0      3
unique           3      NaN      3
top              u      NaN      p
freq             1      NaN      1
mean           NaN      2.0    NaN
std            NaN      1.0    NaN
min            NaN      1.0    NaN
25%            NaN      1.5    NaN
50%            NaN      2.0    NaN
75%            NaN      2.5    NaN
max            NaN      3.0    NaN

print(info.numeric.describe(),'\n')
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
Name: numeric, dtype: float64

print(info.describe(include=[np.number]),'\n')       
          numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

print(info.describe(include=[np.object]),'\n')      
         object
count       3
unique      3
top         p
freq        1

print(info.describe(include=['category']),'\n')       
            categorical
count            3
unique           3
top              u
freq             1

print(info.describe(exclude=[np.number]),'\n')       
           categorical object
count            3      3
unique           3      3
top              u      p
freq             1      1

print(info.describe(exclude=[np.object]),'\n')       
           categorical  numeric
count            3      3.0
unique           3      NaN
top              u      NaN
freq             1      NaN
mean           NaN      2.0
std            NaN      1.0
min            NaN      1.0
25%            NaN      1.5
50%            NaN      2.0
75%            NaN      2.5
max            NaN      3.0

pandas.loc函數(shù)理解及用法

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=['cobra', 'viper', 'sidewinder'],
...      columns=['max_speed', 'shield'])
>>> df
            max_speed  shield
cobra               1       2
viper               4       5
sidewinder          7       8


Single label. Note this returns the row as a Series.
取出某列

>>> df.loc['viper']
max_speed    4
shield       5
Name: viper, dtype: int64


List of labels. Note using ``[[]]`` returns a DataFrame.
用雙[[ ]]取出數(shù)據(jù)框


>>> df.loc[['viper', 'sidewinder']]
            max_speed  shield
viper               4       5
sidewinder          7       8


Single label for row and column
用行/列標(biāo)簽取某個(gè)元素


>>> df.loc['cobra', 'shield']
2


Slice with labels for row and single label for column. As mentioned
above, note that both the start and stop of the slice are included
多行標(biāo)簽,單列,注意是一個(gè)閉區(qū)間

>>> df.loc['cobra':'viper', 'max_speed']
cobra    1
viper    4
Name: max_speed, dtype: int64


Boolean list with the same length as the row axis
用跟行數(shù)相等長(zhǎng)度的布爾值,來表示該行是否要取用

>>> df.loc[[False, False, True]]
            max_speed  shield
sidewinder          7       8


Conditional that returns a boolean Series
設(shè)定條件的返回

>>> df.loc[df['shield'] > 6]
            max_speed  shield
sidewinder          7       8


Conditional that returns a boolean Series with column labels specified


>>> df.loc[df['shield'] > 6, ['max_speed']]
            max_speed
sidewinder          7


Callable that returns a boolean Series
用可調(diào)用的方法返回的布爾序列來取用數(shù)據(jù)

>>> df.loc[lambda df: df['shield'] == 8]
            max_speed  shield
sidewinder          7       8


**Setting values**


Set value for all items matching the list of labels
對(duì)能匹配標(biāo)簽的的項(xiàng)設(shè)定值

>>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df
            max_speed  shield
cobra               1       2
viper               4      50
sidewinder          7      50


Set value for an entire row
對(duì)整行設(shè)值

>>> df.loc['cobra'] = 10
>>> df
            max_speed  shield
cobra              10      10
viper               4      50
sidewinder          7      50


Set value for an entire column
對(duì)全列設(shè)值,注意要在逗號(hào)后,因?yàn)槎禾?hào)前表示要設(shè)定的行的范圍

>>> df.loc[:, 'max_speed'] = 30
>>> df
            max_speed  shield
cobra              30      10
viper              30      50
sidewinder         30      50


Set value for rows matching callable condition
對(duì)滿足返回值的條件的行設(shè)定值

>>> df.loc[df['shield'] > 35] = 0
>>> df
            max_speed  shield
cobra              30      10
viper               0       0
sidewinder          0       0


**Getting values on a DataFrame with an index that has integer labels**


Another example using integers for the index
數(shù)字索引

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=[7, 8, 9], columns=['max_speed', 'shield'])
>>> df
   max_speed  shield
7          1       2
8          4       5
9          7       8


Slice with integer labels for rows. As mentioned above, note that both
the start and stop of the slice are included.


>>> df.loc[7:9]
   max_speed  shield
7          1       2
8          4       5
9          7       8


**Getting values with a MultiIndex**
用多項(xiàng)索引獲值

A number of examples using a DataFrame with a MultiIndex


>>> tuples = [
...    ('cobra', 'mark i'), ('cobra', 'mark ii'),
...    ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
...    ('viper', 'mark ii'), ('viper', 'mark iii')
... ]
>>> index = pd.MultiIndex.from_tuples(tuples)
>>> values = [[12, 2], [0, 4], [10, 20],
...         [1, 4], [7, 1], [16, 36]]
>>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
>>> df
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36


Single label. Note this returns a DataFrame with a single index.


>>> df.loc['cobra']
         max_speed  shield
mark i          12       2
mark ii          0       4


Single index tuple. Note this returns a Series.
元組索引,返回序列

>>> df.loc[('cobra', 'mark ii')]
max_speed    0
shield       4
Name: (cobra, mark ii), dtype: int64


Single label for row and column. Similar to passing in a tuple, this
returns a Series.
單個(gè)索引,返回序列

>>> df.loc['cobra', 'mark i']
max_speed    12
shield        2
Name: (cobra, mark i), dtype: int64


Single tuple. Note using ``[[]]`` returns a DataFrame.
返回?cái)?shù)據(jù)框

>>> df.loc[[('cobra', 'mark ii')]]
               max_speed  shield
cobra mark ii          0       4


Single tuple for the index with a single label for the column
一個(gè)元組索引和一個(gè)標(biāo)簽,返回某個(gè)元素值

>>> df.loc[('cobra', 'mark i'), 'shield']
2


Slice from index tuple to single label
索引切片,返回?cái)?shù)據(jù)框

>>> df.loc[('cobra', 'mark i'):'viper']
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36


Slice from index tuple to index tuple
元組索引:元素索引的切片,返回值同上一個(gè)

>>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
                    max_speed  shield
cobra      mark i          12       2
           mark ii          0       4
sidewinder mark i          10      20
           mark ii          1       4
viper      mark ii          7       1

數(shù)據(jù)及解析源自官方文檔

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容