pandas中g(shù)roupby("x").count和groupby("x").size的區(qū)別

1、官方文檔
ndarray.size
Number of elements in the array.矩陣中元素的個(gè)數(shù)。

s = pd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.size
3

df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.size
4

2、size包括NaN值,count不包括:

In [46]:
df = pd.DataFrame({'a':[0,0,1,2,2,2], 'b':[1,2,3,4,np.NaN,4], 'c':np.random.randn(6)})
df

Out[46]:
   a   b         c
0  0   1  1.067627
1  0   2  0.554691
2  1   3  0.458084
3  2   4  0.426635
4  2 NaN -2.238091
5  2   4  1.256943

In [48]:
print(df.groupby(['a'])['b'].count())
print(df.groupby(['a'])['b'].size())

a
0    2
1    1
2    2
Name: b, dtype: int64

a
0    2
1    1
2    3
dtype: int64 

3、即使數(shù)據(jù)沒(méi)有NA值,count()的結(jié)果也更加冗長(zhǎng)

In [114]:
grouped = fec_mrbo.groupby(['cand_nm',labels])
grouped.size().unstack(0)
Out[114]:
cand_nm Obama, Barack   Romney, Mitt
contb_receipt_amt       
(0, 1]  493.0   77.0
(1, 10] 40070.0 3681.0
(10, 100]   372280.0    31853.0
(100, 1000] 153991.0    43357.0
(1000, 10000]   22284.0 26186.0
(10000, 100000] 2.0 1.0
(100000, 1000000]   3.0 NaN
(1000000, 10000000] 4.0 NaN

In [115]:
grouped = fec_mrbo.groupby(['cand_nm',labels])
grouped.count().unstack(0)
Out[115]:

cmte_id cand_id contbr_nm   contbr_city contbr_st   ... memo_cd memo_text   form_tp file_num    parties
cand_nm Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt    ... Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt
contb_receipt_amt                                                                                   
(0, 1]  493.0   77.0    493.0   77.0    493.0   77.0    493.0   77.0    493.0   77.0    ... 31.0    1.0 138.0   10.0    493.0   77.0    493.0   77.0    493.0   77.0
(1, 10] 40070.0 3681.0  40070.0 3681.0  40070.0 3681.0  40070.0 3681.0  40070.0 3681.0  ... 4645.0  14.0    4781.0  53.0    40070.0 3681.0  40070.0 3681.0  40070.0 3681.0
(10, 100]   372280.0    31853.0 372280.0    31853.0 372280.0    31853.0 372276.0    31853.0 372280.0    31853.0 ... 33331.0 74.0    33789.0 236.0   372280.0    31853.0 372280.0    31853.0 372280.0    31853.0
(100, 1000] 153991.0    43357.0 153991.0    43357.0 153991.0    43357.0 153991.0    43355.0 153987.0    43357.0 ... 31674.0 347.0   31897.0 849.0   153991.0    43357.0 153991.0    43357.0 153991.0    43357.0
(1000, 10000]   22284.0 26186.0 22284.0 26186.0 22284.0 26186.0 22284.0 26185.0 22284.0 26186.0 ... 16622.0 640.0   16693.0 2217.0  22284.0 26186.0 22284.0 26186.0 22284.0 26186.0
(10000, 100000] 2.0 1.0 2.0 1.0 2.0 1.0 2.0 1.0 2.0 1.0 ... 0.0 1.0 1.0 1.0 2.0 1.0 2.0 1.0 2.0 1.0
(100000, 1000000]   3.0 NaN 3.0 NaN 3.0 NaN 3.0 NaN 3.0 NaN ... 3.0 NaN 3.0 NaN 3.0 NaN 3.0 NaN 3.0 NaN
(1000000, 10000000] 4.0 NaN 4.0 NaN 4.0 NaN 4.0 NaN 4.0 NaN ... 4.0 NaN 4.0 NaN 4.0 NaN 4.0 NaN 4.0 NaN
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容