percentile:它是一個(gè)可選參數(shù), 它是一個(gè)列表, 如數(shù)字的數(shù)據(jù)類型, 應(yīng)在0到1之間。其默認(rèn)值為[.25, .5, .75], 它返回第25、50和75個(gè)百分位數(shù)。
include:它也是一個(gè)可選參數(shù), 在描述DataFrame時(shí)包括數(shù)據(jù)類型列表。其默認(rèn)值為無。
exclude:它也是一個(gè)可選參數(shù), 在描述DataFrame時(shí)不包括數(shù)據(jù)類型列表。其默認(rèn)值為無。
用法:DataFrame.describe(percentiles=None, include=None, exclude=None)
info = pd.DataFrame({'categorical': pd.Categorical(['s', 't', 'u']),
'numeric': [1, 2, 3], 'object': ['p', 'q', 'r']})
print(info.describe(),'\n')
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
print(info.describe(include='all'),'\n')
categorical numeric object
count 3 3.0 3
unique 3 NaN 3
top u NaN p
freq 1 NaN 1
mean NaN 2.0 NaN
std NaN 1.0 NaN
min NaN 1.0 NaN
25% NaN 1.5 NaN
50% NaN 2.0 NaN
75% NaN 2.5 NaN
max NaN 3.0 NaN
print(info.numeric.describe(),'\n')
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Name: numeric, dtype: float64
print(info.describe(include=[np.number]),'\n')
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
print(info.describe(include=[np.object]),'\n')
object
count 3
unique 3
top p
freq 1
print(info.describe(include=['category']),'\n')
categorical
count 3
unique 3
top u
freq 1
print(info.describe(exclude=[np.number]),'\n')
categorical object
count 3 3
unique 3 3
top u p
freq 1 1
print(info.describe(exclude=[np.object]),'\n')
categorical numeric
count 3 3.0
unique 3 NaN
top u NaN
freq 1 NaN
mean NaN 2.0
std NaN 1.0
min NaN 1.0
25% NaN 1.5
50% NaN 2.0
75% NaN 2.5
max NaN 3.0
pandas.loc函數(shù)理解及用法
>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=['cobra', 'viper', 'sidewinder'],
... columns=['max_speed', 'shield'])
>>> df
max_speed shield
cobra 1 2
viper 4 5
sidewinder 7 8
Single label. Note this returns the row as a Series.
取出某列
>>> df.loc['viper']
max_speed 4
shield 5
Name: viper, dtype: int64
List of labels. Note using ``[[]]`` returns a DataFrame.
用雙[[ ]]取出數(shù)據(jù)框
>>> df.loc[['viper', 'sidewinder']]
max_speed shield
viper 4 5
sidewinder 7 8
Single label for row and column
用行/列標(biāo)簽取某個(gè)元素
>>> df.loc['cobra', 'shield']
2
Slice with labels for row and single label for column. As mentioned
above, note that both the start and stop of the slice are included
多行標(biāo)簽,單列,注意是一個(gè)閉區(qū)間
>>> df.loc['cobra':'viper', 'max_speed']
cobra 1
viper 4
Name: max_speed, dtype: int64
Boolean list with the same length as the row axis
用跟行數(shù)相等長(zhǎng)度的布爾值,來表示該行是否要取用
>>> df.loc[[False, False, True]]
max_speed shield
sidewinder 7 8
Conditional that returns a boolean Series
設(shè)定條件的返回
>>> df.loc[df['shield'] > 6]
max_speed shield
sidewinder 7 8
Conditional that returns a boolean Series with column labels specified
>>> df.loc[df['shield'] > 6, ['max_speed']]
max_speed
sidewinder 7
Callable that returns a boolean Series
用可調(diào)用的方法返回的布爾序列來取用數(shù)據(jù)
>>> df.loc[lambda df: df['shield'] == 8]
max_speed shield
sidewinder 7 8
**Setting values**
Set value for all items matching the list of labels
對(duì)能匹配標(biāo)簽的的項(xiàng)設(shè)定值
>>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df
max_speed shield
cobra 1 2
viper 4 50
sidewinder 7 50
Set value for an entire row
對(duì)整行設(shè)值
>>> df.loc['cobra'] = 10
>>> df
max_speed shield
cobra 10 10
viper 4 50
sidewinder 7 50
Set value for an entire column
對(duì)全列設(shè)值,注意要在逗號(hào)后,因?yàn)槎禾?hào)前表示要設(shè)定的行的范圍
>>> df.loc[:, 'max_speed'] = 30
>>> df
max_speed shield
cobra 30 10
viper 30 50
sidewinder 30 50
Set value for rows matching callable condition
對(duì)滿足返回值的條件的行設(shè)定值
>>> df.loc[df['shield'] > 35] = 0
>>> df
max_speed shield
cobra 30 10
viper 0 0
sidewinder 0 0
**Getting values on a DataFrame with an index that has integer labels**
Another example using integers for the index
數(shù)字索引
>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=[7, 8, 9], columns=['max_speed', 'shield'])
>>> df
max_speed shield
7 1 2
8 4 5
9 7 8
Slice with integer labels for rows. As mentioned above, note that both
the start and stop of the slice are included.
>>> df.loc[7:9]
max_speed shield
7 1 2
8 4 5
9 7 8
**Getting values with a MultiIndex**
用多項(xiàng)索引獲值
A number of examples using a DataFrame with a MultiIndex
>>> tuples = [
... ('cobra', 'mark i'), ('cobra', 'mark ii'),
... ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
... ('viper', 'mark ii'), ('viper', 'mark iii')
... ]
>>> index = pd.MultiIndex.from_tuples(tuples)
>>> values = [[12, 2], [0, 4], [10, 20],
... [1, 4], [7, 1], [16, 36]]
>>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
>>> df
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
mark iii 16 36
Single label. Note this returns a DataFrame with a single index.
>>> df.loc['cobra']
max_speed shield
mark i 12 2
mark ii 0 4
Single index tuple. Note this returns a Series.
元組索引,返回序列
>>> df.loc[('cobra', 'mark ii')]
max_speed 0
shield 4
Name: (cobra, mark ii), dtype: int64
Single label for row and column. Similar to passing in a tuple, this
returns a Series.
單個(gè)索引,返回序列
>>> df.loc['cobra', 'mark i']
max_speed 12
shield 2
Name: (cobra, mark i), dtype: int64
Single tuple. Note using ``[[]]`` returns a DataFrame.
返回?cái)?shù)據(jù)框
>>> df.loc[[('cobra', 'mark ii')]]
max_speed shield
cobra mark ii 0 4
Single tuple for the index with a single label for the column
一個(gè)元組索引和一個(gè)標(biāo)簽,返回某個(gè)元素值
>>> df.loc[('cobra', 'mark i'), 'shield']
2
Slice from index tuple to single label
索引切片,返回?cái)?shù)據(jù)框
>>> df.loc[('cobra', 'mark i'):'viper']
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
mark iii 16 36
Slice from index tuple to index tuple
元組索引:元素索引的切片,返回值同上一個(gè)
>>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
數(shù)據(jù)及解析源自官方文檔