transform 轉(zhuǎn)換

轉(zhuǎn)換與聚合成單個值的計算不同，數(shù)據(jù)轉(zhuǎn)換后數(shù)量不會變，比如標(biāo)準(zhǔn)化，只是在不同的類中進行標(biāo)準(zhǔn)化。

import pandas as pd
df = pd.read_csv('data/gapminder.tsv', sep='\t')

def my_zscore(x):
    return ((x - x.mean())/x.std())

transform_z = df.groupby('year').lifeExp.transform(my_zscore)
print(transform_z.shape) # (1704,)
print(df.shape) # (1704, 6)

對比分組標(biāo)準(zhǔn)化和不分組標(biāo)準(zhǔn)化，兩個分組標(biāo)準(zhǔn)化結(jié)果類似，但不分組區(qū)別很大

from scipy.stats import zscore

sp_z_grouped = df.groupby('year').lifeExp.transform(zscore)
sp_z_nogroup = zscore(df.lifeExp)

print(transform_z.head())
'''
0   -1.656854
1   -1.731249
2   -1.786543
3   -1.848157
4   -1.894173
Name: lifeExp, dtype: float64
'''

print(sp_z_grouped.head())
'''
0   -1.662719
1   -1.737377
2   -1.792867
3   -1.854699
4   -1.900878
Name: lifeExp, dtype: float64
'''

print(sp_z_nogroup[:5])
# [-2.37533395 -2.25677417 -2.1278375  -1.97117751 -1.81103275]

以缺失值填充為例，用組內(nèi)平均值代替，而不是整個數(shù)據(jù)的平均值。比如男性和女性的消費能力不同，區(qū)分男女計算平均值代替缺失值更加合理。

import seaborn as sns
import numpy as np

np.random.seed(42)
# 取出10個樣本
tips_10 = sns.load_dataset('tips').sample(10)
# 隨機將四個樣本的'total_bill'值改成缺失值
tips_10.loc[np.random.permutation(tips_10.index)[:4], 'total_bill'] = np.NaN
print(tips_10)
'''
     total_bill   tip     sex smoker   day    time  size
24        19.82  3.18    Male     No   Sat  Dinner     2
6          8.77  2.00    Male     No   Sun  Dinner     2
153         NaN  2.00    Male     No   Sun  Dinner     4
211         NaN  5.16    Male    Yes   Sat  Dinner     4
198         NaN  2.00  Female    Yes  Thur   Lunch     2
176         NaN  2.00    Male    Yes   Sun  Dinner     2
192       28.44  2.56    Male    Yes  Thur   Lunch     2
124       12.48  2.52  Female     No  Thur   Lunch     2
9         14.78  3.23    Male     No   Sun  Dinner     2
101       15.38  3.00  Female    Yes   Fri  Dinner     2
'''

# 按sex統(tǒng)計缺失值的數(shù)量，Male3個，F(xiàn)emale1個
count_sex = tips_10.groupby('sex').count()
print(count_sex)
'''
        total_bill  tip  smoker  day  time  size
sex                                             
Male             4    7       7    7     7     7
Female           2    3       3    3     3     3
'''

# 返回給定向量的平均值
def fill_na_mean(x):
    avg = x.mean()
    return (x.fillna(avg))

total_bill_group_mean = tips_10.groupby('sex').total_bill.transform(fill_na_mean)
tips_10['fill_total_bill'] = total_bill_group_mean
print(tips_10)
'''
     total_bill   tip     sex smoker   day    time  size  fill_total_bill
24        19.82  3.18    Male     No   Sat  Dinner     2          19.8200
6          8.77  2.00    Male     No   Sun  Dinner     2           8.7700
153         NaN  2.00    Male     No   Sun  Dinner     4          17.9525
211         NaN  5.16    Male    Yes   Sat  Dinner     4          17.9525
198         NaN  2.00  Female    Yes  Thur   Lunch     2          13.9300
176         NaN  2.00    Male    Yes   Sun  Dinner     2          17.9525
192       28.44  2.56    Male    Yes  Thur   Lunch     2          28.4400
124       12.48  2.52  Female     No  Thur   Lunch     2          12.4800
9         14.78  3.23    Male     No   Sun  Dinner     2          14.7800
101       15.38  3.00  Female    Yes   Fri  Dinner     2          15.3800
'''

filter 過濾器

import pandas as pd
import seaborn as sns

tips = sns.load_dataset('tips')
print(tips.shape) # (244, 7)

print(tips['size'].value_counts())
'''
2    156
3     38
4     37
5      5
6      4
1      4
Name: size, dtype: int64
'''

輸出結(jié)果顯示，人數(shù)為1、5和6的情況不常見，需要過濾掉這些數(shù)據(jù)，要求每組數(shù)量要超過30

tips_filtered = tips.groupby('size').filter(lambda x: x['size'].count() >= 30)
print(tips_filtered.shape) # (231, 7)
print(tips_filtered['size'].value_counts())
'''
(231, 7)
2    156
3     38
4     37
Name: size, dtype: int64
'''

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Pandas - 10.2 轉(zhuǎn)換與過濾

Pandas - 10.2 轉(zhuǎn)換與過濾

transform 轉(zhuǎn)換

filter 過濾器

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Pandas - 10.2 轉(zhuǎn)換與過濾

transform 轉(zhuǎn)換

filter 過濾器

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av