人妻久久狠狠,中文字精品码无卡一区,久久视频看看

案例目標(biāo)：使用邏輯回歸進(jìn)行信用卡欺詐行為的二分類預(yù)測

設(shè)計(jì)流程：

讀取csv信用卡使用信息數(shù)據(jù)集，繪制圖表，對數(shù)據(jù)進(jìn)行宏觀分析

對于與其他列向量差異較大的某些列向量進(jìn)行標(biāo)準(zhǔn)化處理

根據(jù)分析，對數(shù)據(jù)進(jìn)行下采樣處理

從下采樣后的數(shù)據(jù)拆分訓(xùn)練集和測試集

從訓(xùn)練集中繼續(xù)拆分出訓(xùn)練集和驗(yàn)證集

在構(gòu)建的邏輯回歸模型進(jìn)行訓(xùn)練和預(yù)測，進(jìn)行交叉驗(yàn)證后，找到最大召回率均值對應(yīng)的懲罰力度和模型

利用混淆矩陣分析模型在測試集中預(yù)測的結(jié)果

擴(kuò)展：過采樣操作

image.png

準(zhǔn)備

python >= 3.7
matplotlib >= 3.3.2
numpy >= 1.19.2
sklearn >= 0.23.2
pandas >= 1.1.3

首先我們讀取數(shù)據(jù)，使用matplotlib對Class特征(0:無欺詐行為)制作直方圖，直觀感受一下數(shù)據(jù)的樣子：

data = pd.read_csv('creditcard.csv')
#根據(jù)'Class'特征，把每種特征的數(shù)量進(jìn)行統(tǒng)計(jì)
count_classes = pd.value_counts(data['Class'],sort=True).sort_index()
#做出直方圖進(jìn)行分析
count_classes.plot(kind='bar')
plt.title('Fraud class histogram') #Fraud:欺詐 histogram：直方圖
plt.xlabel('Class')
plt.ylabel('Frequency')
plt.show()

效果如下：

信用卡異常情況分布

有沒有搞錯(cuò)？？這批數(shù)據(jù)里居然沒有異常信息？？那還玩?zhèn)€毛！
然而當(dāng)我放大標(biāo)簽為1的部分后：

局部放大

看來還是我想多了，那么我們老老實(shí)實(shí)繼續(xù)干活吧。
根據(jù)直方圖，我們可以很容易看出：
這批數(shù)據(jù)中，信用卡正常數(shù)據(jù)(Class==0)和異常數(shù)據(jù)(Class == 1)的數(shù)據(jù)占比存在巨大差異，通常這種數(shù)據(jù)不進(jìn)行預(yù)處理而直接用來模型制作簡直就是給自己徒增煩惱。因此我們需要讓數(shù)據(jù)的分配變得更加均衡，這里介紹兩種方式：

下采樣：

把特征中數(shù)據(jù)量過大的分類繼續(xù)進(jìn)行隨機(jī)抽樣，直到和樣本少的類別有相同數(shù)量的樣本，以此來達(dá)到樣本的均衡

過采樣：

把特征中樣本的少的類別通過某種數(shù)據(jù)生成策略，把樣本變得和另一類別的樣本一樣多

這里我們先使用下采樣。（之后單獨(dú)介紹過采樣）
我們觀察數(shù)據(jù)同時(shí)還會(huì)發(fā)現(xiàn)Amount特征的向量值和其他值的范圍差異巨大，我們對這一列數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化操作

Amount的列向量長的丑

#為了保證特征之間分布差異不大，對Amount特征進(jìn)行預(yù)處理，(歸一化，或者標(biāo)準(zhǔn)化)
data['normAmout'] = StandardScaler().fit_transform(data['Amount'].values.reshape(-1,1))
# 刪掉沒用到的特征(axis=0:刪除行，1：刪除列)
data = data.drop(['Time','Amount'],axis=1)

制作數(shù)據(jù)和label：

X = data.loc[:,data.columns != 'Class'] #取出除了class特征的其他全部特征數(shù)據(jù)
y = data.loc[:,data.columns == 'Class'] #取出class特征的數(shù)據(jù)

下采樣：

#統(tǒng)計(jì)Class標(biāo)記為1(標(biāo)識(shí)為欺詐)的數(shù)據(jù)個(gè)數(shù)(原理：data.Class==1返回全部數(shù)據(jù)并且讓數(shù)據(jù)里class特征值為1的數(shù)據(jù)標(biāo)記為True；然后把這些數(shù)據(jù)取出來)
number_records_fraud = len(data[data.Class == 1]) 
#拿到Class特征為1的數(shù)據(jù)的index特征值
fraud_indices = np.array(data[data.Class == 1].index) 

#拿到Class為0的index特征值
normal_indices = data[data.Class == 0].index 

#從class特征為0的index集中隨機(jī)抽和class為1同等數(shù)量的樣本（replace=False:無放回抽樣）
random_normal_indices = np.random.choice(normal_indices,number_records_fraud,replace=False) 
random_normal_indices = np.array(random_normal_indices)

#合并class為1和剛剛下采樣得到的class為0的index集
under_sample_indices = np.concatenate([fraud_indices,random_normal_indices])

#根據(jù)合并后的index集拿到合并后的全部數(shù)據(jù)
under_sample_data = data.iloc[under_sample_indices,:]

#取出除了class特征的其他全部特征數(shù)據(jù)
X_undersample = under_sample_data.loc[:,under_sample_data.columns != 'Class'] 
#取出class特征的數(shù)據(jù)
y_undersample = under_sample_data.loc[:,under_sample_data.columns == 'Class']

然后，我們按照3:7把原始數(shù)據(jù)集和下采樣數(shù)據(jù)集拆成測試集:訓(xùn)練集

#3份的測試集，7份的訓(xùn)練集，random_state=0代表每次洗牌(隨機(jī))拿到的數(shù)據(jù)都一樣
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=0)
X_train_undersample,X_test_undersample,y_train_undersample,y_test_undersample = train_test_split(X_undersample,y_undersample,test_size=0.3,random_state=0)

利用KFold函數(shù)，我們把訓(xùn)練數(shù)據(jù)打成5份來制作驗(yàn)證集

#構(gòu)建拆分器，拆5份，4:1的比例拆分出 訓(xùn)練集:驗(yàn)證集 ，身份一共變換5回
fold = KFold(5, shuffle=False) #shuffle:是否打亂

接下來，我們可以使用不同的懲罰力度，對二分類進(jìn)行L1正則化，這里我們的懲罰力度為：

c_param_range = [0.01, 0.1, 1, 10, 100]

構(gòu)建邏輯回歸圖，數(shù)據(jù)訓(xùn)練，預(yù)測，計(jì)算召回率，在一個(gè)交叉驗(yàn)證的for循環(huán)里一氣呵成：

recall_accs = []
# 切割數(shù)據(jù)
index = fold.split(y_train_data)
for iteration, indices in enumerate(index, start=1):

    # 建立模型，使用l1正則化
    lr = LogisticRegression(C = c_param, penalty = 'l1',solver='liblinear')
    # 訓(xùn)練模型
    lr.fit(x_train_data.iloc[indices[0], :], y_train_data.iloc[indices[0], :].values.ravel())
    # 預(yù)測
    y_pred_undersample = lr.predict(x_train_data.iloc[indices[1], :].values)

    # 計(jì)算召回率
    recall_acc = recall_score(y_train_data.iloc[indices[1], :].values, y_pred_undersample)
    recall_accs.append(recall_acc)
    print('Iteration ', iteration, ': recall score = ', recall_acc)

# 計(jì)算某一懲罰力度5次交叉驗(yàn)證的平均召回率
results_table.loc[j, 'Mean recall score'] = np.mean(recall_accs)
j += 1
print('')
print('Mean recall score ', np.mean(recall_accs))
print('')

TIPS

1.召回率recall:比如，一批人員，有10個(gè)病人，我檢測發(fā)現(xiàn)出來了2個(gè)，recall=2/10。使用召回率而不使用精度來評估模型的優(yōu)劣
2.TP:true positive:尋找10個(gè)病人，模型把10個(gè)病人判斷成10個(gè)病人TP=10(命中)
3.FP:false positive:尋找10個(gè)病人，模型把90個(gè)好人判斷成了病人FP=90（不該命中的部分命中了）
4.FN:false negative:尋找10個(gè)病人，模型把兩個(gè)病人當(dāng)成了好人，F(xiàn)N=2（該命中的卻不中）
5.TN:true negative:尋找10個(gè)病人，模型找到了真正的好人4人，TN=4（發(fā)現(xiàn)了真正不該命中的）
6.因此，recall = TP/(TP+FN),混淆矩陣可以映射出TP,FP,FN,TN四個(gè)指標(biāo)

這時(shí)我們得到了某一種懲罰力度下5次交叉驗(yàn)證的平均召回率。
然而我們還有其他4組懲罰力度喲，用一個(gè)for循環(huán)來搞定，循環(huán)結(jié)束后會(huì)拿到了5組平均召回率
找到最大的平均召回率相對應(yīng)的懲罰力度(在此方式里，我們并沒考慮到誤殺率FP)

best_c = results_table.loc[results_table['Mean recall score'].astype('float64').idxmax()]['C_parameter']

接下來，我們拿到了最佳的懲罰力度和模型。
那么來預(yù)測一波下采樣測試集吧！順便做個(gè)混淆矩陣，混淆矩陣制作代碼如下:

def plot_confusion_matrix(cm, classes,
                          title='Confusion matrix',
                          cmap='Blues'):
    """
    This function prints and plots the confusion matrix.
    """
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=0)
    plt.yticks(tick_marks, classes)

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

利用混淆矩陣展示預(yù)測結(jié)果：

#用之前訓(xùn)練的模型預(yù)測測試集
y_pred_undersample = lr.predict(X_test_undersample.values)

# 根據(jù)y真實(shí)值和y估計(jì)，來制作混淆矩陣
cnf_matrix = confusion_matrix(y_test_undersample,y_pred_undersample)
np.set_printoptions(precision=2) #設(shè)置浮點(diǎn)精度
#利用混淆矩陣計(jì)算召回率
print("Recall metric in the testing dataset: ", cnf_matrix[1,1]/(cnf_matrix[1,0]+cnf_matrix[1,1]))

# Plot non-normalized confusion matrix
class_names = [0,1]
plt.figure()
plot_confusion_matrix(cnf_matrix
                      , classes=class_names
                      , title='Confusion matrix')
plt.show()

結(jié)果如下

混淆矩陣

我們看到結(jié)果還是不做的，只有10張信用卡的欺詐行為沒有被檢測到，另外還有16張正常消費(fèi)的信用卡被誤認(rèn)為是欺詐。

調(diào)節(jié)sigmoid的閾值

如果大家知道二分類算法里是通過sigmoid函數(shù)來把數(shù)據(jù)區(qū)分成兩個(gè)大類的，默認(rèn)情況下，0.5為分界點(diǎn)(即閾值)，即概率大于0.5的歸類為1，小于0.5的歸類為0，那么如果我們通過調(diào)節(jié)這個(gè)值的標(biāo)準(zhǔn)，便可調(diào)節(jié)歸類的傾向程度。例如：概率大于0.7的才歸類為1，那么如何操作呢？

sigmod函數(shù)

先前，我們預(yù)測結(jié)果的函數(shù)，使用的lr.predict(X_test_undersample.values)來直接獲得最終標(biāo)簽結(jié)果，即某個(gè)概率非黑即白，如果想調(diào)節(jié)sigmoid的閾值，我們得獲得某個(gè)事物發(fā)生的具體概率概率，使用lr.predict_proba(X_test_undersample.values),此方法在二分類問題里，返回的是n行2列的矩陣，行為n多個(gè)事物，第一列是事件為假的概率，第二列是事件為真的概率，因此第m行的兩列相加必然為一。
然后：

#拿到的是全部數(shù)據(jù)的第一列的值,通過不等式得到由BOOL值組成的數(shù)組
y_test_predictions_high_recall = y_pred_undersample_proba[:,1] > threshold

以上是有下采樣操作的數(shù)據(jù)預(yù)測，下面來介紹一下過采樣操作

擴(kuò)展：過采樣騷操作

此操作就是根據(jù)樣本量少的那部分樣本的特征，隨機(jī)浮動(dòng)生成假數(shù)據(jù)

對于少數(shù)類中每一個(gè)樣本x，以歐氏距離為標(biāo)準(zhǔn)計(jì)算它到少數(shù)類樣本集中所有樣本的距離，得到其k近鄰。

根據(jù)樣本不平衡比例設(shè)置一個(gè)采樣比例以確定采樣倍率N，對于每一個(gè)少數(shù)類樣本x，從其k近鄰中隨機(jī)選擇若干個(gè)樣本，假設(shè)選擇的近鄰為xn。

對于每一個(gè)隨機(jī)選出的近鄰xn，分別與原樣本按照如下的公式構(gòu)建新的樣本。

smote算法

代碼實(shí)現(xiàn)：

from imblearn.over_sampling import SMOTE
oversampler=SMOTE(random_state=0)#random_state=0：每次生成一樣的數(shù)據(jù)
os_X,os_y=oversampler.fit_sample(X_train,y_train)
os_features = pd.DataFrame(os_X)
os_labels = pd.DataFrame(os_y)

此時(shí)的os_features,os_labels就包含了先前的原始數(shù)據(jù)和新生成是數(shù)據(jù)了！!
數(shù)據(jù)生成搞定！

完整代碼（不包含改變閾值和過采樣）

'''
信用卡欺詐分析
1.csv文件中class分類樣本數(shù)量差異過大，樣本不均衡的時(shí)候，采用“過采樣”或者“下采樣”
2.下采樣：把樣本多的類別繼續(xù)抽取樣本，直到和樣本少的類別有相同數(shù)量的樣本，以此來達(dá)到樣本的均衡
3.過采樣：把樣本的少的類別通過某種數(shù)據(jù)生成策略，把樣本變得和靈異類別的樣本一樣多
4.sklearn中，把[2,3]的矩陣reshape(-1,2)的意思是：在原來數(shù)量不變的基礎(chǔ)上(即:2x3=6)，根據(jù)第二維是2的策略，讓程序自動(dòng)填補(bǔ)-1處(這里是第一個(gè)維度)的數(shù)字，此例子生成后為[3,2]
5.召回率recall:比如，一批人員，有10個(gè)病人，我檢測發(fā)現(xiàn)出來了2個(gè)，recall=2/10。使用召回率而不使用精度來評估模型的優(yōu)劣
6.TP:true positive:尋找10個(gè)病人，模型把10個(gè)病人判斷成10個(gè)病人TP=10(命中)
7.FP:false positive:尋找10個(gè)病人，模型把90個(gè)好人判斷成了病人FP=90（不該命中的部分命中了）
8.FN:false negative:尋找10個(gè)病人，模型把兩個(gè)病人當(dāng)成了好人，F(xiàn)N=2（該命中的卻不中）
9.TN:true negative:尋找10個(gè)病人，模型找到了真正的好人4人，TN=4（發(fā)現(xiàn)了真正不該命中的）
10.因此，recall = TP/(TP+FN),混淆矩陣可以映射出TP,FP,FN,TN四個(gè)指標(biāo)
11.自己補(bǔ)充一下，L1正則化和L2正則化
12.可以認(rèn)為設(shè)定召回率和誤殺率來評估喲，可以通過sigmod的threshold
13，smote算法實(shí)現(xiàn)過采樣
14.過采樣的誤殺率很低，但是召回率也相對低了一些
15:iloc和loc差別：loc的參數(shù)代表index的號碼，iloc后面的參數(shù)代表第幾行
'''
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import KFold,cross_val_score
from sklearn.metrics import confusion_matrix,recall_score,classification_report
import itertools


data = pd.read_csv('creditcard.csv')
# #根據(jù)'Class'類別，把每種類型的數(shù)量進(jìn)行統(tǒng)計(jì)
# count_classes = pd.value_counts(data['Class'],sort=True).sort_index()
# #做出直方圖進(jìn)行分析
# count_classes.plot(kind='bar')
# plt.title('Fraud class histogram') #Fraud:欺詐 histogram：直方圖
# plt.xlabel('Class')
# plt.ylabel('Frequency')
# plt.show()

#通過分析發(fā)現(xiàn)：Class特征中，標(biāo)記為1和0的樣本數(shù)量極度不均衡，這里采用下采樣策略

#為了保證特征之間分布差異不大，對Amount特征進(jìn)行預(yù)處理，(歸一化，或者標(biāo)準(zhǔn)化)
data['normAmout'] = StandardScaler().fit_transform(data['Amount'].values.reshape(-1,1))
data = data.drop(['Time','Amount'],axis=1) # 刪掉沒用到的特征(axis=0:刪除行，1：刪除列)


X = data.loc[:,data.columns != 'Class'] #取出除了class特征的其他全部特征數(shù)據(jù)
y = data.loc[:,data.columns == 'Class'] #取出class特征的數(shù)據(jù)

number_records_fraud = len(data[data.Class == 1]) #統(tǒng)計(jì)Class標(biāo)記為1(標(biāo)識(shí)為欺詐)的數(shù)據(jù)個(gè)數(shù)(原理：data.Class==1返回全部數(shù)據(jù)并且讓數(shù)據(jù)里class特征值為1的數(shù)據(jù)標(biāo)記為True；然后把這些數(shù)據(jù)取出來)
fraud_indices = np.array(data[data.Class == 1].index) #拿到Class特征為1的數(shù)據(jù)的index特征值

normal_indices = data[data.Class == 0].index #拿到Class為0的index特征值

random_normal_indices = np.random.choice(normal_indices,number_records_fraud,replace=False) #從class特征為0的index集中隨機(jī)抽和class為1同等數(shù)量的樣本（replace=False:無放回抽樣）
random_normal_indices = np.array(random_normal_indices)

#合并class為1和剛剛下采樣得到的class為0的index集
under_sample_indices = np.concatenate([fraud_indices,random_normal_indices])

#根據(jù)合并后的index集拿到合并后的全部數(shù)據(jù)
under_sample_data = data.iloc[under_sample_indices,:]


X_undersample = under_sample_data.loc[:,under_sample_data.columns != 'Class'] #取出除了class特征的其他全部特征數(shù)據(jù)
y_undersample = under_sample_data.loc[:,under_sample_data.columns == 'Class'] #取出class特征的數(shù)據(jù)

#進(jìn)行交叉驗(yàn)證：數(shù)據(jù)切分成train和test，再把train平均切分成三份①②③，①+②->③,①<-②+③,①+③->②

#切分?jǐn)?shù)據(jù)測試集和訓(xùn)練集
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=0)#3份的測試集，7份的訓(xùn)練集，random_state=0代表每次洗牌(隨機(jī))拿到的數(shù)據(jù)都一樣
X_train_undersample,X_test_undersample,y_train_undersample,y_test_undersample = train_test_split(X_undersample,y_undersample,test_size=0.3,random_state=0)


def printing_Kfold_scores(x_train_data, y_train_data):
    #構(gòu)建拆分器，拆5份，4:1的比例拆分出 訓(xùn)練集:驗(yàn)證集 ，身份一共變換5回
    fold = KFold(5, shuffle=False) #shuffle:是否打亂

    print(fold)

    # 5組懲罰力度
    c_param_range = [0.01, 0.1, 1, 10, 100]
    # 構(gòu)建結(jié)果集表結(jié)構(gòu)
    results_table = pd.DataFrame(index=range(len(c_param_range), 2), columns=['C_parameter', 'Mean recall score'])
    # 在結(jié)果集里把懲罰力度填進(jìn)去
    results_table['C_parameter'] = c_param_range

    # the k-fold will give 2 lists: train_indices = indices[0], test_indices = indices[1]
    j = 0
    for c_param in c_param_range: #用某一個(gè)懲罰力度
        print('-------------------------------------------')
        print('C parameter: ', c_param)
        print('-------------------------------------------')
        print('')

        recall_accs = []
        # 切割數(shù)據(jù)
        index = fold.split(y_train_data)
        for iteration, indices in enumerate(index, start=1):

            # 建立模型，使用l1正則化
            lr = LogisticRegression(C = c_param, penalty = 'l1',solver='liblinear')
            # 訓(xùn)練模型
            lr.fit(x_train_data.iloc[indices[0], :], y_train_data.iloc[indices[0], :].values.ravel())
            # 預(yù)測
            y_pred_undersample = lr.predict(x_train_data.iloc[indices[1], :].values)

            # 計(jì)算召回率
            recall_acc = recall_score(y_train_data.iloc[indices[1], :].values, y_pred_undersample)
            recall_accs.append(recall_acc)
            print('Iteration ', iteration, ': recall score = ', recall_acc)

        # 計(jì)算某一懲罰力度5次交叉驗(yàn)證的平均召回率
        results_table.loc[j, 'Mean recall score'] = np.mean(recall_accs)
        j += 1
        print('')
        print('Mean recall score ', np.mean(recall_accs))
        print('')



    print(results_table)
    best_c = results_table.loc[results_table['Mean recall score'].astype('float64').idxmax()]['C_parameter']

    # 找到5次懲罰力度中召回率均值最大的一次
    print('*********************************************************************************')
    print('Best model to choose from cross validation is with C parameter = ', best_c)
    print('*********************************************************************************')

    return best_c


best_c = printing_Kfold_scores(X_train_undersample,y_train_undersample)

def plot_confusion_matrix(cm, classes,
                          title='Confusion matrix',
                          cmap='Blues'):
    """
    This function prints and plots the confusion matrix.
    """
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=0)
    plt.yticks(tick_marks, classes)

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')



lr = LogisticRegression(C = best_c, penalty = 'l1',solver='liblinear')
lr.fit(X_train_undersample,y_train_undersample.values.ravel())
#用之前訓(xùn)練的模型預(yù)測測試集
y_pred_undersample = lr.predict(X_test_undersample.values)

# 根據(jù)y真實(shí)值和y估計(jì)，來制作混淆矩陣
cnf_matrix = confusion_matrix(y_test_undersample,y_pred_undersample)
np.set_printoptions(precision=2) #設(shè)置浮點(diǎn)精度
#利用混淆矩陣計(jì)算召回率
print("Recall metric in the testing dataset: ", cnf_matrix[1,1]/(cnf_matrix[1,0]+cnf_matrix[1,1]))

# Plot non-normalized confusion matrix
class_names = [0,1]
plt.figure()
plot_confusion_matrix(cnf_matrix
                      , classes=class_names
                      , title='Confusion matrix')
plt.show()

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

離散型隨機(jī)變量的二分類預(yù)測案例

離散型隨機(jī)變量的二分類預(yù)測案例

案例目標(biāo)：使用邏輯回歸進(jìn)行信用卡欺詐行為的二分類預(yù)測

設(shè)計(jì)流程：

準(zhǔn)備

下采樣：

過采樣：

TIPS

調(diào)節(jié)sigmoid的閾值

擴(kuò)展：過采樣騷操作

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

離散型隨機(jī)變量的二分類預(yù)測案例

案例目標(biāo)：使用邏輯回歸進(jìn)行信用卡欺詐行為的二分類預(yù)測

設(shè)計(jì)流程：

準(zhǔn)備

下采樣：

過采樣：

TIPS

調(diào)節(jié)sigmoid的閾值

擴(kuò)展：過采樣騷操作

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av