Softmax Classifier

Softmax Classifier

softmax分類器和logistics regression有點(diǎn)像,softmax其實(shí)就是從logistics發(fā)張過(guò)來(lái)的。由于是多分類了,需要走更多的概率來(lái)表示每一個(gè)分類。softmax的公式:P(y = j) = \frac{e^{\theta^Tx_j}}{\sum_ie^{\theta^Tx_i}}
問(wèn)題來(lái)了,為什么不直接求max?而是繞這么大的一圈最后還是求最大值。①我們需要的其實(shí)就是max,但是這個(gè)max有一個(gè)缺點(diǎn),就是不可導(dǎo)。所以我們需要一個(gè)函數(shù)來(lái)模擬max,exp是指數(shù)函數(shù),數(shù)值大的增長(zhǎng)的速度就會(huì)更塊,這樣就可以把最大的區(qū)分出來(lái)。同時(shí)也是可導(dǎo)的,這樣設(shè)計(jì)也可以使得特征對(duì)概率的影響是乘性的。②softmax是從logistics發(fā)展過(guò)來(lái)的,自然就用到了交叉熵?fù)p失函數(shù),L = \sum_kt_klogP(y=k),目標(biāo)類t_k=1其他的都是0,這個(gè)時(shí)候求導(dǎo),\frac{\delta L}{\delta \theta_i} = P(y=i)-t_i,這個(gè)形式非常簡(jiǎn)潔,而且與線性回歸(采用最小均方誤差目標(biāo)函數(shù))、兩類分類(采用cross-entropy目標(biāo)函數(shù))時(shí)的形式一致。
主要實(shí)現(xiàn)流程:
首先就是exp的歸一化操作,得到當(dāng)前樣本屬于每一個(gè)類別的概率,P(y = j) = \frac{e^{\theta^Tx_j}}{\sum_ie^{\theta^Tx_i}}
然后就是求對(duì)數(shù)化求cost function。L = \sum_kt_klogP(y=k)
求導(dǎo)操作:
\nabla\theta_jJ(\theta) = -\frac{1}{m}\sum_{i=1}^m[\nabla\theta_i\sum_{j=1}^kI\{y^i=j\}log\frac{e^{\theta_j^Tx^i}}{\sum_ke^{\theta_k^Tx^k}}]
=-\frac{1}{m}\sum_{i=1}^m[I\{y^i=j\}\frac{\sum_{l=1}^ke^{\theta_l^T}x^i}{e^{\theta_j^Tx^i}}*\frac{e^{\theta_j^T}x^i*x^i*\sum_{l=1}^ke^{\theta_l^Tx^i}-e^{\theta_j^Tx^i}*x^i*e^{\theta_j^T}x^i}{(\sum_{l=1}^ke^{\theta_l^T}x^i)^2}]
=-\frac{1}{m}\sum_{i=1}^m[I\{y^i=j\}x^i*(I\{y^i=j\}-P(y^i=j|x^i;\theta))]

Softmax里的參數(shù)特點(diǎn)

P(y^i=j|x^i;\theta)=\frac{e^{(\theta_j-φ)^Tx^i}}{\sum_{l=1}^ke^{(\theta_l-φ)^Tx^i}}
=\frac{e^{\theta_j^T}x^i*e^{-φ^Tx^i}}{\sum_{l=1}^ke^{\theta_l^T}x^i*e^{-φ^Tx^i}}
=\frac{e^{(\theta_j)^Tx^i}}{\sum_{l=1}^ke^{(\theta_l)^Tx^i}}
所以可以看出,最優(yōu)參數(shù)\theta減去一些向量φ對(duì)預(yù)測(cè)結(jié)果是沒(méi)有什么影響的,也就是說(shuō)在模型里面,是有多組的最優(yōu)解,因?yàn)棣盏牟煌鸵馕吨煌慕猓諏?duì)于結(jié)果又是沒(méi)有影響的,所以就存在多組解的可能。

Softmax和logistics的關(guān)系

h_{\theta}(x) = \frac{1}{e^{(\theta_1-φ)^Tx}+e^{(\theta_2-φ)^Tx}}[e^{(\theta_1-φ)^Tx},e^{(\theta_2-φ)^Tx}]^T
if\quad φ=\theta_1:
=[\frac{1}{1+e^{\theta^Tx}},1-\frac{1}{1+e^{\theta^Tx}}]
所以說(shuō)softmax是logistics的一種擴(kuò)展,回到二分類,softmax也是一樣的,都是用的cross-entropy。

代碼實(shí)現(xiàn)

使用手寫數(shù)字識(shí)別的數(shù)據(jù)集:

class DataPrecessing(object):
    def loadFile(self):
        (x_train, x_target_tarin), (x_test, x_target_test) = mnist.load_data()
        x_train = x_train.astype('float32')/255.0
        x_test = x_test.astype('float32')/255.0
        x_train = x_train.reshape(len(x_train), np.prod(x_train.shape[1:]))
        x_test = x_test.reshape(len(x_test), np.prod(x_test.shape[1:]))
        x_train = np.mat(x_train)
        x_test = np.mat(x_test)
        x_target_tarin = np.mat(x_target_tarin)
        x_target_test = np.mat(x_target_test)
        return x_train, x_target_tarin, x_test, x_target_test

    def Calculate_accuracy(self, target, prediction):
        score = 0
        for i in range(len(target)):
            if target[i] == prediction[i]:
                score += 1
        return score/len(target)

    def predict(self, test, weights):
        h = test * weights
        return h.argmax(axis=1)

引入數(shù)據(jù)集,格式的轉(zhuǎn)換等等。


def gradientAscent(feature_data, label_data, k, maxCycle, alpha):
    '''train softmax model by gradientAscent
    input:feature_data(mat) feature
    label_data(mat) target
    k(int) number of classes
    maxCycle(int) max iterator
    alpha(float) learning rate
    '''
    Dataprecessing = DataPrecessing()
    x_train, x_target_tarin, x_test, x_target_test = Dataprecessing.loadFile()
    x_target_tarin = x_target_tarin.tolist()[0]
    x_target_test = x_target_test.tolist()[0]
    m, n = np.shape(feature_data)
    weights = np.mat(np.ones((n, k)))
    i = 0
    while i <= maxCycle:
        err = np.exp(feature_data*weights)
        if i % 100 == 0:
            print('cost score : ', cost(err, label_data))
            train_predict = Dataprecessing.predict(x_train, weights)
            test_predict = Dataprecessing.predict(x_test, weights)
            print('Train_accuracy : ', Dataprecessing.Calculate_accuracy(x_target_tarin, train_predict))
            print('Test_accuracy : ', Dataprecessing.Calculate_accuracy(x_target_test, test_predict))
        rowsum = -err.sum(axis = 1)
        rowsum = rowsum.repeat(k, axis = 1)
        err = err / rowsum
        for x in range(m):
            err[x, label_data[x]] += 1
        weights = weights + (alpha/m) * feature_data.T * err
        i += 1
    return weights

def cost(err, label_data):
    m = np.shape(err)[0]
    sum_cost = 0.0
    for i in range(m):
        if err[i, label_data[i]] / np.sum(err[i, :]) > 0:
            sum_cost -= np.log(err[i, label_data[i]] / np.sum(err[i, :]))
        else:
            sum_cost -= 0
    return sum_cost/m

實(shí)現(xiàn)其實(shí)還是比較簡(jiǎn)單的。

    Dataprecessing = DataPrecessing()
    x_train, x_target_tarin, x_test, x_target_test = Dataprecessing.loadFile()
    x_target_tarin = x_target_tarin.tolist()[0]
    gradientAscent(x_train, x_target_tarin, 10, 100000, 0.001)

運(yùn)行函數(shù)。

GitHub代碼https://github.com/GreenArrow2017/MachineLearning/tree/master/MachineLearning/Linear%20Model/LogosticRegression

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容