寫在前面
- 最近在學(xué)習(xí)NLP的課程,下面的代碼,基本來自我的NLP課程作業(yè),當(dāng)然大部分都是模仿老師寫的,使用Python完成,感興趣的可以去我的github上面查看:https://github.com/LiuPineapple/Learning-NLP/tree/master/Assignments/lesson-02
- 作者水平有限,如果有文章中有錯誤的地方,歡迎指正!如有侵權(quán),請聯(lián)系作者刪除。
Machine Learning--Gradient Descent(機器學(xué)習(xí)--梯度下降)
??機器學(xué)習(xí)是什么,不同的人可能給出不同的定義。我的理解是,使用算法讓機器從數(shù)據(jù)中學(xué)習(xí),進(jìn)而得到比人為設(shè)計更好的模型,去做某些諸如分類、預(yù)測的事情。
??這里,我們研究波士頓房價預(yù)測這一問題,來對機器學(xué)習(xí)做一個簡單的實踐。
from sklearn.datasets import load_boston
data = load_boston()
X, y = data['data'], data['target']
X[1]
array([2.7310e-02, 0.0000e+00, 7.0700e+00, 0.0000e+00, 4.6900e-01,
6.4210e+00, 7.8900e+01, 4.9671e+00, 2.0000e+00, 2.4200e+02,
1.7800e+01, 3.9690e+02, 9.1400e+00])
len(y)
506
len(X[:, 0])
506
X_rm = X[:, 5]
上段代碼中需要注意的地方有:
- y代表著不同房子的房價,X代表著房子的各種變量,如大小,犯罪率等??梢钥吹?,我們一共使用了506棟房子的數(shù)據(jù)。
- 為了簡單起見,我們僅僅研究X的第6個參數(shù)與房價的關(guān)系,所以需要把第六個變量在各個房子上的取值單獨拿出來為
X_rm。
??我們假設(shè)自變量與因變量之間是線性關(guān)系,即,
為未知參數(shù),定義
price()函數(shù),來計算給定自變量與參數(shù)值后的y值。我們的任務(wù)就是,找到一個合適的參數(shù)值,使得當(dāng)我們給定一個
,使用上式得到的預(yù)測值與真實值之間的差距盡可能的小。 如果我們能夠找到比較合適的
參數(shù)值,那么就有可能得到準(zhǔn)確率比較高的預(yù)測結(jié)果。
??那么我們?nèi)绾味x我們得到的預(yù)測值與真實值之間的差距呢?我們使用如下定義:

def price(rm, k, b):
"""f(x) = k * x + b"""
return k * rm + b
def loss(y, y_hat): # to evaluate the performance
return sum((y_i - y_hat_i)**2 for y_i, y_hat_i in zip(list(y), list(y_hat))) / len(list(y))
# 也可以使用numpy來更簡單的定義損失函數(shù)
import numpy as np
def loss(y,y_hat):
e = np.array(y)-np.array(y_hat)
return (e@e.T)/len(y)
上段代碼中需要注意的地方有:
- Python3 zip() 函數(shù) https://www.runoob.com/python3/python3-func-zip.html
??我們的任務(wù)就是,找到一個合適的參數(shù)值,使得loss盡可能小。那么按照機器學(xué)習(xí)的思想,我們要做的是先隨機生成一個
,然后通過數(shù)據(jù)去讓程序自動的去調(diào)整
,直到迭代多少次或者損失小于某個值。
Gradient Descent(梯度下降)
??我們可以看到,是確定的值,loss其實是以
為變量的函數(shù),我們求loss關(guān)于
的偏導(dǎo)數(shù)以及相應(yīng)代碼如下所示:

def partial_k(x, y, y_hat):
n = len(y)
gradient = 0
for x_i, y_i, y_hat_i in zip(list(x), list(y), list(y_hat)):
gradient += (y_i - y_hat_i) * x_i
return -2 / n * gradient
def partial_b(x, y, y_hat):
n = len(y)
gradient = 0
for y_i, y_hat_i in zip(list(y), list(y_hat)):
gradient += (y_i - y_hat_i)
return -2 / n * gradient
??我們在隨機得到后,計算loss以及l(fā)oss關(guān)于
的偏導(dǎo)數(shù),一般來說,隨機得到的
都會使得loss比較大,那么我們應(yīng)該怎么變化
,才能使得loss不斷減小呢?偏導(dǎo)數(shù)為我們提供了變化的方向,我們定義一個正的學(xué)習(xí)率
,在計算完偏導(dǎo)數(shù)后,我們對
的值做如下變化:
??得到新的后,我們帶回去計算loss,如果新的到的loss比之前的loss小,那么最小的loss就是新的到的loss,
也是比之前的
更為合適的取值,接下來再重復(fù)上述過程,直到重復(fù)了某個次數(shù)或者損失小于某個值。注意,
一定要同步更新,不能先更新
再用更新了的
去計算函數(shù)關(guān)于
的偏導(dǎo)數(shù)去更新
。代碼如下:
import random
trying_times = 2000
min_loss = float('inf')
current_k = random.random() * 200 - 100
current_b = random.random() * 200 - 100
learning_rate = 1e-04
for i in range(trying_times):
price_by_k_and_b = [price(r, current_k, current_b) for r in X_rm]
current_loss = loss(y, price_by_k_and_b)
if current_loss < min_loss: # performance became better
min_loss = current_loss
if i % 50 == 0:
print('When time is : {}, get best_k: {} best_b: {}, and the loss is: {}'.format(i, best_k, best_b, min_loss))
k_gradient = partial_k(X_rm, y, price_by_k_and_b)
b_gradient = partial_b(X_rm, y, price_by_k_and_b)
current_k = current_k + (-1 * k_gradient) * learning_rate
current_b = current_b + (-1 * b_gradient) * learning_rate
上段代碼中需要注意的地方有:
- Python中可以用如下方式表示正負(fù)無窮:
float("inf"), float("-inf"),利用 inf 做加、乘算術(shù)運算仍會得到 inf。除了inf外的其他數(shù)除以inf,會得到0。 - Python random() 函數(shù)。https://www.runoob.com/python/func-number-random.html。注意區(qū)分random模塊中的random和numpy模塊中的random。
- Python format 格式化函數(shù)。https://www.runoob.com/python/att-string-format.html
-
1e-04代表
??最后得到的結(jié)果如下所示:
When time is : 0, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 575.5349822522099
When time is : 50, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 277.9378161169662
When time is : 100, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 147.24895628021088
When time is : 150, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 89.8572545975801
When time is : 200, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 64.65372567052019
When time is : 250, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 53.58551239815359
When time is : 300, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 48.72477014152337
When time is : 350, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 46.59001559478237
When time is : 400, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 45.65236839246802
When time is : 450, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 45.24042644341104
When time is : 500, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 45.059346031766644
When time is : 550, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.97964764306714
When time is : 600, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.94447083305862
When time is : 650, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.928845550418174
When time is : 700, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.921806290539294
When time is : 750, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.918537593098634
When time is : 800, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91692476670531
When time is : 850, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91603915253814
When time is : 900, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91547293354079
When time is : 950, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91504701836891
When time is : 1000, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.914682759718445
When time is : 1050, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91434561990997
When time is : 1100, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.9140204318406
When time is : 1150, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91370053492356
When time is : 1200, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91338300417686
When time is : 1250, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91306655509527
When time is : 1300, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.912750623583214
When time is : 1350, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.912434961909526
When time is : 1400, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91211946127419
When time is : 1450, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91180407388745
When time is : 1500, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.9114887787528
When time is : 1550, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.9111735666393
When time is : 1600, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91085843348287
When time is : 1650, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.91054337748873
When time is : 1700, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.910228397858496
When time is : 1750, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.9099134942312
When time is : 1800, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.909598666438264
When time is : 1850, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.90928391439542
When time is : 1900, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.90896923805536
When time is : 1950, get best_k: 11.431551629413757 best_b: -49.52403584539048, and the loss is: 44.908654637387244
??一個簡單的機器學(xué)習(xí)--梯度下降模型就完成啦,當(dāng)然這其中還有很多問題,比如初始值的選取、學(xué)習(xí)率的選取等等,這些就是我們后面探討的內(nèi)容啦。
最后,歡迎大家訪問我的GitHub查看更多代碼:https://github.com/LiuPineapple
歡迎大家訪問我的簡書主頁查看更多文章:http://www.itdecent.cn/u/31e8349bd083