?torch.optim is a package implementing various optimization algorithms. Most commonly used methods are already supported, and the interface is general enough, so that more sophisticated ones can be also easily integrated in the future..
更多可以查看官網(wǎng) :
> PyTorch 官網(wǎng)
載入數(shù)據(jù)
編造一些偽數(shù)據(jù)用于實(shí)驗(yàn).
import torch
import torch.utils.data as Data
import torch.nn.functional as F
from torch.autograd import Variable
import matplotlib.pyplot as plt
torch.manual_seed(1) # reproducible
LR = 0.01
BATCH_SIZE = 32
EPOCH = 12
# fake dataset
x = torch.unsqueeze(torch.linspace(-1, 1, 1000), dim=1)
y = x.pow(2) + 0.1*torch.normal(torch.zeros(*x.size()))
# plot dataset
plt.scatter(x.numpy(), y.numpy())
plt.show()
torch_dataset = Data.TensorDataset(data_tensor=x, target_tensor=y)
loader = Data.DataLoader(dataset=torch_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2,)
優(yōu)化神經(jīng)網(wǎng)絡(luò)
為了對(duì)比每一種優(yōu)化器, 我們給他們各自創(chuàng)建一個(gè)神經(jīng)網(wǎng)絡(luò), 但這個(gè)神經(jīng)網(wǎng)絡(luò)都來(lái)自同一個(gè) Net 形式.
# 默認(rèn)的 network 形式
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(1, 20) # hidden layer
self.predict = torch.nn.Linear(20, 1) # output layer
def forward(self, x):
x = F.relu(self.hidden(x)) # activation function for hidden layer
x = self.predict(x) # linear output
return x
# 為每個(gè)優(yōu)化器創(chuàng)建一個(gè) net
net_SGD = Net()
net_Momentum = Net()
net_RMSprop = Net()
net_Adam = Net()
nets = [net_SGD, net_Momentum, net_RMSprop, net_Adam]
Optimizer
接下來(lái)在創(chuàng)建不同的優(yōu)化器, 用來(lái)訓(xùn)練不同的網(wǎng)絡(luò). 并創(chuàng)建一個(gè) loss_func 用來(lái)計(jì)算誤差.
幾種常見(jiàn)的優(yōu)化器, SGD, Momentum, RMSprop, Adam.
# different optimizers
opt_SGD = torch.optim.SGD(net_SGD.parameters(), lr=LR)
opt_Momentum = torch.optim.SGD(net_Momentum.parameters(), lr=LR, momentum=0.8)
opt_RMSprop = torch.optim.RMSprop(net_RMSprop.parameters(), lr=LR, alpha=0.9)
opt_Adam = torch.optim.Adam(net_Adam.parameters(), lr=LR, betas=(0.9, 0.99))
optimizers = [opt_SGD, opt_Momentum, opt_RMSprop, opt_Adam]
loss_func = torch.nn.MSELoss()
losses_his = [[], [], [], []] # 記錄 training 時(shí)不同神經(jīng)網(wǎng)絡(luò)的 loss
結(jié)果
訓(xùn)練和 loss 畫(huà)圖.
for epoch in range(EPOCH):
print('Epoch: ', epoch)
for step, (batch_x, batch_y) in enumerate(loader):
b_x = Variable(batch_x) # 務(wù)必要用 Variable 包一下
b_y = Variable(batch_y)
# 對(duì)每個(gè)優(yōu)化器, 優(yōu)化屬于他的神經(jīng)網(wǎng)絡(luò)
for net, opt, l_his in zip(nets, optimizers, losses_his):
output = net(b_x) # get output for every net
loss = loss_func(output, b_y) # compute loss for every net
opt.zero_grad() # clear gradients for next train
loss.backward() # backpropagation, compute gradients
opt.step() # apply gradients
l_his.append(loss.data[0]) # loss recoder
SGD 是最普通的優(yōu)化器, 也可以說(shuō)沒(méi)有加速效果, 而 Momentum 是 SGD 的改良版, 它加入了動(dòng)量原則.
后面的 RMSprop 又是 Momentum 的升級(jí)版. 而 Adam 又是 RMSprop 的升級(jí)版.
不過(guò)從這個(gè)結(jié)果中看到, Adam 的效果似乎比 RMSprop 要差一點(diǎn). 所以說(shuō)并不是越先進(jìn)的優(yōu)化器, 結(jié)果越佳.
在試驗(yàn)中可以嘗試不同的優(yōu)化器, 找到那個(gè)最適合的數(shù)據(jù)/網(wǎng)絡(luò)的優(yōu)化器.