layout: post
title: PyTorch 學(xué)習(xí)2
subtitle: LEARNING PYTORCH WITH EXAMPLES
date: 2020-08-10
author: Zhuoran Li
catalog: true
tags:
- PyTorch
這篇向?qū)Ы榻B了PyTorch的基本概念
PyTorch包含兩個(gè)主要的特征:
一個(gè)n維的Tensor,類(lèi)似于Numpy,但是能夠運(yùn)行在GPU
自動(dòng)求導(dǎo),用于建立和訓(xùn)練神經(jīng)網(wǎng)絡(luò)
我們將用一個(gè)全聯(lián)接的ReLU網(wǎng)絡(luò)作為我們的運(yùn)行示例。這個(gè)網(wǎng)絡(luò)有一個(gè)隱藏層,通過(guò)梯度下降法訓(xùn)練,通過(guò)最小化網(wǎng)絡(luò)輸出和真實(shí)輸出的歐式距離擬合數(shù)據(jù)。
Tensors
Warm-up: numpy 熱身
我們首先使用numpy實(shí)現(xiàn)一個(gè)網(wǎng)絡(luò)。
numpy提供一個(gè)n維數(shù)組以及許多操作數(shù)組的函數(shù)。它是一個(gè)科學(xué)計(jì)算的通用框架,但它不能用于圖計(jì)算、深度學(xué)習(xí)以及梯度。但是,我們可以用numpy擬合一個(gè)兩層的網(wǎng)絡(luò),該網(wǎng)絡(luò)基于隨機(jī)的數(shù)據(jù),通過(guò)numpy的操作手工實(shí)現(xiàn)網(wǎng)絡(luò)的前向和后向操作:
# -*- coding: utf-8 -*-
import numpy as np
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
# N 是 batch size;D_in 是輸入數(shù)據(jù)的數(shù)目
# H 是 隱藏層維度;D_out 是輸出維度
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random input and output data
# 隨機(jī)建立輸入和輸出數(shù)據(jù)
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)
# Randomly initialize weights
# 隨機(jī)初始化權(quán)重
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)
learning_rate = 1e-6
for t in range(500):
# Forward pass: compute predicted y
# 前向:預(yù)測(cè)y
h = x.dot(w1)
h_relu = np.maximum(h, 0)
y_pred = h_relu.dot(w2)
# Compute and print loss
# 計(jì)算并打印 loss
loss = np.square(y_pred - y).sum()
print(t, loss)
# Backprop to compute gradients of w1 and w2 with respect to loss
# 后向:計(jì)算loss對(duì)w1和w2的梯度
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.T.dot(grad_y_pred)
grad_h_relu = grad_y_pred.dot(w2.T)
grad_h = grad_h_relu.copy()
grad_h[h < 0] = 0
grad_w1 = x.T.dot(grad_h)
# Update weights
# 更新權(quán)重
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
PyTorch: Tensors
Numpy是一個(gè)很強(qiáng)的框架,但是它不能利用GPU加速計(jì)算。對(duì)于現(xiàn)代的深度神經(jīng)網(wǎng)絡(luò),GPU經(jīng)常能提供50倍以上的加速,所以numpy并不滿足現(xiàn)代的深度學(xué)習(xí)。
這里我們介紹最基礎(chǔ)的PyTorch概念:Tensor。Tensor類(lèi)似于numpy:一個(gè)tensor是一個(gè)n維數(shù)組,PyTorch也提供許多對(duì)tensor的操作函數(shù)。從表面看,Tensor可以跟蹤圖計(jì)算和迭代,但是他們作為科學(xué)計(jì)算工具也是很有用的。
不像numpy,PyTorch Tensor能夠利用GPU加速計(jì)算。為了在GPU上運(yùn)行Tensor,要將numpy轉(zhuǎn)化成一個(gè)新的數(shù)據(jù)類(lèi)型。
這里我們基于隨機(jī)數(shù)據(jù)用Tensor擬合一個(gè)兩層網(wǎng)絡(luò)。就像numpy示例一樣,我們需要手動(dòng)實(shí)現(xiàn)網(wǎng)絡(luò)的前向和后向傳遞:
-*- coding: utf-8 -*-
import torch
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# 不加注釋在GPU上運(yùn)行
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)
learning_rate = 1e-6
for t in range(500):
# Forward pass: compute predicted y
h = x.mm(w1)
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum().item()
if t % 100 == 99:
print(t, loss)
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)
# Update weights using gradient descent
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
自動(dòng)求導(dǎo)
PyTorch:Tensors 和 自動(dòng)求導(dǎo)
以上示例中,我們都是手工實(shí)現(xiàn)神經(jīng)網(wǎng)絡(luò)的前向和后向傳遞。手工實(shí)現(xiàn)后向傳遞在簡(jiǎn)單的兩層神經(jīng)網(wǎng)絡(luò)中不是問(wèn)題,但是對(duì)于復(fù)雜的多層網(wǎng)絡(luò)很容易就會(huì)變得非常難。
幸運(yùn)的是,我們能夠自動(dòng)計(jì)算神經(jīng)網(wǎng)絡(luò)的后向傳遞。PyTorch的autograd包正好提供這個(gè)功能。當(dāng)使用自動(dòng)求導(dǎo),前向傳遞將定義一個(gè)計(jì)算圖,節(jié)點(diǎn)都是Tensor,邊是一個(gè)函數(shù),輸入和輸出都是Tensor。反向傳播通過(guò)這個(gè)圖可以方便的計(jì)算出梯度。
聽(tīng)起來(lái)很不好理解,實(shí)際用起來(lái)很簡(jiǎn)單。計(jì)算圖中,每一個(gè)Tensor表示一個(gè)節(jié)點(diǎn)。如果x是一個(gè)Tensor并且 x.requires_grad=True ,然后x.grad 是另外一個(gè)包含x梯度的由標(biāo)量構(gòu)成的Tensor。
這里用PyTorch的Tensor和autograd實(shí)現(xiàn)一個(gè)兩層的神經(jīng)網(wǎng)絡(luò),不再需要手工實(shí)現(xiàn)神經(jīng)網(wǎng)絡(luò)的后向傳遞。
# -*- coding: utf-8 -*-
import torch
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# 不注釋表示在GPU運(yùn)行
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
# N 是 batch size;D_in 是輸入數(shù)據(jù)的數(shù)目
# H 是 隱藏層維度;D_out 是輸出維度
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs.
# 隨機(jī)建立輸入和輸出數(shù)據(jù)在Tensor中
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Tensors during the backward pass.
# 設(shè)置 requires_grad=False 表示在后向傳遞過(guò)程我們不需要計(jì)算這些梯度
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
# Create random Tensors for weights.
# 隨機(jī)創(chuàng)建權(quán)重在Tensor中
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
# 設(shè)置 requires_grad=True 表示在后向傳遞過(guò)程我們需要計(jì)算這些梯度
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
learning_rate = 1e-6
for t in range(500):
# Forward pass: compute predicted y using operations on Tensors; these
# are exactly the same operations we used to compute the forward pass using
# Tensors, but we do not need to keep references to intermediate values since
# we are not implementing the backward pass by hand.
# 前向傳遞:使用Tensor的操作計(jì)算預(yù)測(cè)值y,Tensor的操作與我們之前定義的操作都相同,但是我們不需要保留中間值,因?yàn)槲覀儧](méi)有手工實(shí)現(xiàn)后向傳遞。
y_pred = x.mm(w1).clamp(min=0).mm(w2)
# Compute and print loss using operations on Tensors.
# 使用Tensor的操作計(jì)算損失函數(shù)并打印
# Now loss is a Tensor of shape (1,)
# loss.item() gets the scalar value held in the loss.
# 現(xiàn)在損失是一個(gè)(1,)的Tensor,loss.item()可以得到損失里的標(biāo)量值。
loss = (y_pred - y).pow(2).sum()
if t % 100 == 99:
print(t, loss.item())
# Use autograd to compute the backward pass. This call will compute the
# gradient of loss with respect to all Tensors with requires_grad=True.
# 使用自動(dòng)求導(dǎo)計(jì)算后向傳遞。這個(gè)調(diào)用將計(jì)算 requires_grad=True 的Tensor的損失梯度。
# After this call w1.grad and w2.grad will be Tensors holding the gradient
# of the loss with respect to w1 and w2 respectively.
# 調(diào)用后, w1.grad 和 w2.grad 分別是帶有w1和w2的梯度的Tensor。
loss.backward()
# Manually update weights using gradient descent. Wrap in torch.no_grad()
# because weights have requires_grad=True, but we don't need to track this
# in autograd.
# 使用梯度下降手動(dòng)更新權(quán)重。在 torch.no_grad() 語(yǔ)句下,因?yàn)闄?quán)重設(shè)置 requires_grad=True,但是我們?cè)谧詣?dòng)求導(dǎo)時(shí)不需要跟蹤它。
# An alternative way is to operate on weight.data and weight.grad.data.
# 一個(gè)可選的方式是操作 weight.data 和 weight.grad.data 。
# Recall that tensor.data gives a tensor that shares the storage with
# tensor, but doesn't track history.
# 調(diào)用 tensor.data 得到一個(gè)和tensor共享存儲(chǔ)值的tensor,但是不跟蹤歷史。
# You can also use torch.optim.SGD to achieve this.
# 你也可以使用 torch.optim.SGD 實(shí)現(xiàn)這個(gè)操作。
with torch.no_grad():
w1 -= learning_rate * w1.grad
w2 -= learning_rate * w2.grad
# Manually zero the gradients after updating weights
# 更新完權(quán)重后手工對(duì)梯度清零
w1.grad.zero_()
w2.grad.zero_()
PyTorch: 定義新的自動(dòng)求導(dǎo)函數(shù)
其實(shí),每一個(gè)自動(dòng)求導(dǎo)操作都是兩個(gè)對(duì)Tensor的函數(shù)。forward函數(shù)根據(jù)輸入計(jì)算輸出Tensor。backward函數(shù)得到關(guān)于標(biāo)量值的輸出Tensor,并計(jì)算輸入Tensor的關(guān)于標(biāo)量值的梯度。
PyTorch可以簡(jiǎn)單的自定義自動(dòng)求導(dǎo)操作,通過(guò)定義一個(gè)torch.autograd.Function的子類(lèi),并實(shí)現(xiàn) forward 和 backward 函數(shù)。然后我們可以使用它,通過(guò)構(gòu)造一個(gè)實(shí)例,像調(diào)用函數(shù)一樣調(diào)用它,將包含輸入數(shù)據(jù)的Tensor傳遞進(jìn)去。
下面的示例,我們定義了自己的自動(dòng)求導(dǎo)函數(shù),用于ReLU非線性函數(shù),并用它實(shí)現(xiàn)了我們的兩層網(wǎng)絡(luò)。
-*- coding: utf-8 -*-
import torch
class MyReLU(torch.autograd.Function):
"""
We can implement our own custom autograd Functions by subclassing
torch.autograd.Function and implementing the forward and backward passes
which operate on Tensors.
"""
"""
我們能夠通過(guò)torch.autograd.Function子類(lèi)并實(shí)現(xiàn)前向、后向傳遞,實(shí)現(xiàn)自己的自動(dòng)求導(dǎo)函數(shù)。
"""
@staticmethod
def forward(ctx, input):
"""
In the forward pass we receive a Tensor containing the input and return
a Tensor containing the output. ctx is a context object that can be used
to stash information for backward computation. You can cache arbitrary
objects for use in the backward pass using the ctx.save_for_backward method.
"""
"""
前向傳遞接收一個(gè)輸入Tensor,返回一個(gè)輸出Tensor。ctx是一個(gè)上下文對(duì)象,可以為后向計(jì)算存儲(chǔ)信息。通過(guò)ctx.save_for_backward方法可以緩存任何在后向傳遞用到的對(duì)象。
"""
ctx.save_for_backward(input)
return input.clamp(min=0)
@staticmethod
def backward(ctx, grad_output):
"""
In the backward pass we receive a Tensor containing the gradient of the loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input.
"""
"""
后向傳遞我們接收一個(gè)包含loss關(guān)于輸出的梯度Tensor,需要計(jì)算loss關(guān)于輸入的梯度值。
"""
input, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input < 0] = 0
return grad_input
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
# Create random Tensors for weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
learning_rate = 1e-6
for t in range(500):
# To apply our Function, we use Function.apply method. We alias this as 'relu'.
# 為了應(yīng)用我們的函數(shù),我們用Function.apply方法,重新命名為relu。
relu = MyReLU.apply
# Forward pass: compute predicted y using operations; we compute
# ReLU using our custom autograd operation.
# ReLU 使用我們的自動(dòng)求導(dǎo)操作。
y_pred = relu(x.mm(w1)).mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum()
if t % 100 == 99:
print(t, loss.item())
# Use autograd to compute the backward pass.
loss.backward()
# Update weights using gradient descent
with torch.no_grad():
w1 -= learning_rate * w1.grad
w2 -= learning_rate * w2.grad
# Manually zero the gradients after updating weights
w1.grad.zero_()
w2.grad.zero_()
nn module
PyTorch: nn
計(jì)算圖和autograd是一個(gè)定義復(fù)雜算子和獲取導(dǎo)數(shù)的非常典型的示例。然而對(duì)于大型神經(jīng)網(wǎng)絡(luò),autograd可能太低級(jí)了。
當(dāng)建立神經(jīng)網(wǎng)絡(luò)時(shí),我們通常考慮將計(jì)算分為若干層,其中一些層有一些在學(xué)習(xí)期間可被優(yōu)化的學(xué)習(xí)參數(shù)。
在TensorFlow,類(lèi)似 Keras, TensorFlow-Slim和 TFLearn 包提供對(duì)神經(jīng)網(wǎng)絡(luò)有用的比計(jì)算圖高級(jí)的概念。
在PyTorch中,nn包有相同的用途。nn包定義了一系列Modules,大概等同于神經(jīng)網(wǎng)絡(luò)的層。一個(gè)Module接收輸入Tensor并計(jì)算輸出Tensor,但也可以保持內(nèi)部狀態(tài)例如可學(xué)習(xí)參數(shù)。nn包也定義了一系列有用的常用于訓(xùn)練神經(jīng)網(wǎng)絡(luò)的損失函數(shù)。
這個(gè)示例中,我們用nn包實(shí)現(xiàn)了兩層神經(jīng)網(wǎng)絡(luò)。
# -*- coding: utf-8 -*-
import torch
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
# 使用nn包定義我們的模型為一個(gè)層序列。nn.Sequential是一個(gè)包含其他模塊的模塊,并會(huì)按順序產(chǎn)生輸出。每個(gè)線性模塊利用線性函數(shù)產(chǎn)生輸出,并保存權(quán)重和偏置內(nèi)部Tensor。
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
# nn包也包含了常用的損失函數(shù)。這個(gè)示例中,我們使用MSE損失函數(shù)。
loss_fn = torch.nn.MSELoss(reduction='sum')
learning_rate = 1e-4
for t in range(500):
# Forward pass: compute predicted y by passing x to the model. Module objects
# override the __call__ operator so you can call them like functions. When
# doing so you pass a Tensor of input data to the Module and it produces
# a Tensor of output data.
# 前向傳遞: 通過(guò)傳遞給模型的x計(jì)算預(yù)測(cè)y。Module對(duì)象會(huì)重寫(xiě)__call__操作,因此你可以像調(diào)用函數(shù)一樣調(diào)用他們。當(dāng)你這樣做時(shí),你傳遞一個(gè)輸入Tensor并產(chǎn)生一個(gè)輸出Tensor。
y_pred = model(x)
# Compute and print loss. We pass Tensors containing the predicted and true
# values of y, and the loss function returns a Tensor containing the
# loss.
# 計(jì)算打印損失。傳遞包含預(yù)測(cè)和真值的Tensor,返回一個(gè)包含loss的Tensor。
loss = loss_fn(y_pred, y)
if t % 100 == 99:
print(t, loss.item())
# Zero the gradients before running the backward pass.
# 運(yùn)行后向傳遞前清零梯度
model.zero_grad()
# Backward pass: compute gradient of the loss with respect to all the learnable
# parameters of the model. Internally, the parameters of each Module are stored
# in Tensors with requires_grad=True, so this call will compute gradients for
# all learnable parameters in the model.
# 后向傳遞: 計(jì)算所有關(guān)于學(xué)習(xí)參數(shù)的梯度。每個(gè)帶有requires_grad=True的模塊參數(shù)都會(huì)存儲(chǔ)在Tensor,所以這個(gè)調(diào)用會(huì)計(jì)算模型內(nèi)所有可學(xué)習(xí)參數(shù)的梯度。
loss.backward()
# Update the weights using gradient descent. Each parameter is a Tensor, so
# we can access its gradients like we did before.
# 使用梯度下降更新權(quán)重。每一個(gè)參數(shù)是一個(gè)Tensor,所以我們可以像之前一樣訪問(wèn)它的梯度。
with torch.no_grad():
for param in model.parameters():
param -= learning_rate * param.grad
PyTorch:optim
目前為止,我們已經(jīng)通過(guò)手工的方式更新了模型權(quán)重,即手工改變可學(xué)習(xí)參數(shù)Tensor(使用torch.no_grad() 或 .data避免autograd中跟蹤歷史 )。雖然對(duì)于簡(jiǎn)單的優(yōu)化算法隨機(jī)梯度下降不是什么大負(fù)擔(dān),但實(shí)際上我們經(jīng)常用更優(yōu)化的優(yōu)化器訓(xùn)練神經(jīng)網(wǎng)絡(luò),比如AdaGrad,RMSProp,Adam,等等。
optim包基于常用的優(yōu)化算法理論進(jìn)行了實(shí)現(xiàn)。
在這個(gè)示例中,我們像之前一樣用nn包定義我們的模型,但是我們將用optim包提供的Adam算法優(yōu)化模型。
# -*- coding: utf-8 -*-
import torch
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')
# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algorithms. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
# 使用optim包定義一個(gè)更新模型權(quán)重的優(yōu)化器。這里我們將用Adam;optim包含很多其他的優(yōu)化算法。Adam構(gòu)造函數(shù)的第一個(gè)參數(shù)是應(yīng)當(dāng)被更新的Tensor。
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
# Forward pass: compute predicted y by passing x to the model.
y_pred = model(x)
# Compute and print loss.
loss = loss_fn(y_pred, y)
if t % 100 == 99:
print(t, loss.item())
# Before the backward pass, use the optimizer object to zero all of the
# gradients for the variables it will update (which are the learnable
# weights of the model). This is because by default, gradients are
# accumulated in buffers( i.e, not overwritten) whenever .backward()
# is called. Checkout docs of torch.autograd.backward for more details.
# 后向傳遞前,用優(yōu)化器清零用于更新的梯度(即模型的可學(xué)習(xí)權(quán)重)。這是因?yàn)槟J(rèn)情況下,梯度當(dāng)調(diào)用.backward()時(shí)是累積在緩存中(不是覆蓋)。更多細(xì)節(jié)可查看torch.autograd.backward。
optimizer.zero_grad()
# Backward pass: compute gradient of the loss with respect to model
# parameters
loss.backward()
# Calling the step function on an Optimizer makes an update to its
# parameters
# 調(diào)用優(yōu)化器的step函數(shù)更新參數(shù)
optimizer.step()
PyTorch:自定義的nn模塊
有時(shí)你想要比現(xiàn)有的序列模型更復(fù)雜的模型,這種情況你需要定義自己的模塊作為nn.Module的子類(lèi),并且使用其他的modules或其他的autograd操作,定義一個(gè)接收輸入Tensor產(chǎn)生輸出Tensor的前向傳遞。
這個(gè)示例中,我們實(shí)現(xiàn)一個(gè)兩層的網(wǎng)絡(luò)作為一個(gè)自定義Module的子類(lèi)。
# -*- coding: utf-8 -*-
import torch
class TwoLayerNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
"""
In the constructor we instantiate two nn.Linear modules and assign them as
member variables.
"""
"""
構(gòu)造器中實(shí)例化兩個(gè)nn.Linear模型,并分配他們作為成員變量。
"""
super(TwoLayerNet, self).__init__()
self.linear1 = torch.nn.Linear(D_in, H)
self.linear2 = torch.nn.Linear(H, D_out)
def forward(self, x):
"""
In the forward function we accept a Tensor of input data and we must return
a Tensor of output data. We can use Modules defined in the constructor as
well as arbitrary operators on Tensors.
"""
"""
前向函數(shù)接收一個(gè)輸入數(shù)據(jù)Tensor,必須返回一個(gè)輸出Tensor。我們能夠用構(gòu)造器中的模塊作為T(mén)ensor的操作。
"""
h_relu = self.linear1(x).clamp(min=0)
y_pred = self.linear2(h_relu)
return y_pred
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# Construct our model by instantiating the class defined above
# 通過(guò)實(shí)例化定義的類(lèi)構(gòu)造我們的模型
model = TwoLayerNet(D_in, H, D_out)
# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
# 構(gòu)造我們的損失函數(shù)和優(yōu)化器。SGD中調(diào)用model.parameters()將包含兩個(gè)nn.Linear成員函數(shù)的可學(xué)習(xí)參數(shù)。
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x)
# Compute and print loss
loss = criterion(y_pred, y)
if t % 100 == 99:
print(t, loss.item())
# Zero gradients, perform a backward pass, and update the weights.
# 清零梯度,執(zhí)行一個(gè)后向傳遞,更新權(quán)重。
optimizer.zero_grad()
loss.backward()
optimizer.step()
PyTorch:控制流+共享權(quán)重
作為一個(gè)動(dòng)態(tài)圖和共享權(quán)重的示例,我們實(shí)現(xiàn)一個(gè)非常陌生的模型:一個(gè)全連接的ReLU網(wǎng)絡(luò),每一個(gè)前向傳遞隨機(jī)從1-4選擇一個(gè)數(shù),并使用許多隱藏層,重復(fù)使用相同的權(quán)重多次計(jì)算內(nèi)部的隱藏層。
對(duì)于這個(gè)模型我們能夠用Python流控制實(shí)現(xiàn)循環(huán),并且可以在定義前向傳遞時(shí)通過(guò)重復(fù)多次相同的模塊在內(nèi)部層實(shí)現(xiàn)權(quán)重共享。
我們可以容易的實(shí)現(xiàn)這個(gè)模型,作為一個(gè)Module的子類(lèi)。
# -*- coding: utf-8 -*-
import random
import torch
class DynamicNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
"""
In the constructor we construct three nn.Linear instances that we will use
in the forward pass.
"""
"""
在構(gòu)造器構(gòu)造三個(gè)nn.Linear實(shí)例,我們將在前向傳遞中使用。
"""
super(DynamicNet, self).__init__()
self.input_linear = torch.nn.Linear(D_in, H)
self.middle_linear = torch.nn.Linear(H, H)
self.output_linear = torch.nn.Linear(H, D_out)
def forward(self, x):
"""
For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
and reuse the middle_linear Module that many times to compute hidden layer
representations.
Since each forward pass builds a dynamic computation graph, we can use normal
Python control-flow operators like loops or conditional statements when
defining the forward pass of the model.
Here we also see that it is perfectly safe to reuse the same Module many
times when defining a computational graph. This is a big improvement from Lua
Torch, where each Module could be used only once.
"""
"""
對(duì)前向傳遞,我們隨機(jī)選擇0,1,2,3然后重復(fù)middle_linear多次計(jì)算隱藏層表示。
由于每一個(gè)前向傳遞建立一個(gè)動(dòng)態(tài)計(jì)算圖,當(dāng)定義前向傳遞時(shí),我們能夠用Python控制流操作,如循環(huán)或條件語(yǔ)句。
這里我們也看到當(dāng)定義一個(gè)計(jì)算圖時(shí),這樣重復(fù)相同的Module很多次是很安全的。
這對(duì)于Lua Torch每個(gè)Module只能用一次是一個(gè)很大的提升。
"""
h_relu = self.input_linear(x).clamp(min=0)
for _ in range(random.randint(0, 3)):
h_relu = self.middle_linear(h_relu).clamp(min=0)
y_pred = self.output_linear(h_relu)
return y_pred
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# Construct our model by instantiating the class defined above
# 通過(guò)實(shí)例化以上定義的類(lèi)構(gòu)造模型
model = DynamicNet(D_in, H, D_out)
# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
# 構(gòu)造我們的損失和優(yōu)化器。使用vanilla隨機(jī)梯度下降訓(xùn)練陌生的模型是困難的,所以我們用momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x)
# Compute and print loss
loss = criterion(y_pred, y)
if t % 100 == 99:
print(t, loss.item())
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()