PyTorch Deep Learning (II) - A Simple Regression

1. Background

A simple and familiar problem: a linear regression with a single feature.

Simple linear regression model:


image.png

image.png

?

2. Import libraries and make preparation

  • Libraries we need in the demo
import numpy as np
from sklearn.linear_model import LinearRegression

import torch
import torch.optim as optim
import torch.nn as nn
from torchviz import make_dot
import matplotlib.pyplot as plt
  • Make preparations, some custom functions like plot
# Make preparations, some custom functions like plot
def figure1(x_train, y_train, x_val, y_val):
    fig, ax = plt.subplots(1, 2, figsize=(12, 6))
    
    ax[0].scatter(x_train, y_train)
    ax[0].set_xlabel('x')
    ax[0].set_ylabel('y')
    ax[0].set_ylim([0, 3.1])
    ax[0].set_title('Generated Data - Train')

    ax[1].scatter(x_val, y_val, c='r')
    ax[1].set_xlabel('x')
    ax[1].set_ylabel('y')
    ax[1].set_ylim([0, 3.1])
    ax[1].set_title('Generated Data - Validation')
    fig.tight_layout()
    
    return fig, ax

?

3. Data Generation

  • 2-1) Let’s start generating some synthetic data
    We start with a vector of 100 (N) points for our feature (x) and create our labels (y) using b = 1, w = 2,
    and some Gaussian noise(epsilon).
# Synthetic Data Generation
true_b = 1
true_w = 2
N = 100

# Data Generation
np.random.seed(42)
x = np.random.rand(N, 1)
epsilon = (.1 * np.random.randn(N, 1))
y = true_b + true_w * x + epsilon


  • 2-2) Split data into train and validation sets
    Next, let’s split our synthetic data into train and validation sets, shuffling the array of indices and using the first 80 shuffled points for training.
# Shuffles the indices
idx = np.arange(N)
np.random.shuffle(idx)

# Uses first 80 random indices for train
train_idx = idx[:int(N*.8)]
# Uses the remaining indices for validation
val_idx = idx[int(N*.8):]

# Generates train and validation sets
x_train, y_train = x[train_idx], y[train_idx]
x_val, y_val = x[val_idx], y[val_idx]

# using plot to draw train and validation data
figure1(x_train, y_train, x_val, y_val)

Result:


image.png

?

4. Gradient Descent

  • 4-1) Random Initialization

For training a model, you need to randomly initialize the parameters / weights(in this example, we have only two, b and w).

# Step 0 - Initializes parameters "b" and "w" randomly
np.random.seed(42)
b = np.random.randn(1)
w = np.random.randn(1)

print(b, w)

Output:
[0.49671415] [-0.1382643]


  • 4-2) Compute Model’s Predictions

This is the forward pass; it simply computes the model’s predictions using the current values of the parameters / weights. At the very beginning, we will be producing really bad predictions, as we started with random values.

# Step 1 - Computes our model's predicted output - forward pass
yhat = b + w * x_train


  • 4-3) Compute the Loss

For a regression problem, the loss is given by the mean squared error (MSE); that is, the average of all squared errors; that is, the average of all squared differences between labels (y) and predictions (b + wx).
In the code below, we are using all data points of the training set to compute the loss, so n = N = 80, meaning we are performing batch gradient descent.

# Step 2 - Computing the loss
# We are using ALL data points, so this is BATCH gradient
# descent. How wrong is our model? That's the error!
error = (yhat - y_train)

# It is a regression, so it computes mean squared error (MSE)
loss = (error ** 2).mean()

print(loss)

Output:
2.720278897826747


  • 4-4) Compute the Gradients

A gradient is a partial derivative.
A derivative tells you how much a given quantity changes when you slightly vary some other quantity.

Gradient = how much the loss changes if ONE parameter changes a little bit

# Step 3 - Computes gradients for both "b" and "w" parameters
b_grad = 2 * error.mean()
w_grad = 2 * (x_train * error).mean()
print(b_grad, w_grad)

Output:
-3.044811379650508 -1.8337537171510832


  • 4-5) Update the Parameters

In the final step, we use the gradients to update the parameters.
Since we are trying to minimize our losses, we reverse the sign of the gradient for the update.

# Sets learning rate - this is "eta" ~ the "n" like Greek letter
lr = 0.1
print(b, w)

# Step 4 - Updates parameters using gradients and 
# the learning rate
b = b - lr * b_grad
w = w - lr * w_grad

print(b, w)

Output:
[0.49671415] [-0.1382643]
[0.80119529] [0.04511107]

eg: Let’s start with a value of 0.1 for the learning rate (which is a
relatively high value, as far as learning rates are concerned!)


  • 4-6) Rinse and Repeat

We use the updated parameters to go back to Step 1 and restart the process.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容