http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17_2.html
深度學(xué)習(xí)(DL, Deep Learning)
每個(gè) "logistic regression"看成一個(gè) “Neuron”,多個(gè) “Neuron” 組成Neural Network 神經(jīng)網(wǎng)絡(luò)。
1958:Perceptron感知器(linear model)
1969:Perceptron has limitation
1980s:Multi-layer perceptron
[Do not have significant difference from Deep Neural Networks(DNN) today]
1986:Backpropagation
[Usually more than 3 hidden layers is not helpful]
1989:1 hidden layer is “good enough”. why deep?
[突破:改了個(gè)名字 “深度學(xué)習(xí)”。]
2006:RBM(Restricted Boltzmann Machine) initialization_Geoffrey E. Hinton
[后面證明幫助不大,但重要的是再次引發(fā)人們的研究興趣]
2009:GPU
2011:Start to be popular in speech recognition
2012:win ILSVRC image competition
sigmoid function ——> Activation Function
fully connected Feedforward network This is a function(Input vector, output vector)
Matrix Operation
You need to decide the network structure to let a good function in your function set.(層數(shù),每層的個(gè)數(shù)和用什么激活函數(shù))
special structure:
- Convolutional Neural Network (CNN)
Backpropagation
Chain Rule
Forward pass:
Backward pass:
Keras
用Keras 就像是在搭積木。
Tips in DNN
層數(shù)越多training data不一定會(huì)更好,所有首先在training data上得到好的結(jié)果。
Do not always blame Overfitting
到底要在training data好還是testing data好。
Good Results on Training Data? No
1new activation function
Vanishing Gradient Problem
[sigmoid——>
ReLU(Rectified Linear Unit),ReLU-variant(Leaky ReLU, Parametric ReLU, ELU),
Maxout[Learnable activation function, ReLU is a special cases of Maxout]]
2adaptive learning rate
[Adagrad——>RMSProp,
Momentum
Adam(RMSProp + Momentum)]
Good Results on Testing Data? No
1Early Stopping
2Regularization
3Dropout[Dropout is a kind of ensemble]
CNN for 計(jì)算機(jī)視覺
Network 的架構(gòu)是可以設(shè)計(jì)的。
CNN 是 fully connected network 的簡(jiǎn)化版(參數(shù)減少)。
[使用 CNN 處理圖像 Image 的 3個(gè)理由:
1 A neuron does not have to see the whole image to discover the pattern. Connecting to small region with less parameters.
2The same patterns appear in different regions.
3Subsampling the pixels will not change the object.]
1、 2——》Convolution
3 ——》Max Pooling
Input, Convolution(layer), Max Pooling(layer), Convolution(layer), Max Pooling(layer),...,Convolution(layer), Max Pooling(layer), Flatten, Fully Connected Feedforward network, Output.
Property1 平移不變性
Property2 模型的空間層次結(jié)構(gòu)
CNN – Convolution卷積運(yùn)算
Filter過濾器(另一張較小的圖)
[大小 ,數(shù)值;
stride步幅;
Feature Map特征圖]
Convolution v.s. Fully Connected
注意:Each filter is a channel.
注意:輸出的寬度和高度可能與輸入的寬度和高度不停[邊界效應(yīng)和填充;步幅]
CNN – Max Pooling最大池化運(yùn)算
分組,運(yùn)算(另一張更小的圖)
注意:最大池化不是實(shí)現(xiàn)這種采樣的唯一方式,可以在前一個(gè)卷積層中使用步幅來實(shí)現(xiàn);也可以使用平均池化來代替最大池化。
注意:卷積通常使用 3×3 窗口和步幅 1 ;最大池化通常使用 2×2 窗口和步幅 2 。
分析 CNN(Filter) 的結(jié)果:
1First Convolution Layer[Typical-looking filters on the trained first layer];How about higher layers?[Which images make a specific neuron activate]
2What does CNN learn?[Degree of the activation of the k-th filter,]
小型數(shù)據(jù)集的圖像分類問題
- 從頭開始訓(xùn)練一個(gè)小型模型
- 使用預(yù)訓(xùn)練的網(wǎng)絡(luò)做特征提取
- 對(duì)預(yù)訓(xùn)練的網(wǎng)絡(luò)進(jìn)行微調(diào)
RNN for 文本和序列
生成式深度學(xué)習(xí)