完整課程視頻b站視頻鏈接:MIT 6.S191
1. Introduction to deep learning
The perceptron - the structural building block of deep learning
perceptron:感知元 neuron:神經(jīng)元,在神經(jīng)網(wǎng)絡(luò)中是同一個(gè)東西,即組成神經(jīng)網(wǎng)絡(luò)的最小單元。
Forward Propagation:前向傳播。即從輸入到輸出的正向傳播過(guò)程。
Common Activation Functions:Sigmoid Function, Hyperbolic Tangent, Rectified Linear Unit (ReLU)。
Building Neural Networks with perceptrons
Dense Layers: if inputs are densely connected to all ouputs, these layers are called dense layers.
Quantifying Loss:量化loss。loss記錄了不正確的預(yù)測(cè)所造成的損失。
Loss的種類:Empirical Loss, Binary Cross Entropy Loss(二元交叉熵?fù)p失函數(shù)), Mean Squared Error Loss(均方誤差損失函數(shù))
Training Neural Networks
Loss Optimization: Gradient Descent(梯度下降)
Computing Gradient: Backpopagation(誤差反向傳播,用來(lái)求梯度)
Learning Rate:決定了梯度下降的幅度,太大則無(wú)法收斂,太小則優(yōu)化過(guò)程太慢。
Adaptive Learning Rate:訓(xùn)練過(guò)程中l(wèi)earning rate不再是固定值,而是可以改變的。常用算法:SGD, Adam, Adadelta, Adagrad, RMSProp
Batch Size:由于計(jì)算loss時(shí),把整個(gè)數(shù)據(jù)集的數(shù)據(jù)項(xiàng)都計(jì)算一遍的開(kāi)銷太大,因此每次隨機(jī)選擇若干個(gè)數(shù)據(jù)項(xiàng)計(jì)算loss。batch size指的是選擇計(jì)算loss的數(shù)據(jù)項(xiàng)的個(gè)數(shù)。batch size越小,計(jì)算速度越快,但loss無(wú)法代表整體效果;batch size越大,計(jì)算速度越慢,但是每一次loss更能代表整體的性能。
Overfitting:隨著訓(xùn)練數(shù)據(jù)的加入,神經(jīng)網(wǎng)絡(luò)可能會(huì)記錄下只屬于訓(xùn)練集的特征,這些特征無(wú)法泛化,從而導(dǎo)致過(guò)擬合的現(xiàn)象。解決此問(wèn)題的方法有:
- Dropout:將每一層的每一個(gè)神經(jīng)元的激活函數(shù)根據(jù)概率隨機(jī)置為0,即不讓它處理數(shù)據(jù),從而簡(jiǎn)化網(wǎng)絡(luò)。
- Early Stopping:當(dāng) test data 的 loss 曲線走向開(kāi)始與 training data 的 loss 曲線走向相背離時(shí),提前終止訓(xùn)練。這種情況的出現(xiàn)說(shuō)明神經(jīng)網(wǎng)絡(luò)雖然在訓(xùn)練集上表現(xiàn)更好,但在測(cè)試集上表現(xiàn)更差了,這是過(guò)擬合的表現(xiàn)。
2. Recurrent Neural Networks
處理的問(wèn)題:Sequence Modeling Problem。
Sequence Modeling Problem中的關(guān)鍵問(wèn)題:
- Handle variable-length sequence
- Track long-term dependency
- Maintain information about order
- Share parameters across the sequence
RNN訓(xùn)練中可能出現(xiàn)的問(wèn)題:1. Exploding gradient;2. Vanishing gradient。因此加入了LSTM (Long Short Term Memory)
RNN的應(yīng)用
- Music generation
- Sentiment classification
- Machine translation (Attention machanism -- from Google)
- Trakectory prediction (self-driving)
- Environmental modeling
3. Convolutional Neural Networks
CNN常常被用來(lái)進(jìn)行特征提取,其內(nèi)部的神經(jīng)網(wǎng)絡(luò)一般由兩部分組成:卷積層(Convolution)、池化層(Pooling)。其中卷積層的激活函數(shù)使用 ReLU。
CNN 使用場(chǎng)景
- Detection
- Semantic segmantation
- End-to-end robotic controll
4. Deep Generative Modeling
VAEs (Variational Autoencoders)
Learn lower-dimensional latent space and sample to generate input reconstruction.
GANs (Generative Adversarial Networks)
Competing Generator and Discriminator networks.