夜夜爽中文字幕精品,欧洲精品五区,熟女人妻一区二区三区

DL理論

泛化能力

Predicting the Generalization Gap in Deep Networks with Margin Distributions (paper)

DNN可以完美的擬合隨機(jī)的label，但是在測(cè)試集上的性能卻很差。這個(gè)現(xiàn)象表明類似于cross-entropy這樣的損失函數(shù)不是對(duì)泛化能力的可信賴的indicator，這就導(dǎo)致一個(gè)關(guān)鍵的問(wèn)題：泛化的gap該如何從訓(xùn)練集和網(wǎng)絡(luò)參數(shù)預(yù)測(cè)。
本文提出一個(gè)基于margin distribution的度量（訓(xùn)練數(shù)據(jù)到?jīng)Q策邊界的距離）。該度量在CIFAR-10和CIFAR-100上顯示出對(duì)泛化gap很強(qiáng)的關(guān)聯(lián)。
這個(gè)度量可以很容易的應(yīng)用到前向神經(jīng)網(wǎng)絡(luò)，而且可能導(dǎo)出新的loss function能獲得更好的泛化能力。

收斂

20180927-On the loss landscape of a class of deep neural networks with no bad local valleys (paper)

本文確定出一類over-parameterized DNN+standard activation functions + CE loss，可以被證明其沒(méi)有bad local valley。對(duì)于參數(shù)空間的任何點(diǎn)，存在一個(gè)連續(xù)的path使得ce loss不增加并且以任意的精度接近0。這就意味著網(wǎng)絡(luò)沒(méi)有sub-optimal strict local minima.

考慮一類DNN：有d input units, H hidden units, m output units 并且滿足下面的條件

第一層的每個(gè)隱含節(jié)點(diǎn)可以將輸入層節(jié)點(diǎn)的任意子集作為輸入
高層的每個(gè)隱含節(jié)點(diǎn)可以將之前隱層任意子集的節(jié)點(diǎn)的任意子集作為輸入
位于相同層的任意隱含單元組可以有共享/非共享的權(quán)重，在這種情況下，進(jìn)來(lái)的神經(jīng)元數(shù)量必須相等
存在N個(gè)隱藏節(jié)點(diǎn)跟輸出節(jié)點(diǎn)相連，并有獨(dú)立的權(quán)重。N表示訓(xùn)練樣本的數(shù)量
網(wǎng)絡(luò)中每個(gè)節(jié)點(diǎn)的輸出要進(jìn)行非線性激活（實(shí)的而且嚴(yán)格增）。
這個(gè)設(shè)置中，特別的一點(diǎn)是要求必須有至少N個(gè)神經(jīng)元跟輸出層相連。
Theorem
存在無(wú)數(shù)的解有zero訓(xùn)練誤差
損失函數(shù)的loss landscape沒(méi)有任何bad local valley
不存在次優(yōu)的嚴(yán)格局部極小
不存在局部極小

20181109-Gradient Descent Finds Global Minima of Deep Neural Networks （paper，CUM-杜少雷，Reddit討論）

在目標(biāo)函數(shù)非凸的情況下，梯度下降在訓(xùn)練深度神經(jīng)網(wǎng)絡(luò)中也能夠找到全局最小值。本文證明，對(duì)于具有殘差連接的超參數(shù)化的深度神經(jīng)網(wǎng)絡(luò)（ResNet），采用梯度下降可以在多項(xiàng)式時(shí)間內(nèi)實(shí)現(xiàn)零訓(xùn)練損失。本文的分析基于由神經(jīng)網(wǎng)絡(luò)架構(gòu)建立的Gram矩陣的特定結(jié)構(gòu)。該結(jié)構(gòu)顯示在整個(gè)訓(xùn)練過(guò)程中，Gram矩陣是穩(wěn)定的，并且這種穩(wěn)定性意味著梯度下降算法的全局最優(yōu)性。使用ResNet可以獲得相對(duì)于全連接的前饋網(wǎng)絡(luò)架構(gòu)的優(yōu)勢(shì)。對(duì)于前饋神經(jīng)網(wǎng)絡(luò)，邊界要求每層網(wǎng)絡(luò)中的神經(jīng)元數(shù)量隨網(wǎng)絡(luò)深度的增加呈指數(shù)級(jí)增長(zhǎng)。對(duì)于ResNet，只要求每層的神經(jīng)元數(shù)量隨著網(wǎng)絡(luò)深度的實(shí)現(xiàn)多項(xiàng)式縮放。我們進(jìn)一步將此類分析擴(kuò)展到深度殘余卷積神經(jīng)網(wǎng)絡(luò)上，并獲得了類似的收斂結(jié)果。

看懂文章需要的基礎(chǔ)知識(shí)

算法復(fù)雜度分析，漸近分析理論
凸優(yōu)化
lipschitz 條件
矩陣求導(dǎo)

20181109-A Convergence Theory for Deep Learning via Over-Parameterization （paper, MIT-朱澤園）

The theory of multi-layer networks remains somewhat unsettled.
In this work, we prove why simple algorithms such as stochastic gradient descent (SGD) can find global minima on the training objective of DNNs. We only make two assumptions: the inputs do not degenerate and the network is over-parameterized. The latter means the number of hidden neurons is sufficiently large: polynomial in L, the number of DNN layers and in n, the number of training samples.
As concrete examples, on the training set and starting from randomly initialized weights, we show that SGD attains 100% accuracy in classification tasks, or minimizes regression loss in linear convergence speed ε∝e?Ω(T), with a number of iterations that only scales polynomial in n and L. Our theory applies to the widely-used but non-smooth ReLU activation, and to any smooth and possibly non-convex loss functions. In terms of network architectures, our theory at least applies to fully-connected neural networks, convolutional neural networks (CNN), and residual neural networks (ResNet).

實(shí)用trick

20181105-How deep is deep enough? - Optimizing deep neural network architecture (paper)

本文介紹了一個(gè)新的measure， called the generalized discrimination value (GDV), which quantifies how well different object classes separate in each layer. 由于它的定義，GDV對(duì)輸入數(shù)據(jù)的平移和縮放不變，獨(dú)立于特征的數(shù)量，以及獨(dú)立于一層內(nèi)神經(jīng)元的擾動(dòng)。我們計(jì)算了一個(gè)在MNIST數(shù)據(jù)中無(wú)監(jiān)督訓(xùn)練的DBN中每一層的GDV，我們發(fā)現(xiàn)GDV首先提升，然后在30層之后變差，這就指出了對(duì)于數(shù)據(jù)分類任務(wù)最優(yōu)的網(wǎng)絡(luò)深度進(jìn)一步的研究表明GDV可以作為一個(gè)統(tǒng)一的工具來(lái)確定各種深度神經(jīng)網(wǎng)絡(luò)中最優(yōu)的層數(shù)。

CNN學(xué)習(xí)形狀表示

20181017-A Convolutional Autoencoder Approach to Learn Volumetric Shape Representations for Brain Structures (paper, code)

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

domain文獻(xiàn)總結(jié)-重于結(jié)論（2018-11-17更新）

domain文獻(xiàn)總結(jié)-重于結(jié)論（2018-11-17更新）

DL理論

泛化能力

收斂

20180927-On the loss landscape of a class of deep neural networks with no bad local valleys (paper)

20181109-Gradient Descent Finds Global Minima of Deep Neural Networks （paper，CUM-杜少雷，Reddit討論）

20181109-A Convergence Theory for Deep Learning via Over-Parameterization （paper, MIT-朱澤園）

大牛主頁(yè)

實(shí)用trick

CNN學(xué)習(xí)形狀表示

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

domain文獻(xiàn)總結(jié)-重于結(jié)論（2018-11-17更新）

DL理論

泛化能力

收斂

20180927-On the loss landscape of a class of deep neural networks with no bad local valleys (paper)

20181109-Gradient Descent Finds Global Minima of Deep Neural Networks （paper，CUM-杜少雷，Reddit討論）

20181109-A Convergence Theory for Deep Learning via Over-Parameterization （paper, MIT-朱澤園）

大牛主頁(yè)

實(shí)用trick

CNN學(xué)習(xí)形狀表示

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

20181109-Gradient Descent Finds Global Minima of Deep Neural Networks （paper，CUM-杜少雷，Reddit討論）