關(guān)于小樣本生成

-paper1:Matching Networks for One Shot Learning (谷歌DeepMind的一篇論文)
-paper2:DATA AUGMENTATION GENERATIVE ADVERSARIAL NET
-paper3:MetaGAN: An Adversarial Approach to Few-Shot Learning(NIPS2018)

Matching Networks for One Shot Learning

附上我覺得總結(jié)的不錯(cuò)的一篇鏈接

Abstract

In this work, we employ ideas from metric learning based on deep neural features and from recent advances that augment neural networks with external memories. Our framework learns a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types.

Matching Networks基于小樣本學(xué)習(xí)歸類,使得訓(xùn)練好的模型不需要經(jīng)過調(diào)整也可以用在對(duì)訓(xùn)練過程中未出現(xiàn)過的類別進(jìn)行歸類。

Introduction

  1. Deep Learning:Learning slow and based on large datasets(many weight updates using stochastic gradient descent.) This is mostly due to the parametric aspect of the model, in which training examples need to be slowly learnt by the model into its parameters。且樣本用完即棄。
  2. Non-parametric Model:作者此處對(duì)比nearest neighbor分類器,對(duì)NN而言,樣本是什么輸入就是什么并且會(huì)被保存,無需訓(xùn)練。從而可以快速學(xué)習(xí)。
  3. 本文目的融合parametric model和non-parametric model。
  4. 本文的兩個(gè)創(chuàng)新點(diǎn):提出Matching Net(對(duì)樣本學(xué)習(xí)一個(gè)樣本的表示,把他們編碼一下)&& 提出新的訓(xùn)練測(cè)試方式(task based)。
    Main novelty:We propose Matching Nets (MN), a neural network which uses recent advances in attention and memory that enable rapid learning.

Model

Model Architecture

Contribution: one-shot learning within the set-to-set framework
Simplest form of model:
\hat{y}=\sum_{i=1}^ka(\hat{x},x_i)y_i\tag{1}

其中,x_i,y_i來自support set S=\{(x_i,y_i)\}_{i=1}^k ,a可以看作一個(gè)attention kernel。
該模型用函數(shù)可表示為prediction=f(support\_set, test\_example),用概率可表示為P(\hat{y}|\hat{x},S)。For a given input unseen example \hat{x},our predicted output class would be \arg \max_yP(y|\hat{x},S).其中P為parametric neural network。

Attention Kernel

use the softmax over the cosine distance(Euclidean distance + softmax as weight )

其中f,g是兩個(gè)嵌入對(duì)編碼函數(shù),如figure1所示。

Full Context Embeddings

(內(nèi)部結(jié)構(gòu)沒太看懂,得對(duì)LSTM結(jié)構(gòu)有深入對(duì)了解才行,這里是宏觀上理解)
Full Context Embeddings f

其中
可以參照模型圖來理解。K為L(zhǎng)STM的timestep,等于support set中圖片的個(gè)數(shù),每個(gè)圖片產(chǎn)生一個(gè)embedding,要與每個(gè)embedding計(jì)算cos距離,這樣就是K次(不斷地對(duì)測(cè)試樣本自身進(jìn)行K次迭代編碼)。其余參數(shù)是LSTM內(nèi)部結(jié)構(gòu)函數(shù)。
Full Context Embeddings g
關(guān)于fully-conditional embedding的理解:支持集的樣本除了能優(yōu)化 g 網(wǎng)絡(luò),也應(yīng)該可以優(yōu)化用于編碼測(cè)試樣本的 f 網(wǎng)絡(luò),嵌入函數(shù)同時(shí)考慮support set和test set可以消除隨機(jī)選擇造成的差異性。為使支持集樣本之間信息互通具體操作為:

  • 雙向LSTM:學(xué)習(xí)訓(xùn)練集的embedding,使得每個(gè)訓(xùn)練樣本的embedding是其它訓(xùn)練樣本的函數(shù);
  • 基于attention-LSTM來對(duì)測(cè)試樣本embedding,使得每個(gè)測(cè)試樣本的embeding是訓(xùn)練集embedding的函數(shù)。

Training Strategy

  • a task T as distribution over possible label sets L.(各種label集的可能組合)
  • L \sim T: a label set L sampled from a task T
  • S \sim L ,B \sim L: use L to sample the support set S and a batch B
  • 一個(gè)B含多個(gè)task,一個(gè)task有一個(gè)S和一個(gè)test example。對(duì)one-shot來說,support set中有且只有一個(gè)樣本與test example同類。
    The Matching Net is then trained to minimise the error predicting the labels in the batch B conditioned on the support set S. 換句話解釋:form of meta-learning: learn to learn from a given support set to minimize a loss over batch.

DAGAN

Introduction

Figure1的意思大致為:從源域?qū)W習(xí)好的manifold可以用于實(shí)現(xiàn)和有效地改進(jìn)匹配網(wǎng)絡(luò)的few-shot目標(biāo)域。通過DAGAN可以增加匹配網(wǎng)絡(luò)和相關(guān)模型中的數(shù)據(jù)(從DAGAN生成的每個(gè)類的最相關(guān)的比較點(diǎn)來實(shí)現(xiàn))這涉及切線距離的概念。DAGAN以學(xué)習(xí)到流形之間的距離為目標(biāo)關(guān)鍵。
Figure2的意思大致介紹了data shift的概念:協(xié)變量移位對(duì)多個(gè)域之間的變化情況。(對(duì)于one shot學(xué)習(xí),類分布有一個(gè)極端的變化——兩個(gè)分布沒有共同支持。因此需要假設(shè)類條件分布具有一些共性,信息才可以從源域轉(zhuǎn)換到one-shot目標(biāo)域,生成新的數(shù)據(jù)。)
介紹了典型的數(shù)據(jù)增強(qiáng)技術(shù)的思想:在數(shù)據(jù)類間轉(zhuǎn)換去挖掘其中的已知不變性。引出DAGAN的思想就差不多是在不同的源域訓(xùn)練GAN,從而學(xué)得更大的不變空間模型。訓(xùn)練出來的DAGAN不依賴于類本身,能捕獲跨類轉(zhuǎn)換,將數(shù)據(jù)點(diǎn)移動(dòng)到相同類的其他點(diǎn)。

Controbution

  1. Using GAN to learn a representation and process for a data
    augmentation.
  2. 用單個(gè)新數(shù)據(jù)點(diǎn)生成了數(shù)據(jù)增強(qiáng)樣本。
  3. 在數(shù)據(jù)量少的情況下也保證了任務(wù)的泛化性。
  4. DAGAN在元學(xué)習(xí)空間中的應(yīng)用,表現(xiàn)出比以往所有通用的元學(xué)習(xí)(meta-learning )模型更好的表現(xiàn)。
  5. 在元學(xué)習(xí)空間中的應(yīng)用比以往所有通用的元學(xué)習(xí)(meta-learning )模型有更好的表現(xiàn)。
    To our knowledge, this is the first paper to demonstrate state-of-the-art performance on meta-learning via novel data augmentation strategies.

Background

Transfer Learning and Dataset Shift:The term dataset shift (Storkey, 2009) generalises the concept of covariate shift (講了協(xié)變量轉(zhuǎn)移的概念)
Data Augmentation:Almost all cases of data augmentation are from a priori known invariance.(先驗(yàn)已知不變性)

Models

這里的 g 為generative model,f 為neural network takes the representation r and the random z as inputs.(這里感覺文中的描述和模型圖片有點(diǎn)矛盾,個(gè)人覺得還是主要看圖片描述,把g當(dāng)成一個(gè)encoder,f當(dāng)成一個(gè)decoder) 這時(shí)給出一個(gè)任意的,我們可以
主要模型如下:

Learning

這里要強(qiáng)調(diào)向D提供原始數(shù)據(jù)的重要性,防止GAN簡(jiǎn)單地對(duì)當(dāng)前數(shù)據(jù)點(diǎn)進(jìn)行自動(dòng)編碼。

Architecture

G: a combination of a UNet and ResNet (UResNet)


D:a DenseNet discriminator, using layer normalization instead of batch
normalization(the latter would break the assumptions of the WGAN objective function.)

Conclusions

  1. DAGANS improve performance of classifiers even after standard data-augmentation.
  2. 數(shù)據(jù)增強(qiáng)在所有模型和方法上的一般性意味著DAGAN could be a valuable addition to any low data setting.

MetaGAN

網(wǎng)上搜不到對(duì)這篇文章的分析,就我個(gè)人理解整篇文章偏理論,提出了把GAN應(yīng)用到元學(xué)習(xí)領(lǐng)域。文章借用元學(xué)習(xí)訓(xùn)練的方式,整體來看很像半監(jiān)督學(xué)習(xí)GAN。

核心思想

通過對(duì)抗訓(xùn)練的方式使得鑒別器 learn sharper decision boundary.

Introduction

Problem:Adapt to new tasks within a few numbers of steps and scarce data.
Solve:MetaLearning:Train a adaptation strategy to a distribution of similar tasks, trying to extract transferable patterns useful for many tasks.
目前小樣本學(xué)習(xí)方法建議閱讀 當(dāng)小樣本遇上機(jī)器學(xué)習(xí) fewshot learning
目前許多few-shot learning models考慮如何用少量樣本進(jìn)行監(jiān)督學(xué)習(xí),而本文MetaGAN框架將監(jiān)督和半監(jiān)督學(xué)習(xí)結(jié)合,通過對(duì)抗學(xué)習(xí)的方式使用G生成的假數(shù)據(jù)學(xué)習(xí)到更清晰的決策邊界,for both sample-level and task-level。
關(guān)于sharper decision boundary的理解可以參考文中的這張圖:

BACKGROUND

Few-Shot Learning Def

Approch

Increase the dimension of the classifier output from N to N + 1, to model the probability that input data is fake.(通過給classifier增加一個(gè)額外的輸出,這就是我說的其實(shí)想法類似于 semi-supervised GANs)

Basic Algorithm


Discriminator的選擇
理論上選擇是沒有限制的,本文使用

  • MAML:representing learning to fast fine-tune based models
  • Relation Networks: learning shared embedding and metric based models
    文章最后附錄部分還給出了基于MAML的偽代碼。
    Generative的選擇
    Conditional generative model
    G和D具體選擇詳見原文,不多做分析

WHY DOES METAGAN WORK?

最后作者分析了MetaGAN work的原因。直觀的理解就是那幅圖,當(dāng)然作者沒有那么隨意,用了許多數(shù)學(xué)知識(shí)來證明,于我而言晦澀難懂,這里就不班門弄斧了。

實(shí)驗(yàn)

  • Sample-level
  • Task-level
    效果都不錯(cuò)。

特別感謝@ewanlee

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容