再讀快速梯度符號方法(FGSM)

2022-02-04

Sec. 3: THE LINEAR EXPLANATION OF ADVERSARIAL EXAMPLES

Consider the dot product between a weight vector w and an adversarial example \tilde{x}: w^T\tilde{x}=w^Tx+w^T\eta The adversarial perturbation causes the activation to grow by w^T\eta. We can maximize this increase subject to the max norm constraint on \eta by assigning \eta=\text{sign}(w).

  1. 這里,如果\eta=\text{sign}(w),怎能滿足\Vert\eta\Vert_{\infty}<\epsilon?從下文來看應(yīng)該是\eta=\epsilon \cdot \text{sign}(w).
    If w has n dimensions and the average magnitude of an element of the weight vector is m, then the activation will grow by \epsilon m n.
  2. 這里的目標(biāo)優(yōu)化式應(yīng)為:\max_{w}{w^T\eta} \ \ \text{s.t.} \ \ \Vert\eta\Vert_{\infty}<\epsilon 由于須滿足\Vert\eta\Vert_{\infty}<\epsilon,那么\eta的最大值最大只能為\epsilon了;同時,為了最大化w^T\eta,需要保證\eta的符號與w一致,因此,\eta=\epsilon \cdot \text{sign}(w)

注:什么是Max norm constraints?下面是來自CS231n課程的答案:
Max norm constraints. Another form of regularization is to enforce an absolute upper bound on the magnitude of the weight vector for every neuron and use projected gradient descent to enforce the constraint. In practice, this corresponds to performing the parameter update as normal, and then enforcing the constraint by clamping the weight vector \vec{w} of every neuron to satisfy \Vert\vec{w}\Vert_{2}. Typical values of c are on orders of 3 or 4. Some people report improvements when using this form of regularization. One of its appealing properties is that network cannot “explode” even when the learning rates are set too high because the updates are always bounded.

Sec 4: LINEAR PERTURBATION OF NON-LINEAR MODELS

Let \theta be the parameters of a model, x the input to the model, y the targets associated with x (for machine learning tasks that have targets) and J(\theta; x; y) be the cost used to train the neural network. We can linearize the cost function around the current value of \theta, obtaining an optimal max-norm constrained perturbation of
\eta=\epsilon \cdot \text{sign} \left( \nabla_{x} J(\theta; x; y) \right) 這里,有如下幾個問題:

  1. 為什么是對x求導(dǎo)呢?因為我們要擾動的是x.
  2. 在Sec. 3中是取w\text{sign},這里為什么是取\nabla_{x} J(\theta; x; y)\text{sign}?可以簡單的理解為:如果是線性分類器的話,\nabla_{x} J(\theta; x; y)的結(jié)果就是參數(shù)w。
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容