注意力機(jī)制是源于nlp，在這篇論文中主要使用的是squeeze and excitation（壓縮和激活）模塊。

優(yōu)點(diǎn)：可以學(xué)習(xí)使用全局信息來選擇性地強(qiáng)調(diào)信息特征和抑制無用的特征。
se結(jié)構(gòu)簡(jiǎn)單，輕量化，可以直接放在最先進(jìn)的框架中，而且計(jì)算簡(jiǎn)單，只是稍微增加了計(jì)算的復(fù)雜度。
自動(dòng)學(xué)習(xí)，而不是手工設(shè)計(jì)。

0.基礎(chǔ)知識(shí)

0.1 Feedforword 結(jié)構(gòu)

Feedforword 結(jié)構(gòu)中主要起作用的是激活函數(shù)，通過激活函數(shù)增加模型的非線性學(xué)習(xí)能力；一般網(wǎng)絡(luò)都是一個(gè)線性學(xué)習(xí)套一個(gè)激活函數(shù)增加它的學(xué)習(xí)能力，但self-attention里沒有進(jìn)行任何非線性學(xué)習(xí)，所以會(huì)在self-attention后增加feedforward層提高非線性學(xué)習(xí)能力
transfomer中的Feedforword 結(jié)構(gòu)一般為兩層，經(jīng)過兩層Feedforword 結(jié)構(gòu)數(shù)據(jù)維度不會(huì)改變，是一個(gè)先升維再降維的過程

1.壓縮SQUEEZE

首先是 Squeeze 操作，我們順著空間維度來進(jìn)行特征壓縮，將每個(gè)二維的特征通道變成一個(gè)實(shí)數(shù)，這個(gè)實(shí)數(shù)某種程度上具有全局的感受野，并且輸出的維度和輸入的特征通道數(shù)相匹配。它表征著在特征通道上響應(yīng)的全局分布，而且使得靠近輸入的層也可以獲得全局的感受野，這一點(diǎn)在很多任務(wù)中都是非常有用的。使用全局均值池化：就是把一個(gè)通道上所有的數(shù)加起來然后除以元素個(gè)數(shù)

#和論文中表述的一樣，很簡(jiǎn)潔，就是全局池化、線性層降維、relu、升維、在sigmoid
class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)#全局均值池化
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias=False),#//整除
            nn.ReLU(inplace=True),#
            nn.Linear(channel // reduction, channel, bias=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y.expand_as(x)#y.expand_as(x)是要y擴(kuò)展到和x一樣維度，之后再乘以x，就是給每個(gè)特征值加上通道注意力的權(quán)重

class SEBasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None,
                 *, reduction=16):
        #plane就是通道數(shù)，reduction是經(jīng)驗(yàn)值，給固定了
        super(SEBasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes, 1)
        self.bn2 = nn.BatchNorm2d(planes)
        self.se = SELayer(planes, reduction)#很簡(jiǎn)單就插入了，輕量化模組
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x#先保存一個(gè)輸入
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.se(out)#給加完權(quán)了

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out

class SEBottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None,
                 *, reduction=16):
        super(SEBottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.se = SELayer(planes * 4, reduction)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)
        out = self.se(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out

2.激活EXCITATION

為了限制模型復(fù)雜度和輔助泛化，論文通過引入兩個(gè)全連接（FC）層（都是1
*1的conv層），即降維層參數(shù)為W1，降維比例為r（論文把它設(shè)置為16），然后經(jīng)過一個(gè)ReLU，然后是一個(gè)參數(shù)為W2的升維層，再sigmoid一下。最后得到1*1*C的實(shí)數(shù)數(shù)列結(jié)合U（原始feature map）。通過參數(shù) w 來為每個(gè)特征通道生成權(quán)重，其中參數(shù) w 被學(xué)習(xí)用來顯式地建模特征通道間的相關(guān)性。

最后是一個(gè) Reweight 的操作，我們將 Excitation 的輸出的權(quán)重看做是進(jìn)過特征選擇后的每個(gè)特征通道的權(quán)重，然后通過乘法逐通道加權(quán)到先前的特征上。

實(shí)驗(yàn)結(jié)果：

2.CBAM: Convolutional Block Attention Module

2.1基礎(chǔ)知識(shí)

整體結(jié)構(gòu)如下，先全局池化--->全連接---->relu---->全連接---->sigmoid

寬度、深度、基數(shù)depth, width, and cardinality.
網(wǎng)絡(luò)更深帶來的一個(gè)非常大的好處，就是逐層的抽象，不斷精煉提取知識(shí)，更深的網(wǎng)絡(luò)層能學(xué)習(xí)到更加復(fù)雜的表達(dá)。

而寬度就起到了另外一個(gè)作用，那就是讓每一層學(xué)習(xí)到更加豐富的特征，比如不同方向，不同頻率的紋理特征。

ResNeXt的作者引入了一個(gè)被稱為“基數(shù)”（cardinality）的超參數(shù) - 即獨(dú)立路徑的數(shù)量，以提供一種新方式來調(diào)整模型容量。實(shí)驗(yàn)表明，通過增加“基數(shù)”提高準(zhǔn)確度相比讓網(wǎng)絡(luò)加深或擴(kuò)大來提高準(zhǔn)確度更有效。這里的基數(shù)就是32，我感覺就是殘差模塊的寬度。

2.2CBAM

2.2.1注意力機(jī)制

本文的作者覺得senet中提出的通道注意力只是次優(yōu)特征，用使用空間注意力，并且使用了最大池化而不是全局均值池化

2.2.2卷積塊注意力模塊

特征圖譜

通道注意力

空間注意力

整個(gè)注意力實(shí)施的過程，可以如下表示：

和senet一樣，分兩步：先提取相關(guān)注意力權(quán)重;再把權(quán)重乘到特征圖譜上再進(jìn)行下一步的前向傳播。

2.2.2.1新的通道注意力

使用最大池化和平均池化對(duì)feature map在空間維度上進(jìn)行壓縮，得到兩個(gè)不同的空間背景描述。使用由多層感知器（只有一個(gè)隱藏層，所以可以寫成兩次矩陣乘法）組成的共享網(wǎng)絡(luò)對(duì)這兩個(gè)不同的空間背景描述進(jìn)行計(jì)算得到channel attention map，最后加起來做sigmoid。這里池化后的f是11c維度，解決的是what

2.2.2.2空間注意力

這一步是輸入是上一步的輸出，即可通道注意力加權(quán)后的后特征圖譜，這里池化完后的數(shù)據(jù)是1*h*w，結(jié)果基于channel 做concat操作。然后經(jīng)過一個(gè)卷積操作，降維為1個(gè)channel。再經(jīng)過sigmoid生成spatial attention feature。最后將該feature和該模塊的輸入feature做乘法，得到最終生成的特征。解決的是where的問題