午夜福利视频久久,亚洲一区人妻,国产无遮挡啪啪

作者： 心有寶寶人自圓

聲明：歡迎轉(zhuǎn)載本文中的圖片或文字，請(qǐng)說(shuō)明出處

寫在前面

受到前輩們的啟發(fā)，決定應(yīng)該寫些文章記錄一下學(xué)習(xí)的內(nèi)容了

之前也讀過(guò)一些文章、寫過(guò)一些代碼，以后再慢慢填坑吧 ??

現(xiàn)在把最近讀的學(xué)習(xí)與大家分享一下

在此分享一下自己的理解和心得，如有錯(cuò)誤或理解不當(dāng)敬請(qǐng)指出 ??

這篇文章是SSD：Single Shot Multibox Detector：第一部分-論文閱讀的后續(xù)內(nèi)容，努力填坑......

論文地址：SSD: Single Shot MultiBox Detector

我們的目標(biāo)是：用Pytorch實(shí)現(xiàn)SSD ??

我使用的是python-3.6+ pytorch-1.3.0+torchvision-0.4.1

訓(xùn)練集：VOC2007 trainval ，VOC2012 trainval

測(cè)試集：VOC2007 test

其中目標(biāo)類別如下，共20個(gè)類別+1（背景類）

('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 
'chair', 'cow', 'diningtable','dog', 'horse', 'motorbike', 'person',
 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor')

以下圖片為detect的結(jié)果，訓(xùn)練了45個(gè)epochs，比著作者的200+epochs差的挺多，但效果還行把（關(guān)鍵有點(diǎn)耗時(shí)間??），隨機(jī)展示了測(cè)試集中的一些圖片檢測(cè)效果??，看看怎么樣

0.論文重要概念的回顧

single-shot vs two-stage：典型的two-stage模型（R-CNN系列）一般有SSD論文提及的那個(gè)pipeline，大量的多尺度的提議區(qū)域，卷積神經(jīng)網(wǎng)絡(luò)提取特征，高質(zhì)量分類器進(jìn)行分類，用回歸方法預(yù)測(cè)邊界框的位置，blablabla......總之它存在準(zhǔn)確率-速度權(quán)衡，大量的計(jì)算資源消耗使它不適合真實(shí)世界的即時(shí)目標(biāo)檢測(cè)任務(wù)；SSD將最耗時(shí)的提議區(qū)域的選擇與重采樣去除，轉(zhuǎn)而使用封裝在了模型內(nèi)部的固定錨框，是我們能又快又準(zhǔn)的進(jìn)行目標(biāo)檢測(cè)
固定的錨框（fixed邊界框，priors）：在我之前寫的論文閱讀部分中，大量的準(zhǔn)備工作都是對(duì)錨框進(jìn)行的，錨框的設(shè)計(jì)對(duì)模型的訓(xùn)練至關(guān)重要，因?yàn)樗鼘⒈辉O(shè)計(jì)成ground truth標(biāo)記（offset+label）。錨框是預(yù)先在SSD模型中固定下來(lái)的（priors），以(aspect ratio, scale)來(lái)標(biāo)識(shí)。由于錨框與不同層次的feature map對(duì)應(yīng)，所以高層的 scale大，低層的 scale?。A(yù)測(cè)是基于每一個(gè)priors）
多尺度特征圖與預(yù)測(cè)器：SSD在不同層次的特征圖上進(jìn)行預(yù)測(cè)，并將預(yù)測(cè)結(jié)果加到截?cái)嗟腷ase net之后。低層主要用來(lái)檢測(cè)較小的目標(biāo)，高層主要用來(lái)檢測(cè)較大的目標(biāo)，不同尺度的預(yù)測(cè)器學(xué)習(xí)去預(yù)測(cè)該尺度下的目標(biāo)。由于不同的尺度特征上，一個(gè)像素的感受野在高層更大，這一特性使得卷積核被設(shè)定成固定的大小的小卷積核。
Hard Negative Mining：SSD在訓(xùn)練時(shí)往往會(huì)存在大量的負(fù)類，這將導(dǎo)致訓(xùn)練數(shù)據(jù)的正負(fù)類嚴(yán)重不平衡，所以我們需要顯式選擇一定比例負(fù)類信度高的預(yù)測(cè)結(jié)果去計(jì)算損失，而不使用全部的負(fù)類
非極大值抑制：只留下信度最高的預(yù)測(cè)框，刪除交疊、冗余的數(shù)據(jù)框

整體的工作量還是很大的，我盡量把注釋寫的清楚 ??

記得定義全局變量

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

1. 從錨框（論文中固定邊界框、default boxes，之后的Prior）開始

import matplotlib.pyplot as plt
def show_box(box, color):
   """
   使用matplotlib展示邊界框
   :param box: 邊界框，(xmin, ymin, xmax, ymax)
   :return: matplotlib.patches.Rectangle
   """
    return plt.Rectangle(xy=(box[0], box[1]), width=box[2] - box[0], height=box[3] - box[1], fill=False,edgecolor=color, linewidth=2)

通常來(lái)說(shuō)，目標(biāo)（不論是哪個(gè)種類）在圖像中的位置分布十分散亂，大小尺寸各不一致。從概率上來(lái)說(shuō)，目標(biāo)可能出現(xiàn)在任何地方，所以我們只能將這種概率空間離散化，這樣我們至少能得出一個(gè)概率值了......??我們就讓錨框盡可能的普遍整個(gè)特征圖（離散化的概率空間？）。

錨框是先驗(yàn)的、固定的方框，它們共同代表了這個(gè)類別可能性和近似的方框的概率空間，之后為了突出先驗(yàn)性，給它起個(gè)英文名：Prior。

1.1 好吧Prior

這些錨框需要人工選定且大小、尺度符合訓(xùn)練數(shù)據(jù)的特點(diǎn)，想要Prior代表概率空間就需要它們以每個(gè)像素塊生成
和之前論文閱讀中講的一樣，低層采樣較小的scale（檢測(cè)較小的目標(biāo)），高層采用較大的scale（檢測(cè)較大的目標(biāo)）。因?yàn)閟cale采用比例表示，從特征圖還原到原始空間上尺度具有一致性

10x10特征圖上某一location對(duì)應(yīng)的6個(gè)priors（其他的沒畫太多了）

（具體的操作過(guò)程看論文或我之前寫的文章把，這里只標(biāo)識(shí)了重點(diǎn)步驟）

def create_prior_boxes(widths: list, heights: list, scales: list, aspect_ratios: list) -> torch.Tensor:
    """
    Create prior boxes on each pixel following authors methods in paper
    :param widths: widths list of all feature maps using for create priors
    :param heights: heights list of all feature maps using for create priors
    :param scales: scales list of all feature maps use for create priors.
                Note that each feature map has a specific scale
    :param aspect_ratios: widths list of all feature maps use for create priors.
                Note that each feature maps has different nums of ratios
    :return: priors' location in center coordinates , a tensor in shape of(8732, 4)
    """
    prior_boxes = []
    for i, (width, height, scale, ratios) in enumerate(zip(widths, heights, scales, aspect_ratios)):
        for y in range(height):
            for x in range(width):
                # change cxcy to the center of pixel
                # change cxcy in range 0 to 1
                cx = (x + 0.5) / width
                cy = (y + 0.5) / height
                for ratio in ratios:
                    # all those params are proportional form(percent coordinates)
                    prior_width = scale * math.sqrt(ratio)
                    prior_height = scale / math.sqrt(ratio)
                    prior_boxes.append([cx, cy, prior_width, prior_height])

                    # For the aspect ratio of 1, we also add a default box whose scale is sqrt(s(k)*(sk+1))
                    if ratio == 1:
                        try:
                            additional_scale = math.sqrt(scales[i] * scales[i + 1])
                        # except this is the last feature map, only one pixel is left
                        except IndexError:
                            additional_scale = 1

                        # ratio of 1 means scale is width and height
                        prior_boxes.append([cx, cy, additional_scale, additional_scale])

    return torch.FloatTensor(prior_boxes).clamp_(0, 1).to(device) # (8732, 4) Note that they are percent coordinates

1.2 Prior的表示形式

Prior在論文中表示為(cx, cy, w, h)：中心表示形式，而有時(shí)候?yàn)榱司幊痰姆奖氵€會(huì)采用(xmin, ymin , xmax, ymax)的邊緣表示形式，這就需要兩種表示形式的相互轉(zhuǎn)化

def xy_to_cxcy(xy: torch.Tensor) -> torch.Tensor:
    """
    把(xmin, ymin, xmax, ymax)的中心表示形式轉(zhuǎn)換為(cx, cy, w, h)的邊緣表示形式
    :param xy: 邊界框的(xmin, ymin, xmax, ymax)表示，a tensor of size (num_boxes, 4)
    :return:邊界框的(cx, cy, w, h)表示， a tensor of size (num_boxes, 4)
    """
    return torch.cat([(xy[:, 2:] + xy[:, :2] )/ 2, xy[:, 2:] - xy[:, :2]], dim=1)

def cxcy_to_xy(cxcy: torch.Tensor) -> torch.Tensor:
    """
    把(cx, cy, w, h)表示形式轉(zhuǎn)換為(xmin, ymin, xmax, ymax)
    :param cxcy: 邊界框的(cx, cy, w, h)表示，a tensor of size (n_boxes, 4)
    :return: 邊界框的(xmin, ymin, xmax, ymax)表示
    """
    return torch.cat([cxcy[:, :2] - (cxcy[:, 2:] / 2), cxcy[:, :2] + (cxcy[:, 2:] / 2)], 1)

注：在之前的論文閱讀部分也指明了通過(guò)多方面考慮應(yīng)該使用相對(duì)長(zhǎng)度(或相對(duì)坐標(biāo)，即已進(jìn)行歸一化)來(lái)表示Prior

1.3 Prior to ground truth

很顯然priors并不是真正的groud truth信息（與真實(shí)邊界存在偏差、未指定類別、且每個(gè)prior的ground truth據(jù)有不確定性，我們需要量化這些信息），我們需要將priors的信息調(diào)整為ground truth信息來(lái)計(jì)算損失（同時(shí)我么也必須理解我們預(yù)測(cè)的是什么，預(yù)測(cè)結(jié)果怎么轉(zhuǎn)換為真實(shí)預(yù)測(cè)邊界框的信息）

1.3.1 offset

偏移量表示為 $(\Delta cx,\Delta cy,\Delta w,\Delta h)$ ，論文閱讀部分指出進(jìn)行了如下編碼：

? $\hat{cx}$ = $\frac{cx-cx_{anchor}}{width_{anchor}}$ , $\hat{cy}$ = $\frac{cy-cy_{anchor}}{height_{anchor}}$ , $\hat{w}=log(\frac{w}{w_{anchor}})$ , $\hat{h}=log(\frac{h}{h_{anchor}})$ （1），

? 其中(cx,cy,w,h)是ground truth的真實(shí)位置信息， $(cx_{anchor},cy_{anchor},w_{anchor},h_{anchor})$ 是prior的真實(shí)位置信息

而在實(shí)際使用的時(shí)候常常使用基于經(jīng)驗(yàn)參數(shù)的標(biāo)準(zhǔn)化對(duì)編碼結(jié)果再次處理，即：

? $\hat{cx}$ = $\frac{\frac{cx-cx_{anchor}}{width_{anchor}}-\mu_x}{\sigma_x}$ , $\hat{cy}$ = $\frac{\frac{cy-cy_{anchor}}{height_{anchor}}-\mu_y}{\sigma_y}$ , $\hat{w}=\frac{log(\frac{w}{w_{anchor}})-\mu_w}{\sigma_w}$ , $\hat{h}=\frac{log(\frac{h}{h_{anchor}})-\mu_h}{\sigma_h}$ （2），

? 其中經(jīng)驗(yàn)參數(shù) $\mu_x=\mu_y=\mu_w=\mu_h=0,\sigma_x=\sigma_y=0.1,\sigma_w=\sigma_h=0.1$

def cxcy_to_gcxgcy(cxcy: torch.Tensor, priors_cxcy: torch.Tensor) -> torch.Tensor:
    """
    使用中心格式的輸入計(jì)算與目標(biāo)區(qū)域與priors的偏移量，該偏移量按式(2)編碼
    中心格式的目標(biāo)區(qū)域與priors是一一對(duì)應(yīng)的
    :param cxcy: 邊緣格式的邊界框, a tensor of size (n_priors, 4)
    :param priors_cxcy: prior的邊界框, a tensor of size (n_priors, 4)
    :return: encoded bounding boxes, a tensor of size (n_priors, 4)
    """
    return torch.cat([(cxcy[:, :2] - priors_cxcy[:, :2]) / (priors_cxcy[:, 2:]) * 10,  
                      torch.log(cxcy[:, 2:] / priors_cxcy[:, 2:]) * 5], 1)

我們要獲得實(shí)際預(yù)測(cè)邊界框，則需要對(duì)上述過(guò)程進(jìn)行解碼（注：預(yù)測(cè)器實(shí)際預(yù)測(cè)的結(jié)果是上面最終編碼的的offsets）

def gcxgcy_to_cxcy(gcxgcy: torch.Tensor, priors_cxcy: torch.Tensor) -> torch.Tensor:
    """
    輸入模型預(yù)測(cè)的offsets和priors(一一對(duì)應(yīng))，解碼出的預(yù)測(cè)邊界框中心格式邊界框
    :param gcxgcy:編碼后的邊界框(即offset),如模型的輸出, a tensor of size (n_priors, 4)
    :param priors_cxcy:prior的邊界框, a tensor of size (n_priors, 4)
    :return: decoded bounding boxes in center-size form, a tensor of size (n_priors, 4)
    """
    return torch.cat([gcxgcy[:, :2] / 10 * priors_cxcy[:, 2:] + priors_cxcy[:, 2],
                      torch.exp(gcxgcy[:, 2:] / 5) * priors_cxcy[:, 2:]], dim=1)

這一部分中g(shù)round truth offset只需cxcy為ground truth labels即可，但cxcy需與priors一一對(duì)應(yīng)，這種對(duì)應(yīng)關(guān)系，就是我們接下來(lái)討論的內(nèi)容

1.3.2 object class

0代表背景類，1-n_classes代表目標(biāo)類別。每個(gè)圖像中目標(biāo)個(gè)數(shù)、目標(biāo)類別均不一定相同，因此我要先給priors分配一個(gè)目標(biāo)，由該目標(biāo)的類別確定prior的類別

1.3.3 criterion

為了為priors分配類別，必須采用一種指標(biāo)，來(lái)判斷priors與真實(shí)邊界框的匹配程度

原文中采用了jaccard overlap（交并比，IoU）

IoU

下面定義了計(jì)算交并比的函數(shù)，注意輸入是邊界框的邊緣形式

def find_intersection(set_1, set_2):
    """
    Find the intersection of every box combination between two sets of boxes that are in boundary coordinates.
    :param set_1: set 1, a tensor of dimensions (n1, 4)
    :param set_2: set 2, a tensor of dimensions (n2, 4)
    :return: intersection of each of the boxes in set 1 with respect to each of the boxes in set 2, a tensor of dimensions (n1, n2)
    """

    # PyTorch auto-broadcasts singleton dimensions
    lower_bound = torch.max(set_1[:, :2].unsqueeze(1), set_2[:, :2].unsqueeze(0))  # (n1,n2,2)
    upper_bound = torch.min(set_1[:, 2:].unsqueeze(1), set_2[:, 2:].unsqueeze(0))  # (n1,n2,2)
    intersection_dims = torch.clamp(upper_bound - lower_bound, 0)  # (n1, n2, 2)
    return intersection_dims[:, :, 0] * intersection_dims[:, :, 1]  # (n1, n2)


def find_jaccard_overlap(set_1, set_2):
    """
    Find the Jaccard Overlap (IoU) of every box combination between two sets of boxes that are in boundary coordinates.
    :param set_1: set 1, a tensor of dimensions (n1, 4)
    :param set_2: set 2, a tensor of dimensions (n2, 4)
    :return: Jaccard Overlap of each of the boxes in set 1 with respect to each of the boxes in set 2, a tensor of dimensions (n1, n2)
    """
    # Find intersections
    intersection = find_intersection(set_1, set_2)

    # Find areas of each box in both sets
    areas_set_1 = (set_1[:, 2] - set_1[:, 0]) * (set_1[:, 3] - set_1[:, 1])  # (n1)
    areas_set_2 = (set_2[:, 2] - set_2[:, 0]) * (set_2[:, 3] - set_2[:, 1])  # (n2)

    # Find the union
    # PyTorch auto-broadcasts singleton dimensions
    union = areas_set_1.unsqueeze(1) + areas_set_2.unsqueeze(0) - intersection  # (n1, n2)
    return intersection / union  # (n1, n2)

假設(shè)set_1是priors(8732, 4)，set_2是真實(shí)邊界框(n_object_per_image, 4)，我們最終的到(8732, n_object_per_image)的tensor，即在該圖像內(nèi)每個(gè)prior與每個(gè)object box的交并比

1.3.4 priors to ground truth

def label_prior(priors_cxcy, boxes, classes):
    """
    Assign ground truth label for prior. Note that we do this for each image in a batch
    priors are fixed pretrain, boxes and classes are from dataloader.
    :param priors_cxcy: priors which we create in shape of (8732, 4),note that they are center center coordinates and percent coordinates
    :param boxes: boxes is a tensor of true objects' bounding boxes in the image. Note that they are percent coordinates
    :param classes: classes is a tensor of true objects' class labels in the image
    :return:
    """

    n_objects = boxes.size(0)
    # cxcy to xy
    priors_cxcy = priors_cxcy
    priors_cxcy = cxcy_to_xy(priors_cxcy)
    overlaps = find_jaccard_overlap(boxes, priors_cxcy)

    # 為每個(gè)prior找出最大的overlap并以此為標(biāo)準(zhǔn)分配目標(biāo)(注意不是類別)
    overlap_per_prior, object_per_prior = overlaps.max(dim=0)  # (8732)

    # 直接為按交并比大小分配類別會(huì)產(chǎn)生如下的問(wèn)題
    # 1. 如果一個(gè)檢測(cè)目標(biāo)對(duì)與所有priors的交并比都不是最大的，該目標(biāo)的類別則不能分配給任意一個(gè)prior
    # 2. 給定閾值(0.5)將交并比較小的prior分配給背景類(class 0)

    # 解決第一個(gè)問(wèn)題：
    _, prior_per_object = overlaps.max(dim=1)  # (nums of object)每個(gè)值為該目標(biāo)對(duì)應(yīng)的index in (0, 8731)

    object_per_prior[prior_per_object] = torch.LongTensor(range(n_objects)).to(device)  # 為與每個(gè)目標(biāo)overlap最大prior的分配為該目標(biāo)
    overlap_per_prior[prior_per_object] = 1

    # 解決第二個(gè)問(wèn)題：
    class_per_prior = classes[object_per_prior]  # 根據(jù)object的索引獲得對(duì)應(yīng)其真實(shí)的類別標(biāo)簽
    class_per_prior[overlap_per_prior < 0.5] = 0  # (8732)

    # 為每個(gè)prior計(jì)算與之前所分配objcet邊界框的offset
    offset_per_prior = cxcy_to_gcxgcy(boxes[object_per_prior], priors_cxcy)  # (8732, 4)

    return class_per_prior, offset_per_prior

不難注意到，每個(gè)prior對(duì)應(yīng)了一個(gè)ground truth，它們用來(lái)檢測(cè)不同尺度、不同位置的目標(biāo)

label_prior()是針對(duì)batch里的一個(gè)圖像與之對(duì)應(yīng)的目標(biāo)邊界框和目標(biāo)類別（xml文件標(biāo)注的，from dataloard）,只需在batches里寫個(gè)for循環(huán)即可，就得到了針對(duì)該圖片的priors to ground truth，用于Loss計(jì)算（見5.1）.

2. 網(wǎng)絡(luò)結(jié)構(gòu)

SSD模型的網(wǎng)絡(luò)結(jié)構(gòu)將VGG-16從FC之前截?cái)嘧鳛閎ase net，將base net細(xì)節(jié)結(jié)構(gòu)進(jìn)行更改并加上Conv6和Conv7，在base net之后加上了額外的卷積層結(jié)構(gòu)

（注：為代碼的可讀性網(wǎng)絡(luò)，SSD的網(wǎng)絡(luò)被拆分BaseNet和AuxiliaryConvolutions）

vgg-16

作者提供的細(xì)節(jié)更改+附加結(jié)構(gòu)

完整的VGG-16模型由于全連接層的存在，需要輸入的大小為( 3, 224, 224)，作者將網(wǎng)絡(luò)魔改一下用來(lái)接收300x300的輸入(SSD300 model)

2.0 Conv4_3：

按vgg-16向前傳播的時(shí)候，Conv_4中300 x 300的原始圖像會(huì)被下采樣到37 x 37，而這里指出的大小為38 x 38。vgg-16網(wǎng)絡(luò)中，能夠下采樣的只有池化層，所以這里變化是由maxpool3的修改而導(dǎo)致的，將其中計(jì)算輸出尺寸的函數(shù)由向下取整(floor)改為向上取整(ceiling)

self.pool3=nn.MaxPool2d(kernel_size=2, 2, ceil_mode=True)

2.1 Maxpool5

不在使用原來(lái)vgg-16中同一結(jié)構(gòu)，而改用size=(3,3)，stride=1，padding=1的maxpool

self.pool5 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)

2.2 Conv6與Conv7：希望我能表述的足夠清楚??

fc6-fc7：圖像為(512, 7, 7).flatten() $\Rightarrow$ (fc6) $\Rightarrow$ 4096 $\Rightarrow$ (fc7) $\Rightarrow$ 1000，作者希望直接利用fc6和fc7的weights生成Conv6和Conv7的卷積核

2.2.1我們先來(lái)理清一下卷積層與全連接層的相互轉(zhuǎn)化問(wèn)題

卷積層->全連接層：

Conv to FC

由上圖很容易的出轉(zhuǎn)換fc層的權(quán)重是取自卷積核權(quán)重的稀疏矩陣。又特征圖每個(gè)輸出通道上的像素由輸入空間所有in_channel在相同位置的卷積值相加得到(i.e.紅框陰影由多層藍(lán)陰影框(假設(shè)有多層-_-)分別與多個(gè)卷積核卷積得到的多層結(jié)果相加得到)，所以out_channel控制特征圖的個(gè)數(shù)，in_channel和out_channels控制fc權(quán)重的長(zhǎng)和寬

全連接層->卷積層：考慮input像素（512，7，7）.flatten() -> 4096個(gè)，此時(shí)fc權(quán)重為（512*7*7，4096）

假設(shè)卷積核大小與圖像大小一致，為（4096，512，7，7），按照卷積的運(yùn)算過(guò)程，得到的結(jié)果是（某一輸出通道內(nèi)）每個(gè)通道的每個(gè)像素與對(duì)應(yīng)的卷積核權(quán)重相乘之后相加，與全連接的計(jì)算結(jié)果完全一致，此時(shí)通道維是原來(lái)的特征維
所以conv6的卷積核應(yīng)為（4096，512，7，7），conv7的卷積核應(yīng)為（4096，4096，1，1）

However，這樣還不行??，這些過(guò)濾器數(shù)量眾多、體積龐大，而且計(jì)算成本很高，所以作者對(duì)卷積核進(jìn)行了下采樣

2.2.2 卷積核下采樣

其實(shí)這個(gè)過(guò)程非常的簡(jiǎn)單，就是把卷積核的參數(shù)（out_channels, height, width這三個(gè)dim）給下采樣了.......

from collections import Iterable
def decimate(tensor: torch.Tensor, m: Iterable) -> torch.Tensor:
    """
    對(duì)tensor的一些維度進(jìn)行下采樣，每一維度的下采樣間隔列表為m
    :param tensor: 要被下采樣的tensor
    :param m: 每一維度的下采樣間隔參數(shù)列表，如果某一維度不進(jìn)行下采樣，參數(shù)為None
    :return: 下采樣后的tensor
    """
    assert tensor.dim() == len(m)
    for d in range(tensor.dim()):
        if m[d] is not None:
            tensor = tensor.index_select(dim=d, index=torch.arange(start=0, end=tensor.size(d), step=m[d]))
    return tensor

作者將 height和width dim的采樣率都設(shè)為3（每三取一），out_channels采樣率為4采樣出了 $\frac{1}{4}$ 的原始卷積核

終于我們得到了Conv6核Conv7的卷積核分別為（1024，512，3，3），（1024, 1024，1, 1）

2.2.2Atrous卷積

Atrous卷積（空洞卷積, also known as Dilated Convolution or Convolution with holes......)實(shí)際針對(duì)的是相鄰的像素（因?yàn)橄噜徬袼匾话阍谛畔⑸嫌休^大冗余）。為了在不進(jìn)行pooling下采樣的情況下能夠獲得更大的感受野，我們便可以在卷積的輸入空間內(nèi)加入空洞（因?yàn)閜ooling意味著圖片信息的損失。Atrous卷積實(shí)際并沒有圖片信息的損失，只不過(guò)特征圖同一像素不提取輸入空間相鄰像素的信息，而在其他特征圖像素中，之前被“跳過(guò)”的相鄰像也確實(shí)和卷積核進(jìn)行了運(yùn)算......不多說(shuō)了，看圖更清楚 $\downarrow$ ??）

該圖片來(lái)自：vdumoulin/conv_arithmetic （可能大家對(duì)這一系列的圖都很熟悉，陰影部分是卷積運(yùn)算的區(qū)域??）

DILATED CONVOLUTIONS with kernel size 3x3, dilation=2

不難發(fā)現(xiàn)，確實(shí)每個(gè)輸入空間的像素都被用到（沒有像pooling那樣丟棄）并且還擴(kuò)大了感受野

2.2.3Atrous算法與卷積核的下采樣

原文中，conv6的輸出大小仍是19x19，且使用了atrous卷積。

按之前講述的內(nèi)容卷積核被下采樣后，特征圖原本應(yīng)該與7x7卷積核運(yùn)算，但下采樣使部分核有所缺失（holes are in the kernel），所以合適的方法應(yīng)該讓卷積時(shí)跳過(guò)3個(gè)像素。然而作者的倉(cāng)庫(kù)中實(shí)際上使用了dilation=6，這樣的操作可能是考慮了修改之后maxpool5沒有使輸出大小縮小一半，所以dilation需要增加一倍

self.conv6 = nn.Conv2d(512, 1024, kernel_size=3, padding=6, dilation=6)  # atrous convolution
self.conv7 = nn.Conv2d(1024, 1024, kernel_size=1)

接下來(lái)使用原全連接層的weight和bias更新base_net：

# this part can be defined in class BaseNet as a function for init.
# get state_dict which only contains params
state_dict = base_net.state_dict()  # base net is instance of BaseNet
pretrained_state_dict = torchvision.models.vgg16(pretrained=True).state_dict()

# fc6
conv_fc_weight = pretrained_state_dict['classifier.0.weight'].view(4096, 512, 7, 7)  # (4096, 512, 7, 7)
conv_fc_bias = pretrained_state_dict['classifier.0.bias']  # (4096)
state_dict['conv6.weight'] = decimate(conv_fc_weight, m=[4, None, 3, 3])# (1024, 512, 3, 3)
# fc7：在預(yù)訓(xùn)練模型中，fc7的名字就是classifier.3
conv_fc7_weight = pretrained_state_dict['classifier.3.weight'].view(4096, 4096, 1, 1)  # (4096, 4096, 1, 1)
conv_fc7_bias = pretrained_state_dict['classifier.3.bias']  # (4096)
state_dict['conv7.weight'] = decimate(conv_fc7_weight, m=[4, 4, None, None])  # (1024, 1024, 1, 1)
state_dict['conv7.bias'] = decimate(conv_fc7_bias, m=[4])  # (1024)

base_net.load_state_dict(state_dict)

......這個(gè)令人頭疼的部分終于結(jié)束了??

2.3 其余的附加卷積層：

都是作者附加的用來(lái)提取大尺度特征的，挺好理解，1x1卷積層有妙用（類似于提取特征圖進(jìn)一步提取特征？）??

class AuxiliaryConvolutions(nn.Module):
    """
    Additional convolutions to produce higher-level feature maps.
    """

    def __init__(self):
        super(AuxiliaryConvolutions, self).__init__()

        # Auxiliary convolutions on top of the VGG base
        self.conv8_1 = nn.Conv2d(1024, 256, kernel_size=1, padding=0)  
        self.conv8_2 = nn.Conv2d(256, 512, kernel_size=3, stride=2, padding=1) 

        self.conv9_1 = nn.Conv2d(512, 128, kernel_size=1, padding=0)
        self.conv9_2 = nn.Conv2d(128, 256, kernel_size=3, stride=2, padding=1)  
        
        self.conv10_1 = nn.Conv2d(256, 128, kernel_size=1, padding=0)
        self.conv10_2 = nn.Conv2d(128, 256, kernel_size=3, padding=0)  

        self.conv11_1 = nn.Conv2d(256, 128, kernel_size=1, padding=0)
        self.conv11_2 = nn.Conv2d(128, 256, kernel_size=3, padding=0)  
        
        # Initialize convolutions' parameters
        for c in self.children():
            if isinstance(c, nn.Conv2d):
                nn.init.xavier_normal_(c.weight)
                nn.init.constant_(c.bias, 0.)

2.4 multi-level feature maps：

從圖中可以看出，用來(lái)提取多尺度特征的特征圖選擇為conv4_3, conv7, conv8_2, conv9_2, conv10_2, and conv11_2（有低層特征圖，也有高層特征圖），在forward內(nèi)把這些特征圖返回出來(lái)即可

BaseNet：forward return conv4_3_features, conv7_features

AuxiliaryConvolutions: foward return conv8_2_features, conv9_2_features, conv10_2_features, conv11_2_features

2.5 predictor

多層特征圖傳入各自的預(yù)測(cè)其，分別預(yù)測(cè)offset和class，各層的預(yù)測(cè)器具有較類似的結(jié)構(gòu)：kernel_size=3, padding=1

注意offset的預(yù)測(cè)結(jié)果是基于該層特征圖上priors的編碼結(jié)果（見1.3），class需要為各個(gè)類別評(píng)分

def loc_predictor(in_channels, num_priors):
    """
    邊界框預(yù)測(cè)層,為每個(gè)輸入空間每個(gè)像素上的priors預(yù)測(cè)4個(gè)偏移量
    :param in_channels: 輸入空間通道數(shù)
    :param num_priors:每個(gè)單元為中心生成 num_priors 個(gè)prior
    :return:預(yù)測(cè)offset的卷積層
    """
    return nn.Conv2d(in_channels, num_priors * 4, kernel_size=3, padding=1)


def cls_predictor(in_channels, num_priors, num_classes):
    """
    類別預(yù)測(cè)層,為每個(gè)輸入空間像素上的priors預(yù)測(cè)各個(gè)類別的評(píng)分
    類別預(yù)測(cè)層使用一個(gè)保持輸入高和寬的卷積層。此時(shí)，輸出和輸入在特征圖寬和高上的空間坐標(biāo)一一對(duì)應(yīng)
    :param in_channels: 輸入空間通道數(shù)
    :param num_priors: 每個(gè)單元為中心生成 num_priors 個(gè)prior
    :param num_classes: 目標(biāo)的類別個(gè)數(shù)為 num_classes
    :return:類別預(yù)測(cè)的卷積層
    """
    return nn.Conv2d(in_channels, num_priors * num_classes, kernel_size=3, padding=1)

priors是在特征圖每個(gè)像素上生成的，預(yù)測(cè)器的預(yù)測(cè)結(jié)果的w,h與輸入空間一致，所以每個(gè)預(yù)測(cè)空間像素與輸入空間像素對(duì)應(yīng)，很自然offset是針對(duì)對(duì)應(yīng)prior的編碼后offset，此時(shí)out_channels轉(zhuǎn)換為了特征維，為了應(yīng)對(duì)不同輸入空間大小不同導(dǎo)致w,h和num_priors的不同，我們需要在把所有輸出結(jié)果concatenate前，需要把其空間維flatten一下。class預(yù)測(cè)與offset預(yù)測(cè)的思路基本一致只是最后的特征維（輸出通道）不同

為了訓(xùn)練還需要把選取提取特征的特征圖元素個(gè)數(shù)湊得和priors的個(gè)數(shù)一致（一一對(duì)應(yīng)關(guān)系）

最后把所有特征圖的預(yù)測(cè)結(jié)果連接起來(lái)

class PredictionConvolution(nn.Module):
    """
    Convolutions to predict class scores and bounding boxes
    """

    def __init__(self, n_classes):
        """
        :param n_class: number of different types of objects
        """
        self.n_classes = n_classes
        super(PredictionConvolution, self).__init__()
        # Number of priors, as we showing before ,at per position in each feature map
        n_boxes = {'conv4_3': 4,
                   'conv7': 6,
                   'conv8_2': 6,
                   'conv9_2': 6,
                   'conv10_2': 4,
                   'conv11_2': 4}
        self.convs = ['conv4_3', 'conv7', 'conv8_2', 'conv9_2', 'conv10_2', 'conv11_2']
        for name, ic in zip(self.convs, [512, 1024, 512, 256, 256, 256]):
            setattr(self, 'cls_%s' % name, cls_predictor(ic, n_boxes[name], n_classes))
            setattr(self, 'loc_%s' % name, loc_predictor(ic, n_boxes[name]))      

        # Initialize convolutions' parameters
        for c in self.children():
            if isinstance(c, nn.Conv2d):
                nn.init.xavier_normal_(c.weight)
                nn.init.constant_(c.bias, 0.)

    def _apply(self, x: torch.Tensor, conv: nn.Conv2d, num_features: int):
        """
        Apply forward calculation for each conv2d with respect to specific feature map
        :param x: input tensor
        :param conv: conv
        :param num_features: output feature, for loc_pred is 4, for label_pred is num_classes+1
        :return: locations and class scores
        """
        x = conv(x).permute(0, 2, 3, 1).contiguous()
        return x.view(x.size(0), -1, num_features)

    def forward(self, *args):
        # args are feature maps needed for prediction
        assert len(args) == len(self.convs)
        locs = []
        classes_scores = []

        for name, x in zip(self.convs, args):
            classes_scores.append(self._apply(x, getattr(self, 'cls_%s' %name), self.n_classes))
            locs.append(self._apply(x, getattr(self, 'loc_%s' % name), 4))

        locs = torch.cat(locs, dim=1)  # (N, 8732, 4)
        classes_scores = torch.cat(classes_scores, dim=1)  # (N, 8732, n_classes)

        return locs, classes_scores

2.6 SSD300

把BaseNet，AuxiliaryConvolutions和PredictionConvolution整合在一起得到SSD300模型

3. 訓(xùn)練數(shù)據(jù)處理

數(shù)據(jù)增廣時(shí)除了圖像本身的處理外還涉及對(duì)真實(shí)邊界框的處理，所以我們不能直接使用torchvision.transform里封裝好的類，我們只能手動(dòng)寫了??

作者使用的data augmentation

針對(duì)文中所說(shuō)的0.5的概率進(jìn)行圖像增廣，只需通過(guò)判斷random.random()是否小于0.5來(lái)進(jìn)行圖像增廣即可

3.1 隨機(jī)裁剪

原文中的數(shù)據(jù)增廣主要就是這個(gè)隨機(jī)裁剪了

def random_crop(image: torch.Tensor, boxes: torch.Tensor, labels: torch.Tensor):
    """
    隨機(jī)裁剪，能夠幫助網(wǎng)絡(luò)學(xué)習(xí)更大尺度的目標(biāo)，但某些目標(biāo)可能被完全剪切掉
    :param image: 圖像, a tensor of dimensions (3, original_h, original_w)
    :param boxes: 邊緣形式的真實(shí)邊界框, a tensor of dimensions (n_objects, 4)
    :param labels: 真實(shí)目標(biāo)類別, a tensor of dimensions (n_objects)
    :return: 隨機(jī)裁剪后圖像，邊界框，目標(biāo)類別
    """
    original_width = image.size(2)
    original_height = image.size(1)

    while True:
        # 'None' 意味著不剪裁,0意味著隨即裁剪，[.1, .3, .5, .7, .9]是作者文中描述的最小交并比
        min_overlap = random.choice([0., .1, .3, .5, .7, .9, None])
        if min_overlap is None:
            return image, boxes, labels

        # 對(duì)選取的最小交并比嘗試50次（原文中未提及，但作者倉(cāng)庫(kù)中使用），若均不滿足條件，則進(jìn)行下一循環(huán)選擇新的最小交并比
        for _ in range(50):
            min_scale = 0.3
            # 論文中提及采樣比例是[.1, 1]，但作者倉(cāng)庫(kù)使用[.3, 1]
            # random.uniform(a,b)->[a,b]閉區(qū)間
            new_width = int(original_width * random.uniform(min_scale, 1))
            new_height = int(original_height * random.uniform(min_scale, 1))

            # 論文重提及采樣后aspect ratio應(yīng)該在[0.5,2]
            if not .5 <= new_height / new_width <= 2:
                continue

            # 獲取裁剪的位置
            # random.randint(a,b)->[a,b]閉區(qū)間
            left = random.randint(0, original_width - new_width)
            top = random.randint(0, original_height - new_height)
            right = left + new_width
            bottom = top + new_height

            crop_bounding = torch.FloatTensor([left, top, right, bottom])

            # 計(jì)算剪裁后的圖片與真實(shí)邊界框交并比
            over_lap = find_jaccard_overlap(crop_bounding.unsqueeze(0), boxes).squeeze(0)  # (n_objects)

            # 論文中提及，與所有目標(biāo)的交并比應(yīng)該> min_overlap
            if over_lap.max().item() < min_overlap:
                continue

            cropped_image = image[:, top:bottom, left:right]

            # 判斷object是否在圖像中的判據(jù)：true bounding box的中心是否在裁剪后的圖像中
            box_centers = (boxes[:, :2] + boxes[:, 2:]) / 2.  # (n_objects, 2)
            center_in_cropped_iamge = (box_centers[:, 0] > left) * (box_centers[:, 0] < right) * ( box_centers[:, 1] > top) * (box_centers[:, 0] < bottom)  # (n_objects)

            # 如果沒有一個(gè)目標(biāo)的中心在裁剪后的圖像中
            if center_in_cropped_iamge.any():
                continue

            # 丟棄沒有通過(guò)判據(jù)的目標(biāo)
            new_boxes = boxes[center_in_cropped_iamge]
            new_labels = labels[center_in_cropped_iamge]

            # 計(jì)算剪切后圖像中邊界框的位置
            # 篩選出真實(shí)左邊界、上邊界和裁剪左邊界、上邊界之中小的那個(gè)
            new_boxes[:, :2] = torch.max(new_boxes[:, :2], crop_bounding[:2])
            new_boxes[:, :2] -= crop_bounding[:2]
            # 篩選出真實(shí)右邊界、下邊界和裁剪右邊界、下邊界之中大的那個(gè)
            new_boxes[:, 2:] = torch.min(new_boxes[:, 2:], crop_bounding[2:])
            new_boxes[:, 2:] -= crop_bounding[:2]

            return cropped_image, new_boxes, new_labels

3.2 水平翻轉(zhuǎn)

這個(gè)很簡(jiǎn)單，就是真實(shí)邊界框不是圖像還需要額外處理

def flip(image, boxes):
    """
    Flip image horizontally.
    :param image: 一個(gè)PIL圖像，因?yàn)檎{(diào)用了torchvision的函數(shù)，必須使用PIL Image
    :param boxes: 邊緣形式的真實(shí)邊界框, a tensor of dimensions (n_objects, 4)
    :return: 水平翻轉(zhuǎn)圖像, 更新后的邊界框
    """

    # Flip image
    new_image = torchvision.transforms.functional.hflip(image)

    # Flip boxes
    new_boxes = boxes
    new_boxes[:, 0] = image.width - (boxes[:, 0] + 1)
    new_boxes[:, 2] = image.width - (boxes[:, 2] + 1)
    new_boxes = new_boxes[:, [2, 1, 0, 3]]

    return new_image, new_boxes

3.3 Resize

SSD300模型需要將訓(xùn)練集resize到300 x 300，此外在這里把真實(shí)邊界框處理成比例 $(\in[0, 1] )$ 的形式

def resize(image, boxes, size=(300, 300), return_percent_coords=True):
    """
    Resize image. For the SSD300, resize to (300, 300).

    Since percent/fractional coordinates are calculated for the bounding boxes (w.r.t image dimensions) in this process,
    you may choose to retain them.
    :param image: image, a PIL Image
    :param boxes: bounding boxes in boundary coordinates, a tensor of dimensions (n_objects, 4)
    :param size: resize to specific size
    :param return_percent_coords: whether to return new bounding box coordinates in form of percent coordinates
    :return: resized image, updated bounding box coordinates (or fractional coordinates, in which case they remain the same)
    """
    # Resize image
    new_image = transforms.functional.resize(image, size)

    # Resize bounding boxes
    old_size = torch.FloatTensor([image.width, image.height, image.width, image.height]).unsqueeze(0)
    # resize means percent coordinates will not change for only augment or shrink
    new_boxes = boxes / old_size  # percent coordinates means same even if different size 

    if not return_percent_coords:
        new_size = torch.FloatTensor([size[0], size[1], size[0], size[1]]).unsqueeze(0)
        new_boxes = new_boxes * new_size

    return new_image, new_boxes

3.5 Expand

由于模型對(duì)于較小尺度目標(biāo)的檢測(cè)性能不好，在此我們將訓(xùn)練數(shù)據(jù)放大，以增強(qiáng)對(duì)小尺度目標(biāo)的檢測(cè)能力

整體的步驟與resize十分類似，只不過(guò)需要將新圖片放大，將原圖片放在新圖片內(nèi)部，再將其他空白部分填充一下

這個(gè)填充的值推薦使用三個(gè)channels各自的平均值（可以在3.6中看到）

由于新圖片范圍比原圖片大，真實(shí)邊界框只需加上[ 向左的移動(dòng)，向下的移動(dòng)，向左的移動(dòng)，向下的移動(dòng) ]

3.6 標(biāo)準(zhǔn)化

輸入數(shù)據(jù)先被歸一化到[0, 1]，預(yù)訓(xùn)練的模型會(huì)還需對(duì)歸一化輸入進(jìn)行標(biāo)準(zhǔn)化，這個(gè)頁(yè)面展示了torchvision.model預(yù)訓(xùn)練模型的具體處理

mean = [0.485, 0.456, 0.406] # RGB channels
std = [0.229, 0.224, 0.225]  # RGB channels

4. Dataset and DataLoader

Dataset需要手動(dòng)創(chuàng)建torch.utils.data.Dataset的子類，在里面對(duì)圖片、真實(shí)邊界框、目標(biāo)標(biāo)記進(jìn)行第3節(jié)的處理即可

Dataset返回圖片、真實(shí)邊界框、目標(biāo)標(biāo)記

然而在使用DataLoader讀取batches的時(shí)候會(huì)出現(xiàn)問(wèn)題：

注意每個(gè)圖片內(nèi)objects的個(gè)數(shù)不同，這會(huì)導(dǎo)致每個(gè)圖片內(nèi)boxes和labels的長(zhǎng)度不同，這樣沒辦法組成batches

所以我們要為DataLoader的collate_fn=參數(shù)指定一個(gè)函數(shù)（注意只需傳入函數(shù)名），按此函數(shù)整理輸出

def collate_fn(batch):
    """
      
    This describes how to combine these tensors of different sizes. We use lists.

    :param batch: an iterable of N sets from __getitem__()
    :return: a tensor of images, lists of varying-size tensors of bounding boxes, labels, and difficulties
    """

    images = list()
    boxes = list()
    labels = list()

    for b in batch:
        images.append(b[0])
        boxes.append(b[1])
        labels.append(b[2])

        images = torch.stack(images, dim=0)

        return images, boxes, labels, difficulties  # tensor (N, 3, 300, 300), 3 lists of N tensors each

5.訓(xùn)練

5.1 Loss Function

location_loss=torch.nn.L1Loss()
confidence_loss=nn.CrossEntropyLoss(reduction='none')

5.2 Hard negative mining

由于訓(xùn)練數(shù)據(jù)中的負(fù)類（背景類）遠(yuǎn)遠(yuǎn)多于正類，導(dǎo)致訓(xùn)練數(shù)據(jù)正負(fù)類嚴(yán)重的不平衡，所以這里要使用Hard negative mining，選擇Loss最大的負(fù)類，使正負(fù)類之比為1：3

def calculate_loss(priors_cxcy, pred_locs, pred_scores, boxes, labels, loc_loss, conf_loss, alpha=1):
    """
    使用Hard Negative mining 計(jì)算損失
    :param priors_cxcy: 中心形式的priors
    :param pred_locs: 預(yù)測(cè)的offsets, 一個(gè)batch的預(yù)測(cè)結(jié)果
    :param pred_scores: 類別預(yù)測(cè)分?jǐn)?shù), 一個(gè)batch的預(yù)測(cè)結(jié)果
    :param boxes: 真實(shí)邊界框，from a batch of dataloader
    :param labels: 真實(shí)類別標(biāo)記，from a batch of dataloader
    :param loc_loss: nn.L1Loss()
    :param conf_loss: nn.CrossEntropyLoss(reduction='none')
    :param alpha: 論文中位置損失的權(quán)重，默認(rèn)為1
    :return: 
    """
    n_priors = priors_cxcy.size(0)
    batch_size = pred_locs.size(0)
    n_classes = pred_scores.size(2)

    assert n_priors == pred_scores.size(1) == pred_scores.size(1)
    true_locs = torch.zeros((batch_size, n_priors, 4), dtype=torch.float).to(device)  # (N, 8732, 4)
    true_classes = torch.zeros((batch_size, n_priors), dtype=torch.long).to(device)  # (N, 8732)

    # 在不同圖片里，為每個(gè)prior分配真實(shí)標(biāo)簽
    for i in range(batch_size):
        cls, loc = label_prior(priors_cxcy, boxes[i], labels[i])
        true_locs[i] = loc
        true_classes[i] = cls

    positive_priors = (true_classes != 0)  # (N, 8732)

    # 計(jì)算位置損失：位置損失只計(jì)算正類（非背景類）
    loss_of_loc = loc_loss(pred_locs[positive_priors], true_locs[positive_priors])

    # 計(jì)算信度損失

    # 按論文中負(fù)類：正類 = 3：1選取負(fù)類
    n_hard_negative = 3 * positive_priors.sum(dim=1)  # (N)

    # 首先計(jì)算所由正類和負(fù)類的信度損失，這樣可以免得計(jì)算不同圖片導(dǎo)致的位置關(guān)系
    # CrossEntropyLoss(reduction='none')使得損失在第0維度上羅列開來(lái)而不是相加或取平均

    loss_of_conf_all = conf_loss(pred_scores.view(-1, n_classes), labels.view(-1))  # (N * 8732)
    loss_of_conf_all = loss_of_conf_all.view(batch_size, n_priors)  # (N, 8732)

    # 我們已經(jīng)知道了所有正類的損失
    loss_of_conf_pos = loss_of_conf_all[positive_priors]  # (sum(n_positives))

    loss_of_conf_neg = loss_of_conf_all.clone()  # (N, 8732)
    loss_of_conf_neg[positive_priors] = 0  # (N, 8732), 使正類的loss永遠(yuǎn)不能在前n_hard_negatives
    loss_of_conf_neg, _ = loss_of_conf_neg.sort(dim=1, descending=True)  # 負(fù)類將損失按降序排序
    neg_ranks = torch.LongTensor(range(n_priors)).unsqueeze(0).expand_as(loss_of_conf_neg)  # (N, 8732), 為每行元素標(biāo)序號(hào)
    hard_negatives = (neg_ranks < n_hard_negative.unsqueeze(1))  # (N, 8732)
    loss_of_conf_hard_neg = loss_of_conf_neg[hard_negatives]  # (sum(n_hard_negatives)

    # As in the paper, averaged over positive priors only, although computed over both positive and hard-negative priors
    loss_of_conf = (loss_of_conf_pos.sum() + loss_of_conf_hard_neg.sum()) / positive_priors.sum().float()  # (), scalar

    # TOTAL LOSS

    return loss_of_conf + alpha * loss_of_loc

6. 目標(biāo)檢測(cè)

6.1 非極大值抑制

在最后進(jìn)行目標(biāo)檢測(cè)的時(shí)候，我們不希望輸出過(guò)多的預(yù)測(cè)邊界框（此時(shí)的邊界框存在大量的重疊），這時(shí)候我們需要進(jìn)行非極大值抑制，把認(rèn)為是重疊的邊界框（不同預(yù)測(cè)邊界框之間的交并比大于給定閾值認(rèn)為是重疊）去除，只保留信度最大的邊界框

def none_max_suppress(priors_cxcy, pred_locs, pred_scores, min_score, max_overlap, top_k):
    """
    執(zhí)行非極大值預(yù)測(cè)
    :param priors_cxcy: 中心格式的priors
    :param pred_locs: 預(yù)測(cè)的offsets，預(yù)測(cè)器的輸出
    :param pred_scores: 預(yù)測(cè)的得分，預(yù)測(cè)器的輸出
    :param min_score: 設(shè)置接收的最小得分
    :param max_overlap: 設(shè)置抑制的最大交并比
    :param top_k: 保留至多top_k個(gè)預(yù)測(cè)目標(biāo)
    :return: 壓縮后邊緣形式的邊界框、類別、得分
    """
    batch_size = priors.size(0)
    n_priors = priors.size(0)
    n_classes = pred_scores.size(2)

    pred_scores = torch.softmax(pred_scores, dim=2)  # (batch_size, n_priors, n_classes)

    assert n_priors == pred_scores.size(1) == pred_locs.size(1)

    boxes_all_image = []
    scores_all_image = []
    labels_all_image = []

    for i in range(batch_size):
        # 將預(yù)測(cè)的offset解碼為邊緣形式的邊界框
        boxes = cxcy_to_xy(gcxgcy_to_cxcy(pred_locs[i], priors_cxcy))  # (n_priors, 4)

        boxes_per_image = []
        scores_per_image = []
        labels_per_image = []

        for c in range(1, n_classes):
            class_scores = pred_scores[i, :, c]  # (8732)
            score_above_min = class_scores > min_score
            n_score_above_min = score_above_min.sum().item()

            if n_score_above_min == 0:
                continue

            # 僅保留score>min_score的預(yù)測(cè)
            class_scores = class_scores[score_above_min]
            class_boxes = boxes[score_above_min]

            # 按檢測(cè)信度排序
            class_scores, sorted_ind = class_scores.sort(dim=0, descending=True)  # (n_score_above_min)
            class_boxes = class_boxes[sorted_ind]  # (n_score_above_min, 4)

            # 按交并比進(jìn)行非極大值壓縮
            overlap = find_jaccard_overlap(class_boxes, class_boxes)  # (n_score_above_min, n_score_above_min)

            # 創(chuàng)建記錄是否被壓縮的掩碼，1代表壓縮
            suppress = torch.zeros((n_score_above_min), dtype=torch.uint8).to(device)

            for b_id in range(n_score_above_min):
                # 若已被掩碼記錄為壓縮，則跳過(guò)
                if suppress[b_id] == 1:
                    continue
                # 按預(yù)測(cè)邊框間的交并比是否>max_overlap更新mask,并保持原來(lái)被壓縮的邊界框不變
                suppress = torch.max(suppress, (overlap[box] > max_overlap).byte())
                # 不壓縮當(dāng)前邊界框
                suppress[b_id] = 0

            # 僅為每個(gè)類存儲(chǔ)未被壓縮的預(yù)測(cè)
            boxes_per_image.append(class_boxes[(1 - suppress).bool()])
            scores_per_image.append(class_scores[(1 - suppress).bool()])
            labels_per_image.append(torch.LongTensor([c] * (1 - suppress).sum().item()))

        # 如果該圖片中沒有包含任何類別, 則把整個(gè)圖片標(biāo)注為背景類
        if len(labels_per_image) == 0:
            boxes_per_image.append(torch.FloatTensor([0, 0, 1, 1]).to(device))
            labels_per_image.append(torch.LongTensor([0]).to(device))
            scores_per_image.append(torch.FloatTensor([0]).to(device))

        boxes_per_image = torch.cat(boxes_per_image, dim=0)  # (n_objects, 4)
        scores_per_image = torch.cat(scores_per_image, dim=0)  # (n_objects)
        labels_per_image = torch.cat(labels_per_image, dim=0)  # (n_objects)
        n_object = boxes_per_image.size(0)

        # 只保留按信度排序前K個(gè)目標(biāo)
        if n_object > top_k:
            scores_per_image, sorted_ind = scores_per_image.sort(dim=0, descending=True)
            scores_per_image = scores_per_image[:top_k]
            boxes_per_image = boxes_per_image[sorted_ind][:top_k]
            labels_per_image = labels_per_image[sorted_ind][:top_k]

        boxes_all_image.append(boxes_per_image)
        scores_all_image.append(scores_per_image)
        labels_all_image.append(labels_per_image)

    return boxes_all_image, labels_all_image, scores_all_image  #  長(zhǎng)度為batch_size的列表

額外部分：一些注意點(diǎn)

我們將各層特征圖的輸出連接成一個(gè)tensor，此時(shí)conv4_3 feature maps處于較低層，其features數(shù)值比之高層的大很多（下采樣會(huì)使特征響應(yīng)的數(shù)值減?。?/em>，因此我們可以選擇對(duì)feature maps進(jìn)行歸一化（如L2 normalization）后，再放大其特征響應(yīng)（該factor由網(wǎng)絡(luò)自己學(xué)習(xí)）。我認(rèn)為Batch Normalization同樣也適用。

使用dtype=torch.bool或torch.uint8(至少1.3.0之后就廢除了uint8的索引操作了)為多維tensor進(jìn)行索引操作，得到的索引結(jié)果是flatten的（注：此 bool tensor的位置與原 tensor一一時(shí)，若不是則會(huì)保留dim（即使還維剩余1個(gè)數(shù)組），切片則會(huì)把僅剩一個(gè)數(shù)組的維度給壓縮了），如

x = torch.rand((2, 3, 4)) # 假設(shè)有一半的數(shù)據(jù)>0.5 y = x > 0.5 # y in shape of (2, 3, 4)，一半是True，一半是False print(x[y].shape) # tenor in shape of（12）

提高訓(xùn)練速度的一些操作

torch.backends.cudnn.benchmark = True

dataloader的pin_memory=True，使用GPU中的鎖頁(yè)內(nèi)存（不與虛擬內(nèi)存交換數(shù)據(jù)以加快速度），需要GPU內(nèi)存足夠，更具體內(nèi)容參考：https://blog.csdn.net/tfcy694/article/details/83270701

這里沒用使用eval函數(shù)去評(píng)價(jià)模型實(shí)際的效果，可以選擇使用mAP。在保存最好的網(wǎng)絡(luò)模型時(shí)，可以考慮eval指標(biāo)的增加來(lái)保留下好的參數(shù)，同時(shí)可以用此eval指標(biāo)控制epochs提前終止

新人上路，請(qǐng)多多關(guān)注??，純手動(dòng)不易，歡迎討論

轉(zhuǎn)載請(qǐng)說(shuō)明出處。

References

a-PyTorch-Tutorial-to-Object-Detection

《動(dòng)手學(xué)深度學(xué)習(xí)》

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

SSD：Single Shot Multibox Detector：第二部分-代碼與細(xì)節(jié)實(shí)現(xiàn)