作 者: 心有寶寶人自圓
聲 明: 歡迎轉(zhuǎn)載本文中的圖片或文字,請(qǐng)說(shuō)明出處
寫在前面
受到前輩們的啟發(fā),決定應(yīng)該寫些文章記錄一下學(xué)習(xí)的內(nèi)容了
之前也讀過(guò)一些文章、寫過(guò)一些代碼,以后再慢慢填坑吧 ??
現(xiàn)在把最近讀的學(xué)習(xí)與大家分享一下
在此分享一下自己的理解和心得,如有錯(cuò)誤或理解不當(dāng)敬請(qǐng)指出 ??
這篇文章是SSD:Single Shot Multibox Detector:第一部分-論文閱讀的后續(xù)內(nèi)容,努力填坑......
論文地址:SSD: Single Shot MultiBox Detector
我們的目標(biāo)是:用Pytorch實(shí)現(xiàn)SSD ??
我使用的是python-3.6+ pytorch-1.3.0+torchvision-0.4.1
訓(xùn)練集:VOC2007 trainval ,VOC2012 trainval
測(cè)試集:VOC2007 test
其中目標(biāo)類別如下,共20個(gè)類別+1(背景類)
('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat',
'chair', 'cow', 'diningtable','dog', 'horse', 'motorbike', 'person',
'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor')
-
以下圖片為detect的結(jié)果,訓(xùn)練了45個(gè)epochs,比著作者的200+epochs差的挺多,但效果還行把(關(guān)鍵有點(diǎn)耗時(shí)間??),隨機(jī)展示了測(cè)試集中的一些圖片檢測(cè)效果??,看看怎么樣
0.論文重要概念的回顧
- single-shot vs two-stage:典型的two-stage模型(R-CNN系列)一般有SSD論文提及的那個(gè)pipeline,大量的多尺度的提議區(qū)域,卷積神經(jīng)網(wǎng)絡(luò)提取特征,高質(zhì)量分類器進(jìn)行分類,用回歸方法預(yù)測(cè)邊界框的位置,blablabla......總之它存在準(zhǔn)確率-速度權(quán)衡,大量的計(jì)算資源消耗使它不適合真實(shí)世界的即時(shí)目標(biāo)檢測(cè)任務(wù);SSD將最耗時(shí)的提議區(qū)域的選擇與重采樣去除,轉(zhuǎn)而使用封裝在了模型內(nèi)部的固定錨框,是我們能又快又準(zhǔn)的進(jìn)行目標(biāo)檢測(cè)
- 固定的錨框(fixed邊界框,priors):在我之前寫的論文閱讀部分中,大量的準(zhǔn)備工作都是對(duì)錨框進(jìn)行的,錨框的設(shè)計(jì)對(duì)模型的訓(xùn)練至關(guān)重要,因?yàn)樗鼘⒈辉O(shè)計(jì)成ground truth標(biāo)記(offset+label)。錨框是預(yù)先在SSD模型中固定下來(lái)的(priors),以(aspect ratio, scale)來(lái)標(biāo)識(shí)。由于錨框與不同層次的feature map對(duì)應(yīng),所以高層的 scale大,低層的 scale?。A(yù)測(cè)是基于每一個(gè)priors)
- 多尺度特征圖與預(yù)測(cè)器:SSD在不同層次的特征圖上進(jìn)行預(yù)測(cè),并將預(yù)測(cè)結(jié)果加到截?cái)嗟腷ase net之后。低層主要用來(lái)檢測(cè)較小的目標(biāo),高層主要用來(lái)檢測(cè)較大的目標(biāo),不同尺度的預(yù)測(cè)器學(xué)習(xí)去預(yù)測(cè)該尺度下的目標(biāo)。由于不同的尺度特征上,一個(gè)像素的感受野在高層更大,這一特性使得卷積核被設(shè)定成固定的大小的小卷積核。
- Hard Negative Mining:SSD在訓(xùn)練時(shí)往往會(huì)存在大量的負(fù)類,這將導(dǎo)致訓(xùn)練數(shù)據(jù)的正負(fù)類嚴(yán)重不平衡,所以我們需要顯式選擇一定比例負(fù)類信度高的預(yù)測(cè)結(jié)果去計(jì)算損失,而不使用全部的負(fù)類
- 非極大值抑制:只留下信度最高的預(yù)測(cè)框,刪除交疊、冗余的數(shù)據(jù)框
整體的工作量還是很大的,我盡量把注釋寫的清楚 ??
記得定義全局變量
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
1. 從錨框(論文中固定邊界框、default boxes,之后的Prior)開始
import matplotlib.pyplot as plt
def show_box(box, color):
"""
使用matplotlib展示邊界框
:param box: 邊界框,(xmin, ymin, xmax, ymax)
:return: matplotlib.patches.Rectangle
"""
return plt.Rectangle(xy=(box[0], box[1]), width=box[2] - box[0], height=box[3] - box[1], fill=False,edgecolor=color, linewidth=2)
通常來(lái)說(shuō),目標(biāo)(不論是哪個(gè)種類)在圖像中的位置分布十分散亂,大小尺寸各不一致。從概率上來(lái)說(shuō),目標(biāo)可能出現(xiàn)在任何地方,所以我們只能將這種概率空間離散化,這樣我們至少能得出一個(gè)概率值了......??我們就讓錨框盡可能的普遍整個(gè)特征圖(離散化的概率空間?)。
錨框是先驗(yàn)的、固定的方框,它們共同代表了這個(gè)類別可能性和近似的方框的概率空間,之后為了突出先驗(yàn)性,給它起個(gè)英文名:Prior。
1.1 好吧Prior
- 這些錨框需要人工選定且大小、尺度符合訓(xùn)練數(shù)據(jù)的特點(diǎn),想要Prior代表概率空間就需要它們以每個(gè)像素塊生成
- 和之前論文閱讀中講的一樣,低層采樣較小的scale(檢測(cè)較小的目標(biāo)),高層采用較大的scale(檢測(cè)較大的目標(biāo))。因?yàn)閟cale采用比例表示,從特征圖還原到原始空間上尺度具有一致性
10x10特征圖上某一location對(duì)應(yīng)的6個(gè)priors(其他的沒畫太多了)
(具體的操作過(guò)程看論文或我之前寫的文章把,這里只標(biāo)識(shí)了重點(diǎn)步驟)
def create_prior_boxes(widths: list, heights: list, scales: list, aspect_ratios: list) -> torch.Tensor:
"""
Create prior boxes on each pixel following authors methods in paper
:param widths: widths list of all feature maps using for create priors
:param heights: heights list of all feature maps using for create priors
:param scales: scales list of all feature maps use for create priors.
Note that each feature map has a specific scale
:param aspect_ratios: widths list of all feature maps use for create priors.
Note that each feature maps has different nums of ratios
:return: priors' location in center coordinates , a tensor in shape of(8732, 4)
"""
prior_boxes = []
for i, (width, height, scale, ratios) in enumerate(zip(widths, heights, scales, aspect_ratios)):
for y in range(height):
for x in range(width):
# change cxcy to the center of pixel
# change cxcy in range 0 to 1
cx = (x + 0.5) / width
cy = (y + 0.5) / height
for ratio in ratios:
# all those params are proportional form(percent coordinates)
prior_width = scale * math.sqrt(ratio)
prior_height = scale / math.sqrt(ratio)
prior_boxes.append([cx, cy, prior_width, prior_height])
# For the aspect ratio of 1, we also add a default box whose scale is sqrt(s(k)*(sk+1))
if ratio == 1:
try:
additional_scale = math.sqrt(scales[i] * scales[i + 1])
# except this is the last feature map, only one pixel is left
except IndexError:
additional_scale = 1
# ratio of 1 means scale is width and height
prior_boxes.append([cx, cy, additional_scale, additional_scale])
return torch.FloatTensor(prior_boxes).clamp_(0, 1).to(device) # (8732, 4) Note that they are percent coordinates
1.2 Prior的表示形式
Prior在論文中表示為(cx, cy, w, h):中心表示形式,而有時(shí)候?yàn)榱司幊痰姆奖氵€會(huì)采用(xmin, ymin , xmax, ymax)的邊緣表示形式,這就需要兩種表示形式的相互轉(zhuǎn)化
def xy_to_cxcy(xy: torch.Tensor) -> torch.Tensor:
"""
把(xmin, ymin, xmax, ymax)的中心表示形式轉(zhuǎn)換為(cx, cy, w, h)的邊緣表示形式
:param xy: 邊界框的(xmin, ymin, xmax, ymax)表示,a tensor of size (num_boxes, 4)
:return:邊界框的(cx, cy, w, h)表示, a tensor of size (num_boxes, 4)
"""
return torch.cat([(xy[:, 2:] + xy[:, :2] )/ 2, xy[:, 2:] - xy[:, :2]], dim=1)
def cxcy_to_xy(cxcy: torch.Tensor) -> torch.Tensor:
"""
把(cx, cy, w, h)表示形式轉(zhuǎn)換為(xmin, ymin, xmax, ymax)
:param cxcy: 邊界框的(cx, cy, w, h)表示,a tensor of size (n_boxes, 4)
:return: 邊界框的(xmin, ymin, xmax, ymax)表示
"""
return torch.cat([cxcy[:, :2] - (cxcy[:, 2:] / 2), cxcy[:, :2] + (cxcy[:, 2:] / 2)], 1)
注:在之前的論文閱讀部分也指明了通過(guò)多方面考慮應(yīng)該使用相對(duì)長(zhǎng)度(或相對(duì)坐標(biāo),即已進(jìn)行歸一化)來(lái)表示Prior
1.3 Prior to ground truth
很顯然priors并不是真正的groud truth信息(與真實(shí)邊界存在偏差、未指定類別、且每個(gè)prior的ground truth據(jù)有不確定性,我們需要量化這些信息),我們需要將priors的信息調(diào)整為ground truth信息來(lái)計(jì)算損失(同時(shí)我么也必須理解我們預(yù)測(cè)的是什么,預(yù)測(cè)結(jié)果怎么轉(zhuǎn)換為真實(shí)預(yù)測(cè)邊界框的信息)
1.3.1 offset
偏移量表示為,論文閱讀部分指出進(jìn)行了如下編碼:
? =
,
=
,
,
(1),
? 其中(cx,cy,w,h)是ground truth的真實(shí)位置信息,是prior的真實(shí)位置信息
而在實(shí)際使用的時(shí)候常常使用基于經(jīng)驗(yàn)參數(shù)的標(biāo)準(zhǔn)化對(duì)編碼結(jié)果再次處理,即:
? =
,
=
,
,
(2),
? 其中經(jīng)驗(yàn)參數(shù)
def cxcy_to_gcxgcy(cxcy: torch.Tensor, priors_cxcy: torch.Tensor) -> torch.Tensor:
"""
使用中心格式的輸入計(jì)算與目標(biāo)區(qū)域與priors的偏移量,該偏移量按式(2)編碼
中心格式的目標(biāo)區(qū)域與priors是一一對(duì)應(yīng)的
:param cxcy: 邊緣格式的邊界框, a tensor of size (n_priors, 4)
:param priors_cxcy: prior的邊界框, a tensor of size (n_priors, 4)
:return: encoded bounding boxes, a tensor of size (n_priors, 4)
"""
return torch.cat([(cxcy[:, :2] - priors_cxcy[:, :2]) / (priors_cxcy[:, 2:]) * 10,
torch.log(cxcy[:, 2:] / priors_cxcy[:, 2:]) * 5], 1)
我們要獲得實(shí)際預(yù)測(cè)邊界框,則需要對(duì)上述過(guò)程進(jìn)行解碼(注:預(yù)測(cè)器實(shí)際預(yù)測(cè)的結(jié)果是上面最終編碼的的offsets)
def gcxgcy_to_cxcy(gcxgcy: torch.Tensor, priors_cxcy: torch.Tensor) -> torch.Tensor:
"""
輸入模型預(yù)測(cè)的offsets和priors(一一對(duì)應(yīng)),解碼出的預(yù)測(cè)邊界框中心格式邊界框
:param gcxgcy:編碼后的邊界框(即offset),如模型的輸出, a tensor of size (n_priors, 4)
:param priors_cxcy:prior的邊界框, a tensor of size (n_priors, 4)
:return: decoded bounding boxes in center-size form, a tensor of size (n_priors, 4)
"""
return torch.cat([gcxgcy[:, :2] / 10 * priors_cxcy[:, 2:] + priors_cxcy[:, 2],
torch.exp(gcxgcy[:, 2:] / 5) * priors_cxcy[:, 2:]], dim=1)
這一部分中g(shù)round truth offset只需cxcy為ground truth labels即可,但cxcy需與priors一一對(duì)應(yīng),這種對(duì)應(yīng)關(guān)系,就是我們接下來(lái)討論的內(nèi)容
1.3.2 object class
0代表背景類,1-n_classes代表目標(biāo)類別。每個(gè)圖像中目標(biāo)個(gè)數(shù)、目標(biāo)類別均不一定相同,因此我要先給priors分配一個(gè)目標(biāo),由該目標(biāo)的類別確定prior的類別
1.3.3 criterion
為了為priors分配類別,必須采用一種指標(biāo),來(lái)判斷priors與真實(shí)邊界框的匹配程度
原文中采用了jaccard overlap(交并比,IoU)
下面定義了計(jì)算交并比的函數(shù),注意輸入是邊界框的邊緣形式
def find_intersection(set_1, set_2):
"""
Find the intersection of every box combination between two sets of boxes that are in boundary coordinates.
:param set_1: set 1, a tensor of dimensions (n1, 4)
:param set_2: set 2, a tensor of dimensions (n2, 4)
:return: intersection of each of the boxes in set 1 with respect to each of the boxes in set 2, a tensor of dimensions (n1, n2)
"""
# PyTorch auto-broadcasts singleton dimensions
lower_bound = torch.max(set_1[:, :2].unsqueeze(1), set_2[:, :2].unsqueeze(0)) # (n1,n2,2)
upper_bound = torch.min(set_1[:, 2:].unsqueeze(1), set_2[:, 2:].unsqueeze(0)) # (n1,n2,2)
intersection_dims = torch.clamp(upper_bound - lower_bound, 0) # (n1, n2, 2)
return intersection_dims[:, :, 0] * intersection_dims[:, :, 1] # (n1, n2)
def find_jaccard_overlap(set_1, set_2):
"""
Find the Jaccard Overlap (IoU) of every box combination between two sets of boxes that are in boundary coordinates.
:param set_1: set 1, a tensor of dimensions (n1, 4)
:param set_2: set 2, a tensor of dimensions (n2, 4)
:return: Jaccard Overlap of each of the boxes in set 1 with respect to each of the boxes in set 2, a tensor of dimensions (n1, n2)
"""
# Find intersections
intersection = find_intersection(set_1, set_2)
# Find areas of each box in both sets
areas_set_1 = (set_1[:, 2] - set_1[:, 0]) * (set_1[:, 3] - set_1[:, 1]) # (n1)
areas_set_2 = (set_2[:, 2] - set_2[:, 0]) * (set_2[:, 3] - set_2[:, 1]) # (n2)
# Find the union
# PyTorch auto-broadcasts singleton dimensions
union = areas_set_1.unsqueeze(1) + areas_set_2.unsqueeze(0) - intersection # (n1, n2)
return intersection / union # (n1, n2)
假設(shè)set_1是priors(8732, 4),set_2是真實(shí)邊界框(n_object_per_image, 4),我們最終的到(8732, n_object_per_image)的tensor,即在該圖像內(nèi)每個(gè)prior與每個(gè)object box的交并比
1.3.4 priors to ground truth
def label_prior(priors_cxcy, boxes, classes):
"""
Assign ground truth label for prior. Note that we do this for each image in a batch
priors are fixed pretrain, boxes and classes are from dataloader.
:param priors_cxcy: priors which we create in shape of (8732, 4),note that they are center center coordinates and percent coordinates
:param boxes: boxes is a tensor of true objects' bounding boxes in the image. Note that they are percent coordinates
:param classes: classes is a tensor of true objects' class labels in the image
:return:
"""
n_objects = boxes.size(0)
# cxcy to xy
priors_cxcy = priors_cxcy
priors_cxcy = cxcy_to_xy(priors_cxcy)
overlaps = find_jaccard_overlap(boxes, priors_cxcy)
# 為每個(gè)prior找出最大的overlap并以此為標(biāo)準(zhǔn)分配目標(biāo)(注意不是類別)
overlap_per_prior, object_per_prior = overlaps.max(dim=0) # (8732)
# 直接為按交并比大小分配類別會(huì)產(chǎn)生如下的問(wèn)題
# 1. 如果一個(gè)檢測(cè)目標(biāo)對(duì)與所有priors的交并比都不是最大的,該目標(biāo)的類別則不能分配給任意一個(gè)prior
# 2. 給定閾值(0.5)將交并比較小的prior分配給背景類(class 0)
# 解決第一個(gè)問(wèn)題:
_, prior_per_object = overlaps.max(dim=1) # (nums of object)每個(gè)值為該目標(biāo)對(duì)應(yīng)的index in (0, 8731)
object_per_prior[prior_per_object] = torch.LongTensor(range(n_objects)).to(device) # 為與每個(gè)目標(biāo)overlap最大prior的分配為該目標(biāo)
overlap_per_prior[prior_per_object] = 1
# 解決第二個(gè)問(wèn)題:
class_per_prior = classes[object_per_prior] # 根據(jù)object的索引獲得對(duì)應(yīng)其真實(shí)的類別標(biāo)簽
class_per_prior[overlap_per_prior < 0.5] = 0 # (8732)
# 為每個(gè)prior計(jì)算與之前所分配objcet邊界框的offset
offset_per_prior = cxcy_to_gcxgcy(boxes[object_per_prior], priors_cxcy) # (8732, 4)
return class_per_prior, offset_per_prior
不難注意到,每個(gè)prior對(duì)應(yīng)了一個(gè)ground truth,它們用來(lái)檢測(cè)不同尺度、不同位置的目標(biāo)
label_prior()是針對(duì)batch里的一個(gè)圖像與之對(duì)應(yīng)的目標(biāo)邊界框和目標(biāo)類別(xml文件標(biāo)注的,from dataloard),只需在batches里寫個(gè)for循環(huán)即可,就得到了針對(duì)該圖片的priors to ground truth,用于Loss計(jì)算(見5.1).
2. 網(wǎng)絡(luò)結(jié)構(gòu)
SSD模型的網(wǎng)絡(luò)結(jié)構(gòu)將VGG-16從FC之前截?cái)嘧鳛閎ase net,將base net細(xì)節(jié)結(jié)構(gòu)進(jìn)行更改并加上Conv6和Conv7,在base net之后加上了額外的卷積層結(jié)構(gòu)
(注:為代碼的可讀性網(wǎng)絡(luò),SSD的網(wǎng)絡(luò)被拆分BaseNet和AuxiliaryConvolutions)
完整的VGG-16模型由于全連接層的存在,需要輸入的大小為( 3, 224, 224),作者將網(wǎng)絡(luò)魔改一下用來(lái)接收300x300的輸入(SSD300 model)
2.0 Conv4_3:
按vgg-16向前傳播的時(shí)候,Conv_4中300 x 300的原始圖像會(huì)被下采樣到37 x 37,而這里指出的大小為38 x 38。vgg-16網(wǎng)絡(luò)中,能夠下采樣的只有池化層,所以這里變化是由maxpool3的修改而導(dǎo)致的,將其中計(jì)算輸出尺寸的函數(shù)由向下取整(floor)改為向上取整(ceiling)
self.pool3=nn.MaxPool2d(kernel_size=2, 2, ceil_mode=True)
2.1 Maxpool5
不在使用原來(lái)vgg-16中同一結(jié)構(gòu),而改用size=(3,3),stride=1,padding=1的maxpool
self.pool5 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
2.2 Conv6與Conv7:希望我能表述的足夠清楚??
fc6-fc7:圖像為(512, 7, 7).flatten()(fc6)4096
(fc7)1000,作者希望直接利用fc6和fc7的weights生成Conv6和Conv7的卷積核
2.2.1我們先來(lái)理清一下卷積層與全連接層的相互轉(zhuǎn)化問(wèn)題
-
卷積層->全連接層:
Conv to FC
由上圖很容易的出轉(zhuǎn)換fc層的權(quán)重是取自卷積核權(quán)重的稀疏矩陣。又特征圖每個(gè)輸出通道上的像素由輸入空間所有in_channel在相同位置的卷積值相加得到(i.e.紅框陰影由多層藍(lán)陰影框(假設(shè)有多層-_-)分別與多個(gè)卷積核卷積得到的多層結(jié)果相加得到),所以out_channel控制特征圖的個(gè)數(shù),in_channel和out_channels控制fc權(quán)重的長(zhǎng)和寬
-
全連接層->卷積層:考慮input像素(512,7,7).flatten() -> 4096個(gè),此時(shí)fc權(quán)重為(512*7*7,4096)
假設(shè)卷積核大小與圖像大小一致,為(4096,512,7,7),按照卷積的運(yùn)算過(guò)程,得到的結(jié)果是(某一輸出通道內(nèi))每個(gè)通道的每個(gè)像素與對(duì)應(yīng)的卷積核權(quán)重相乘之后相加,與全連接的計(jì)算結(jié)果完全一致,此時(shí)通道維是原來(lái)的特征維
所以conv6的卷積核應(yīng)為(4096,512,7,7),conv7的卷積核應(yīng)為(4096,4096,1,1)
However,這樣還不行??,這些過(guò)濾器數(shù)量眾多、體積龐大,而且計(jì)算成本很高,所以作者對(duì)卷積核進(jìn)行了下采樣
2.2.2 卷積核下采樣
其實(shí)這個(gè)過(guò)程非常的簡(jiǎn)單,就是把卷積核的參數(shù)(out_channels, height, width這三個(gè)dim)給下采樣了.......
from collections import Iterable
def decimate(tensor: torch.Tensor, m: Iterable) -> torch.Tensor:
"""
對(duì)tensor的一些維度進(jìn)行下采樣,每一維度的下采樣間隔列表為m
:param tensor: 要被下采樣的tensor
:param m: 每一維度的下采樣間隔參數(shù)列表,如果某一維度不進(jìn)行下采樣,參數(shù)為None
:return: 下采樣后的tensor
"""
assert tensor.dim() == len(m)
for d in range(tensor.dim()):
if m[d] is not None:
tensor = tensor.index_select(dim=d, index=torch.arange(start=0, end=tensor.size(d), step=m[d]))
return tensor
作者將 height和width dim的采樣率都設(shè)為3(每三取一),out_channels采樣率為4采樣出了的原始卷積核
終于我們得到了Conv6核Conv7的卷積核分別為(1024,512,3,3),(1024, 1024,1, 1)
2.2.2Atrous卷積
Atrous卷積(空洞卷積, also known as Dilated Convolution or Convolution with holes......)實(shí)際針對(duì)的是相鄰的像素(因?yàn)橄噜徬袼匾话阍谛畔⑸嫌休^大冗余)。為了在不進(jìn)行pooling下采樣的情況下能夠獲得更大的感受野,我們便可以在卷積的輸入空間內(nèi)加入空洞(因?yàn)閜ooling意味著圖片信息的損失。Atrous卷積實(shí)際并沒有圖片信息的損失,只不過(guò)特征圖同一像素不提取輸入空間相鄰像素的信息,而在其他特征圖像素中,之前被“跳過(guò)”的相鄰像也確實(shí)和卷積核進(jìn)行了運(yùn)算......不多說(shuō)了,看圖更清楚??)
該圖片來(lái)自:vdumoulin/conv_arithmetic (可能大家對(duì)這一系列的圖都很熟悉,陰影部分是卷積運(yùn)算的區(qū)域??)

不難發(fā)現(xiàn),確實(shí)每個(gè)輸入空間的像素都被用到(沒有像pooling那樣丟棄)并且還擴(kuò)大了感受野
2.2.3Atrous算法與卷積核的下采樣
原文中,conv6的輸出大小仍是19x19,且使用了atrous卷積。
按之前講述的內(nèi)容卷積核被下采樣后,特征圖原本應(yīng)該與7x7卷積核運(yùn)算,但下采樣使部分核有所缺失(holes are in the kernel),所以合適的方法應(yīng)該讓卷積時(shí)跳過(guò)3個(gè)像素。然而作者的倉(cāng)庫(kù)中實(shí)際上使用了dilation=6,這樣的操作可能是考慮了修改之后maxpool5沒有使輸出大小縮小一半,所以dilation需要增加一倍
self.conv6 = nn.Conv2d(512, 1024, kernel_size=3, padding=6, dilation=6) # atrous convolution
self.conv7 = nn.Conv2d(1024, 1024, kernel_size=1)
接下來(lái)使用原全連接層的weight和bias更新base_net:
# this part can be defined in class BaseNet as a function for init.
# get state_dict which only contains params
state_dict = base_net.state_dict() # base net is instance of BaseNet
pretrained_state_dict = torchvision.models.vgg16(pretrained=True).state_dict()
# fc6
conv_fc_weight = pretrained_state_dict['classifier.0.weight'].view(4096, 512, 7, 7) # (4096, 512, 7, 7)
conv_fc_bias = pretrained_state_dict['classifier.0.bias'] # (4096)
state_dict['conv6.weight'] = decimate(conv_fc_weight, m=[4, None, 3, 3])# (1024, 512, 3, 3)
# fc7:在預(yù)訓(xùn)練模型中,fc7的名字就是classifier.3
conv_fc7_weight = pretrained_state_dict['classifier.3.weight'].view(4096, 4096, 1, 1) # (4096, 4096, 1, 1)
conv_fc7_bias = pretrained_state_dict['classifier.3.bias'] # (4096)
state_dict['conv7.weight'] = decimate(conv_fc7_weight, m=[4, 4, None, None]) # (1024, 1024, 1, 1)
state_dict['conv7.bias'] = decimate(conv_fc7_bias, m=[4]) # (1024)
base_net.load_state_dict(state_dict)
......這個(gè)令人頭疼的部分終于結(jié)束了??
2.3 其余的附加卷積層:
都是作者附加的用來(lái)提取大尺度特征的,挺好理解,1x1卷積層有妙用(類似于提取特征圖進(jìn)一步提取特征?)??
class AuxiliaryConvolutions(nn.Module):
"""
Additional convolutions to produce higher-level feature maps.
"""
def __init__(self):
super(AuxiliaryConvolutions, self).__init__()
# Auxiliary convolutions on top of the VGG base
self.conv8_1 = nn.Conv2d(1024, 256, kernel_size=1, padding=0)
self.conv8_2 = nn.Conv2d(256, 512, kernel_size=3, stride=2, padding=1)
self.conv9_1 = nn.Conv2d(512, 128, kernel_size=1, padding=0)
self.conv9_2 = nn.Conv2d(128, 256, kernel_size=3, stride=2, padding=1)
self.conv10_1 = nn.Conv2d(256, 128, kernel_size=1, padding=0)
self.conv10_2 = nn.Conv2d(128, 256, kernel_size=3, padding=0)
self.conv11_1 = nn.Conv2d(256, 128, kernel_size=1, padding=0)
self.conv11_2 = nn.Conv2d(128, 256, kernel_size=3, padding=0)
# Initialize convolutions' parameters
for c in self.children():
if isinstance(c, nn.Conv2d):
nn.init.xavier_normal_(c.weight)
nn.init.constant_(c.bias, 0.)
2.4 multi-level feature maps:
從圖中可以看出,用來(lái)提取多尺度特征的特征圖選擇為conv4_3, conv7, conv8_2, conv9_2, conv10_2, and conv11_2(有低層特征圖,也有高層特征圖),在forward內(nèi)把這些特征圖返回出來(lái)即可
BaseNet:forward return conv4_3_features, conv7_features
AuxiliaryConvolutions: foward return conv8_2_features, conv9_2_features, conv10_2_features, conv11_2_features
2.5 predictor
多層特征圖傳入各自的預(yù)測(cè)其,分別預(yù)測(cè)offset和class,各層的預(yù)測(cè)器具有較類似的結(jié)構(gòu):kernel_size=3, padding=1
注意offset的預(yù)測(cè)結(jié)果是基于該層特征圖上priors的編碼結(jié)果(見1.3),class需要為各個(gè)類別評(píng)分
def loc_predictor(in_channels, num_priors):
"""
邊界框預(yù)測(cè)層,為每個(gè)輸入空間每個(gè)像素上的priors預(yù)測(cè)4個(gè)偏移量
:param in_channels: 輸入空間通道數(shù)
:param num_priors:每個(gè)單元為中心生成 num_priors 個(gè)prior
:return:預(yù)測(cè)offset的卷積層
"""
return nn.Conv2d(in_channels, num_priors * 4, kernel_size=3, padding=1)
def cls_predictor(in_channels, num_priors, num_classes):
"""
類別預(yù)測(cè)層,為每個(gè)輸入空間像素上的priors預(yù)測(cè)各個(gè)類別的評(píng)分
類別預(yù)測(cè)層使用一個(gè)保持輸入高和寬的卷積層。此時(shí),輸出和輸入在特征圖寬和高上的空間坐標(biāo)一一對(duì)應(yīng)
:param in_channels: 輸入空間通道數(shù)
:param num_priors: 每個(gè)單元為中心生成 num_priors 個(gè)prior
:param num_classes: 目標(biāo)的類別個(gè)數(shù)為 num_classes
:return:類別預(yù)測(cè)的卷積層
"""
return nn.Conv2d(in_channels, num_priors * num_classes, kernel_size=3, padding=1)
priors是在特征圖每個(gè)像素上生成的,預(yù)測(cè)器的預(yù)測(cè)結(jié)果的w,h與輸入空間一致,所以每個(gè)預(yù)測(cè)空間像素與輸入空間像素對(duì)應(yīng),很自然offset是針對(duì)對(duì)應(yīng)prior的編碼后offset,此時(shí)out_channels轉(zhuǎn)換為了特征維,為了應(yīng)對(duì)不同輸入空間大小不同導(dǎo)致w,h和num_priors的不同,我們需要在把所有輸出結(jié)果concatenate前,需要把其空間維flatten一下。class預(yù)測(cè)與offset預(yù)測(cè)的思路基本一致只是最后的特征維(輸出通道)不同
- 為了訓(xùn)練還需要把選取提取特征的特征圖元素個(gè)數(shù)湊得和priors的個(gè)數(shù)一致(一一對(duì)應(yīng)關(guān)系)
最后把所有特征圖的預(yù)測(cè)結(jié)果連接起來(lái)
class PredictionConvolution(nn.Module):
"""
Convolutions to predict class scores and bounding boxes
"""
def __init__(self, n_classes):
"""
:param n_class: number of different types of objects
"""
self.n_classes = n_classes
super(PredictionConvolution, self).__init__()
# Number of priors, as we showing before ,at per position in each feature map
n_boxes = {'conv4_3': 4,
'conv7': 6,
'conv8_2': 6,
'conv9_2': 6,
'conv10_2': 4,
'conv11_2': 4}
self.convs = ['conv4_3', 'conv7', 'conv8_2', 'conv9_2', 'conv10_2', 'conv11_2']
for name, ic in zip(self.convs, [512, 1024, 512, 256, 256, 256]):
setattr(self, 'cls_%s' % name, cls_predictor(ic, n_boxes[name], n_classes))
setattr(self, 'loc_%s' % name, loc_predictor(ic, n_boxes[name]))
# Initialize convolutions' parameters
for c in self.children():
if isinstance(c, nn.Conv2d):
nn.init.xavier_normal_(c.weight)
nn.init.constant_(c.bias, 0.)
def _apply(self, x: torch.Tensor, conv: nn.Conv2d, num_features: int):
"""
Apply forward calculation for each conv2d with respect to specific feature map
:param x: input tensor
:param conv: conv
:param num_features: output feature, for loc_pred is 4, for label_pred is num_classes+1
:return: locations and class scores
"""
x = conv(x).permute(0, 2, 3, 1).contiguous()
return x.view(x.size(0), -1, num_features)
def forward(self, *args):
# args are feature maps needed for prediction
assert len(args) == len(self.convs)
locs = []
classes_scores = []
for name, x in zip(self.convs, args):
classes_scores.append(self._apply(x, getattr(self, 'cls_%s' %name), self.n_classes))
locs.append(self._apply(x, getattr(self, 'loc_%s' % name), 4))
locs = torch.cat(locs, dim=1) # (N, 8732, 4)
classes_scores = torch.cat(classes_scores, dim=1) # (N, 8732, n_classes)
return locs, classes_scores
2.6 SSD300
把BaseNet,AuxiliaryConvolutions和PredictionConvolution整合在一起得到SSD300模型
3. 訓(xùn)練數(shù)據(jù)處理
數(shù)據(jù)增廣時(shí)除了圖像本身的處理外還涉及對(duì)真實(shí)邊界框的處理,所以我們不能直接使用torchvision.transform里封裝好的類,我們只能手動(dòng)寫了??
針對(duì)文中所說(shuō)的0.5的概率進(jìn)行圖像增廣,只需通過(guò)判斷random.random()是否小于0.5來(lái)進(jìn)行圖像增廣即可
3.1 隨機(jī)裁剪
原文中的數(shù)據(jù)增廣主要就是這個(gè)隨機(jī)裁剪了
def random_crop(image: torch.Tensor, boxes: torch.Tensor, labels: torch.Tensor):
"""
隨機(jī)裁剪,能夠幫助網(wǎng)絡(luò)學(xué)習(xí)更大尺度的目標(biāo),但某些目標(biāo)可能被完全剪切掉
:param image: 圖像, a tensor of dimensions (3, original_h, original_w)
:param boxes: 邊緣形式的真實(shí)邊界框, a tensor of dimensions (n_objects, 4)
:param labels: 真實(shí)目標(biāo)類別, a tensor of dimensions (n_objects)
:return: 隨機(jī)裁剪后圖像,邊界框,目標(biāo)類別
"""
original_width = image.size(2)
original_height = image.size(1)
while True:
# 'None' 意味著不剪裁,0意味著隨即裁剪,[.1, .3, .5, .7, .9]是作者文中描述的最小交并比
min_overlap = random.choice([0., .1, .3, .5, .7, .9, None])
if min_overlap is None:
return image, boxes, labels
# 對(duì)選取的最小交并比嘗試50次(原文中未提及,但作者倉(cāng)庫(kù)中使用),若均不滿足條件,則進(jìn)行下一循環(huán)選擇新的最小交并比
for _ in range(50):
min_scale = 0.3
# 論文中提及采樣比例是[.1, 1],但作者倉(cāng)庫(kù)使用[.3, 1]
# random.uniform(a,b)->[a,b]閉區(qū)間
new_width = int(original_width * random.uniform(min_scale, 1))
new_height = int(original_height * random.uniform(min_scale, 1))
# 論文重提及采樣后aspect ratio應(yīng)該在[0.5,2]
if not .5 <= new_height / new_width <= 2:
continue
# 獲取裁剪的位置
# random.randint(a,b)->[a,b]閉區(qū)間
left = random.randint(0, original_width - new_width)
top = random.randint(0, original_height - new_height)
right = left + new_width
bottom = top + new_height
crop_bounding = torch.FloatTensor([left, top, right, bottom])
# 計(jì)算剪裁后的圖片與真實(shí)邊界框交并比
over_lap = find_jaccard_overlap(crop_bounding.unsqueeze(0), boxes).squeeze(0) # (n_objects)
# 論文中提及,與所有目標(biāo)的交并比應(yīng)該> min_overlap
if over_lap.max().item() < min_overlap:
continue
cropped_image = image[:, top:bottom, left:right]
# 判斷object是否在圖像中的判據(jù):true bounding box的中心是否在裁剪后的圖像中
box_centers = (boxes[:, :2] + boxes[:, 2:]) / 2. # (n_objects, 2)
center_in_cropped_iamge = (box_centers[:, 0] > left) * (box_centers[:, 0] < right) * ( box_centers[:, 1] > top) * (box_centers[:, 0] < bottom) # (n_objects)
# 如果沒有一個(gè)目標(biāo)的中心在裁剪后的圖像中
if center_in_cropped_iamge.any():
continue
# 丟棄沒有通過(guò)判據(jù)的目標(biāo)
new_boxes = boxes[center_in_cropped_iamge]
new_labels = labels[center_in_cropped_iamge]
# 計(jì)算剪切后圖像中邊界框的位置
# 篩選出真實(shí)左邊界、上邊界和裁剪左邊界、上邊界之中小的那個(gè)
new_boxes[:, :2] = torch.max(new_boxes[:, :2], crop_bounding[:2])
new_boxes[:, :2] -= crop_bounding[:2]
# 篩選出真實(shí)右邊界、下邊界和裁剪右邊界、下邊界之中大的那個(gè)
new_boxes[:, 2:] = torch.min(new_boxes[:, 2:], crop_bounding[2:])
new_boxes[:, 2:] -= crop_bounding[:2]
return cropped_image, new_boxes, new_labels
3.2 水平翻轉(zhuǎn)
這個(gè)很簡(jiǎn)單,就是真實(shí)邊界框不是圖像還需要額外處理
def flip(image, boxes):
"""
Flip image horizontally.
:param image: 一個(gè)PIL圖像,因?yàn)檎{(diào)用了torchvision的函數(shù),必須使用PIL Image
:param boxes: 邊緣形式的真實(shí)邊界框, a tensor of dimensions (n_objects, 4)
:return: 水平翻轉(zhuǎn)圖像, 更新后的邊界框
"""
# Flip image
new_image = torchvision.transforms.functional.hflip(image)
# Flip boxes
new_boxes = boxes
new_boxes[:, 0] = image.width - (boxes[:, 0] + 1)
new_boxes[:, 2] = image.width - (boxes[:, 2] + 1)
new_boxes = new_boxes[:, [2, 1, 0, 3]]
return new_image, new_boxes
3.3 Resize
SSD300模型需要將訓(xùn)練集resize到300 x 300,此外在這里把真實(shí)邊界框處理成比例 的形式
def resize(image, boxes, size=(300, 300), return_percent_coords=True):
"""
Resize image. For the SSD300, resize to (300, 300).
Since percent/fractional coordinates are calculated for the bounding boxes (w.r.t image dimensions) in this process,
you may choose to retain them.
:param image: image, a PIL Image
:param boxes: bounding boxes in boundary coordinates, a tensor of dimensions (n_objects, 4)
:param size: resize to specific size
:param return_percent_coords: whether to return new bounding box coordinates in form of percent coordinates
:return: resized image, updated bounding box coordinates (or fractional coordinates, in which case they remain the same)
"""
# Resize image
new_image = transforms.functional.resize(image, size)
# Resize bounding boxes
old_size = torch.FloatTensor([image.width, image.height, image.width, image.height]).unsqueeze(0)
# resize means percent coordinates will not change for only augment or shrink
new_boxes = boxes / old_size # percent coordinates means same even if different size
if not return_percent_coords:
new_size = torch.FloatTensor([size[0], size[1], size[0], size[1]]).unsqueeze(0)
new_boxes = new_boxes * new_size
return new_image, new_boxes
3.5 Expand
由于模型對(duì)于較小尺度目標(biāo)的檢測(cè)性能不好,在此我們將訓(xùn)練數(shù)據(jù)放大,以增強(qiáng)對(duì)小尺度目標(biāo)的檢測(cè)能力
整體的步驟與resize十分類似,只不過(guò)需要將新圖片放大,將原圖片放在新圖片內(nèi)部,再將其他空白部分填充一下
這個(gè)填充的值推薦使用三個(gè)channels各自的平均值(可以在3.6中看到)
由于新圖片范圍比原圖片大,真實(shí)邊界框只需加上[ 向左的移動(dòng),向下的移動(dòng),向左的移動(dòng),向下的移動(dòng) ]
3.6 標(biāo)準(zhǔn)化
輸入數(shù)據(jù)先被歸一化到[0, 1],預(yù)訓(xùn)練的模型會(huì)還需對(duì)歸一化輸入進(jìn)行標(biāo)準(zhǔn)化,這個(gè)頁(yè)面展示了torchvision.model預(yù)訓(xùn)練模型的具體處理
mean = [0.485, 0.456, 0.406] # RGB channels
std = [0.229, 0.224, 0.225] # RGB channels
4. Dataset and DataLoader
Dataset需要手動(dòng)創(chuàng)建torch.utils.data.Dataset的子類,在里面對(duì)圖片、真實(shí)邊界框、目標(biāo)標(biāo)記進(jìn)行第3節(jié)的處理即可
Dataset返回圖片、真實(shí)邊界框、目標(biāo)標(biāo)記
然而在使用DataLoader讀取batches的時(shí)候會(huì)出現(xiàn)問(wèn)題:
注意每個(gè)圖片內(nèi)objects的個(gè)數(shù)不同,這會(huì)導(dǎo)致每個(gè)圖片內(nèi)boxes和labels的長(zhǎng)度不同,這樣沒辦法組成batches
所以我們要為DataLoader的collate_fn=參數(shù)指定一個(gè)函數(shù)(注意只需傳入函數(shù)名),按此函數(shù)整理輸出
def collate_fn(batch):
"""
This describes how to combine these tensors of different sizes. We use lists.
:param batch: an iterable of N sets from __getitem__()
:return: a tensor of images, lists of varying-size tensors of bounding boxes, labels, and difficulties
"""
images = list()
boxes = list()
labels = list()
for b in batch:
images.append(b[0])
boxes.append(b[1])
labels.append(b[2])
images = torch.stack(images, dim=0)
return images, boxes, labels, difficulties # tensor (N, 3, 300, 300), 3 lists of N tensors each
5.訓(xùn)練
5.1 Loss Function
location_loss=torch.nn.L1Loss()
confidence_loss=nn.CrossEntropyLoss(reduction='none')
5.2 Hard negative mining
由于訓(xùn)練數(shù)據(jù)中的負(fù)類(背景類)遠(yuǎn)遠(yuǎn)多于正類,導(dǎo)致訓(xùn)練數(shù)據(jù)正負(fù)類嚴(yán)重的不平衡,所以這里要使用Hard negative mining,選擇Loss最大的負(fù)類,使正負(fù)類之比為1:3
def calculate_loss(priors_cxcy, pred_locs, pred_scores, boxes, labels, loc_loss, conf_loss, alpha=1):
"""
使用Hard Negative mining 計(jì)算損失
:param priors_cxcy: 中心形式的priors
:param pred_locs: 預(yù)測(cè)的offsets, 一個(gè)batch的預(yù)測(cè)結(jié)果
:param pred_scores: 類別預(yù)測(cè)分?jǐn)?shù), 一個(gè)batch的預(yù)測(cè)結(jié)果
:param boxes: 真實(shí)邊界框,from a batch of dataloader
:param labels: 真實(shí)類別標(biāo)記,from a batch of dataloader
:param loc_loss: nn.L1Loss()
:param conf_loss: nn.CrossEntropyLoss(reduction='none')
:param alpha: 論文中位置損失的權(quán)重,默認(rèn)為1
:return:
"""
n_priors = priors_cxcy.size(0)
batch_size = pred_locs.size(0)
n_classes = pred_scores.size(2)
assert n_priors == pred_scores.size(1) == pred_scores.size(1)
true_locs = torch.zeros((batch_size, n_priors, 4), dtype=torch.float).to(device) # (N, 8732, 4)
true_classes = torch.zeros((batch_size, n_priors), dtype=torch.long).to(device) # (N, 8732)
# 在不同圖片里,為每個(gè)prior分配真實(shí)標(biāo)簽
for i in range(batch_size):
cls, loc = label_prior(priors_cxcy, boxes[i], labels[i])
true_locs[i] = loc
true_classes[i] = cls
positive_priors = (true_classes != 0) # (N, 8732)
# 計(jì)算位置損失:位置損失只計(jì)算正類(非背景類)
loss_of_loc = loc_loss(pred_locs[positive_priors], true_locs[positive_priors])
# 計(jì)算信度損失
# 按論文中負(fù)類:正類 = 3:1選取負(fù)類
n_hard_negative = 3 * positive_priors.sum(dim=1) # (N)
# 首先計(jì)算所由正類和負(fù)類的信度損失,這樣可以免得計(jì)算不同圖片導(dǎo)致的位置關(guān)系
# CrossEntropyLoss(reduction='none')使得損失在第0維度上羅列開來(lái)而不是相加或取平均
loss_of_conf_all = conf_loss(pred_scores.view(-1, n_classes), labels.view(-1)) # (N * 8732)
loss_of_conf_all = loss_of_conf_all.view(batch_size, n_priors) # (N, 8732)
# 我們已經(jīng)知道了所有正類的損失
loss_of_conf_pos = loss_of_conf_all[positive_priors] # (sum(n_positives))
loss_of_conf_neg = loss_of_conf_all.clone() # (N, 8732)
loss_of_conf_neg[positive_priors] = 0 # (N, 8732), 使正類的loss永遠(yuǎn)不能在前n_hard_negatives
loss_of_conf_neg, _ = loss_of_conf_neg.sort(dim=1, descending=True) # 負(fù)類將損失按降序排序
neg_ranks = torch.LongTensor(range(n_priors)).unsqueeze(0).expand_as(loss_of_conf_neg) # (N, 8732), 為每行元素標(biāo)序號(hào)
hard_negatives = (neg_ranks < n_hard_negative.unsqueeze(1)) # (N, 8732)
loss_of_conf_hard_neg = loss_of_conf_neg[hard_negatives] # (sum(n_hard_negatives)
# As in the paper, averaged over positive priors only, although computed over both positive and hard-negative priors
loss_of_conf = (loss_of_conf_pos.sum() + loss_of_conf_hard_neg.sum()) / positive_priors.sum().float() # (), scalar
# TOTAL LOSS
return loss_of_conf + alpha * loss_of_loc
6. 目標(biāo)檢測(cè)
6.1 非極大值抑制
在最后進(jìn)行目標(biāo)檢測(cè)的時(shí)候,我們不希望輸出過(guò)多的預(yù)測(cè)邊界框(此時(shí)的邊界框存在大量的重疊),這時(shí)候我們需要進(jìn)行非極大值抑制,把認(rèn)為是重疊的邊界框(不同預(yù)測(cè)邊界框之間的交并比大于給定閾值認(rèn)為是重疊)去除,只保留信度最大的邊界框
def none_max_suppress(priors_cxcy, pred_locs, pred_scores, min_score, max_overlap, top_k):
"""
執(zhí)行非極大值預(yù)測(cè)
:param priors_cxcy: 中心格式的priors
:param pred_locs: 預(yù)測(cè)的offsets,預(yù)測(cè)器的輸出
:param pred_scores: 預(yù)測(cè)的得分,預(yù)測(cè)器的輸出
:param min_score: 設(shè)置接收的最小得分
:param max_overlap: 設(shè)置抑制的最大交并比
:param top_k: 保留至多top_k個(gè)預(yù)測(cè)目標(biāo)
:return: 壓縮后邊緣形式的邊界框、類別、得分
"""
batch_size = priors.size(0)
n_priors = priors.size(0)
n_classes = pred_scores.size(2)
pred_scores = torch.softmax(pred_scores, dim=2) # (batch_size, n_priors, n_classes)
assert n_priors == pred_scores.size(1) == pred_locs.size(1)
boxes_all_image = []
scores_all_image = []
labels_all_image = []
for i in range(batch_size):
# 將預(yù)測(cè)的offset解碼為邊緣形式的邊界框
boxes = cxcy_to_xy(gcxgcy_to_cxcy(pred_locs[i], priors_cxcy)) # (n_priors, 4)
boxes_per_image = []
scores_per_image = []
labels_per_image = []
for c in range(1, n_classes):
class_scores = pred_scores[i, :, c] # (8732)
score_above_min = class_scores > min_score
n_score_above_min = score_above_min.sum().item()
if n_score_above_min == 0:
continue
# 僅保留score>min_score的預(yù)測(cè)
class_scores = class_scores[score_above_min]
class_boxes = boxes[score_above_min]
# 按檢測(cè)信度排序
class_scores, sorted_ind = class_scores.sort(dim=0, descending=True) # (n_score_above_min)
class_boxes = class_boxes[sorted_ind] # (n_score_above_min, 4)
# 按交并比進(jìn)行非極大值壓縮
overlap = find_jaccard_overlap(class_boxes, class_boxes) # (n_score_above_min, n_score_above_min)
# 創(chuàng)建記錄是否被壓縮的掩碼,1代表壓縮
suppress = torch.zeros((n_score_above_min), dtype=torch.uint8).to(device)
for b_id in range(n_score_above_min):
# 若已被掩碼記錄為壓縮,則跳過(guò)
if suppress[b_id] == 1:
continue
# 按預(yù)測(cè)邊框間的交并比是否>max_overlap更新mask,并保持原來(lái)被壓縮的邊界框不變
suppress = torch.max(suppress, (overlap[box] > max_overlap).byte())
# 不壓縮當(dāng)前邊界框
suppress[b_id] = 0
# 僅為每個(gè)類存儲(chǔ)未被壓縮的預(yù)測(cè)
boxes_per_image.append(class_boxes[(1 - suppress).bool()])
scores_per_image.append(class_scores[(1 - suppress).bool()])
labels_per_image.append(torch.LongTensor([c] * (1 - suppress).sum().item()))
# 如果該圖片中沒有包含任何類別, 則把整個(gè)圖片標(biāo)注為背景類
if len(labels_per_image) == 0:
boxes_per_image.append(torch.FloatTensor([0, 0, 1, 1]).to(device))
labels_per_image.append(torch.LongTensor([0]).to(device))
scores_per_image.append(torch.FloatTensor([0]).to(device))
boxes_per_image = torch.cat(boxes_per_image, dim=0) # (n_objects, 4)
scores_per_image = torch.cat(scores_per_image, dim=0) # (n_objects)
labels_per_image = torch.cat(labels_per_image, dim=0) # (n_objects)
n_object = boxes_per_image.size(0)
# 只保留按信度排序前K個(gè)目標(biāo)
if n_object > top_k:
scores_per_image, sorted_ind = scores_per_image.sort(dim=0, descending=True)
scores_per_image = scores_per_image[:top_k]
boxes_per_image = boxes_per_image[sorted_ind][:top_k]
labels_per_image = labels_per_image[sorted_ind][:top_k]
boxes_all_image.append(boxes_per_image)
scores_all_image.append(scores_per_image)
labels_all_image.append(labels_per_image)
return boxes_all_image, labels_all_image, scores_all_image # 長(zhǎng)度為batch_size的列表
額外部分:一些注意點(diǎn)
我們將各層特征圖的輸出連接成一個(gè)tensor,此時(shí)conv4_3 feature maps處于較低層,其features數(shù)值比之高層的大很多(下采樣會(huì)使特征響應(yīng)的數(shù)值減?。?/em>,因此我們可以選擇對(duì)feature maps進(jìn)行歸一化(如L2 normalization)后,再放大其特征響應(yīng)(該factor由網(wǎng)絡(luò)自己學(xué)習(xí))。我認(rèn)為Batch Normalization同樣也適用。
-
使用dtype=torch.bool或torch.uint8(至少1.3.0之后就廢除了uint8的索引操作了)為多維tensor進(jìn)行索引操作,得到的索引結(jié)果是flatten的(注:此 bool tensor的位置與原 tensor一一時(shí),若不是則會(huì)保留dim(即使還維剩余1個(gè)數(shù)組),切片則會(huì)把僅剩一個(gè)數(shù)組的維度給壓縮了),如
x = torch.rand((2, 3, 4)) # 假設(shè)有一半的數(shù)據(jù)>0.5 y = x > 0.5 # y in shape of (2, 3, 4),一半是True,一半是False print(x[y].shape) # tenor in shape of(12) -
提高訓(xùn)練速度的一些操作
torch.backends.cudnn.benchmark = True
dataloader的pin_memory=True,使用GPU中的鎖頁(yè)內(nèi)存(不與虛擬內(nèi)存交換數(shù)據(jù)以加快速度),需要GPU內(nèi)存足夠,更具體內(nèi)容參考:https://blog.csdn.net/tfcy694/article/details/83270701
這里沒用使用eval函數(shù)去評(píng)價(jià)模型實(shí)際的效果,可以選擇使用mAP。在保存最好的網(wǎng)絡(luò)模型時(shí),可以考慮eval指標(biāo)的增加來(lái)保留下好的參數(shù),同時(shí)可以用此eval指標(biāo)控制epochs提前終止
新人上路,請(qǐng)多多關(guān)注??,純手動(dòng)不易,歡迎討論
轉(zhuǎn)載請(qǐng)說(shuō)明出處。
