一、前言

最近一直在研究深度學(xué)習(xí)在目標(biāo)檢測(cè)的應(yīng)用，看完了YOLOv2的paper和YAD2K的實(shí)現(xiàn)源碼，來總結(jié)一下自己的收獲，以便于加深理解。

二、關(guān)于目標(biāo)檢測(cè)

目標(biāo)檢測(cè)可簡(jiǎn)單劃分成兩個(gè)任務(wù)，一個(gè)是分類，一個(gè)是確定bounding boxes。目前目標(biāo)檢測(cè)領(lǐng)域的深度學(xué)習(xí)方法主要分為兩類：two stage的目標(biāo)檢測(cè)算法；one stage的目標(biāo)檢測(cè)算法。前者是先由算法生成一系列作為樣本的候選框，再通過卷積神經(jīng)網(wǎng)絡(luò)進(jìn)行樣本分類；后者則不用產(chǎn)生候選框，直接將目標(biāo)邊框定位的問題轉(zhuǎn)化為回歸問題處理。正是由于兩種方法的差異，在性能上也有不同，前者在檢測(cè)準(zhǔn)確率和定位精度上占優(yōu)，后者在算法速度上占優(yōu)。YOLO（You Only Look Once ）則是一種one stage的目標(biāo)檢測(cè)算法，目前已經(jīng)迭代發(fā)布了三個(gè)版本YOLOv1、YOLOv2、YOLOv3。本文著重介紹的是YOLOv2。

三、YOLOv2的改進(jìn)

作者在論文中主要總結(jié)了關(guān)于YOLOv2的三個(gè)方面改進(jìn)：Better、Faster、Stronger。這不是本片文章我想分享的主要內(nèi)容，因?yàn)橛刑嗖┲饕呀?jīng)寫的很透徹了，所以這部分我就只是很簡(jiǎn)單的稍微敘述了作者的思想，公式比較難編輯也基本沒寫。可以看下我黑體字的概括，如果想要了解更多的細(xì)節(jié)，可以搜搜別的博客看看。

YOLOv2的改進(jìn)

1、Better

（1）batch Normalization
每個(gè)卷積層后均使用batch Normalization
采用Batch Normalization可以提升模型收斂速度，而且可以起到一定正則化效果，降低模型的過擬合。在YOLOv2中，每個(gè)卷積層后面都添加了Batch Normalization層，并且不再使用droput。使用Batch Normalization后，YOLOv2的mAP提升了2.4%。

Bacth_Normalizing
（2）High ResolutionClassifier
預(yù)訓(xùn)練分類模型采用了更高分辨率的圖片
YOLOv1先在ImageNet（224x224）分類數(shù)據(jù)集上預(yù)訓(xùn)練模型的主體部分（大部分目標(biāo)檢測(cè)算法），獲得較好的分類效果，然后再訓(xùn)練網(wǎng)絡(luò)的時(shí)候?qū)⒕W(wǎng)絡(luò)的輸入從224x224增加為448x448。但是直接切換分辨率，檢測(cè)模型可能難以快速適應(yīng)高分辨率。所以YOLOv2增加了在ImageNet數(shù)據(jù)集上使用448x448的輸入來finetune分類網(wǎng)絡(luò)這一中間過程（10 epochs），這可以使得模型在檢測(cè)數(shù)據(jù)集上finetune之前已經(jīng)適用高分辨率輸入。使用高分辨率分類器后，YOLOv2的mAP提升了約4%。

YOLOv2訓(xùn)練的三個(gè)階段
（3）Convolutional With Anchor Boxes
使用了anchor boxes去預(yù)測(cè)bounding boxes，去掉了最后的全連接層，網(wǎng)絡(luò)僅采用了卷積層和池化層
在YOLOv1中，輸入圖片最終被劃分為7x7的gird cell，每個(gè)單元格預(yù)測(cè)2個(gè)邊界框。YOLOv1最后采用的是全連接層直接對(duì)邊界框進(jìn)行預(yù)測(cè)，其中邊界框的寬與高是相對(duì)整張圖片大小的，而由于各個(gè)圖片中存在不同尺度和長(zhǎng)寬比（scales and ratios）的物體，YOLOv1在訓(xùn)練過程中學(xué)習(xí)適應(yīng)不同物體的形狀是比較困難的，這也導(dǎo)致YOLOv1在精確定位方面表現(xiàn)較差。YOLOv2則引入了一個(gè)anchor boxes的概念，這樣做的目的就是得到更高的召回率，yolov1只有98個(gè)邊界框，yolov2可以達(dá)到1000多個(gè)（論文中的實(shí)現(xiàn)是845個(gè)）。還去除了全連接層，保留一定空間結(jié)構(gòu)信息，網(wǎng)絡(luò)僅由卷積層和池化層構(gòu)成。輸入由448x448變?yōu)?16x416，下采樣32倍，輸出為13x13x5x25。采用奇數(shù)的gird cell 是因?yàn)榇髨D像的中心往往位于圖像中間，為了避免四個(gè)gird cell參與預(yù)測(cè)，我們更希望用一個(gè)gird cell去預(yù)測(cè)。結(jié)果mAP由69.5下降到69.2，下降了0.3，召回率由81%提升到88%，提升7%。盡管mAP下降，但召回率的上升意味著我們的模型有更大的提升空間。
（4）Dimension Clusters（關(guān)于anchor boxes的第一個(gè)問題：如何確定尺寸）
利用Kmeans聚類，解決了anchor boxes的尺寸選擇問題
在Faster R-CNN和SSD中，先驗(yàn)框的維度（長(zhǎng)和寬）都是手動(dòng)設(shè)定的，帶有一定的主觀性。如果選取的先驗(yàn)框維度比較合適，那么模型更容易學(xué)習(xí)，從而做出更好的預(yù)測(cè)。因此，YOLOv2采用k-means聚類方法對(duì)訓(xùn)練集中的邊界框做了聚類分析。比較了復(fù)雜度和精確度后，選用了K值為5。因?yàn)樵O(shè)置先驗(yàn)框的主要目的是為了使得預(yù)測(cè)框與ground truth的IOU更好，所以聚類分析時(shí)選用box與聚類中心box之間的IOU值作為距離指標(biāo)：

距離公式

Dimension_Clusters.png
（5）Direction locationprediction（關(guān)于anchor boxes的第二個(gè)問題：如何確定位置）
引入Sigmoid函數(shù)預(yù)測(cè)offset，解決了anchor boxes的預(yù)測(cè)位置問題，采用了新的損失函數(shù)
作者借鑒了RPN網(wǎng)絡(luò)使用的anchor boxes去預(yù)測(cè)bounding boxes相對(duì)于圖片分辨率的offset，通過(x,y,w,h)四個(gè)維度去確定anchor boxes的位置，但是這樣在早期迭代中x,y會(huì)非常不穩(wěn)定，因?yàn)镽PN是一個(gè)區(qū)域預(yù)測(cè)一次，但是YOLO中是169個(gè)gird cell一起預(yù)測(cè)，處于A gird cell 的x,y可能會(huì)跑到B gird cell中，到處亂跑，導(dǎo)致不穩(wěn)定。作者巧妙的引用了sigmoid函數(shù)來規(guī)約x,y的值在（0,1）輕松解決了這個(gè)offset的問題。關(guān)于w,h的也改進(jìn)了YOLOv1中平方差的差的平方的方法，用了RPN中的log函數(shù)。
（6）Fine-Grained Features
采用了passthrough層，去捕捉更細(xì)粒度的特征
YOLOv2提出了一種passthrough層來利用更精細(xì)的特征圖，F(xiàn)ine-Grained Features之后YOLOv2的性能有1%的提升。
（7）Multi-Scale Training
采用不同尺寸的圖片訓(xùn)練，提高魯棒性
由于YOLOv2模型中只有卷積層和池化層，所以YOLOv2的輸入可以不限于416x416大小的圖片。為了增強(qiáng)模型的魯棒性，YOLOv2采用了多尺度輸入訓(xùn)練策略，具體來說就是在訓(xùn)練過程中每間隔一定的iterations之后改變模型的輸入圖片大小。由于YOLOv2的下采樣總步長(zhǎng)為32，輸入圖片大小選擇一系列為32倍數(shù)的值：{320,352,384,...,608}，輸入圖片最小為320x320，此時(shí)對(duì)應(yīng)的特征圖大小為10x10（不是奇數(shù)了，確實(shí)有點(diǎn)尷尬），而輸入圖片最大為 608x608，對(duì)應(yīng)的特征圖大小為19x19。在訓(xùn)練過程，每隔10個(gè)iterations隨機(jī)選擇一種輸入圖片大小，然后只需要修改對(duì)最后檢測(cè)層的處理就可以重新訓(xùn)練。采用Multi-Scale Training策略，YOLOv2可以適應(yīng)不同大小的圖片，并且預(yù)測(cè)出很好的結(jié)果。

2、Faster

大多數(shù)檢測(cè)框架依賴于VGG-16作為的基本特征提取器。VGG-16是一個(gè)強(qiáng)大的，準(zhǔn)確的分類網(wǎng)絡(luò)，但它是不必要的復(fù)雜。在單張圖像224×224分辨率的情況下VGG-16的卷積層運(yùn)行一次前饋傳播需要306.90億次浮點(diǎn)運(yùn)算。YOLO框架使用基于Googlenet架構(gòu)的自定義網(wǎng)絡(luò)。這個(gè)網(wǎng)絡(luò)比VGG-16更快，一次前饋傳播只有85.2億次的操作。然而，它的準(zhǔn)確性比VGG-16略差。在ImageNet上，對(duì)于單張裁剪圖像，224×224分辨率下的top-5準(zhǔn)確率，YOLO的自定義模型獲得了88.0%，而VGG-16則為90.0%。YOLOv2使用Darknet-19網(wǎng)絡(luò)，有19個(gè)卷積層和5個(gè)最大池化層。相比YOLOv1的24個(gè)卷積層和2個(gè)全連接層精簡(jiǎn)了網(wǎng)絡(luò)。

YOLOv2網(wǎng)絡(luò)圖.png

3、Stronger

這里作者的想法也很新穎，解決了2個(gè)不同數(shù)據(jù)集相互排斥(mutualy exclusive)的問題。作者提出了WordTree，使用該樹形結(jié)構(gòu)成功的解決了不同數(shù)據(jù)集中的排斥問題。使用該樹形結(jié)構(gòu)進(jìn)行分層的預(yù)測(cè)分類，在某個(gè)閾值處結(jié)束或者最終達(dá)到葉子節(jié)點(diǎn)處結(jié)束。下面這副圖將有助于WordTree這個(gè)概念的理解。

word_tree

四、YAD2K代碼解析

YAD2K用了90%的Keras和10%Tensorflow實(shí)現(xiàn)的YOLOv2。下面主要分析一下/yad2k/models/keras_yolo.py這個(gè)文件里的代碼。
提示：其實(shí)boxes的坐標(biāo)是[y,x,h,w]而不是[x,y,w,h]。
流程：數(shù)據(jù)先經(jīng)過preprocess_true_boxes（）函數(shù)處理，然后做一些處理輸入到模型，損失函數(shù)是yolo_loss（），網(wǎng)絡(luò)最后一個(gè)卷積層的輸出作為函數(shù)yolo_head（）的輸入，然后再使用函數(shù)yolo_eval（），得到結(jié)果。

1、preprocess_true_boxes（）

這個(gè)函數(shù)是得到detectors_mask（最佳預(yù)測(cè)的anchor boxes，每一個(gè)true boxes都對(duì)應(yīng)一個(gè)anchor boxes），matching_true_boxes（用于后面和pred_boxes做差求loss）代碼后都給了比較詳細(xì)的注釋

def preprocess_true_boxes(true_boxes, anchors, image_size):
 """
參數(shù)
--------------
true_boxes : 實(shí)際框的位置和類別，我們的輸入。二個(gè)維度：
第一個(gè)維度：一張圖片中有幾個(gè)實(shí)際框
第二個(gè)維度： [x, y, w, h, class]，x,y 是框中心點(diǎn)坐標(biāo)，w,h 是框的寬度和高度。x,y,w,h 均是除以圖片
           分辨率得到的[0,1]范圍的比值。
  
anchors : 實(shí)際anchor boxes 的值，論文中使用了五個(gè)。[w,h]，都是相對(duì)于gird cell 的比值。二個(gè)維度：
第一個(gè)維度：anchor boxes的數(shù)量，這里是5
第二個(gè)維度：[w,h]，w,h,都是相對(duì)于gird cell長(zhǎng)寬的比值。
           [1.08, 1.19], [3.42, 4.41], [6.63, 11.38], [9.42, 5.11], [16.62, 10.52]
              
        
image_size : 圖片的實(shí)際尺寸。這里是416x416。


Returns
--------------
detectors_mask : 取值是0或者1，這里的shape是[13,13,5,1]，四個(gè)維度。
第一個(gè)維度：true_boxes的中心位于第幾行（y方向上屬于第幾個(gè)gird cell）
第二個(gè)維度：true_boxes的中心位于第幾列（x方向上屬于第幾個(gè)gird cell）
第三個(gè)維度：哪個(gè)anchor box
第四個(gè)維度：0/1。1的就是用于預(yù)測(cè)改true boxes 的 anchor boxes

matching_true_boxes: 這里的shape是[13,13,5,5]，四個(gè)維度。
第一個(gè)維度：true_boxes的中心位于第幾行（y方向上屬于第幾個(gè)gird cel）
第二個(gè)維度：true_boxes的中心位于第幾列（x方向上屬于第幾個(gè)gird cel）
第三個(gè)維度：第幾個(gè)anchor box
第四個(gè)維度：[x,y,w,h,class]。這里的x，y表示offset，是相當(dāng)于gird cell的，w,h是取了log函數(shù)的，
class是屬于第幾類。后面的代碼會(huì)詳細(xì)看到
"""

    height, width = image_size
    num_anchors = len(anchors)

    assert height % 32 == 0,   '輸入的圖片的高度必須是32的倍數(shù)，不然會(huì)報(bào)錯(cuò)。'
    assert width % 32 == 0,   '輸入的圖片的寬度必須是32的倍數(shù)，不然會(huì)報(bào)錯(cuò)。'

    conv_height = height // 32    '進(jìn)行g(shù)ird cell劃分'
    conv_width = width // 32    '進(jìn)行g(shù)ird cell劃分'

    num_box_params = true_boxes.shape[1] 
    detectors_mask = np.zeros(
        (conv_height, conv_width, num_anchors, 1), dtype=np.float32)
    matching_true_boxes = np.zeros(
        (conv_height, conv_width, num_anchors, num_box_params),
        dtype=np.float32)    '確定detectors_mask和matching_true_boxes的維度，用0填充'

    for box in true_boxes:    '遍歷實(shí)際框'
        box_class = box[4:5]    '提取類別信息，屬于哪類'

        box = box[0:4] * np.array(
            [conv_width, conv_height, conv_width, conv_height])   '換算成相對(duì)于gird cell的值'

        i = np.floor(box[1]).astype('int')    '（y方向上屬于第幾個(gè)gird cell）'
        j = np.floor(box[0]).astype('int')    '（x方向上屬于第幾個(gè)gird cell）'
        best_iou = 0
        best_anchor = 0


        '計(jì)算anchor boxes 和 true boxes的iou，找到最佳預(yù)測(cè)的一個(gè)anchor boxes'
        for k, anchor in enumerate(anchors):
            # Find IOU between box shifted to origin and anchor box.
            box_maxes = box[2:4] / 2.
            box_mins = -box_maxes
            anchor_maxes = (anchor / 2.)
            anchor_mins = -anchor_maxes

            intersect_mins = np.maximum(box_mins, anchor_mins)
            intersect_maxes = np.minimum(box_maxes, anchor_maxes)
            intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
            intersect_area = intersect_wh[0] * intersect_wh[1]
            box_area = box[2] * box[3]
            anchor_area = anchor[0] * anchor[1]
            iou = intersect_area / (box_area + anchor_area - intersect_area)
            if iou > best_iou:
                best_iou = iou
                best_anchor = k


        if best_iou > 0:
            detectors_mask[i, j, best_anchor] = 1  '找到最佳預(yù)測(cè)anchor boxes'
            adjusted_box = np.array(
                [
                    box[0] - j, box[1] - i, 'x,y都是相對(duì)于gird cell的位置，左上角[0,0]，右下角[1,1]'
                    np.log(box[2] / anchors[best_anchor][0]),  '對(duì)應(yīng)實(shí)際框w,h和anchor boxes w,h的比值取log函數(shù)'
                    np.log(box[3] / anchors[best_anchor][1]), box_class  'class實(shí)際框的物體是屬于第幾類'
                ],
                dtype=np.float32)
            matching_true_boxes[i, j, best_anchor] = adjusted_box   
    return detectors_mask, matching_true_boxes

2、yolo_head（）

這個(gè)函數(shù)是輸入yolo的輸出層的特征，轉(zhuǎn)化成相對(duì)于gird cell坐標(biāo)的x,y，相對(duì)于gird cell長(zhǎng)寬的w,h，pred_confidence是判斷否存在物體的概率，pred_class_prob是sofrmax后各個(gè)類別分別的概率。返回值x,y,w,h在loss function中計(jì)算iou，然后計(jì)算iou損失。然后和pred_confidence計(jì)算confidence_loss，pred_class_prob用于計(jì)算classification_loss。

def yolo_head(feats, anchors, num_classes):
    """Convert final layer features to bounding box parameters.

    參數(shù)
    ----------
    feats : 神經(jīng)網(wǎng)絡(luò)最后一層的輸出，shape：[-1,13,13,125]

    anchors : 實(shí)際anchor boxes 的值，論文中使用了五個(gè)。[w,h]，都是相對(duì)于gird cell 長(zhǎng)寬的比值。二個(gè)維度：
    第一個(gè)維度：anchor boxes的數(shù)量，這里是5
    第二個(gè)維度：[w,h]，w,h,都是相對(duì)于gird cell 長(zhǎng)寬的比值。
    [1.08, 1.19], [3.42, 4.41], [6.63, 11.38], [9.42, 5.11], [16.62, 10.52]

    num_classes : 類別個(gè)數(shù)（有多少類）

    返回值
    -------
    box_xy : 每張圖片的每個(gè)gird cell中的每個(gè)pred_boxes中心點(diǎn)x,y相對(duì)于其所在gird cell的坐標(biāo)值，左上頂點(diǎn)為[0,0],右下頂點(diǎn)為[1,1]。
    有五個(gè)維度，shape:[-1,13,13,5,2].
    第一個(gè)維度：圖片張數(shù)
    第二個(gè)維度：每組x,y在pred_boxes的行坐標(biāo)信息（y方向上屬于第幾個(gè)gird cell）
    第三個(gè)維度：每組x,y在pred_boxes的列坐標(biāo)信息（x方向上屬于第幾個(gè)gird cell）
    第四個(gè)維度：每組x,y的anchor box信息（使用第幾個(gè)anchor boxes）
    第五個(gè)維度：[x,y],中心點(diǎn)x,y相對(duì)于gird cell的坐標(biāo)值
        
    box_wh : 每張圖片的每個(gè)gird cell中的每個(gè)pred_boxes的w,h都是相對(duì)于gird cell的比值
    有五個(gè)維度，shape:[-1,13,13,5,2].
    第一個(gè)維度：圖片張數(shù)
    第二個(gè)維度：每組w,h對(duì)應(yīng)的x,y在pred_boxes的行坐標(biāo)信息（y方向上屬于第幾個(gè)gird cell）
    第三個(gè)維度：每組w,h對(duì)應(yīng)的x,y在pred_boxes的列坐標(biāo)信息（x方向上屬于第幾個(gè)gird cell）
    第四個(gè)維度：每組w,h對(duì)應(yīng)的x,y的anchor box信息（使用第幾個(gè)anchor boxes）
    第五個(gè)維度：[w,h],w,h都是相對(duì)于gird cell的比值

    box_confidence : 每張圖片的每個(gè)gird cell中的每個(gè)pred_boxes的，判斷是否存在可檢測(cè)物體的概率。五個(gè)維度，shape:[-1,13,13,5,1]。各維度信息同上。

    box_class_pred : 每張圖片的每個(gè)gird cell中的每個(gè)pred_boxes所框起來的各個(gè)類別分別的概率(經(jīng)過了softmax)。shape:[-1,13,13,5,20]
        
    """
    num_anchors = len(anchors)
    # Reshape to batch, height, width, num_anchors, box_params.
    anchors_tensor = K.reshape(K.variable(anchors), [1, 1, 1, num_anchors, 2])

    conv_dims = K.shape(feats)[1:3]  '用多少個(gè)gird cell劃分圖片，這里是13x13'
    # In YOLO the height index is the inner most iteration.
    conv_height_index = K.arange(0, stop=conv_dims[0])
    conv_width_index = K.arange(0, stop=conv_dims[1])
    conv_height_index = K.tile(conv_height_index, [conv_dims[1]])

    conv_width_index = K.tile(
        K.expand_dims(conv_width_index, 0), [conv_dims[0], 1])
    conv_width_index = K.flatten(K.transpose(conv_width_index))
    conv_index = K.transpose(K.stack([conv_height_index, conv_width_index]))
    conv_index = K.reshape(conv_index, [1, conv_dims[0], conv_dims[1], 1, 2])  'shape:[1，13，13，1，2]'
    conv_index = K.cast(conv_index, K.dtype(feats))

    '
    tile（）：平移，
    expand_dims（）：增加維度
    transpose（）：轉(zhuǎn)置
    flatten（）：降成一維
    stack（）：堆積，增加一個(gè)維度
    conv_index:[0,0],[0,1],...,[0,12],[1,0],[1,1],...,[12,12]（大概是這個(gè)樣子）
    '

    feats = K.reshape(
        feats, [-1, conv_dims[0], conv_dims[1], num_anchors, num_classes + 5])
    conv_dims = K.cast(K.reshape(conv_dims, [1, 1, 1, 1, 2]), K.dtype(feats))

    box_xy = K.sigmoid(feats[..., :2])
    box_wh = K.exp(feats[..., 2:4])
    box_confidence = K.sigmoid(feats[..., 4:5])
    box_class_probs = K.softmax(feats[..., 5:])

    # Adjust preditions to each spatial grid point and anchor size.
    # Note: YOLO iterates over height index before width index.
    box_xy = (box_xy + conv_index) / conv_dims
    box_wh = box_wh * anchors_tensor / conv_dims

    return box_xy, box_wh, box_confidence, box_class_probs

3、yolo_loss（）

YOLOv2的損失函數(shù)較YOLOv1也有比較大的改變，主要分為三大部分的損失，IOU損失，分類損失，坐標(biāo)損失。IOU損失分為了no_objects_loss和objects_loss，兩者相比對(duì)objects_loss的懲罰更大。下面簡(jiǎn)單介紹一下和YOLOv1的區(qū)別。

3.1、confidence_loss：

YOLOv2中，總共有845個(gè)anchor_boxes，與true_boxes匹配的用于預(yù)測(cè)pred_boxes，未與true_boxes匹配的anchor_boxes用于預(yù)測(cè)background。

objects_loss（true_boxes所匹配的anchor_boxes）
與true_boxes所匹配的anchor_boxes去和預(yù)測(cè)的pred_boxes計(jì)算objects_loss。
no_objects_loss（true_boxes未匹配的anchor_boxes）
1、未與true_boxes所匹配的anchor_boxes中，若與true_boxes的IOU>0.6，則無需計(jì)算loss。
2、未與true_boxes所匹配的anchor_boxes中，若與true_boxes的IOU<0.6，則計(jì)算no_objects_loss。

這里疑惑點(diǎn)比較多，也比較繞，不太好理解，自己當(dāng)時(shí)也理解錯(cuò)了。后來自己理解：confidence是為了衡量anchor_boxes是否有物體的置信度，對(duì)于負(fù)責(zé)預(yù)測(cè)前景（pred_boxes）的anchors_boxes來說，我們必須計(jì)算objects_loss；對(duì)于負(fù)責(zé)預(yù)測(cè)背景（background）的anchors_boxes來說，若與true_boxes的IOU<0.6，我們需要計(jì)算no_objects_loss。這兩條都好理解，因?yàn)槎际歉鞲筛鞯幕睢５襞ctrue_boxes的IOU>0.6時(shí)，則不需要計(jì)算no_objects_loss。這是為什么呢？因?yàn)樗o了我們驚喜，我們不忍苛責(zé)它。一個(gè)負(fù)責(zé)預(yù)測(cè)背景的anchor_boxes居然和true_boxes的IOU>0.6，框的甚至比那些本來就負(fù)責(zé)預(yù)測(cè)前景的anchors要準(zhǔn)，吃的是草，擠的是奶，怎么能再懲罰它呢？好了言歸正傳，我個(gè)人覺得是因?yàn)楸籺rue_boxes的中心點(diǎn)可能在附近的gird cell里，但是true_boxes又比較大，導(dǎo)致它和附近gird cell里的anchors_boxes的IOU很大，那么這部分造成的損失可以不進(jìn)行計(jì)算，畢竟它確實(shí)框的也準(zhǔn)。就像faster rcnn中0.3<IOU<0.7的anchors一樣不造成損失，因?yàn)檫@部分并不是重點(diǎn)需要優(yōu)化的對(duì)象。
與YOLOv1不同的是修正系數(shù)的改變，YOLOv1中no_objects_loss和objects_loss分別是0.5和1，而YOLOv2中則是1和5。

3.2、classification_loss：

這部分和YOLOv1基本一致，就是經(jīng)過softmax（）后，20維向量（數(shù)據(jù)集中分類種類為20種）的均方誤差。

3.3、coordinates_loss：

這里較YOLOv1的改動(dòng)較大，計(jì)算x,y的誤差由相對(duì)于整個(gè)圖像（416x416）的offset坐標(biāo)誤差的均方改變?yōu)橄鄬?duì)于gird cell的offset（這個(gè)offset是取sigmoid函數(shù)得到的處于（0,1）的值）坐標(biāo)誤差的均方。也將修正系數(shù)由5改為了1 。計(jì)算w,h的誤差由w,h平方根的差的均方誤差變?yōu)榱耍?strong>w,h與對(duì)true_boxes匹配的anchor_boxes的長(zhǎng)寬的比值取log函數(shù)，和YOLOv1的想法一樣，對(duì)于相等的誤差值，降低對(duì)大物體誤差的懲罰，加大對(duì)小物體誤差的懲罰。同時(shí)也將修正系數(shù)由5改為了1。

def yolo_loss(args,
              anchors,
              num_classes,
              rescore_confidence=False,
              print_loss=False):
    """
    參數(shù)
    ----------
    yolo_output : 神經(jīng)網(wǎng)絡(luò)最后一層的輸出，shape:[batch_size,13,13,125]
        
    true_boxes : 實(shí)際框的位置和類別，我們的輸入。三個(gè)維度：
    第一個(gè)維度：圖片張數(shù)
    第二個(gè)維度：一張圖片中有幾個(gè)實(shí)際框
    第三個(gè)維度： [x, y, w, h, class]，x,y 是實(shí)際框的中心點(diǎn)坐標(biāo)，w,h 是框的寬度和高度。x,y,w,h 均是除以圖片分辨率得到的[0,1]范圍的值。


    detectors_mask : 取值是0或者1，這里的shape是[ batch_size，13,13,5,1]，其值可參考函數(shù)preprocess_true_boxes（）的輸出，五個(gè)維度：
    第一個(gè)維度：圖片張數(shù)
    第二個(gè)維度：true_boxes的中心位于第幾行（y方向上屬于第幾個(gè)gird cell）
    第三個(gè)維度：true_boxes的中心位于第幾列（x方向上屬于第幾個(gè)gird cell）
    第四個(gè)維度：哪個(gè)anchor box
    第五個(gè)維度：0/1。1的就是用于預(yù)測(cè)改true boxes 的 anchor boxes

    matching_true_boxes :這里的shape是[-1,13,13,5,5]，其值可參考函數(shù)preprocess_true_boxes（）的輸出，五個(gè)維度：
    第一個(gè)維度：圖片張數(shù)
    第二個(gè)維度：true_boxes的中心位于第幾行（y方向上屬于第幾個(gè)gird cel）
    第三個(gè)維度：true_boxes的中心位于第幾列（x方向上屬于第幾個(gè)gird cel）
    第四個(gè)維度：第幾個(gè)anchor box
    第五個(gè)維度：[x,y,w,h,class]。這里的x，y表示offset，是相當(dāng)于gird cell的坐標(biāo)，w,h是取了log函數(shù)的，class是屬于第幾類。

    anchors : 實(shí)際anchor boxes 的值，論文中使用了五個(gè)。[w,h]，都是相對(duì)于gird cell 長(zhǎng)寬的比值。二個(gè)維度：
    第一個(gè)維度：anchor boxes的數(shù)量，這里是5
    第二個(gè)維度：[w,h]，w,h,都是相對(duì)于gird cell 長(zhǎng)寬的比值。
    [1.08, 1.19], [3.42, 4.41], [6.63, 11.38], [9.42, 5.11], [16.62, 10.52]

    num_classes :類別個(gè)數(shù)（有多少類）

    rescore_confidence : bool值，F(xiàn)alse和True計(jì)算confidence_loss的objects_loss不同，后面代碼可以看到。

    print_loss : bool值，是否打印損失，包括總損失，IOU損失，分類損失，坐標(biāo)損失

   返回值
    -------
    total_loss : float，總損失    
    """
    (yolo_output, true_boxes, detectors_mask, matching_true_boxes) = args
    num_anchors = len(anchors)
    object_scale = 5  '物體位于gird cell時(shí)計(jì)算置信度的修正系數(shù)'
    no_object_scale = 1  '物體位于gird cell時(shí)計(jì)算置信度的修正系數(shù)'
    class_scale = 1   '計(jì)算分類損失的修正系數(shù)'
    coordinates_scale = 1  '計(jì)算坐標(biāo)損失的修正系數(shù)'

    pred_xy, pred_wh, pred_confidence, pred_class_prob = yolo_head(
        yolo_output, anchors, num_classes)

    yolo_output_shape = K.shape(yolo_output)
    feats = K.reshape(yolo_output, [
        -1, yolo_output_shape[1], yolo_output_shape[2], num_anchors,
        num_classes + 5])           'shape:[-1,13,13,5,25]'

    pred_boxes = K.concatenate(
        (K.sigmoid(feats[..., 0:2]), feats[..., 2:4]), axis=-1)
    '合并得到pred_boxes的x,y,w,h，用于和matching_true_boxes計(jì)算坐標(biāo)損失,shape:[-1,13,13,5,4]'


    # Expand pred x,y,w,h to allow comparison with ground truth.
    # batch, conv_height, conv_width, num_anchors, num_true_boxes, box_params
    pred_xy = K.expand_dims(pred_xy, 4)  '增加一個(gè)維度由[-1,13,13,5,2]變成[-1,13,13,5,1,2]'
    pred_wh = K.expand_dims(pred_wh, 4)  '增加一個(gè)維度由[-1,13,13,5,2]變成[-1,13,13,5,1,2]'

    pred_wh_half = pred_wh / 2.
    pred_mins = pred_xy - pred_wh_half
    pred_maxes = pred_xy + pred_wh_half
    '計(jì)算pred_boxes左上頂點(diǎn)和右下頂點(diǎn)的坐標(biāo)'

    true_boxes_shape = K.shape(true_boxes)

    true_boxes = K.reshape(true_boxes, [true_boxes_shape[0], 1, 1, 1, true_boxes_shape[1], true_boxes_shape[2]]) 
    'shape:[-1,1,1,1,-1,5],batch, conv_height, conv_width, num_anchors, num_true_boxes, box_params'

    true_xy = true_boxes[..., 0:2]
    true_wh = true_boxes[..., 2:4]

    true_wh_half = true_wh / 2.
    true_mins = true_xy - true_wh_half
    true_maxes = true_xy + true_wh_half
    '計(jì)算true_boxes左上頂點(diǎn)和右下頂點(diǎn)的坐標(biāo)'


    intersect_mins = K.maximum(pred_mins, true_mins)
    intersect_maxes = K.minimum(pred_maxes, true_maxes)
    intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.)
    intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1]

    pred_areas = pred_wh[..., 0] * pred_wh[..., 1]
    true_areas = true_wh[..., 0] * true_wh[..., 1]

    union_areas = pred_areas + true_areas - intersect_areas
    iou_scores = intersect_areas / union_areas
    '計(jì)算出所有anchor boxes（這里是一張圖片845個(gè)）和true_boxes的IOU，shape:[-1,13,13,5,2,1]'

    
    best_ious = K.max(iou_scores, axis=4)  '這里很有意思，若兩個(gè)true_boxes落在同一個(gè)gird cell里，我只取iou最大的那一個(gè)，
    因?yàn)閎est_iou這個(gè)值只關(guān)心在這個(gè)gird cell中最大的那個(gè)iou，不關(guān)心來自于哪個(gè)true_boxes。'

    best_ious = K.expand_dims(best_ious)  'shape:[1,-1,13,13,5,1]'

    object_detections = K.cast(best_ious > 0.6, K.dtype(best_ious)) 
     '選出IOU大于0.6的，不關(guān)注其損失。cast（）函數(shù)，第一個(gè)參數(shù)是bool值，dtype是int，就會(huì)轉(zhuǎn)換成0,1'

    no_object_weights = (no_object_scale * (1 - object_detections) *
                         (1 - detectors_mask))
    no_objects_loss = no_object_weights * K.square(-pred_confidence)

    if rescore_confidence:
        objects_loss = (object_scale * detectors_mask *
                        K.square(best_ious - pred_confidence))
    else:
        objects_loss = (object_scale * detectors_mask *
                        K.square(1 - pred_confidence))
    confidence_loss = objects_loss + no_objects_loss
    '計(jì)算confidence_loss，no_objects_loss是計(jì)算background的誤差， objects_loss是計(jì)算與true_box匹配的anchor_boxes的誤差，相比較no_objects_loss更關(guān)注這部分誤差，其修正系數(shù)為5'


    matching_classes = K.cast(matching_true_boxes[..., 4], 'int32')
    matching_classes = K.one_hot(matching_classes, num_classes)
    classification_loss = (class_scale * detectors_mask *
                           K.square(matching_classes - pred_class_prob))
    '計(jì)算classification_loss，20維向量的差'
    
    matching_boxes = matching_true_boxes[..., 0:4]
    coordinates_loss = (coordinates_scale * detectors_mask *
                        K.square(matching_boxes - pred_boxes))
    '計(jì)算coordinates_loss， x,y都是offset的均方損失，w,h是取了對(duì)數(shù)的均方損失，與YOLOv1中的平方根的差的均方類似，效果比其略好一點(diǎn)'

    confidence_loss_sum = K.sum(confidence_loss)
    classification_loss_sum = K.sum(classification_loss)
    coordinates_loss_sum = K.sum(coordinates_loss)
    total_loss = 0.5 * (
        confidence_loss_sum + classification_loss_sum + coordinates_loss_sum)
    if print_loss:
        total_loss = tf.Print(
            total_loss, [
                total_loss, confidence_loss_sum, classification_loss_sum,
                coordinates_loss_sum
            ],
            message='yolo_loss, conf_loss, class_loss, box_coord_loss:')

    return total_loss

4、 yolo_boxes_to_corners（）

這個(gè)函數(shù)很簡(jiǎn)單，就是將yolo_head（）函數(shù)輸出的的x,y作為輸入，求出該boxes的左上頂點(diǎn)和右下頂點(diǎn)，作為yolo_filter_boxes（）的輸入，可用于畫出bounding box。

def yolo_boxes_to_corners(box_xy, box_wh):

    box_mins = box_xy - (box_wh / 2.)
    box_maxes = box_xy + (box_wh / 2.)

    return K.concatenate([
        box_mins[..., 1:2],  # y_min
        box_mins[..., 0:1],  # x_min
        box_maxes[..., 1:2],  # y_max
        box_maxes[..., 0:1]  # x_max
    ])

5、yolo_filter_boxes（）

從845個(gè) pred_boxes中選出置信度大于0.6的作為最終的predict bounding boxes，實(shí)際訓(xùn)練時(shí)取了0.3，返回它的左上頂點(diǎn)和右下頂點(diǎn)坐標(biāo)，置信度，分類類別。

def yolo_filter_boxes(boxes, box_confidence, box_class_probs, threshold=.6):
   
    box_scores = box_confidence * box_class_probs '定義一個(gè)box_scores，就是該 bounding boxes的置信度。shape:[-1,13,13,5,20]'
    box_classes = K.argmax(box_scores, axis=-1)  '求出最大box_scores的索引，即屬于第幾類'
    box_class_scores = K.max(box_scores, axis=-1) '求出最大box_scores的值，作為bounding boxes的置信度'
    prediction_mask = box_class_scores >= threshold '選出box_scores大于設(shè)定閾值的anchor_boxes，bool值，配合tf.boolean_mask（）函數(shù)獲取True所在位置的值'

    boxes = tf.boolean_mask(boxes, prediction_mask)  ' 符合要求的bounding boxes'
    scores = tf.boolean_mask(box_class_scores, prediction_mask)  '其對(duì)應(yīng)的置信度'
    classes = tf.boolean_mask(box_classes, prediction_mask)   '其對(duì)應(yīng)的分類結(jié)果'

    return boxes, scores, classes

6、yolo_eval（）

其中嵌套使用了yolo_boxes_to_corners（）函數(shù)和yolo_filter_boxes（）函數(shù)，然后對(duì)使用了置信度篩選后的bounding boxes使用了非極大值抑制輸出 boxes, scores, classes，分別是bounding boxes的左上頂點(diǎn)和右下頂點(diǎn)的坐標(biāo)，bounding boxes的置信度，bounding boxes的的分類類別。

def yolo_eval(yolo_outputs,
              image_shape,
              max_boxes=10,
              score_threshold=.6,
              iou_threshold=.5):
    box_xy, box_wh, box_confidence, box_class_probs = yolo_outputs  'yolo_outputs是yolo_head的輸出'
    boxes = yolo_boxes_to_corners(box_xy, box_wh)
    boxes, scores, classes = yolo_filter_boxes(
        boxes, box_confidence, box_class_probs, threshold=score_threshold)

    'image_shape,(416x416)'
    height = image_shape[0]  
    width = image_shape[1]
    image_dims = K.stack([height, width, height, width])
    image_dims = K.reshape(image_dims, [1, 4])
    boxes = boxes * image_dims  '乘以圖片分辨率，得到真實(shí)的x,y,w,h'

    '運(yùn)行一下NMS，非極大值抑制，iou_threshold默認(rèn)是0.5，在訓(xùn)練時(shí)實(shí)際取了0.9，但是這里沒有分類別使用，我猜測(cè)這也是提高閾值的原因吧'
    max_boxes_tensor = K.variable(max_boxes, dtype='int32')
    K.get_session().run(tf.variables_initializer([max_boxes_tensor]))
    nms_index = tf.image.non_max_suppression(
        boxes, scores, max_boxes_tensor, iou_threshold=iou_threshold)
    boxes = K.gather(boxes, nms_index)  'gather()函數(shù)，獲取索引對(duì)應(yīng)的值'
    scores = K.gather(scores, nms_index)
    classes = K.gather(classes, nms_index)
    return boxes, scores, classes

五、YOLO的優(yōu)缺點(diǎn)

不得不感嘆作者的創(chuàng)新能力，給我們帶來了這么好的YOLO。YOLO算法的優(yōu)點(diǎn)不言而喻，you only look once，不吃計(jì)算資源，在精度保證的情況下，運(yùn)行速度快。缺點(diǎn)也很明顯就是bounding boxes的位置不夠準(zhǔn)確，對(duì)于小物體和密集物體檢測(cè)效果差，召回率較低，但這也是YOLOv2主要改進(jìn)的地方。

六、個(gè)人問題

同時(shí)這里提出一個(gè)YOLO的問題，在YAD2K中，如果一張圖片中有兩個(gè)true boxes，然后位于同一個(gè)gird cell，最優(yōu)匹配了同一個(gè)anchor boxes，似乎只能預(yù)測(cè)那個(gè)IOU最好的一個(gè)true boxes，不知道YOLOv2的源代碼是否有這樣的問題，也希望有大佬指點(diǎn)一下我。。。

七、總結(jié)

寫出這篇文章也是自己的總結(jié)，看論文啃代碼的確實(shí)很難熬，但是也讓自己更加深刻的理解了YOLO的來龍去脈，領(lǐng)略了作者的思想，收獲了更多。接下來我也要去領(lǐng)略一下YOLOv3的魅力了，如果有時(shí)間，我也會(huì)將YOLOv3的學(xué)習(xí)過程分享出來。加油！

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

目標(biāo)檢測(cè)之YOLOv2，最詳細(xì)的代碼解析

目標(biāo)檢測(cè)之YOLOv2，最詳細(xì)的代碼解析

一、前言

二、關(guān)于目標(biāo)檢測(cè)

三、YOLOv2的改進(jìn)

1、Better

2、Faster

3、Stronger

四、YAD2K代碼解析

1、preprocess_true_boxes（）

2、yolo_head（）

3、yolo_loss（）

3.1、confidence_loss：

3.2、classification_loss：

3.3、coordinates_loss：

4、 yolo_boxes_to_corners（）

5、yolo_filter_boxes（）

6、yolo_eval（）

五、YOLO的優(yōu)缺點(diǎn)

六、個(gè)人問題

七、總結(jié)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

目標(biāo)檢測(cè)之YOLOv2，最詳細(xì)的代碼解析

一、前言

二、關(guān)于目標(biāo)檢測(cè)

三、YOLOv2的改進(jìn)

1、Better

2、Faster

3、Stronger

四、YAD2K代碼解析

1、preprocess_true_boxes（）

2、yolo_head（）

3、yolo_loss（）

3.1、confidence_loss：

3.2、classification_loss：

3.3、coordinates_loss：

4、 yolo_boxes_to_corners（）

5、yolo_filter_boxes（）

6、yolo_eval（）

五、YOLO的優(yōu)缺點(diǎn)

六、個(gè)人問題

七、總結(jié)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

目標(biāo)檢測(cè)之YOLOv2，最詳細(xì)的代碼解析

一、前言

二、關(guān)于目標(biāo)檢測(cè)

三、YOLOv2的改進(jìn)

3、Stronger

四、YAD2K代碼解析

1、preprocess_true_boxes（）

2、yolo_head（）

3、yolo_loss（）

3.1、confidence_loss：

3.2、classification_loss：

3.3、coordinates_loss：

4、 yolo_boxes_to_corners（）

5、yolo_filter_boxes（）

六、個(gè)人問題