动图·一区二区,日韩欧美专区,日日亚洲精品

本文主要通過通過keras版本的代碼來講解:https://github.com/Jeozhao/Keras-FasterRCNN
原文鏈接:http://www.ee.bgu.ac.il/~rrtammy/DNN/reading/FastSun.pdf

1.faster RCNN整個流程圖

圖1 faster R-CNN流程圖

其實RCNN系列目標(biāo)檢測，大致分為兩個階段：一是獲取候選區(qū)域（region proposal 或 RoI），二是對候選區(qū)域進(jìn)行分類判斷以及邊框回歸。Faster R-CNN其實也是符合兩個階段，只是Faste R-CNN使用RPN網(wǎng)絡(luò)提取候選框，后面的分類和邊框回歸和R-CNN差不多。所以有時候我們可以將faster r-cnn看成RPN部分和R-CNN部分。
從如圖1可以看出，faster r-cnn又包含了以下4重要的部分：

1. Conv layers
這里應(yīng)該理解為基本卷積網(wǎng)絡(luò)(base net).通過該網(wǎng)絡(luò)來提取原始圖片的featuremap特征,最后將這些特征送入RPN網(wǎng)絡(luò)和RCNN網(wǎng)絡(luò)。有一點需要注意的就是,真正送入RPN網(wǎng)絡(luò)的featuremap其實并不是整張圖片的產(chǎn)生的featuremap,具體怎么選擇,后面仔細(xì)說明。在本文的講解中，我們會使用到兩種base Net：vgg16 和 Resnet50.

2. RPN網(wǎng)絡(luò)
RPN網(wǎng)絡(luò)用于生成region proposals（也可以說是RoI-region of interest）。該層通過sigmoid函數(shù)判斷anchors屬于foreground或者background（其實就是一個二分類，論文代碼-caffe版本用的softmax輸出兩個值，前景和背景的概率，本文使用keras版本指數(shù)一個值表示前景的概率），再利用bounding box regression修正anchors獲得修正后的RoI。

3. Roi Pooling
該層通過輸入feature maps和RoI，其中featuremap就是base Net提取的，而RoI是RPN網(wǎng)絡(luò)提取的。通過該層pooling實現(xiàn)提取RoI的feature maps，送入后續(xù)全連接層判定目標(biāo)類別。

4. Classifier
該部分，叫做分類部分，其實就是對候選區(qū)域進(jìn)行檢測部分了。利用RoI feature maps計算RoI的類別，同時再次bounding box regression獲得檢測框最終的位置。

2.定義網(wǎng)絡(luò)

2.1 VGG16版本的base Net.

? ? ? ?也就是前面提到的Conv layers，可以看到，該定義的網(wǎng)絡(luò)在標(biāo)準(zhǔn)的VGG16的基礎(chǔ)上去掉了后面的全連接層和softmax層。注意網(wǎng)絡(luò)中的名字不能亂命名，一定要保持和標(biāo)準(zhǔn)的VGG16網(wǎng)絡(luò)一直，因為最后訓(xùn)練網(wǎng)絡(luò)進(jìn)行初始化的時候，需要根據(jù)名字加載預(yù)訓(xùn)練的網(wǎng)絡(luò)。
? ? ? ?可以看到整個網(wǎng)絡(luò)由5個Block組成：

Block1和Block2：
? ? ? ?由2個（33）的卷積層和1個（22）的最大池化層構(gòu)成，由于設(shè)置的卷積層的邊界padding為1，stride默認(rèn)為（1，1），所以可以知道（33）的卷積層并不改變featuremap的長寬尺度，僅僅改變的featuremap的通道數(shù)。而最大池化層池化核的大小為（2,2）同時stride為（2,2），所以經(jīng)過池化后，featuremap的長寬都變?yōu)樵瓉淼?/2.
Block3和Block4：
? ? ? ?由三個（33）的卷積層和1個（22）的最大池化層構(gòu)成。其中每個層的構(gòu)成與Bloc1和Block2一致。也即卷積層不會改變featuremap的大小，只有池化層會縮小featuremap的尺度。
Block5：
? ? ? ?僅僅有三個(33）的卷基層。

從整個網(wǎng)絡(luò)可以得出：
假如輸入的圖片的shape為：（600 * 600 * 3）
輸出的featuremap的shape為：（600/16 * 600/16 * 512) = (37 * 37 * 512)
注：假設(shè)不考慮batch維。

def nn_base(input_tensor=None, trainable=False):


    # Determine proper input shape
    if K.image_dim_ordering() == 'th':
        input_shape = (3, None, None)
    else:
        input_shape = (None, None, 3)

    if input_tensor is None:
        img_input = Input(shape=input_shape)
    else:
        if not K.is_keras_tensor(input_tensor):
            img_input = Input(tensor=input_tensor, shape=input_shape)
        else:
            img_input = input_tensor

    if K.image_dim_ordering() == 'tf':
        bn_axis = 3
    else:
        bn_axis = 1

    # Block 1
    x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
    x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

    # Block 2
    x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
    x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

    # Block 3
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)

    # Block 4
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)

    # Block 5
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)

    return x

? ? ? ?如果給網(wǎng)絡(luò)輸入圖片如下所示:

網(wǎng)絡(luò)輸入原始圖片

則輸出Vgg16的各層特征為(只選擇幾層):

block1_conv2層的feature map, 共64個通道中選擇16個

block2_conv2層的feature map, 共128個通道選擇16個

block3_conv3層的feature map, 共256個通道選擇16個

block4_conv3層的feature map, 共512個通道選擇16個

block5_conv5層的feature map, 共512個通道選擇16個

2.2 RPN網(wǎng)絡(luò)的定義。

圖2 RPN網(wǎng)絡(luò)的結(jié)構(gòu)

該網(wǎng)絡(luò)非常簡單，僅僅在前面定義的base net的基礎(chǔ)上加了一個（33）的卷基層，然后就是由兩個一個（11）的卷基層構(gòu)成的輸出層。一個輸出用于判斷前景和背景，另外一個用于bboxes回歸.而且，這里的卷積層都不改變featuremap的尺度，僅僅改變通道數(shù)。

該網(wǎng)絡(luò)的輸入為：
base_layers: 也就是前面Vgg版本的base Net網(wǎng)絡(luò)最后的輸出。假設(shè)輸入base Net的圖片尺度為（600 * 600 * 3).則該RPN輸入featuremap的shape也就是（37 * 37 * 512)。
num_anchors: 這個是值得每個錨點產(chǎn)生的RoI的數(shù)量。例如：根據(jù)論文中anchors的尺度為：[16, 32, 64]共3種, 長寬比例為：[1:1,1:2,2:1]也是三種。則num_anchors=3*3.
(該值并不固定，可能需要根據(jù)具體實驗數(shù)據(jù)以及應(yīng)用場景做相應(yīng)的修改)

網(wǎng)絡(luò)的輸出為：
x_class: 根據(jù)前面的輸入，可知輸出的shape為：（37 * 37 * 9).注意在論文中輸出的時29=18維，因為考慮使用的時softmax分別輸出forground和background的概率，但是次數(shù)僅僅輸出foreground的概率所以時19=9維。效果其實是一樣的。
x_regr: bboxes回歸層.bboxes回歸由于是RCNN系列的核心部分,所以需要特別說明.請參照第二篇的第5章
? ? ? ?

def rpn(base_layers, num_anchors):

    x = Conv2D(512, (3, 3), padding='same', activation='relu', kernel_initializer='normal', name='rpn_conv1')(base_layers)

    x_class = Conv2D(num_anchors, (1, 1), activation='sigmoid', kernel_initializer='uniform', name='rpn_out_class')(x)
    x_regr = Conv2D(num_anchors * 4, (1, 1), activation='linear', kernel_initializer='zero', name='rpn_out_regress')(x)

    return [x_class, x_regr, base_layers]

2.3 最終的classifier部分網(wǎng)絡(luò)的定義：

? ? ? ?最終的分類器,就是將RPN提取的RoI的作為訓(xùn)練數(shù)據(jù).最后得出每個RoI對應(yīng)的類別,和bboxes.也就是說該網(wǎng)絡(luò)也會有兩個輸出:一個是對RoI的分類共有21個類,其二是bboxes回歸,用于修正邊框,和RPN網(wǎng)絡(luò)類似.

網(wǎng)絡(luò)的輸入:
base_layer: 也就是前面的Vgg網(wǎng)絡(luò)的輸出,同樣其shape為(37 * 37 * 512 )
input_rois: 就是RPN網(wǎng)絡(luò)提取的RoI.
num_rois: 前面R-CNN和fast R-CNN通過Slective search提取的RoI的數(shù)量大約是2000個,但是由于RPN網(wǎng)絡(luò)提取的RoI是有目的性的,僅僅提取其中不超過300個就好.在代碼本keras版本的代碼中,默認(rèn)設(shè)置的時32個,這個參數(shù)可以根據(jù)實際情況調(diào)整.
nb_classes: 指的數(shù)據(jù)集中所有的類別數(shù),有20個前景類別,另外加一個背景,總共21類

網(wǎng)絡(luò)輸出:
out_calss: 也就是對應(yīng)每個RoI輸出一個包含21個類別的輸出.
out_regr: 也就是對應(yīng)每個RoI的每個類別有4個修正參數(shù)

? ? ? ?注:網(wǎng)絡(luò)中每層執(zhí)行完后的輸出featuremap的shape都標(biāo)注在代碼中.整個網(wǎng)絡(luò)定義中有一個很牛逼的部件:TimeDistributed.就是在進(jìn)行卷積等操作的時候,保持第一個維度不變,只針對后面的維度進(jìn)行修改.

def classifier(base_layers, input_rois, num_rois, nb_classes = 21, trainable=False):

    # compile times on theano tend to be very high, so we use smaller ROI pooling regions to workaround

    if K.backend() == 'tensorflow':
        pooling_regions = 7
        input_shape = (num_rois,7,7,512)
    elif K.backend() == 'theano':
        pooling_regions = 7
        input_shape = (num_rois,512,7,7)

    out_roi_pool = RoiPoolingConv(pooling_regions, num_rois)([base_layers, input_rois])

    out = TimeDistributed(Flatten(name='flatten'))(out_roi_pool)
    out = TimeDistributed(Dense(4096, activation='relu', name='fc1'))(out)
    out = TimeDistributed(Dropout(0.5))(out)
    out = TimeDistributed(Dense(4096, activation='relu', name='fc2'))(out)
    out = TimeDistributed(Dropout(0.5))(out)

    out_class = TimeDistributed(Dense(nb_classes, activation='softmax', kernel_initializer='zero'), name='dense_class_{}'.format(nb_classes))(out)
    # note: no regression target for bg class
    out_regr = TimeDistributed(Dense(4 * (nb_classes-1), activation='linear', kernel_initializer='zero'), name='dense_regress_{}'.format(nb_classes))(out)

    return [out_class, out_regr]

3. 下篇:Faster R-CNN從原理詳解（基于keras代碼）(二)

[參考鏈接]:

https://zhuanlan.zhihu.com/p/31426458
http://geyao1995.com/Faster_rcnn%E4%BB%A3%E7%A0%81%E7%AC%94%E8%AE%B0_test_2_roi_helpers/
https://dongjk.github.io/code/object+detection/keras/2018/05/21/Faster_R-CNN_step_by_step,_Part_I.html

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Faster R-CNN原理詳解（基于keras代碼）(一)

Faster R-CNN原理詳解（基于keras代碼）(一)

1.faster RCNN整個流程圖

2.定義網(wǎng)絡(luò)

2.1 VGG16版本的base Net.

2.2 RPN網(wǎng)絡(luò)的定義。

2.3 最終的classifier部分網(wǎng)絡(luò)的定義：

3. 下篇:Faster R-CNN從原理詳解（基于keras代碼）(二)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Faster R-CNN原理詳解（基于keras代碼）(一)

1.faster RCNN整個流程圖

2.定義網(wǎng)絡(luò)

2.1 VGG16版本的base Net.

2.2 RPN網(wǎng)絡(luò)的定義。

2.3 最終的classifier部分網(wǎng)絡(luò)的定義：

3. 下篇:Faster R-CNN從原理詳解（基于keras代碼）(二)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

2.2 RPN網(wǎng)絡(luò)的定義。