faster rcnn論文筆記和代碼分析

  • 目錄

    • faster rcnn論文備注
    • caffe代碼框架簡介
    • faster rcnn代碼分析
    • 后記
  • faster rcnn論文備注

    • 引言
      faster rcnn paper是Ross Girshick在基于CNN生成region proposal提速識別方案, 主要體現(xiàn)在復(fù)用前面卷積后的feature map和多框一次出, feature map一路生成框結(jié)合另一路做分類.尤其是測試時計算出proposal時間消耗極小(By sharing convolutions at test-time, the marginal cost for computing proposals is small e.g., 10ms per image).
    • 主要組件

      這個圖摘自faster rcnn的論文
      重要包含如下幾個組件:
      1. 輸入層,僅在訓(xùn)練時有用.每次按照配置從一個epoch的圖片拿一個批次的圖片,最短邊縮放到600像素.每次一個epoch完成后shuffle圖片排序
      2. CNN層, 接收resize的圖片,經(jīng)過卷積和池化,通過加pad使得每次卷積后大小不變,池化后減半,最后feature map和輸入圖成比例關(guān)系,被后面RPN(Region Proposal Network)和ROI層復(fù)用
      3. RPN層(Region Proposal Network), 輸入是一個feature map n×n的滑窗(論文中n = 3),輸出是一組框和對應(yīng)框的得分,對應(yīng)VGG16網(wǎng)絡(luò)結(jié)構(gòu)一個滑窗可以覆蓋228像素區(qū)域輔助上錨點(Anchor),可以翻譯成9個區(qū)域.這層拆出2個loss,將框送入ROI層
      4. ROI層,接收RPN的輸入和CNN的輸入獲取proposal的feature map的輸入送入分類器
      5. 分類層,接收ROI層的feature輸入給出分類的結(jié)果,這層有兩個loss一個是分類的loss一個是框的loss
    • CNN層,卷基層的網(wǎng)絡(luò)接口如下:
      faster RCNN卷積

      共有13個卷積層后置一個relu的激活, 4個池化.這是CNN部分的caffe prototxt

      layer {
        name: "conv1_1"
        type: "Convolution"
        bottom: "data"
        top: "conv1_1"
        param {
          lr_mult: 0
          decay_mult: 0
        }
        param {
          lr_mult: 0
          decay_mult: 0
        }
        convolution_param {
          num_output: 64
          pad: 1
          kernel_size: 3
        }
      }
      layer {
        name: "relu1_1"
        type: "ReLU"
        bottom: "conv1_1"
        top: "conv1_1"
      }
      layer {
        name: "conv1_2"
        type: "Convolution"
        bottom: "conv1_1"
        top: "conv1_2"
        param {
          lr_mult: 0
          decay_mult: 0
        }
        param {
          lr_mult: 0
          decay_mult: 0
        }
        convolution_param {
          num_output: 64
          pad: 1
          kernel_size: 3
        }
      }
      layer {
        name: "relu1_2"
        type: "ReLU"
        bottom: "conv1_2"
        top: "conv1_2"
      }
      layer {
        name: "pool1"
        type: "Pooling"
        bottom: "conv1_2"
        top: "pool1"
        pooling_param {
          pool: MAX
          kernel_size: 2
          stride: 2
        }
      }
      #中間層此處省略 #
      layer {
        name: "conv5_3"
        type: "Convolution"
        bottom: "conv5_2"
        top: "conv5_3"
        param {
          lr_mult: 1
        }
        param {
          lr_mult: 2
        }
        convolution_param {
          num_output: 512
          pad: 1
          kernel_size: 3
        }
      }
      layer {
        name: "relu5_3"
        type: "ReLU"
        bottom: "conv5_3"
        top: "conv5_3"
      }
      
      可以看出每次卷積核(kernel)大小是3,墊置(pad)大小是1,從cs231n#conv中可以看出卷積后大小關(guān)系: (W - 3 + 2)/1 + 1 = W,卷積的輸入寬高和輸出層的寬高大小不變.池化層的參數(shù)kernel size = 2, stride = 2以極大值池化,每次池化寬高減半
      總共4個池化,最后卷積輸出的通道數(shù)512(VGG16),feature map大小和輸入的縮放圖映射對應(yīng)比例是1/16,卷基層的最終輸出是'conv5_3',輸入一路送入RPN算出對應(yīng)的框,一路送入ROI算出對應(yīng)feature map進(jìn)行分類
    • Region Proposal Networks(RPN)

      模型中負(fù)責(zé)生成'框'的網(wǎng)絡(luò), 輸入是CNN中feature map中n×n的一個滑窗,輸出是認(rèn)為有物體的框和對應(yīng)得分.一個滑窗的有效覆蓋范圍是228x228,經(jīng)過錨點的映射后(缺省scale 和radio都是[0.5:1, 1:1, 2:1])成為9個框,下圖出資論文原圖針對VGG
      可以看出anchor給出的框大小和橫縱的適應(yīng)性,通常一幅圖像滑動feature map滑動窗大小是2400,anchor的總數(shù)約為20K左右(For a convolutional feature map of a size W ? H (typically ?2,400), there are WHk anchors intotal.) anchor設(shè)計是一個關(guān)鍵點,不用每次將圖片resize到不同大小重新計算特征值,所有anchor的預(yù)測都是基于同一份feature(The design of multiscale anchors is a key component for sharing features without extra cost for addressing scales.)

      RPN接收一個512xHxW的feature map,經(jīng)過一次卷積之后甩出2路,一路用于生成K個框(2值cls, FG和BG得分),一路生成對應(yīng)得分(4值bbox標(biāo)識矩形框),網(wǎng)絡(luò)結(jié)構(gòu)如下:

      layer {
        name: "rpn_conv/3x3"
        type: "Convolution"
        bottom: "conv5_3"
        top: "rpn/output"
        param { lr_mult: 1.0 }
        param { lr_mult: 2.0 }
        convolution_param {
          num_output: 512
          kernel_size: 3 pad: 1 stride: 1
          weight_filler { type: "gaussian" std: 0.01 }
          bias_filler { type: "constant" value: 0 }
        }
      }
      layer {
        name: "rpn_relu/3x3"
        type: "ReLU"
        bottom: "rpn/output"
        top: "rpn/output"
      }
      

      假設(shè)原始訓(xùn)練圖片的shape(3,h_origianl,w_origianl),每個批次一張圖片,經(jīng)過resize后==>(1, 3, h_resized,w_resized)經(jīng)過CNN卷積池化操作之后==>(1,512,h_conv,w_conv) w_resized/16 = w_conv,h_resized/16 = h_conv ,經(jīng)過'rpn_conv/3x3'(F = 3, P = 1, S = 1)后大小依然不變==>(1,512,h_conv,w_conv)但是內(nèi)容已經(jīng)圖像卷積的feature map運算為RPN的基值(適應(yīng)RPN loss從CNN的feature map做了一次轉(zhuǎn)化),滑動窗的個數(shù)就等于w_conv×h_conv,所有anchor的數(shù)目是w_conv×h_conv×k(9)也就說一次rpn的卷積就完成了對全圖的feature map生成proposal的過程借助GPU的并行運算能力非常省時,'rpn_conv/3x3'的輸出作為'rpn_bbox_pred'和'rpn_cls_score'的輸入,'rpn_cls_score'輸出shape(1, 18, w_conv, h_conv), 18對應(yīng)9個anchor的2個得分,因為輸入blob shape(N, C, H, W)中NxHxW要等于預(yù)測/label的個數(shù),所以這里要reshape一下(參數(shù)是shape { dim: 0 dim: 2 dim: -1 dim: 0 } ),再計算cls loss和softmax之前shape變?yōu)?1,2,9×h_conv,w_conv)可以參見softmax_loss_layer.cpp的解釋:

      得出圖形所有的anchor scores一路送入計算loss一路走softmax算出FG和BG的概率.'rpn_cls_prob'輸出是(1,2, 9*h_conv, w_conv),再reshape回(1,18,h_conv, w_conv)每一個window的9個anchor的概率就出來了,結(jié)合對應(yīng)框送入proposal層;'rpn_conv/3x3'的另一路輸出送入了'rpn_bbox_pred'算出對應(yīng)的框(1,36, h_conv, w_conv),'rpn_bbox_pred'一路計算框的loss另一路送入proposal層;proposal層集合輸入的概率和框生成proposal送入ROI層,整體流程如下:
      RPN network

      這塊兒比較容易亂,尤其里面層的實現(xiàn)還是基于python的層和c++實現(xiàn)的loss,對照prototxt圖理解起來好很多
    • Loss計算和訓(xùn)練
      RPN loss包含兩部分: score的loss和bbox的loss,引子原文
      L({pi,ti}) = 1/Ncls×ΣLcls(pi, pi) + λ×1/Lreg×Σpi×Lreg(ti,ti), 其中i在mini-batch中anchor的序號,pi是第i個anchor預(yù)測是物體的概率,pi = 1 if ith anchor is ground true else 0.ti是預(yù)測的正例中矩形4值.Lcls是2值的log loss, Lreg(ti,ti) = R(ti,ti)其中R代表的是RobustLoss, λ是用于平衡兩個loss的參數(shù)默認(rèn)是10.其中矩形框的回歸應(yīng)用:ti預(yù)測框展開 tx = (x - xa)=wa; ty = (y - ya)=ha; tw = log(w/wa); th = log(h/ha); ti ground true矩形展開是 tx = (x?- xa)=wa; ty = (y* - ya)=ha;tw = log(w/wa); th = log(h/ha),其中x,y標(biāo)識矩形中心坐標(biāo),w,h表示寬高,x標(biāo)識預(yù)測坐標(biāo),xa標(biāo)識anchor的坐標(biāo),x標(biāo)識ground true的坐標(biāo),y,w,h類似.如論文所屬這樣的目的是'This can be thought of as bounding-box regression from an
      anchor box to a nearby ground-truth box.' bounding-box regression基于同一份feature map,每個scale和radio不共享參數(shù),獨立回歸一個對應(yīng)的框.基于不同大小比例和橫縱比的原始框和regressors卷積后得到k(9)近似ground true的框.原文如下:

      圖片中的anchor圖像多數(shù)都是反例,造成數(shù)據(jù)不平衡,還有20k左右的anchor數(shù)目太多,隨機(jī)128正例anchor和128反例,假如正例數(shù)目不夠128用反例填充.論文中和代碼中用的是每次訓(xùn)練一張圖,用SGD訓(xùn)練.訓(xùn)練可以是RPN和RCNN交替訓(xùn)練迭代往復(fù),也可以是合成一個大網(wǎng)絡(luò)各自計算各自的loss,作者實驗表明使用大網(wǎng)絡(luò)訓(xùn)練在準(zhǔn)確度差不多的情況下快1~1.5倍.
      再有就是剔除anchor越出圖片邊界的,對于同一個ground true區(qū)域多個anchor都有覆蓋交集(IoU)閾值設(shè)置為0.7,在采用非極大值抑制(NMS)一個圖剩下的anchor大約還有2k,作者有提到NMS沒有顯著影響準(zhǔn)確率而顯著提升了效率.后面作者給出了切割實驗給出了每一個point的效果,比如RPN和RCNN是否共享卷積層影響對比實驗,再比如RPN的效果驗證,把RPN替換成SS后面接上ZF/VGG16看準(zhǔn)確率.這種類似可插拔式的實驗組裝思路非常好,可以驗證每一個點實際cover的作用,但是往往改造起來切割實驗的實現(xiàn)成本比較大.論文只是給了思路和點,實際在工程中具體細(xì)節(jié)還是要看代碼.
  • caffe代碼框架簡介

    • caffe整體結(jié)構(gòu)

      要了解faster rcnn的實現(xiàn)細(xì)節(jié)就要了解caffe的結(jié)構(gòu),以及如何定制自己的層(layer)
      源碼結(jié)構(gòu)

      主要目錄結(jié)構(gòu)如下:
      • include目錄是暴露的cpp接口&class
      • python是python的接口,基于封裝的python和boost python將python調(diào)用翻譯成cpp調(diào)用
      • matlab是matlab接口層
      • src是caffe的實現(xiàn)層

        結(jié)構(gòu)如下:
    • Solver和Net的構(gòu)造
      Solver是一個基礎(chǔ)類,封裝caffe對外的訓(xùn)練和測試操作,類似tensorflow的optimizer,上面架著sgd,adam等等solver,反向傳播更新參數(shù)時有些差異,除了直接構(gòu)造SGDSolver類也可以通過python來創(chuàng)建: self.solver = caffe.SGDSolver(solver_prototxt),公共的基礎(chǔ)操作都維護(hù)在Solver類中
      以一個SGDSolver的構(gòu)造過程看一下里面的結(jié)構(gòu)和操作SGDSolver的構(gòu)造器實現(xiàn)直接放進(jìn)了頭文件里,主要是清理一下歷史,更新,臨時備份的參數(shù),主要工作都在Solver中完成

      template <typename Dtype>
         class SGDSolver : public Solver<Dtype> {
          public:
           explicit SGDSolver(const SolverParameter& param)
               : Solver<Dtype>(param) { PreSolve(); }
           explicit SGDSolver(const string& param_file)
               : Solver<Dtype>(param_file) { PreSolve(); }
           virtual inline const char* type() const { return "SGD"; }
      void SGDSolver<Dtype>::PreSolve() {
           // Initialize the history
           const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params();
           history_.clear();
           update_.clear();
           temp_.clear();
           for (int i = 0; i < net_params.size(); ++i) {
             const vector<int>& shape = net_params[i]->shape();
             history_.push_back(shared_ptr<Blob<Dtype> >(new Blob<Dtype>(shape)));
             update_.push_back(shared_ptr<Blob<Dtype> >(new Blob<Dtype>(shape)));
             temp_.push_back(shared_ptr<Blob<Dtype> >(new Blob<Dtype>(shape)));
           }
         }
      
      // history maintains the historical momentum data.
      // update maintains update related data and is not needed in snapshots.
      // temp maintains other information that might be needed in computation
      //   of gradients/updates and is not needed in snapshots
      vector<shared_ptr<Blob<Dtype> > > history_, update_, temp_;
      

      再看Solver的構(gòu)造, 默認(rèn)root_solver = nullptr, void ReadSolverParamsFromTextFileOrDie(const string& param_file,SolverParameter* param) 主要是從proto反序列化為SolverParameter對象,針對歷史版本做兼容,主要代碼在Init中

      Solver<Dtype>::Solver(const string& param_file, const Solver* root_solver)
      : net_(), callbacks_(), root_solver_(root_solver),
        requested_early_exit_(false) {
      SolverParameter param;
       ReadSolverParamsFromTextFileOrDie(param_file, &param);
        Init(param);
      }
      

      Init()中做了必要的初始化和檢查,比如iter_和current_step_,兩者關(guān)系是:this->current_step_ = this->iter_ / this->param_.stepsize();stepsize是在solver.prototxt中指定,關(guān)聯(lián)學(xué)習(xí)率的修改

          void Solver<Dtype>::Init(const SolverParameter& param) {
            CHECK(Caffe::root_solver() || root_solver_)
                << "root_solver_ needs to be set for all non-root solvers";
            LOG_IF(INFO, Caffe::root_solver()) << "Initializing solver from parameters: "
              << std::endl << param.DebugString();
            param_ = param;
            CHECK_GE(param_.average_loss(), 1) << "average_loss should be non-negative.";
            CheckSnapshotWritePermissions();
            if (Caffe::root_solver() && param_.random_seed() >= 0) {
              Caffe::set_random_seed(param_.random_seed());
            }
            // Scaffolding code
            InitTrainNet();
            if (Caffe::root_solver()) {
              InitTestNets();
              LOG(INFO) << "Solver scaffolding done.";
            }
            iter_ = 0;
            current_step_ = 0;
          }
      

      往下再看InitTrainNet()函數(shù),這里寫偽代碼突出重點和流向,依照這log可以看出代碼的流向:

      solver.cpp:81] Creating training net from train_net file: models/pascal_voc/VGG16/faster_rcnn_end2end/train.prototxt

              void Solver<Dtype>::InitTrainNet() {
            //訓(xùn)練部分參數(shù)的檢查,包含有訓(xùn)練的網(wǎng)絡(luò)參數(shù),是否指定訓(xùn)練文件等等
            deserialize train net file -> net_param
              net_.reset(new Net<Dtype>(net_param));
            }   
      

      重點部分在Net的初始化,抽取的偽代碼如下:

      void Net<Dtype>::Init(const NetParameter& in_param) {
        //過濾參數(shù)
        FilterNet(in_param, &filtered_param);
         // Create a copy of filtered_param with splits added where necessary.
        NetParameter param;
        InsertSplits(filtered_param, &param);
          memory_used_ = 0;
          // set the input blobs
          for (int input_id = 0; input_id < param.input_size(); ++input_id) {
      const int layer_id = -1;  
            // inputs have fake layer ID -1,設(shè)置輸入數(shù)據(jù)blob
            // Helper for Net::Init: add a new input or top blob to the net.  (Inputs have
            // layer_id == -1, tops have layer_id >= 0.)
           //構(gòu)造設(shè)置關(guān)鍵的變量,vector<shared_ptr<Blob<Dtype> > > blobs_( @brief the blobs storing intermediate results between the layer.)  blob_names_, blob_need_backward_, net_input_blob_indices_, net_input_blobs_等等
            AppendTop(param, layer_id, input_id, &available_blobs, &blob_name_to_idx);
            for (int layer_id = 0; layer_id < param.layer_size(); ++layer_id) {
              //構(gòu)造每一層的layer, 這里使用類工廠的設(shè)計模型,通過宏來控制把構(gòu)造函數(shù)放進(jìn)注冊中心,里面會設(shè)置blobs_,后面blobs_會伸出來在net以不同緯度共享引用
              layers_.push_back(LayerRegistry<Dtype>::CreateLayer(layer_param));
              // Figure out this layer's input and output
              for (int bottom_id = 0; bottom_id < layer_param.bottom_size();
           ++bottom_id) {
                //構(gòu)造每一層input blob,此處bottom_vecs_和blobs_通過指針共享blob對象
                const int blob_id = AppendBottom(param, layer_id, bottom_id,&available_blobs, &blob_name_to_idx);
                // If a blob needs backward, this layer should provide it.
                need_backward |= blob_need_backward_[blob_id];
              }
              //設(shè)置每一個layer的輸出, top_vecs_和blobs_通過指針共享blob對象
              for (int top_id = 0; top_id < num_top; ++top_id) {
                AppendTop(param, layer_id, top_id, &available_blobs,&blob_name_to_idx);
              }
              //根據(jù)網(wǎng)絡(luò)設(shè)置layer->AutoTopBlobs(),創(chuàng)建自動輸出的top的blob對象
              //調(diào)用每一層的初始化函數(shù)
              layers_[layer_id]->SetUp(bottom_vecs_[layer_id], top_vecs_[layer_id]);
              //根據(jù)每層內(nèi)的參數(shù)是否設(shè)置了learning rate設(shè)置反向傳播標(biāo)致,構(gòu)造每層的參數(shù)
              for (int param_id = 0; param_id < num_param_blobs; ++param_id) {
                  layers_[layer_id]->set_param_propagate_down(param_id, param_need_backward);
                  AppendParam(param, layer_id, param_id);
              }
            }
            // Handle force_backward if needed.
            for (int layer_id = layers_.size() - 1; layer_id >= 0; --layer_id) {
                set layer_contributes_loss flag
                set layer_need_backward_
            }
            // In the end, all remaining blobs are considered output blobs.
            for (set<string>::iterator it = available_blobs.begin();
                it != available_blobs.end(); ++it) {
                   net_output_blobs_.push_back(blobs_[blob_name_to_idx[*it]].get());
      net_output_blob_indices_.push_back(blob_name_to_idx[*it]);
             }
        LOG_IF(INFO, Caffe::root_solver()) << "Network initialization done.";
      }
      

      至此solver -> net -> layer的初始化構(gòu)造就完成了, 至于每一個layer定制的實現(xiàn)(卷積,池化,定制層)如何耦合進(jìn)入框架稍后會有分析,整個過程圖解如下:
      SGDSolver構(gòu)造
    • 訓(xùn)練一次的step
      網(wǎng)絡(luò)構(gòu)造完成后,就可以訓(xùn)練了, 一般的訓(xùn)練過程是:讀入一批數(shù)據(jù)數(shù)據(jù) -> 正向傳播 -> 基于ground true計算loss ->反向求偏導(dǎo)映射到每個可以訓(xùn)練的layer上根據(jù)訓(xùn)練策略更新參數(shù).

        while (cur < max_repeat){
          data, result_group_true = read_data()
          result_calc = front_propagation(data);
          loss = calc_loss(result_calc, result_group_true);
          dws = compute_partial_derivative_4w(loss)
          update_w_by_strategy()
        }
      

      caffe把一次訓(xùn)練封裝成一次step, SGDSolver直接調(diào)用Solver的step.抽取關(guān)鍵部分,代碼如下:

      void Solver<Dtype>::Step(int iters) {
          end_iter  = cur + iters
          while (cur < end_iter){
              clear_up()
              insert_test_if_need()
              hookup_before()
              Dtype loss = 0;
              for (int i = 0; i < param_.iter_size(); ++i) {
                  loss += net_->ForwardBackward(bottom_vec);
              }
              loss /= param_.iter_size();
              // average the loss across iterations for smoothed reporting,若average_loss為n:loss_容器里面就會存儲前n個loss的值,而smooth_loss_相當(dāng)于做了一個loss平均
              UpdateSmoothedLoss(loss, start_iter, average_loss);
              hookup_after()
              ApplyUpdate();
              take_snapshot_if_necessary()
          }
      }
      

    顯而易見重點就是net_的ForwardBackward(const vector<Blob<Dtype>* > & bottom)和ApplyUpdate().
    首先看下Net的ForwardBackward(const vector<Blob<Dtype>* > & bottom),代碼非常簡單:

     Dtype ForwardBackward(const vector<Blob<Dtype>* > & bottom) {
        Dtype loss;
        Forward(bottom, &loss);
        Backward();
        return loss;
     }
    

    這里有一個點有些奇怪, Step(int iter)中聲明的vector<Blob<Dtype>*> bottom_vec;沒有做任何輸入直接傳入了做正向傳播,捋著代碼看竟然把空的數(shù)據(jù)喂進(jìn)了網(wǎng)絡(luò)的輸入blob 'net_input_blobs_'中,這里以faster rcnn訓(xùn)練網(wǎng)絡(luò)為例, 網(wǎng)絡(luò)里面包含了數(shù)據(jù)輸入層(包括封裝lmdb和做shuffle等等操作),看了下ForwardBackward()在所有測試用例里都沒有額外的初始化.
    net_input_blobs_等于啥都沒放

    const vector<Blob<Dtype>*>& Net<Dtype>::Forward(
    const vector<Blob<Dtype>*> & bottom, Dtype* loss) {
      // Copy bottom to internal bottom
      for (int i = 0; i < bottom.size(); ++i) {
          net_input_blobs_[i]->CopyFrom(*bottom[i]);
      }
      return ForwardPrefilled(loss);
    } 
    

    其中ForwardPrefilled(Dtype* loss)調(diào)用了ForwardFromTo(int start, int end),這里要做全網(wǎng)絡(luò)的FP, 所以是*loss = ForwardFromTo(0, layers_.size() - 1);去除冗余的檢查和debug信息后,代碼非常凝練,這里就完成各個layer之間按照層級FG加loss的組織,各個層只要實現(xiàn)好自己Forward函數(shù)就好了

    Dtype Net<Dtype>::ForwardFromTo(int start, int end) {
      for (int i = start; i <= end; ++i) {
        // LOG(ERROR) << "Forwarding " <<       layer_names_[i];
        Dtype layer_loss = layers_[i]->Forward(bottom_vecs_[i], top_vecs_[i]);
        loss += layer_loss;
      }
      return loss;
    }
    

    在Forward(bottom, &loss);完成后接著進(jìn)行反向傳播Backward(),Backward()除了打了debug信息就調(diào)用了BackwardFromTo(layers_.size() - 1, 0);

    void Net<Dtype>::BackwardFromTo(int start, int end) {
      for (int i = start; i >= end; --i) {
          if (layer_need_backward_[i]) {
            layers_[i]->Backward(top_vecs_[i], bottom_need_backward_[i], bottom_vecs_[i]);
           }
      }
    }
    

    每一層實現(xiàn)的函數(shù)原型是自己定制caffe layer Backward函數(shù),從上面的loss偏導(dǎo)(error gradient)求出本層輸入對應(yīng)的偏導(dǎo),propagate_down標(biāo)識對應(yīng)'bottom'是否計算loss偏導(dǎo),標(biāo)識函數(shù)原型如下:

    /**
     * @brief Given the top blob error gradients, compute the bottom blob error
     *        gradients.
     *
     * @param top
     *     the output blobs, whose diff fields store the gradient of the error
     *     with respect to themselves
     * @param propagate_down
     *     a vector with equal length to bottom, with each index indicating
     *     whether to propagate the error gradients down to the bottom blob at
     *     the corresponding index
     * @param bottom
     *     the input blobs, whose diff fields will store the gradient of the error
     *     with respect to themselves after Backward is run
     *
     * The Backward wrapper calls the relevant device wrapper function
     * (Backward_cpu or Backward_gpu) to compute the bottom blob diffs given the
     * top blob diffs.
     *
     * Your layer should implement Backward_cpu and (optionally) Backward_gpu.
     */
    inline void Backward(const   vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down,
      const vector<Blob<Dtype>*>& bottom);
    

    這樣反向轉(zhuǎn)一遍,bottom_vecs_中就保存著偏導(dǎo)信息.有一點值得注意,net_中包含全量信息(偏導(dǎo),參數(shù),中間的輸入輸出),bottom_vecs_指向的blobs_的某些塊兒

     /// @brief the blobs storing intermediate results between the layer. 
    vector<shared_ptr<Blob<Dtype> > > blobs_;
    
      /// bottom_vecs stores the vectors containing the input for each layer.
     /// They don't actually host the blobs (blobs_ does), so we simply store
     /// pointers.
     vector<vector<Blob<Dtype>*> > bottom_vecs_;    
     bottom_vecs_[layer_id].push_back(blobs_[blob_id].get());
    

    至此一次正向傳播算loss,一次反向傳播算error gradient就完成了,剩下的就是如何更新參數(shù)了,以簡單的SGD為例

    void SGDSolver<Dtype>::ApplyUpdate() {
        Dtype rate = GetLearningRate();
        ClipGradients();
        for (int param_id = 0; param_id < this->net_->learnable_params().size();
       ++param_id) {
            Normalize(param_id);
            Regularize(param_id);
            ComputeUpdateValue(param_id, rate);
        }
        this->net_->Update();
    }
    

    此處caffe里的clip gradient是什么意思?可以參考一下,大概的意思是限速,這不妨礙主流程.
    對于每一個learnable的參數(shù)都是進(jìn)行了一次Normalize, Regularize,然后更新參數(shù).之前在Init時有在每一層AppendParam(net_param, layer_id, param_id);進(jìn)行映射

    params_.push_back(layers_[layer_id]->blobs()[param_id]);
    if (xx condition){
        ...
        const int learnable_param_id = learnable_params_.size();
        learnable_params_.push_back(params_[net_param_id].get());
        ...
    }
    

    更新參數(shù)時就是對learnable的那些blob進(jìn)行axpy操作,一般在CPU模式下是調(diào)用BLAS的cblas_daxpy(N, alpha, X, 1, Y, 1),如果是GPU模式下是cublasSaxpy(Caffe::cublas_handle(), N, &alpha, X, 1, Y, 1).操作data = A*diff + data,完成參數(shù)更新:
    blob基于error gradient更新參數(shù)

    至此一次迭代FG->loss&BG->update就大體清楚了

    • caffe定制自己的層
      • cpp定制層嵌入
        之前將Solver Init的時候提到過Layer的實例化是通過類工廠里注冊自己Layer的構(gòu)造函數(shù)指針實現(xiàn)的,在Solver里只是通過一行l(wèi)ayers_.push_back(LayerRegistry<Dtype>::CreateLayer(layer_param));就實現(xiàn)了
        簡單看下LayerRegistry的結(jié)構(gòu)
      class LayerRegistry {
      public:
        //函數(shù)指針類型定義
        typedef shared_ptr<Layer<Dtype> > (*Creator)(const LayerParameter&);
        typedef std::map<string, Creator> CreatorRegistry;
      
        static CreatorRegistry& Registry() {
          //全局通過name找到構(gòu)造layer函數(shù)指針
          static CreatorRegistry* g_registry_ = new CreatorRegistry();
          return *g_registry_;
        }
      
        // Adds a creator. 添加layer類型
        static void AddCreator(const string& type,     Creator creator) {
            //check exist ... 
            registry[type] = creator;
        }
      
        // Get a layer using a LayerParameter.構(gòu)造一個新的layer對象
        static shared_ptr<Layer<Dtype> >     CreateLayer(const LayerParameter& param) {
       //例行檢查
      return registry[type](param);
       }
      private:
      //確保單例
       LayerRegistry() {}  
      };
      
      LayerRegistry是注冊條目,有LayerRegisterer管理,代碼如下:
       class LayerRegisterer {
       public:
         LayerRegisterer(const string& type,
                    shared_ptr<Layer<Dtype> > (*creator)(const LayerParameter&)) {
             LayerRegistry<Dtype>::AddCreator(type, creator);
        }
      };
      #define REGISTER_LAYER_CREATOR(type, creator)                                  \
      static LayerRegisterer<float>     g_creator_f_##type(#type, creator<float>);     \
      static LayerRegisterer<double>   g_creator_d_##type(#type, creator<double>)    \
      
      #define REGISTER_LAYER_CLASS(type)                                             \
      template <typename Dtype>                                                    \
      shared_ptr<Layer<Dtype> >   Creator_##type##Layer(const LayerParameter& param) \
      {                                                                            \
        return shared_ptr<Layer<Dtype> >(new type##Layer<Dtype>(param));           \
       }                                                                              \
      REGISTER_LAYER_CREATOR(type,   Creator_##type##Layer)
      

      只要是調(diào)到了LayerRegisterer的構(gòu)造器就LayerRegistry放入了類工廠,后面就可以實例化對象了.caffe就是通過宏動態(tài)生成的代碼,把customer的層加入到框架里的,可以參考layer_factory.hpp的注釋
      layer_factory.hpp

      也就是在實現(xiàn)層cpp加入REGISTER_LAYER_CLASS宏就可以了,之前ngx build自己添加的plug in 指定cover那幾個circle也是通過類似的宏手段控制編譯的代碼.
      roi_pooling_layer.cp

      REGISTER_LAYER_CLASS(ROIPooling);翻譯過來的代碼:
      template <typename Dtype>                                                      
      shared_ptr<Layer<Dtype> > Creator_ROIPoolingLayer(const LayerParameter& param) 
      {                                                                            
          return shared_ptr<Layer<Dtype> >(new ROIPoolingLayer<Dtype>(param));           
      }                 
      //這里就調(diào)用了LayerRegisterer的構(gòu)造器進(jìn)而創(chuàng)建了LayerRegistry,這里創(chuàng)建一個float,一個double的                                             
      static LayerRegisterer<float> g_creator_f_ROIPooling(ROIPooling, creator<float>);
      static LayerRegisterer<double> g_creator_d_ROIPooling(ROIPooling, creator<double>)    
      
      • 定制python層, caffe原生有一類的類型就'Python',為了方便python程序員定制自己的layer.實現(xiàn)的代碼在PythonLayer中.通過boost python實現(xiàn)的,首先看一下faster rcnn中一個簡單python層的定義:
      layer {
            name: 'input-data'
            #指定類型
            type: 'Python'
            top: 'data'
            top: 'im_info'
            top: 'gt_boxes'
            python_param {
              #python文件
              module: 'roi_data_layer.layer'
              #對應(yīng)的class
              layer: 'RoIDataLayer'
              #傳遞給python的參數(shù)
              param_str: "'num_classes': 21"
           }
      }
      
      以上就是一個加單的python層的定義,不涉及具體含義,先看下接口定義,和c++層一樣需要實現(xiàn)forward,backward,setup,reshape
      class RoIDataLayer(caffe.Layer):
          def setup(self, bottom, top):
              """Setup the RoIDataLayer."""
              layer_params = yaml.load(self.param_str_)
              #prototxt中定義參數(shù)傳遞到代碼中
              self._num_classes = layer_params['num_classes']
              ...
         def forward(self, bottom, top):
            """Get blobs and copy them into this layer's   top blob vector."""
            blobs = self._get_next_minibatch()
      
            for blob_name, blob in blobs.iteritems():
              top_ind = self._name_to_top_map[blob_name]
              # Reshape net's input blobs
              top[top_ind].reshape(*(blob.shape))
              # Copy data into net's input blobs
              top[top_ind].data[...] = blob.astype(np.float32, copy=False)
      
        def backward(self, top, propagate_down, bottom):
          """This layer does not propagate gradients."""
              pass
      
        def reshape(self, bottom, top):
          """Reshaping happens during the call to forward."""
              pass
      
      當(dāng)然python層只能在cpu模式下運行,不能高效的使用GPU,使用中還是要做適當(dāng)?shù)膖rade off
  • faster rcnn代碼分析

    • 訓(xùn)練
      把卷積層合并后,訓(xùn)練部分網(wǎng)絡(luò)結(jié)構(gòu)如下:
      總共loss有4部分組成RPN部分對應(yīng)論文中的:L({pi,ti}) = 1/Ncls×ΣLcls(pi, pi) + λ×1/Lreg×Σpi×Lreg(ti,t*i),除了內(nèi)置卷積,池化,relu激活,還有定制的python層和cpp層.
      數(shù)據(jù)從input(python實現(xiàn))層開始,讀lmdb一個batch的圖片,卷積后形成feature map一路送入RPN網(wǎng)絡(luò),一路送入ROI層(cpp定制實現(xiàn)),ROI層通RPN層送過來的proposal抽取對應(yīng)proposal的feature map進(jìn)行分類給出分類的loss和二次回歸bbox的loss
    • 以python為入口的代碼分析
      faster rcnn訓(xùn)練分為stage交替訓(xùn)練和一個大網(wǎng)絡(luò)統(tǒng)一訓(xùn)練,因為兩者精度相仿而后者速度是前者1~1.5倍,所以本文都是一個大網(wǎng)絡(luò)分析的.訓(xùn)練和測試方法在基于python+caffe的faster rcnn訓(xùn)練識別有過描述.首先看下訓(xùn)練過程是如何走進(jìn)caffe的內(nèi)部.訓(xùn)練的入口是faster_rcnn_end2end.sh腳本,主要代碼如下:
      time ./tools/train_net.py --gpu ${GPU_ID} \
      --solver   models/${PT_DIR}/${NET}/faster_rcnn_end2end/  solver.prototxt \
      --weights data/imagenet_models/${NET}.v2.caffemodel \
      --imdb ${TRAIN_IMDB} \
      --iters ${ITERS} \
      --cfg experiments/cfgs/faster_rcnn_end2end.yml \
      ${EXTRA_ARGS}
      
      time ./tools/test_net.py --gpu ${GPU_ID} \
      --def   models/${PT_DIR}/${NET}/faster_rcnn_end2end/t    est.prototxt \
      --net ${NET_FINAL} \
      --imdb ${TEST_IMDB} \
      --cfg experiments/cfgs/faster_rcnn_end2end.yml \
      ${EXTRA_ARGS}
      
      訓(xùn)練入口在train_net.py中,測試入口在test_net.py中.抽取重要邏輯train_net.py中邏輯如下
      import caffe
      self.solver = caffe.SGDSolver(solver_prototxt)
      while self.solver.iter < max_iters:
              # Make one SGD update
              self.solver.step(1)
               take_snapshot_if_necessary()
      return model_paths
      
      之前我們已經(jīng)講過了SGDSolver的初始化過程和Step流程.import caffe這一句已經(jīng)包含所有需要的東西了,但是遍歷caffe的python目錄,也沒有caffe.py這個文件, 其實import不僅可以import py文件也可以import目錄,只要這個目錄有__init__.py(不學(xué)習(xí)caffe還真不知道python有這個用法,可以參考下what-is-init-py-for)
      python/caffe的目錄結(jié)構(gòu)
      python/caffe
      看下__init__.py
      from .pycaffe import Net, SGDSolver,   NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
      from ._caffe import set_mode_cpu, set_mode_gpu, set_device, Layer, get_solver, layer_type_list, set_random_seed
      from ._caffe import __version__
      from .proto.caffe_pb2 import TRAIN, TEST
      from .classifier import Classifier
      from .detector import Detector
      from . import io
      from .net_spec import layers, params,     NetSpec, to_proto
      
      可以看出SGDSolver是從pycaffe中取得的
      from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
          RMSPropSolver, AdaDeltaSolver, AdamSolver
      
      _caffe.so是從_caffe.cpp編譯出來的,看下caffe.cpp的代碼,是基于boost python編譯出來的python module將python函數(shù)&類映射成c++的函數(shù)&類,關(guān)鍵部分代碼如下:
      namespace bp = boost::python;
      // Selecting mode.
      void set_mode_gpu() { Caffe::set_mode(Caffe::GPU); }
      //所以編譯出來是_caffe.so的python模塊
      BOOST_PYTHON_MODULE(_caffe) {
        //import的caffe模塊屬性映射
        bp::scope().attr("__version__") = AS_STRING(CAFFE_VERSION);
        //函數(shù)映射
        bp::def("set_mode_gpu", &set_mode_gpu);
        //類映射,python端使用默認(rèn)構(gòu)造器
        bp::class_<Solver<Dtype>, shared_ptr<Solver<Dtype> >, boost::noncopyable>(
        "Solver", bp::no_init)
          //屬性映射
          .add_property("net", &Solver<Dtype>::net)
          .add_property("test_nets", bp::make_function(&Solver<Dtype>::test_nets,
            bp::return_internal_reference<>()))
          .add_property("iter", &Solver<Dtype>::iter)
          .def("solve", static_cast<void (Solver<Dtype>::*)(const char*)>(
            &Solver<Dtype>::Solve), SolveOverloads())
          //關(guān)鍵函數(shù)
          .def("step", &Solver<Dtype>::Step)
          .def("restore", &Solver<Dtype>::Restore)
          .def("snapshot", &Solver<Dtype>::Snapshot);
        //SGDSolver繼承Solver,需要一個string參數(shù)構(gòu)造器,explicit SGDSolver(const string& param_file) : Solver<Dtype>(param_file) { PreSolve(); }
        bp::class_<SGDSolver<Dtype>,   bp::bases<Solver<Dtype> >,
          shared_ptr<SGDSolver<Dtype> >,   boost::noncopyable>(
            "SGDSolver", bp::init<string>());
      }
      
      這樣整個流程從python到c++的串聯(lián)就完成了
      boost python的使用可以參考boost_python_tutorial
    • python layer部分
      • input-data層

        這層的目的是讀入數(shù)據(jù),做預(yù)處理,輸出:圖片內(nèi)容(index:0, name:'data');圖像寬高,縮放比例(index:1, name:'im_info'); label和ground true框信息(index:2, name:'gt_box')如圖所示
        input輸出
        , data/im_info/gt_box送入'rpn-data'層出score的loss,data送入卷基層,gt_boxes送入'roi-data'層(集合proposal輸出roi),im_info送入'proposal'層生成proposal
      layer {
        name: 'input-data'
        type: 'Python'
        top: 'data'
        top: 'im_info'
        top: 'gt_boxes'
        python_param {
        module: 'roi_data_layer.layer'
        layer: 'RoIDataLayer'
        param_str: "'num_classes': N"
        }  
      }
      
      代碼在roi_data_layer/layer.py中
      def forward(self, bottom, top):
          """Get blobs and copy them into this layer's top blob vector."""
          # 獲得blob數(shù)據(jù),key-value形式,按照name 設(shè)置top的輸出順序.
          blobs = self._get_next_minibatch()
      
          for blob_name, blob in blobs.iteritems():
              top_ind = self._name_to_top_map[blob_name]
              # Reshape net's input blobs
              top[top_ind].reshape(*(blob.shape))
              # Copy data into net's input blobs
              top[top_ind].data[...] = blob.astype(np.float32, copy=False)
      
      在_get_next_minibatch中,USE_PREFETCH默認(rèn)是不開啟的,作者發(fā)現(xiàn)沒有太大作用('So far I haven't found this useful; likely more engineering work is required').當(dāng)前拿的batch圖片是否是一個新的epoch,如果是就shuffle一下,為了更好的性能shuffle的時候按照橫圖和縱圖分組.拿到的是lmdb的項,minibatch.py中的get_minibatch獲得完整數(shù)據(jù), 這里有一個點需要注意一下, config.py和在腳本中指定的experiments/cfgs/faster_rcnn_end2end.yml融合成的配置,實際生效的配置需要再檢查一下log('IMS_PER_BATCH': 1)
          def _get_next_minibatch_inds(self):
          """Return the roidb indices for the next minibatch."""
          if self._cur + cfg.TRAIN.IMS_PER_BATCH >= len(self._roidb):
              self._shuffle_roidb_inds()
          #_perm保存的是排序的索引
          db_inds = self._perm[self._cur:self._cur + cfg.TRAIN.IMS_PER_BATCH]
          self._cur += cfg.TRAIN.IMS_PER_BATCH
          return db_inds
      
      def _get_next_minibatch(self):
          """Return the blobs to be used for the next minibatch.
      
          If cfg.TRAIN.USE_PREFETCH is True, then blobs will be computed in a
          separate process and made available through self._blob_queue.
          """
          if cfg.TRAIN.USE_PREFETCH:
              return self._blob_queue.get()
          else:
              #獲得這個batch的lmdb索引
              db_inds = self._get_next_minibatch_inds()
              #lmdb記錄
              minibatch_db = [self._roidb[i] for i in db_inds]
              #從對應(yīng)lmdb記錄轉(zhuǎn)成圖像數(shù)據(jù)輸出,框信息 label信息,圖片大小信息&縮放信息
              return get_minibatch(minibatch_db, self._num_classes)
          
        def _shuffle_roidb_inds(self):
          """Randomly permute the training roidb."""
          # Make minibatches from images that have similar aspect ratios (i.e. both tall and thin or both short and wide) in order to avoid wasting computation on zero-padding.通過橫縱group避免zero padding
          if cfg.TRAIN.ASPECT_GROUPING:
              widths = np.array([r['width'] for r in self._roidb])
              heights = np.array([r['height'] for r in self._roidb])
              horz = (widths >= heights)
              vert = np.logical_not(horz)
              #橫圖
              horz_inds = np.where(horz)[0]
              #縱圖
              vert_inds = np.where(vert)[0]
              inds = np.hstack((
                  np.random.permutation(horz_inds),
                  np.random.permutation(vert_inds)))
              # 2個一組,絕大多數(shù)同一組的形狀一致
              inds = np.reshape(inds, (-1, 2))
              row_perm = np.random.permutation(np.arange(inds.shape[0]))
              #以2個一組打算為單元重排,拉倒一層里,相鄰的形狀一致,之所以是兩個一組,猜想是默認(rèn)的__C.TRAIN.IMS_PER_BATCH = 2
              inds = np.reshape(inds[row_perm, :], (-1,))
              self._perm = inds
          else:
              self._perm = np.random.permutation(np.arange(len(self._roidb)))
          self._cur = 0
      
      這是基礎(chǔ)輸出的log輔助理解代碼:
          horz = [ True  True  True ...,  True  True  True], horz = [False False False ..., False False False]
          horz_inds = [     0      1      2 ..., 186205 186206 186207], vert_inds  = [     6     43     65 ..., 186176 186186 186194]
          inds = [163257  59770  49424 ...,  56475  31817 126653]
          inds = [[163257  59770]
           [ 49424  41168]
           [156295   1803]
           ...,
           [ 99367  20315]
           [142904  56475]
           [ 31817 126653]]
          row_perm  = [77629 51661 58201 ..., 91810 47169 48787]
          inds = [118195 143322 121405 ...,  19415  18933  26468]
      
      這樣就返回了一batch的lmdb記錄的索引,從_roi中找到對應(yīng)lmdb記錄,get_minibatch負(fù)責(zé)讀取,以下是偽代碼
      def get_minibatch(roidb, num_classes):
              """Given a roidb, construct a minibatch sampled from it."""
              num_images = len(roidb)
              # Sample random scales to use for each image in this batch
              #其實SCALES只有一個是600,這么寫是為了支持縮放到多個尺寸
              random_scale_inds = npr.randint(0, high=len(cfg.TRAIN.SCALES),
                                              size=num_images)
              #這里BATCH_SIZE =  num_images, 在yml指定為1
              rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images
              fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)
      
              # Get the input image blob, formatted for caffe
              # 傳入lmdb記錄和比例的索引
              im_blob, im_scales = _get_image_blob(roidb, random_scale_inds)
              #數(shù)據(jù) batch序號:C:H:W
              blobs = {'data': im_blob}
              #faster rcnn主要就是使用RPN
              if cfg.TRAIN.HAS_RPN:
                  gt_inds = np.where(roidb[0]['gt_classes'] != 0)[0]
                  gt_boxes = np.empty((len(gt_inds), 5), dtype=np.float32)
                  #label框乘以縮放比例 = 統(tǒng)一縮放輸入的框大小
                  gt_boxes[:, 0:4] = roidb[0]['boxes'][gt_inds, :] * im_scales[0]
                  #對應(yīng)分類一起賦值
                  gt_boxes[:, 4] = roidb[0]['gt_classes'][gt_inds]
                  blobs['gt_boxes'] = gt_boxes
                  #'im_info' = (H,W, im_scale)
                  blobs['im_info'] = np.array(
                      [[im_blob.shape[2], im_blob.shape[3], im_scales[0]]],
                      dtype=np.float32)
      
      _get_image_blob在minibatch.py中, 處理縮放和把opencv imread的image數(shù)據(jù)轉(zhuǎn)換成blob
          def _get_image_blob(roidb, scale_inds):
              """Builds an input blob from the images in the roidb at the specified
              scales.
              """
              num_images = len(roidb)
              processed_ims = []
              im_scales = []
              for i in xrange(num_images):
                  im = cv2.imread(roidb[i]['image'])
                  #target_size = 600
                  target_size = cfg.TRAIN.SCALES[scale_inds[i]]
                  #做縮放 返回圖像&比例
                  im, im_scale = prep_im_for_blob(im, cfg.PIXEL_MEANS, target_size,
                                                  cfg.TRAIN.MAX_SIZE)
                  im_scales.append(im_scale)
                  processed_ims.append(im)
      
              # Create a blob to hold the input images
              #做格式轉(zhuǎn)換
              blob = im_list_to_blob(processed_ims)
              return blob, im_scales
      
      prep_im_for_blob和im_list_to_blob都是util下blob的方法
          def im_list_to_blob(ims):
              """Convert a list of images into a network input.
      
              Assumes images are already prepared (means subtracted, BGR order, ...).
              """
              圖像的shape是H * W * 通道數(shù), 取圖像中最大的shape(np.array([(100, 5, 3), (110, 4, 3)]).max(axis=0) --> array([110,   5,   3]))
              max_shape = np.array([im.shape for im in ims]).max(axis=0)
              num_images = len(ims)
              blob = np.zeros((num_images, max_shape[0], max_shape[1], 3),
                              dtype=np.float32)
              for i in xrange(num_images):
                  im = ims[i]
                  #序號:H:W:C
                  blob[i, 0:im.shape[0], 0:im.shape[1], :] = im
              # Move channels (axis 3) to axis 1
              # Axis order will become: (batch elem, channel, height, width)
              channel_swap = (0, 3, 1, 2)
              #交換shape的維度內(nèi)的內(nèi)容
              blob = blob.transpose(channel_swap)
              return blob
      
          def prep_im_for_blob(im, pixel_means, target_size, max_size):
              """Mean subtract and scale an image for use in a blob."""
              # type(im) = numpy array, uint8 -> float
              im = im.astype(np.float32, copy=False)
              # 減均值預(yù)處理
              im -= pixel_means
              im_shape = im.shape
              im_size_min = np.min(im_shape[0:2])
              im_size_max = np.max(im_shape[0:2])
              #縮放比率 原圖W/H * scale = 目標(biāo)圖像大小,短邊縮放的600
              im_scale = float(target_size) / float(im_size_min)
              # Prevent the biggest axis from being more than MAX_SIZE
              # 圖像有最大限制,默認(rèn)1000, 以上面的縮放比率是否超限,假如超限就用最大允許大小縮放
              if np.round(im_scale * im_size_max) > max_size:
                  im_scale = float(max_size) / float(im_size_max)
              im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale,
                              interpolation=cv2.INTER_LINEAR)
      
              return im, im_scale
      
      至此input層就大體清晰了,為什么之前看到前向傳播時沒有賦值input的blob(Dtype ForwardBackward(const vector<Blob<Dtype>* > & bottom)),因為在input python layer已經(jīng)完成了read + shuffle + translate blob + scale + box info的處理
      • rpn-data層
        rpn-data層接收的數(shù)據(jù)有:rpn_cls_score(來自rpn_cls_score層, 框的得分), gt_boxes(來自input層標(biāo)注框信息), im_info(來自input層H*W,和原圖縮放比例關(guān)系), proto和流向圖如下:
          layer {
            name: 'rpn-data'
            type: 'Python'
            bottom: 'rpn_cls_score'
            bottom: 'gt_boxes'
            bottom: 'im_info'
            bottom: 'data'
            top: 'rpn_labels'
            top: 'rpn_bbox_targets'
            top: 'rpn_bbox_inside_weights'
            top: 'rpn_bbox_outside_weights'
            python_param {
              module: 'rpn.anchor_target_layer'
              layer: 'AnchorTargetLayer'
              param_str: "'feat_stride': 16"
            }
          }
      
      rpn-data

      參數(shù)只有一個是步長, class是anchor_target_layer, 實現(xiàn)接口setup,forward, 這層是輸出框和label,為下面計算loss所用,不可訓(xùn)練所以backward和reshape都是空實現(xiàn),依次看setup代碼如下:

          def setup(self, bottom, top):
              layer_params = yaml.load(self.param_str_)
              # prototxt沒指定, 默認(rèn)的anchor縮放比例大小
              anchor_scales = layer_params.get('scales', (8, 16, 32))
              #對應(yīng)一個卷積的K(9)個框, (左上坐標(biāo),右下坐標(biāo))
              self._anchors = generate_anchors(scales=np.array(anchor_scales))
              self._num_anchors = self._anchors.shape[0]
              self._feat_stride = layer_params['feat_stride']
      
              # allow boxes to sit over the edge by a small amount
              self._allowed_border = layer_params.get('allowed_border', 0)
      
              height, width = bottom[0].data.shape[-2:]
      
              A = self._num_anchors
              # labels
              top[0].reshape(1, 1, A * height, width)
              # bbox_targets
              top[1].reshape(1, A * 4, height, width)
              # bbox_inside_weights
              top[2].reshape(1, A * 4, height, width)
              # bbox_outside_weights
              top[3].reshape(1, A * 4, height, width)
      
      其中g(shù)enerate_anchor在generate_anchor.py中,借助numpy完成
          def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
                               scales=2**np.arange(3, 6)):
              """
              Generate anchor (reference) windows by enumerating aspect ratios X
              scales wrt a reference (0, 0, 15, 15) window.
              """
              # base anchor :np array [0,0, 15, 15]
              base_anchor = np.array([1, 1, base_size, base_size]) - 1
              # 寬高比擴(kuò)展:縱框,平框,橫框
              ratio_anchors = _ratio_enum(base_anchor, ratios)
              # 在base anchor大小的基礎(chǔ)上針對大小擴(kuò)展: x8, x16, x32 
              anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
                                   for i in xrange(ratio_anchors.shape[0])])
              return anchors
          def _ratio_enum(anchor, ratios):
              """
              Enumerate a set of anchors for each aspect ratio wrt an anchor.
              """
              #轉(zhuǎn)換成w,h,中心坐標(biāo)
              w, h, x_ctr, y_ctr = _whctrs(anchor)
              #原始面積
              size = w * h
              #base anchor是一個正方形,假設(shè)邊長為n, new w = n/(√radio), new h = n*√radio,新的邊長具有如下特點:面積大體不變(忽略上下round的損失),w/h = radio,也就說這樣計算完在面積大體不變的情況下:實現(xiàn)寬高按照raio設(shè)定的比例走,有點像拉長和壓扁
              size_ratios = size / ratios
              ws = np.round(np.sqrt(size_ratios))
              hs = np.round(ws * ratios)
              #轉(zhuǎn)成坐標(biāo)形式,_whctrs的逆操作
              anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
              return anchors
          #按照面積比例擴(kuò)展,實際是scales元素的平方擴(kuò)展
          def _scale_enum(anchor, scales):
              """
              Enumerate a set of anchors for each scale wrt an anchor.
              """
      
              w, h, x_ctr, y_ctr = _whctrs(anchor)
              ws = w * scales
              hs = h * scales
              anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
              return anchors
      
      接下來是forward,代碼比較復(fù)雜,抽取偽代碼看思路和方法.
          def forward(self, bottom, top):
              # Algorithm:
              #
              # for each (H, W) location i
              #   generate 9 anchor boxes centered on cell i
              #   apply predicted bbox deltas at cell i to each of the 9 anchors
              # filter out-of-image anchors
              # measure GT overlap
      
              assert bottom[0].data.shape[0] == 1, \
                  'Only single item batches are supported'
      
              # map of shape (..., H, W),此處是框的得分,reshape = (1,18,H,W)
              height, width = bottom[0].data.shape[-2:]
              # GT boxes (x1, y1, x2, y2, label)
              gt_boxes = bottom[1].data
              # im_info
              im_info = bottom[2].data[0, :]
              
              # 1. Generate proposals from bbox deltas and shifted anchors
              # 這塊的思路是生成一系列的shift, 然后每一個shift和9個anchor想加,迭代出每一個位置的9個框
              shift_x = np.arange(0, width) * self._feat_stride
              shift_y = np.arange(0, height) * self._feat_stride
              shift_x, shift_y = np.meshgrid(shift_x, shift_y)
              #經(jīng)過meshgrid shift_x = [[  0  16  32 ..., 560 576 592] [  0  16  32 ..., 560 576 592] [  0  16  32 ..., 560 576 592] ..., [  0  16  32 ..., 560 576 592] [  0  16  32 ..., 560 576 592] [  0  16  32 ..., 560 576 592]]
              #shift_y = [[  0   0   0 ...,   0   0   0] [ 16  16  16 ...,  16  16  16] [ 32  32  32 ...,  32  32  32]  ..., [560 560 560 ..., 560 560 560] [576 576 576 ..., 576 576 576] [592 592 592 ..., 592 592 592]]
              shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                                  shift_x.ravel(), shift_y.ravel())).transpose()
              #轉(zhuǎn)至之后形成所有位移
              # add A anchors (1, A, 4) to
              # cell K shifts (K, 1, 4) to get
              # shift anchors (K, A, 4)
              # reshape to (K*A, 4) shifted anchors
              A = self._num_anchors
              K = shifts.shape[0]
              # numpy array + 操作_anchors中每一個anchor和每一個shift想加等出結(jié)果
              all_anchors = (self._anchors.reshape((1, A, 4)) +
                             shifts.reshape((1, K, 4)).transpose((1, 0, 2)))
              #K個位移,每個位移A個框
              all_anchors = all_anchors.reshape((K * A, 4))
              total_anchors = int(K * A)
      
              # only keep anchors inside the image,框在圖片內(nèi)
              inds_inside = np.where(
                  (all_anchors[:, 0] >= -self._allowed_border) & 
                  (all_anchors[:, 1] >= -self._allowed_border) &
                  (all_anchors[:, 2] < im_info[1] + self._allowed_border) &  # width
                  (all_anchors[:, 3] < im_info[0] + self._allowed_border)    # height
              )[0]
      
              # keep only inside anchors
              anchors = all_anchors[inds_inside, :]
      
              # label: 1 is positive, 0 is negative, -1 is dont care
              labels = np.empty((len(inds_inside), ), dtype=np.float32)
              labels.fill(-1)
      
              # overlaps between the anchors and the gt boxes
              # overlaps (ex, gt), 每個框?qū)?yīng)每個box的重合面積,overlaps [anchor數(shù)目,box數(shù)目]
              overlaps = bbox_overlaps(
                  np.ascontiguousarray(anchors, dtype=np.float),
                  np.ascontiguousarray(gt_boxes, dtype=np.float))
              # 針對每一個anchor內(nèi)覆蓋率最高的索引
              argmax_overlaps = overlaps.argmax(axis=1)
              # 從索引取覆蓋率, 每一個anchor覆蓋最大的box的覆蓋率
              max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps]
              # 從box出發(fā)覆蓋最好的anchor的索引
              gt_argmax_overlaps = overlaps.argmax(axis=0)
              #取覆蓋最好的anchor全部box的覆蓋值
              gt_max_overlaps = overlaps[gt_argmax_overlaps,
                                         np.arange(overlaps.shape[1])]
              #match的anchor
              gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
      
              if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:
                  # assign bg labels first so that positive labels can clobber them
                  labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
      
              # fg label: for each gt, anchor with highest overlap
              labels[gt_argmax_overlaps] = 1
      
              # fg label: above threshold IOU
              labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1
      
              if cfg.TRAIN.RPN_CLOBBER_POSITIVES:
                  # assign bg labels last so that negative labels can clobber positives
                  labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
      
              # subsample positive labels if we have too many
              #最好是各FG,BG占一半,FG不足BG補(bǔ)充
              num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)
              fg_inds = np.where(labels == 1)[0]
              if len(fg_inds) > num_fg:
                  disable_inds = npr.choice(
                      fg_inds, size=(len(fg_inds) - num_fg), replace=False)
                  labels[disable_inds] = -1
      
              # subsample negative labels if we have too many
              num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1)
              bg_inds = np.where(labels == 0)[0]
              if len(bg_inds) > num_bg:
                  disable_inds = npr.choice(
                      bg_inds, size=(len(bg_inds) - num_bg), replace=False)
                  labels[disable_inds] = -1
           
              # 算出anchor和ground true box的dx,dy, dw,dh的偏差 
              bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])
      
              bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
              bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS)
      
              bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
              if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:
                  # uniform weighting of examples (given non-uniform sampling)
                  num_examples = np.sum(labels >= 0)
                  positive_weights = np.ones((1, 4)) * 1.0 / num_examples
                  negative_weights = np.ones((1, 4)) * 1.0 / num_examples
              else:
                  assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
                          (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))
                  positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT /
                                      np.sum(labels == 1))
                  negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) /
                                      np.sum(labels == 0))
              bbox_outside_weights[labels == 1, :] = positive_weights
              bbox_outside_weights[labels == 0, :] = negative_weights
      
              # map up to original set of anchors
              labels = _unmap(labels, total_anchors, inds_inside, fill=-1)
              bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)
              bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0)
              bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0)
      
              # labels
              labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2)
              labels = labels.reshape((1, 1, A * height, width))
              top[0].reshape(*labels.shape)
              top[0].data[...] = labels
      
              # bbox_targets
              bbox_targets = bbox_targets \
                  .reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2)
              top[1].reshape(*bbox_targets.shape)
              top[1].data[...] = bbox_targets
      
              # bbox_inside_weights
              bbox_inside_weights = bbox_inside_weights \
                  .reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2)
              assert bbox_inside_weights.shape[2] == height
              assert bbox_inside_weights.shape[3] == width
              top[2].reshape(*bbox_inside_weights.shape)
              top[2].data[...] = bbox_inside_weights
      
              # bbox_outside_weights
              bbox_outside_weights = bbox_outside_weights \
                  .reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2)
              assert bbox_outside_weights.shape[2] == height
              assert bbox_outside_weights.shape[3] == width
              top[3].reshape(*bbox_outside_weights.shape)
              top[3].data[...] = bbox_outside_weights
      
* proposal層
* roi-data層
  • c++ layer & loss(未完待續(xù)...)
    • SmoothL1LossLayer層
    • ROIPoolingLayer層
  • 測試
  • 后記

    看到講解faster rcnn的文章無一都要陌拜一下Ross Girshick大神,這里我也膜拜一下,確實厲害.論文寫得非常有深度
    該算法不是一蹴而就的,經(jīng)歷了rcnn -> fast rcnn ->faser rcnn. faster最大的特點是anchor的設(shè)計,不用resize基于相同feature map的regressor出不同,一次運算就出了所有的proposal.
    在學(xué)習(xí)RL的時候就有點驚訝,他們那CNN出來的東西想讓它是啥就是啥,然后用loss去修飾它,它就有了合理的解釋,把網(wǎng)絡(luò)拆分,不同部分有不同的含義,還是用不同loss去修飾它們
    feature map從原圖開始W,H在翻倍減小,維度在翻倍增加,然后map回頭映射到輸入點陣上,從輸入圖像上去預(yù)測框感覺有點玄妙,因為一個隨便圖可以有各式各樣,給它合理的loss它就合理了
    最后作者還給除了切割實驗,把算法中的component替換驗證其必要性著實嚴(yán)禁
    也借著學(xué)習(xí)faster過程,窺探了一下caffe的結(jié)構(gòu),caffe代碼框架清晰,比較干凈不求大而全,代碼也比較簡潔,對有深度學(xué)習(xí)知識的人非常容易上手,這大概就是為啥Ross Girshick要基于caffe寫faster rcnn的demo.初次學(xué)習(xí)一個陌生的框架還是要著眼全局不要過分計較一個局部的細(xì)節(jié),全局通順會帶來更多的信息,信息的增多會細(xì)節(jié)的了解更加有幫助.
    caffe在cpu環(huán)境下加速運算也是一個非常有意思而且有意義的問題,因為很多情況下GPU設(shè)置太大太貴在很多環(huán)境不合適
    后面還有faster rcnn定制的python和cpp層的備注還沒有寫,抽空趕緊補(bǔ)上
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容