RFCN-tensorflow的源碼

源碼地址:
https://github.com/xdever/RFCN-tensorflow

簡(jiǎn)單結(jié)構(gòu):


  • k^2(C+1)的conv: ResNet101的最后的輸出是WxHx1024,用K^2(C+1)個(gè)1x1的卷積核去卷積,即可得到K^2(C+1)個(gè)大小為WxH的position sensitive的score map。這步的卷積操作就是在做prediction。這里的k=3,表示把一個(gè)ROI劃分成3*3,對(duì)應(yīng)的9個(gè)位置分別是:上左(左上角),上中,上右,中左,中中,中右,下左,下中,下右(右下角)

  • k^2(C+1)個(gè)feature map的物理意義: 共有k x k = 9個(gè)顏色,每個(gè)顏色的立體塊(WxHx(C+1))表示的是不同位置存在目標(biāo)的概率值(第一塊黃色表示的是左上角位置,最后一塊淡藍(lán)色表示的是右下角位置)。

  • pooling公式

z(i,j,c)是第i+k*(j-1)個(gè)立體塊上的第c個(gè)map(1<= i,j <=3)。(i,j)決定了9種位置的某一種位置,假設(shè)為左上角位置(i=j=1),c決定了哪一類,假設(shè)為person類。在z(i,j,c)這個(gè)feature map上的某一個(gè)像素的位置是(x,y),像素值是value,則value表示的是原圖對(duì)應(yīng)的(x,y)這個(gè)位置上可能是人(c=‘person’)且是人的左上部位(i=j=1)的概率值

  • ROI pooling的輸入和輸出:ROI pooling操作的輸入(對(duì)于C+1個(gè)類)是k^2(C+1)W' H'(W'和H'是ROI的寬度和高度)的score map上某ROI對(duì)應(yīng)的那個(gè)立體塊(由RPN預(yù)測(cè)的box坐標(biāo),在feature map上進(jìn)行裁剪),且該立體塊組成一個(gè)新的k^2(C+1)W' H'的立體塊:每個(gè)顏色的立體塊(C+1)都只摳出對(duì)應(yīng)位置的一個(gè)bin,把這kk個(gè)bin組成新的立體塊,大小為(C+1)W'H'。例如,下圖中的第一塊黃色只取左上角的bin,最后一塊淡藍(lán)色只取右下角的bin。所有的bin重新組合后就變成了類似右圖的那個(gè)薄的立體塊(圖中的這個(gè)是池化后的輸出,即每個(gè)面上的每個(gè)bin上已經(jīng)是一個(gè)像素。池化前這個(gè)bin對(duì)應(yīng)的是一個(gè)區(qū)域,是多個(gè)像素)。ROI pooling的輸出為為一個(gè)(C+1)k*k的立體塊

介紹結(jié)束,下面開始代碼。。。

注意這是非官方版本的代碼。僅僅是為了學(xué)習(xí)關(guān)于R-FCN的檢測(cè)流程而去讀的這份代碼,代碼本人并沒有調(diào)試。

代碼目錄

Google搜索了很多博客,發(fā)現(xiàn)大家其實(shí)并不是特別關(guān)注R-FCN模型。就把這分代碼搗鼓一下吧。

  • main.py是進(jìn)行train的文件,如果你想進(jìn)行訓(xùn)練可以修改這里的代碼。
  • testCheckpoint.py是進(jìn)行test的文件,就是測(cè)試ckpt文件中保存參數(shù)的shape的一個(gè)文件,這是與整個(gè)模型獨(dú)立的一個(gè)文件,僅僅是對(duì)ckpt進(jìn)行檢查的文件。
  • test.py 這個(gè)文件是進(jìn)行檢測(cè)模型測(cè)試的文件。

test.py

所有的檢測(cè)代碼大部分都是相同的流程,所以首先看test.py模型代碼。

parser = argparse.ArgumentParser(description="RFCN tester")
parser.add_argument('-gpu', type=str, default="0", help='Train on this GPU(s)')
parser.add_argument('-n', type=str, help='Network checkpoint file')
parser.add_argument('-i', type=str, help='Input file.')
parser.add_argument('-o', type=str, default="", help='Write output here.')
parser.add_argument('-p', type=int, default=1, help='Show preview')
parser.add_argument('-threshold', type=float, default=0.5, help='Detection threshold')
parser.add_argument('-delay', type=int, default=-1, help='Delay between frames in visualization. -1 for automatic, 0 for wait for keypress.')

這里的超參數(shù)設(shè)置的比較簡(jiǎn)單。

palette = Visualize.Palette(len(categories))

image = tf.placeholder(tf.float32, [None, None, None, 3])
net = BoxInceptionResnet(image, len(categories), name="boxnet")

boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)


input = PreviewIO.PreviewInput(opt.i)
output = PreviewIO.PreviewOutput(opt.o, input.getFps())

這里使用BoxInceptionResnet構(gòu)建R-FCN模型。
使用PreviewIO進(jìn)行文件的讀寫。

boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)

這里直接通過net.getBoxes獲取模型的檢測(cè)輸出的占位符。

只看test.py 的主要結(jié)構(gòu)。

with tf.Session() as sess:
        .........
        # 檢查模型
    if not CheckpointLoader.loadCheckpoint(sess, None, opt.n, ignoreVarsInFileNotInSess=True):
                
        ..........
        # 對(duì)img進(jìn)行預(yù)處理
        img = preprocessInput(img)  
                ...........
         # 把圖片輸入到檢測(cè)模型進(jìn)行計(jì)算
        rBoxes, rScores, rClasses = sess.run([boxes, scores, classes], feed_dict={image: np.expand_dims(img, 0)})

        ...........
        # 對(duì)輸入的結(jié)果進(jìn)行可視化
        res = Visualize.drawBoxes(img, rBoxes, rClasses, [categories[i] for i in rClasses.tolist()], palette, scores=rScores)

main.py

main.py就是進(jìn)行訓(xùn)練的函數(shù),為了簡(jiǎn)化訓(xùn)練過程,下面只列出重要的訓(xùn)練過程。下面只看loss

# 數(shù)據(jù)讀取
dataset = BoxLoader()
dataset.add(CocoDataset(opt.dataset, randomZoom=opt.randZoom==1, set="train"+opt.cocoVariant))
if opt.mergeValidationSet==1:
    dataset.add(CocoDataset(opt.dataset, set="val"+opt.cocoVariant))

.....................
# 獲取圖片的標(biāo)簽
images, boxes, classes = Augment.augment(*dataset.get())

# 獲取檢測(cè)模型
net = BoxInceptionResnet(images, dataset.categoryCount(), name="boxnet", trainFrom=opt.trainFrom, hardMining=opt.hardMining==1, freezeBatchNorm=opt.freezeBatchNorm==1)

# 獲取模型的loss
tf.losses.add_loss(net.getLoss(boxes, classes))

# 構(gòu)建loss操作
def createUpdateOp(gradClip=1):
    with tf.name_scope("optimizer"):
        optimizer=tf.train.AdamOptimizer(learning_rate=opt.learningRate, epsilon=opt.adamEps)
        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
        totalLoss = tf.losses.get_total_loss()
        grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())
        if gradClip is not None:
            cGrads = []
            for g, v in grads:
                if g is None:
                    print("WARNING: no grad for variable "+v.op.name)
                    continue
                cGrads.append((tf.clip_by_value(g, -float(gradClip), float(gradClip)), v))
            grads = cGrads

        update_ops.append(optimizer.apply_gradients(grads))
        return control_flow_ops.with_dependencies([tf.group(*update_ops)], totalLoss, name='train_op')

# 構(gòu)建優(yōu)化操作
trainOp=createUpdateOp()

在main.py文件中有一個(gè)while=true的死循環(huán),使用RunManager對(duì)訓(xùn)練進(jìn)行管理。如下:

    runManager = RunManager(sess, options=runOptions, run_metadata=runMetadata)
    runManager.add("train", [globalStepInc,trainOp], modRun=1)

在while中進(jìn)行優(yōu)化訓(xùn)練

    ......
    # 進(jìn)行訓(xùn)練
    while True:
        #run various parts of the network
        res = runManager.modRun(i)
        .....
        # 可是化訓(xùn)練結(jié)果
        visualizer.draw(res)

下面來(lái)看檢測(cè)模型BoxInceptionResnet。
在test.py文件中使用net.getBoxes的的方法。

net = BoxInceptionResnet(image, len(categories), name="boxnet")

boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)

在train.py中使用BoxInceptionResnet的net.getLoss和net.getVariables。

net = BoxInceptionResnet(images, dataset.categoryCount(), name="boxnet", trainFrom=opt.trainFrom, hardMining=opt.hardMining==1, freezeBatchNorm=opt.freezeBatchNorm==1)
tf.losses.add_loss(net.getLoss(boxes, classes))
......
grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())

所以在進(jìn)行源碼查看時(shí)就看這幾個(gè)函數(shù)。
發(fā)現(xiàn)BoxInceptionResnet是BoxNetwork的子類。
BoxNetwork.py

# BoxNetwork.py

class BoxNetwork:
    def __init__(self, nCategories, rpnLayer, rpnDownscale, rpnOffset, featureLayer=None, featureDownsample=None, featureOffset=None, weightDecay=1e-6, hardMining=True):
        '''
        featureInput = slim.conv2d(net, 1536, 1)
        BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], featureInput, 16, [32,32], weightDecay=weightDecay, hardMining=hardMining)
    
        nCategories: 檢測(cè)的種類
        rpnInput:輸入到rpn的feature map
        rpnDownscale : rpnDownscale下采樣的尺度
        rpnOffset: 圖片的偏置
        featureInput : 經(jīng)過編碼的featureInput
        featureDownsample: featureDownsample下采樣的尺度
        featureOffset:feature的偏置尺寸
        '''
        
        if featureLayer is None:
            featureLayer=rpnLayer

        if featureDownsample is None:
            featureDownsample=rpnDownscale
            
        if featureOffset is None:
            rpnOffset=featureOffset

        with tf.name_scope("BoxNetwork"):
            self.rpn = RPN(rpnLayer, immediateSize=512, weightDecay=weightDecay, inputDownscale=rpnDownscale, offset=rpnOffset)
            self.boxRefiner = BoxRefinementNetwork(featureLayer, nCategories, downsample=featureDownsample, offset=featureOffset, hardMining=hardMining)

            self.proposals, self.proposalScores = self.rpn.getPositiveOutputs(maxOutSize=300)

    # 使用閥值過濾篩選出合適的box坐標(biāo)
    def getProposals(self, threshold=None):
        if threshold is not None and threshold>0:
            s = tf.cast(tf.where(self.proposalScores > threshold), tf.int32)
            return tf.gather_nd(self.proposals, s), tf.gather_nd(self.proposalScores, s)
        else:
            return self.proposals, self.proposalScores
    
    # 使用boxRefiner對(duì)feature map 進(jìn)行classes 和 box坐標(biāo)的預(yù)測(cè)
    def getBoxes(self, nmsThreshold=0.3, scoreThreshold=0.8):
        return self.boxRefiner.getBoxes(self.proposals, self.proposalScores, maxOutputs=50, nmsThreshold=nmsThreshold, scoreThreshold=scoreThreshold)
    
    # 輸入真實(shí)的box 坐標(biāo)和classes。計(jì)算RPN 和 檢測(cè)的總loss
    def getLoss(self, refBoxes, refClasses):
        return self.rpn.loss(refBoxes) + self.boxRefiner.loss(self.proposals, refBoxes, refClasses)

請(qǐng)仔細(xì)看注釋,因?yàn)檫@個(gè)是R-FCN的基類所以這里的程序比較簡(jiǎn)單。

大體就是使用InceptionResnetV2的最后的輸出的feature map送入RPN,再對(duì)feature map進(jìn)行幾次conv2d運(yùn)算,在BoxRefinementNetwork中進(jìn)行計(jì)算。

注意這里的

def getBoxes(self, nmsThreshold=0.3, scoreThreshold=0.8):
        return self.boxRefiner.getBoxes(self.proposals, self.proposalScores, maxOutputs=50, nmsThreshold=nmsThreshold, scoreThreshold=scoreThreshold)

在test.py中的輸出是

boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)

這里的輸出就是通過RPN對(duì)所有的feature map產(chǎn)生box。再由self.boxRefiner.getBoxes在feature map上提取ROI,并且產(chǎn)生boxes, scores, classes。

BoxInceptionResnet.py

# BoxInceptionResnet.py
class BoxInceptionResnet(BoxNetwork):
    LAYER_NAMES = ['Conv2d_1a_3x3','Conv2d_2a_3x3','Conv2d_2b_3x3','MaxPool_3a_3x3','Conv2d_3b_1x1','Conv2d_4a_3x3',
              'MaxPool_5a_3x3','Mixed_5b','Repeat','Mixed_6a','Repeat_1','Mixed_7a','Repeat_2','Block8','Conv2d_7b_1x1']

    def __init__(self, inputs, nCategories, name="BoxNetwork", weightDecay=0.00004, freezeBatchNorm=False, reuse=False, isTraining=True, trainFrom=None, hardMining=True):
        self.boxThreshold = 0.5

        try:
            trainFrom = int(trainFrom)
        except:
            pass

        if isinstance(trainFrom, int):
            trainFrom = self.LAYER_NAMES[trainFrom]


        print("Training network from "+(trainFrom if trainFrom is not None else "end"))

        with tf.variable_scope(name, reuse=reuse) as scope:
            # 構(gòu)建基礎(chǔ)的特征提取模型模型
            self.googleNet = InceptionResnetV2("features", inputs, trainFrom=trainFrom, freezeBatchNorm=freezeBatchNorm)
            self.scope=scope
        
            with tf.variable_scope("Box"):
                #Pepeat_1 - last 1/16 layer, Mixed_6a - first 1/16 layer
                # 拿到Repeat_1的feature map
                scale_16 = self.googleNet.getOutput("Repeat_1")[:,1:-1,1:-1,:]
                
                # 拿到Mixed_6a的feature map 
                #scale_16 = self.googleNet.getOutput("Mixed_6a")[:,1:-1,1:-1,:]
                scale_32 = self.googleNet.getOutput("PrePool")

                with slim.arg_scope([slim.conv2d],
                        weights_regularizer=slim.l2_regularizer(weightDecay),
                        biases_regularizer=slim.l2_regularizer(weightDecay),
                        padding='SAME',
                        activation_fn = tf.nn.relu):
                    
                    # 合并Repeat_1和PrePool的feature map
                    # 從這里開始net分為兩個(gè)輸出,一個(gè)是輸出rpnInput,一個(gè)是輸出featureInput
                    net = tf.concat([ tf.image.resize_bilinear(scale_32, tf.shape(scale_16)[1:3]), scale_16], 3)
                    
                    
                    rpnInput = slim.conv2d(net, 1024, 1)
                    
                    #BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], scale_32, 32, [32,32], weightDecay=weightDecay, hardMining=hardMining)
                    
                    # 這里的featureInput
                    featureInput = slim.conv2d(net, 1536, 1)
                    BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], featureInput, 16, [32,32], weightDecay=weightDecay, hardMining=hardMining)
    
    def getVariables(self, includeFeatures=False):
        if includeFeatures:
            return tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=self.scope.name)
        else:
            vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=self.scope.name+"/Box/")
            vars += self.googleNet.getTrainableVars()

            print("Training variables: ", [v.op.name for v in vars])
            return vars

    def importWeights(self, sess, filename):
        self.googleNet.importWeights(sess, filename, includeTraining=True)


注意看上面的注釋:

net分為兩路:rpnInput和featureInput。

# 合并Repeat_1和PrePool的feature map
# 從這里開始net分為兩個(gè)輸出,一個(gè)是輸出rpnInput,一個(gè)是輸出featureInput
net = tf.concat([ tf.image.resize_bilinear(scale_32, tf.shape(scale_16)[1:3]), scale_16], 3)

rpnInput = slim.conv2d(net, 1024, 1)

# BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], scale_32, 32, [32,32], weightDecay=weightDecay, hardMining=hardMining)

# 這里的featureInput
featureInput = slim.conv2d(net, 1536, 1)
BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], featureInput, 16, [32,32], weightDecay=weightDecay, hardMining=hardMining)

如下圖:

在這里有個(gè)比較重要的操作就是RPN和BoxRefinementNetwork

接下來(lái)那就看看怎么計(jì)算RPN吧

RPN.py

為了不把RPN搞得太復(fù)雜,這里就看看被調(diào)用的幾個(gè)函數(shù)

在BoxNetwork.py中使用RPN.如下:

self.rpn = RPN(rpnLayer, immediateSize=512, weightDecay=weightDecay, inputDownscale=rpnDownscale, offset=rpnOffset)
......
            self.proposals, self.proposalScores = self.rpn.getPositiveOutputs(maxOutSize=300)

只用到rpn.getPositiveOutputs

# RPN.py

class RPN:
    def __init__(self, input, anchors=None, immediateSize=512, weightDecay=1e-5, inputDownscale=16, offset=[32,32]):
        self.input = input
        self.anchors = anchors
        self.inputDownscale = inputDownscale
        self.offset = offset
        self.anchors = anchors if anchors is not None else self.makeAnchors([64,128,256,512])
        print("Anchors: ", self.anchors)
        self.tfAnchors = tf.constant(self.anchors, dtype=tf.float32)

        self.hA=tf.reshape(self.tfAnchors[:,0],[-1])
        self.wA=tf.reshape(self.tfAnchors[:,1],[-1])

        self.nAnchors = len(self.anchors)

        self.positiveIouThreshold=0.7
        self.negativeIouThreshold=0.3
        self.regressionWeight=1.0
        
        self.nBoxLosses=256
        self.nPositiveLosses=128

        #dimensions
        with tf.name_scope('dimension_info'):
            s = tf.shape(self.input)
            self.hIn = s[1]
            self.wIn = s[2]

        
        self.imageH = tf.cast(self.hIn*self.inputDownscale+self.offset[0]*2, tf.float32)
        self.imageW = tf.cast(self.wIn*self.inputDownscale+self.offset[1]*2, tf.float32)

        self.define(immediateSize, weightDecay)


    def define(self, immediateSize, weightDecay):
        with tf.name_scope('RPN'):
            with slim.arg_scope([slim.conv2d], weights_regularizer=slim.l2_regularizer(weightDecay), padding='SAME'):
                #box prediction layers
                with tf.name_scope('NN'):
                    net = slim.conv2d(self.input, immediateSize, 3, activation_fn=tf.nn.relu)
                    scores = slim.conv2d(net, 2*self.nAnchors, 1, activation_fn=None)
                    boxRelativeCoordinates = slim.conv2d(net, 4*self.nAnchors, 1, activation_fn=None)

                #split coordinates
                x_raw, y_raw, w_raw, h_raw = tf.split(boxRelativeCoordinates, 4, axis=3)

                #Save raw box sizes for loss
                self.rawSizes = BoxUtils.mergeBoxData([w_raw, h_raw])
                            
                #Convert NN outputs to BBox coordinates
                self.boxes = BoxUtils.nnToImageBoxes(x_raw, y_raw, w_raw, h_raw, self.wA, self.hA, self.inputDownscale, self.offset)

                #store the size of every box
                with tf.name_scope('box_sizes'):
                    boxSizes = tf.reshape(self.tfAnchors, [1,1,1,-1,2])
                    boxSizes = tf.tile(boxSizes, tf.stack([1,self.hIn,self.wIn,1,1]))
                    self.boxSizes = tf.reshape(boxSizes, [-1,2])

                #scores
                self.scores = tf.reshape(scores, [-1,2])

這里的define輸出self.boxes 和self.boxes。這里可以理解為使用Faster R-CNN一樣的原理。這里輸入的rpnInput輸出為self.boxes 和self.boxes。這里的self.boxes是針對(duì)rpnInput的每一個(gè)點(diǎn)產(chǎn)生的box坐標(biāo),self.boxes是判斷box里面是否有物體,為一個(gè)二分類判別器。

在BoxNetwork調(diào)用的是rpn.filterOutputBoxe

# RPN.py
def filterOutputBoxes(self, boxes, scores, others=[], preNmsCount=6000, maxOutSize=300, nmsThreshold=0.7): 
        with tf.name_scope("filter_output_boxes"):
            scores = tf.nn.softmax(scores)[:,1]
            scores = tf.reshape(scores,[-1])

            #Clip boxes to edge
            boxes = self.clipBoxesToEdge(boxes)

            #Remove empty boxes
            boxes, scores = BoxUtils.filterSmallBoxes(boxes, [scores])
            scores, boxes = tf.cond(tf.shape(scores)[0] > preNmsCount , lambda: tf.tuple(MultiGather.gatherTopK(scores, preNmsCount, [boxes])), lambda: tf.tuple([scores, boxes]))

            #NMS filter
            nmsIndices = tf.image.non_max_suppression(boxes, scores, iou_threshold=nmsThreshold, max_output_size=maxOutSize)
            nmsIndices = tf.expand_dims(nmsIndices, axis=-1)

            return MultiGather.gather([boxes, scores]+others, nmsIndices)
        
    def getPositiveOutputs(self, preNmsCount=6000, maxOutSize=300, nmsThreshold=0.7):
        boxes, scores = self.filterOutputBoxes(self.boxes, self.scores, preNmsCount=preNmsCount, nmsThreshold=nmsThreshold, maxOutSize=maxOutSize)
        return boxes, scores

filterOutputBoxe就是對(duì)boxes, scores進(jìn)過濾對(duì)使用nms和閥值等方法過濾掉不合適的boxes。

    1. 裁剪超出邊界的box
  • 2.一出不包含物體的box
  • 3.使用NMS進(jìn)行過濾

BoxRefinementNetwork.py

在BoxNetwork中是使用BoxRefinementNetwork

self.boxRefiner = BoxRefinementNetwork(featureLayer, nCategories, downsample=featureDownsample, offset=featureOffset, hardMining=hardMining)

................

self.boxRefiner.getBoxes(self.proposals, self.proposalScores, maxOutputs=50, nmsThreshold=nmsThreshold, scoreThreshold=scoreThreshold)

這里通過BoxRefinementNetwork使用到了self.boxRefiner.getBoxes

直接來(lái)看BoxRefinementNetwork

# BoxRefinementNetwork.py

class BoxRefinementNetwork:
    POOL_SIZE=3

    def __init__(self, input, nCategories, downsample=16, offset=[32,32], hardMining=True):
        self.downsample = downsample
        self.offset = offset
        
        # 設(shè)置分類個(gè)數(shù)
        self.nCategories = nCategories
        
        # 輸出(self.POOL_SIZE**2)*(1+nCategories)個(gè)feature map的classes scores。
        self.classMaps = slim.conv2d(input, (self.POOL_SIZE**2)*(1+nCategories), 3, activation_fn=None, scope='classMaps')
        
        # (self.POOL_SIZE**2)*4 個(gè)feature map的box坐標(biāo)
        self.regressionMap = slim.conv2d(input, (self.POOL_SIZE**2)*4, 3, activation_fn=None, scope='regressionMaps')

        self.hardMining=hardMining

        #Magic parameters.
        self.posIouTheshold = 0.5
        self.negIouThesholdHi = 0.5
        self.negIouThesholdLo = 0.1
        self.nTrainBoxes = 128
        self.nTrainPositives = 32
        self.falseValue = 0.0002

這里比較繞的就是這里的三個(gè)函數(shù),從下往上調(diào)用。

    def roiPooling(self, layer, boxes):
        return positionSensitiveRoiPooling(layer, boxes, offset=self.offset, downsample=self.downsample, roiSize=self.POOL_SIZE)

    def roiMean(self, layer, boxes):
        with tf.name_scope("roiMean"):
            return tf.reduce_mean(self.roiPooling(layer, boxes), axis=[1,2])

    def getBoxScores(self, boxes):
        with tf.name_scope("getBoxScores"):
            return self.roiMean(self.classMaps, boxes)

下圖就是實(shí)現(xiàn)過程:

上面的ROI就是這里輸入的box坐標(biāo)。box的坐標(biāo)在self.classMaps上進(jìn)行裁剪,拿到對(duì)應(yīng)的ROI的feature map再進(jìn)行對(duì)應(yīng)的roi_pooling操作。選擇出合適的類別。這里的具體過程還沒仔細(xì)分析。。。。。。

詳細(xì)的原理細(xì)節(jié)看這里

還記得self.classMaps和self.regressionMap嗎??

這里的self.classMaps是用來(lái)計(jì)算分?jǐn)?shù)的,使用getBoxScores已經(jīng)對(duì)self.classMaps操作完了,篩選出合適的box了。現(xiàn)在需要使用refineBoxes對(duì)self.regressionMap進(jìn)行裁剪操作,得到最后的positives 的box


def getBoxes(self, proposals, proposal_scores, maxOutputs=30, nmsThreshold=0.3, scoreThreshold=0.8):
        if scoreThreshold is None:
            scoreThreshold = 0

        with tf.name_scope("getBoxes"):
            scores = tf.nn.softmax(self.getBoxScores(proposals))
            
            classes = tf.argmax(scores, 1)
            scores = tf.reduce_max(scores, axis=1)
            posIndices = tf.cast(tf.where(tf.logical_and(classes > 0, scores>scoreThreshold)), tf.int32)

            positives, scores, classes = MultiGather.gather([proposals, scores, classes], posIndices)
            positives = self.refineBoxes(positives, False)

            #Final NMS
            posIndices = tf.image.non_max_suppression(positives, scores, iou_threshold=nmsThreshold, max_output_size=maxOutputs)
            posIndices = tf.expand_dims(posIndices, axis=-1)
            positives, scores, classes = MultiGather.gather([positives, scores, classes], posIndices)   
            
            classes = tf.cast(tf.cast(classes,tf.int32) - 1, tf.uint8)

            return positives, scores, classes

可以這么說(shuō)這里檢測(cè)流程已經(jīng)結(jié)束了

loss

def loss(self, proposals, refBoxes, refClasses):
        with tf.name_scope("BoxRefinementNetworkLoss"):
            proposals = tf.stop_gradient(proposals)
            
            # 位置loss 
            def getPosLoss(positiveBoxes, positiveRefIndices, nPositive):
                with tf.name_scope("getPosLoss"):
                    positiveRefIndices =  tf.reshape(positiveRefIndices,[-1,1])

                    positiveClasses, positiveRefBoxes = MultiGather.gather([refClasses, refBoxes], positiveRefIndices)
                    positiveClasses = tf.cast(tf.cast(positiveClasses,tf.int8) + 1, tf.uint8)

                    if not self.hardMining:
                        selected = Utils.RandomSelect.randomSelectIndex(tf.shape(positiveBoxes)[0], nPositive)
                        positiveBoxes, positiveClasses, positiveRefBoxes = MultiGather.gather([positiveBoxes, positiveClasses, positiveRefBoxes], selected)

                    return tf.tuple([self.classRefinementLoss(positiveBoxes, positiveClasses) + self.boxRefinementLoss(positiveBoxes, positiveRefBoxes), tf.shape(positiveBoxes)[0]])
            # 
            def getNegLoss(negativeBoxes, nNegative):
                with tf.name_scope("getNetLoss"):
                    if not self.hardMining:
                        negativeIndices = Utils.RandomSelect.randomSelectIndex(tf.shape(negativeBoxes)[0], nNegative)
                        negativeBoxes = tf.gather_nd(negativeBoxes, negativeIndices)

                    return self.classRefinementLoss(negativeBoxes, tf.zeros(tf.stack([tf.shape(negativeBoxes)[0],1]), dtype=tf.uint8))
            
            def getRefinementLoss():
                with tf.name_scope("getRefinementLoss"):
                    iou = BoxUtils.iou(proposals, refBoxes)
                    
                    maxIou = tf.reduce_max(iou, axis=1)
                    bestIou = tf.expand_dims(tf.cast(tf.argmax(iou, axis=1), tf.int32), axis=-1)

                    #Find positive and negative indices based on their IOU
                    posBoxIndices = tf.cast(tf.where(maxIou > self.posIouTheshold), tf.int32)
                    negBoxIndices = tf.cast(tf.where(tf.logical_and(maxIou < self.negIouThesholdHi, maxIou > self.negIouThesholdLo)), tf.int32)

                    #Split the boxes and references
                    posBoxes, posRefIndices = MultiGather.gather([proposals, bestIou], posBoxIndices)
                    negBoxes = tf.gather_nd(proposals, negBoxIndices)

                    #Add GT boxes
                    posBoxes = tf.concat([posBoxes,refBoxes], 0)
                    posRefIndices = tf.concat([posRefIndices, tf.reshape(tf.range(tf.shape(refClasses)[0]), [-1,1])], 0)

                    #Call the loss if the box collection is not empty
                    nPositive = tf.shape(posBoxes)[0]
                    nNegative = tf.shape(negBoxes)[0]

                    if self.hardMining:
                        posLoss = tf.cond(nPositive > 0, lambda: getPosLoss(posBoxes, posRefIndices, 0)[0], lambda: tf.zeros((0,), tf.float32))
                        negLoss = tf.cond(nNegative > 0, lambda: getNegLoss(negBoxes, 0), lambda: tf.zeros((0,), tf.float32))

                        allLoss = tf.concat([posLoss, negLoss], 0)
                        return tf.cond(tf.shape(allLoss)[0]>0, lambda: tf.reduce_mean(Utils.MultiGather.gatherTopK(allLoss, self.nTrainBoxes)), lambda: tf.constant(0.0))
                    else:
                        posLoss, posCount = tf.cond(nPositive > 0, lambda: getPosLoss(posBoxes, posRefIndices, self.nTrainPositives), lambda: tf.tuple([tf.constant(0.0), tf.constant(0,tf.int32)]))
                        negLoss = tf.cond(nNegative > 0, lambda: getNegLoss(negBoxes, self.nTrainBoxes-posCount), lambda: tf.constant(0.0))

                        nPositive = tf.cast(tf.shape(posLoss)[0], tf.float32)
                        nNegative = tf.cond(nNegative > 0, lambda: tf.cast(tf.shape(negLoss)[0], tf.float32), lambda: tf.constant(0.0))
                        
                        return (tf.reduce_mean(posLoss)*nPositive + tf.reduce_mean(negLoss)*nNegative)/(nNegative+nPositive)
    

        return tf.cond(tf.logical_and(tf.shape(proposals)[0] > 0, tf.shape(refBoxes)[0] > 0), lambda: getRefinementLoss(), lambda:tf.constant(0.0))

這里只是R-FCN的loss

總的loss

聯(lián)合訓(xùn)練需要RPN和R-FCN總的loss

def getLoss(self, refBoxes, refClasses):
        return self.rpn.loss(refBoxes) + self.boxRefiner.loss(self.proposals, refBoxes, refClasses)

參考:

RFCN論文筆記
R-FCN論文閱讀

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容