源碼地址:
https://github.com/xdever/RFCN-tensorflow

簡(jiǎn)單結(jié)構(gòu):

k^2(C+1)的conv: ResNet101的最后的輸出是WxHx1024,用K^2(C+1)個(gè)1x1的卷積核去卷積,即可得到K^2(C+1)個(gè)大小為WxH的position sensitive的score map。這步的卷積操作就是在做prediction。這里的k=3,表示把一個(gè)ROI劃分成3*3,對(duì)應(yīng)的9個(gè)位置分別是:上左(左上角),上中,上右,中左,中中,中右,下左,下中,下右(右下角)
k^2(C+1)個(gè)feature map的物理意義: 共有k x k = 9個(gè)顏色,每個(gè)顏色的立體塊(WxHx(C+1))表示的是不同位置存在目標(biāo)的概率值(第一塊黃色表示的是左上角位置,最后一塊淡藍(lán)色表示的是右下角位置)。
pooling公式

z(i,j,c)是第i+k*(j-1)個(gè)立體塊上的第c個(gè)map(1<= i,j <=3)。(i,j)決定了9種位置的某一種位置,假設(shè)為左上角位置(i=j=1),c決定了哪一類,假設(shè)為person類。在z(i,j,c)這個(gè)feature map上的某一個(gè)像素的位置是(x,y),像素值是value,則value表示的是原圖對(duì)應(yīng)的(x,y)這個(gè)位置上可能是人(c=‘person’)且是人的左上部位(i=j=1)的概率值
- ROI pooling的輸入和輸出:ROI pooling操作的輸入(對(duì)于C+1個(gè)類)是k^2(C+1)W' H'(W'和H'是ROI的寬度和高度)的score map上某ROI對(duì)應(yīng)的那個(gè)立體塊(由RPN預(yù)測(cè)的box坐標(biāo),在feature map上進(jìn)行裁剪),且該立體塊組成一個(gè)新的k^2(C+1)W' H'的立體塊:每個(gè)顏色的立體塊(C+1)都只摳出對(duì)應(yīng)位置的一個(gè)bin,把這kk個(gè)bin組成新的立體塊,大小為(C+1)W'H'。例如,下圖中的第一塊黃色只取左上角的bin,最后一塊淡藍(lán)色只取右下角的bin。所有的bin重新組合后就變成了類似右圖的那個(gè)薄的立體塊(圖中的這個(gè)是池化后的輸出,即每個(gè)面上的每個(gè)bin上已經(jīng)是一個(gè)像素。池化前這個(gè)bin對(duì)應(yīng)的是一個(gè)區(qū)域,是多個(gè)像素)。ROI pooling的輸出為為一個(gè)(C+1)k*k的立體塊

介紹結(jié)束,下面開始代碼。。。
注意這是非官方版本的代碼。僅僅是為了學(xué)習(xí)關(guān)于R-FCN的檢測(cè)流程而去讀的這份代碼,代碼本人并沒有調(diào)試。
代碼目錄

Google搜索了很多博客,發(fā)現(xiàn)大家其實(shí)并不是特別關(guān)注R-FCN模型。就把這分代碼搗鼓一下吧。
- main.py是進(jìn)行train的文件,如果你想進(jìn)行訓(xùn)練可以修改這里的代碼。
- testCheckpoint.py是進(jìn)行test的文件,就是測(cè)試ckpt文件中保存參數(shù)的shape的一個(gè)文件,這是與整個(gè)模型獨(dú)立的一個(gè)文件,僅僅是對(duì)ckpt進(jìn)行檢查的文件。
- test.py 這個(gè)文件是進(jìn)行檢測(cè)模型測(cè)試的文件。
test.py
所有的檢測(cè)代碼大部分都是相同的流程,所以首先看test.py模型代碼。
parser = argparse.ArgumentParser(description="RFCN tester")
parser.add_argument('-gpu', type=str, default="0", help='Train on this GPU(s)')
parser.add_argument('-n', type=str, help='Network checkpoint file')
parser.add_argument('-i', type=str, help='Input file.')
parser.add_argument('-o', type=str, default="", help='Write output here.')
parser.add_argument('-p', type=int, default=1, help='Show preview')
parser.add_argument('-threshold', type=float, default=0.5, help='Detection threshold')
parser.add_argument('-delay', type=int, default=-1, help='Delay between frames in visualization. -1 for automatic, 0 for wait for keypress.')
這里的超參數(shù)設(shè)置的比較簡(jiǎn)單。
palette = Visualize.Palette(len(categories))
image = tf.placeholder(tf.float32, [None, None, None, 3])
net = BoxInceptionResnet(image, len(categories), name="boxnet")
boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)
input = PreviewIO.PreviewInput(opt.i)
output = PreviewIO.PreviewOutput(opt.o, input.getFps())
這里使用BoxInceptionResnet構(gòu)建R-FCN模型。
使用PreviewIO進(jìn)行文件的讀寫。
boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)
這里直接通過net.getBoxes獲取模型的檢測(cè)輸出的占位符。
只看test.py 的主要結(jié)構(gòu)。
with tf.Session() as sess:
.........
# 檢查模型
if not CheckpointLoader.loadCheckpoint(sess, None, opt.n, ignoreVarsInFileNotInSess=True):
..........
# 對(duì)img進(jìn)行預(yù)處理
img = preprocessInput(img)
...........
# 把圖片輸入到檢測(cè)模型進(jìn)行計(jì)算
rBoxes, rScores, rClasses = sess.run([boxes, scores, classes], feed_dict={image: np.expand_dims(img, 0)})
...........
# 對(duì)輸入的結(jié)果進(jìn)行可視化
res = Visualize.drawBoxes(img, rBoxes, rClasses, [categories[i] for i in rClasses.tolist()], palette, scores=rScores)
main.py
main.py就是進(jìn)行訓(xùn)練的函數(shù),為了簡(jiǎn)化訓(xùn)練過程,下面只列出重要的訓(xùn)練過程。下面只看loss
# 數(shù)據(jù)讀取
dataset = BoxLoader()
dataset.add(CocoDataset(opt.dataset, randomZoom=opt.randZoom==1, set="train"+opt.cocoVariant))
if opt.mergeValidationSet==1:
dataset.add(CocoDataset(opt.dataset, set="val"+opt.cocoVariant))
.....................
# 獲取圖片的標(biāo)簽
images, boxes, classes = Augment.augment(*dataset.get())
# 獲取檢測(cè)模型
net = BoxInceptionResnet(images, dataset.categoryCount(), name="boxnet", trainFrom=opt.trainFrom, hardMining=opt.hardMining==1, freezeBatchNorm=opt.freezeBatchNorm==1)
# 獲取模型的loss
tf.losses.add_loss(net.getLoss(boxes, classes))
# 構(gòu)建loss操作
def createUpdateOp(gradClip=1):
with tf.name_scope("optimizer"):
optimizer=tf.train.AdamOptimizer(learning_rate=opt.learningRate, epsilon=opt.adamEps)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
totalLoss = tf.losses.get_total_loss()
grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())
if gradClip is not None:
cGrads = []
for g, v in grads:
if g is None:
print("WARNING: no grad for variable "+v.op.name)
continue
cGrads.append((tf.clip_by_value(g, -float(gradClip), float(gradClip)), v))
grads = cGrads
update_ops.append(optimizer.apply_gradients(grads))
return control_flow_ops.with_dependencies([tf.group(*update_ops)], totalLoss, name='train_op')
# 構(gòu)建優(yōu)化操作
trainOp=createUpdateOp()
在main.py文件中有一個(gè)while=true的死循環(huán),使用RunManager對(duì)訓(xùn)練進(jìn)行管理。如下:
runManager = RunManager(sess, options=runOptions, run_metadata=runMetadata)
runManager.add("train", [globalStepInc,trainOp], modRun=1)
在while中進(jìn)行優(yōu)化訓(xùn)練
......
# 進(jìn)行訓(xùn)練
while True:
#run various parts of the network
res = runManager.modRun(i)
.....
# 可是化訓(xùn)練結(jié)果
visualizer.draw(res)
下面來(lái)看檢測(cè)模型BoxInceptionResnet。
在test.py文件中使用net.getBoxes的的方法。
net = BoxInceptionResnet(image, len(categories), name="boxnet")
boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)
在train.py中使用BoxInceptionResnet的net.getLoss和net.getVariables。
net = BoxInceptionResnet(images, dataset.categoryCount(), name="boxnet", trainFrom=opt.trainFrom, hardMining=opt.hardMining==1, freezeBatchNorm=opt.freezeBatchNorm==1)
tf.losses.add_loss(net.getLoss(boxes, classes))
......
grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())
所以在進(jìn)行源碼查看時(shí)就看這幾個(gè)函數(shù)。
發(fā)現(xiàn)BoxInceptionResnet是BoxNetwork的子類。
BoxNetwork.py
# BoxNetwork.py
class BoxNetwork:
def __init__(self, nCategories, rpnLayer, rpnDownscale, rpnOffset, featureLayer=None, featureDownsample=None, featureOffset=None, weightDecay=1e-6, hardMining=True):
'''
featureInput = slim.conv2d(net, 1536, 1)
BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], featureInput, 16, [32,32], weightDecay=weightDecay, hardMining=hardMining)
nCategories: 檢測(cè)的種類
rpnInput:輸入到rpn的feature map
rpnDownscale : rpnDownscale下采樣的尺度
rpnOffset: 圖片的偏置
featureInput : 經(jīng)過編碼的featureInput
featureDownsample: featureDownsample下采樣的尺度
featureOffset:feature的偏置尺寸
'''
if featureLayer is None:
featureLayer=rpnLayer
if featureDownsample is None:
featureDownsample=rpnDownscale
if featureOffset is None:
rpnOffset=featureOffset
with tf.name_scope("BoxNetwork"):
self.rpn = RPN(rpnLayer, immediateSize=512, weightDecay=weightDecay, inputDownscale=rpnDownscale, offset=rpnOffset)
self.boxRefiner = BoxRefinementNetwork(featureLayer, nCategories, downsample=featureDownsample, offset=featureOffset, hardMining=hardMining)
self.proposals, self.proposalScores = self.rpn.getPositiveOutputs(maxOutSize=300)
# 使用閥值過濾篩選出合適的box坐標(biāo)
def getProposals(self, threshold=None):
if threshold is not None and threshold>0:
s = tf.cast(tf.where(self.proposalScores > threshold), tf.int32)
return tf.gather_nd(self.proposals, s), tf.gather_nd(self.proposalScores, s)
else:
return self.proposals, self.proposalScores
# 使用boxRefiner對(duì)feature map 進(jìn)行classes 和 box坐標(biāo)的預(yù)測(cè)
def getBoxes(self, nmsThreshold=0.3, scoreThreshold=0.8):
return self.boxRefiner.getBoxes(self.proposals, self.proposalScores, maxOutputs=50, nmsThreshold=nmsThreshold, scoreThreshold=scoreThreshold)
# 輸入真實(shí)的box 坐標(biāo)和classes。計(jì)算RPN 和 檢測(cè)的總loss
def getLoss(self, refBoxes, refClasses):
return self.rpn.loss(refBoxes) + self.boxRefiner.loss(self.proposals, refBoxes, refClasses)
請(qǐng)仔細(xì)看注釋,因?yàn)檫@個(gè)是R-FCN的基類所以這里的程序比較簡(jiǎn)單。
大體就是使用InceptionResnetV2的最后的輸出的feature map送入RPN,再對(duì)feature map進(jìn)行幾次conv2d運(yùn)算,在BoxRefinementNetwork中進(jìn)行計(jì)算。
注意這里的
def getBoxes(self, nmsThreshold=0.3, scoreThreshold=0.8):
return self.boxRefiner.getBoxes(self.proposals, self.proposalScores, maxOutputs=50, nmsThreshold=nmsThreshold, scoreThreshold=scoreThreshold)
在test.py中的輸出是
boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)
這里的輸出就是通過RPN對(duì)所有的feature map產(chǎn)生box。再由self.boxRefiner.getBoxes在feature map上提取ROI,并且產(chǎn)生boxes, scores, classes。
BoxInceptionResnet.py
# BoxInceptionResnet.py
class BoxInceptionResnet(BoxNetwork):
LAYER_NAMES = ['Conv2d_1a_3x3','Conv2d_2a_3x3','Conv2d_2b_3x3','MaxPool_3a_3x3','Conv2d_3b_1x1','Conv2d_4a_3x3',
'MaxPool_5a_3x3','Mixed_5b','Repeat','Mixed_6a','Repeat_1','Mixed_7a','Repeat_2','Block8','Conv2d_7b_1x1']
def __init__(self, inputs, nCategories, name="BoxNetwork", weightDecay=0.00004, freezeBatchNorm=False, reuse=False, isTraining=True, trainFrom=None, hardMining=True):
self.boxThreshold = 0.5
try:
trainFrom = int(trainFrom)
except:
pass
if isinstance(trainFrom, int):
trainFrom = self.LAYER_NAMES[trainFrom]
print("Training network from "+(trainFrom if trainFrom is not None else "end"))
with tf.variable_scope(name, reuse=reuse) as scope:
# 構(gòu)建基礎(chǔ)的特征提取模型模型
self.googleNet = InceptionResnetV2("features", inputs, trainFrom=trainFrom, freezeBatchNorm=freezeBatchNorm)
self.scope=scope
with tf.variable_scope("Box"):
#Pepeat_1 - last 1/16 layer, Mixed_6a - first 1/16 layer
# 拿到Repeat_1的feature map
scale_16 = self.googleNet.getOutput("Repeat_1")[:,1:-1,1:-1,:]
# 拿到Mixed_6a的feature map
#scale_16 = self.googleNet.getOutput("Mixed_6a")[:,1:-1,1:-1,:]
scale_32 = self.googleNet.getOutput("PrePool")
with slim.arg_scope([slim.conv2d],
weights_regularizer=slim.l2_regularizer(weightDecay),
biases_regularizer=slim.l2_regularizer(weightDecay),
padding='SAME',
activation_fn = tf.nn.relu):
# 合并Repeat_1和PrePool的feature map
# 從這里開始net分為兩個(gè)輸出,一個(gè)是輸出rpnInput,一個(gè)是輸出featureInput
net = tf.concat([ tf.image.resize_bilinear(scale_32, tf.shape(scale_16)[1:3]), scale_16], 3)
rpnInput = slim.conv2d(net, 1024, 1)
#BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], scale_32, 32, [32,32], weightDecay=weightDecay, hardMining=hardMining)
# 這里的featureInput
featureInput = slim.conv2d(net, 1536, 1)
BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], featureInput, 16, [32,32], weightDecay=weightDecay, hardMining=hardMining)
def getVariables(self, includeFeatures=False):
if includeFeatures:
return tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=self.scope.name)
else:
vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=self.scope.name+"/Box/")
vars += self.googleNet.getTrainableVars()
print("Training variables: ", [v.op.name for v in vars])
return vars
def importWeights(self, sess, filename):
self.googleNet.importWeights(sess, filename, includeTraining=True)
注意看上面的注釋:
net分為兩路:rpnInput和featureInput。
# 合并Repeat_1和PrePool的feature map
# 從這里開始net分為兩個(gè)輸出,一個(gè)是輸出rpnInput,一個(gè)是輸出featureInput
net = tf.concat([ tf.image.resize_bilinear(scale_32, tf.shape(scale_16)[1:3]), scale_16], 3)
rpnInput = slim.conv2d(net, 1024, 1)
# BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], scale_32, 32, [32,32], weightDecay=weightDecay, hardMining=hardMining)
# 這里的featureInput
featureInput = slim.conv2d(net, 1536, 1)
BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], featureInput, 16, [32,32], weightDecay=weightDecay, hardMining=hardMining)
如下圖:

在這里有個(gè)比較重要的操作就是RPN和BoxRefinementNetwork
接下來(lái)那就看看怎么計(jì)算RPN吧
RPN.py
為了不把RPN搞得太復(fù)雜,這里就看看被調(diào)用的幾個(gè)函數(shù)
在BoxNetwork.py中使用RPN.如下:
self.rpn = RPN(rpnLayer, immediateSize=512, weightDecay=weightDecay, inputDownscale=rpnDownscale, offset=rpnOffset)
......
self.proposals, self.proposalScores = self.rpn.getPositiveOutputs(maxOutSize=300)
只用到rpn.getPositiveOutputs
# RPN.py
class RPN:
def __init__(self, input, anchors=None, immediateSize=512, weightDecay=1e-5, inputDownscale=16, offset=[32,32]):
self.input = input
self.anchors = anchors
self.inputDownscale = inputDownscale
self.offset = offset
self.anchors = anchors if anchors is not None else self.makeAnchors([64,128,256,512])
print("Anchors: ", self.anchors)
self.tfAnchors = tf.constant(self.anchors, dtype=tf.float32)
self.hA=tf.reshape(self.tfAnchors[:,0],[-1])
self.wA=tf.reshape(self.tfAnchors[:,1],[-1])
self.nAnchors = len(self.anchors)
self.positiveIouThreshold=0.7
self.negativeIouThreshold=0.3
self.regressionWeight=1.0
self.nBoxLosses=256
self.nPositiveLosses=128
#dimensions
with tf.name_scope('dimension_info'):
s = tf.shape(self.input)
self.hIn = s[1]
self.wIn = s[2]
self.imageH = tf.cast(self.hIn*self.inputDownscale+self.offset[0]*2, tf.float32)
self.imageW = tf.cast(self.wIn*self.inputDownscale+self.offset[1]*2, tf.float32)
self.define(immediateSize, weightDecay)
def define(self, immediateSize, weightDecay):
with tf.name_scope('RPN'):
with slim.arg_scope([slim.conv2d], weights_regularizer=slim.l2_regularizer(weightDecay), padding='SAME'):
#box prediction layers
with tf.name_scope('NN'):
net = slim.conv2d(self.input, immediateSize, 3, activation_fn=tf.nn.relu)
scores = slim.conv2d(net, 2*self.nAnchors, 1, activation_fn=None)
boxRelativeCoordinates = slim.conv2d(net, 4*self.nAnchors, 1, activation_fn=None)
#split coordinates
x_raw, y_raw, w_raw, h_raw = tf.split(boxRelativeCoordinates, 4, axis=3)
#Save raw box sizes for loss
self.rawSizes = BoxUtils.mergeBoxData([w_raw, h_raw])
#Convert NN outputs to BBox coordinates
self.boxes = BoxUtils.nnToImageBoxes(x_raw, y_raw, w_raw, h_raw, self.wA, self.hA, self.inputDownscale, self.offset)
#store the size of every box
with tf.name_scope('box_sizes'):
boxSizes = tf.reshape(self.tfAnchors, [1,1,1,-1,2])
boxSizes = tf.tile(boxSizes, tf.stack([1,self.hIn,self.wIn,1,1]))
self.boxSizes = tf.reshape(boxSizes, [-1,2])
#scores
self.scores = tf.reshape(scores, [-1,2])
這里的define輸出self.boxes 和self.boxes。這里可以理解為使用Faster R-CNN一樣的原理。這里輸入的rpnInput輸出為self.boxes 和self.boxes。這里的self.boxes是針對(duì)rpnInput的每一個(gè)點(diǎn)產(chǎn)生的box坐標(biāo),self.boxes是判斷box里面是否有物體,為一個(gè)二分類判別器。
在BoxNetwork調(diào)用的是rpn.filterOutputBoxe
# RPN.py
def filterOutputBoxes(self, boxes, scores, others=[], preNmsCount=6000, maxOutSize=300, nmsThreshold=0.7):
with tf.name_scope("filter_output_boxes"):
scores = tf.nn.softmax(scores)[:,1]
scores = tf.reshape(scores,[-1])
#Clip boxes to edge
boxes = self.clipBoxesToEdge(boxes)
#Remove empty boxes
boxes, scores = BoxUtils.filterSmallBoxes(boxes, [scores])
scores, boxes = tf.cond(tf.shape(scores)[0] > preNmsCount , lambda: tf.tuple(MultiGather.gatherTopK(scores, preNmsCount, [boxes])), lambda: tf.tuple([scores, boxes]))
#NMS filter
nmsIndices = tf.image.non_max_suppression(boxes, scores, iou_threshold=nmsThreshold, max_output_size=maxOutSize)
nmsIndices = tf.expand_dims(nmsIndices, axis=-1)
return MultiGather.gather([boxes, scores]+others, nmsIndices)
def getPositiveOutputs(self, preNmsCount=6000, maxOutSize=300, nmsThreshold=0.7):
boxes, scores = self.filterOutputBoxes(self.boxes, self.scores, preNmsCount=preNmsCount, nmsThreshold=nmsThreshold, maxOutSize=maxOutSize)
return boxes, scores
filterOutputBoxe就是對(duì)boxes, scores進(jìn)過濾對(duì)使用nms和閥值等方法過濾掉不合適的boxes。
- 裁剪超出邊界的box
- 2.一出不包含物體的box
- 3.使用NMS進(jìn)行過濾
BoxRefinementNetwork.py
在BoxNetwork中是使用BoxRefinementNetwork
self.boxRefiner = BoxRefinementNetwork(featureLayer, nCategories, downsample=featureDownsample, offset=featureOffset, hardMining=hardMining)
................
self.boxRefiner.getBoxes(self.proposals, self.proposalScores, maxOutputs=50, nmsThreshold=nmsThreshold, scoreThreshold=scoreThreshold)
這里通過BoxRefinementNetwork使用到了self.boxRefiner.getBoxes
直接來(lái)看BoxRefinementNetwork
# BoxRefinementNetwork.py
class BoxRefinementNetwork:
POOL_SIZE=3
def __init__(self, input, nCategories, downsample=16, offset=[32,32], hardMining=True):
self.downsample = downsample
self.offset = offset
# 設(shè)置分類個(gè)數(shù)
self.nCategories = nCategories
# 輸出(self.POOL_SIZE**2)*(1+nCategories)個(gè)feature map的classes scores。
self.classMaps = slim.conv2d(input, (self.POOL_SIZE**2)*(1+nCategories), 3, activation_fn=None, scope='classMaps')
# (self.POOL_SIZE**2)*4 個(gè)feature map的box坐標(biāo)
self.regressionMap = slim.conv2d(input, (self.POOL_SIZE**2)*4, 3, activation_fn=None, scope='regressionMaps')
self.hardMining=hardMining
#Magic parameters.
self.posIouTheshold = 0.5
self.negIouThesholdHi = 0.5
self.negIouThesholdLo = 0.1
self.nTrainBoxes = 128
self.nTrainPositives = 32
self.falseValue = 0.0002
這里比較繞的就是這里的三個(gè)函數(shù),從下往上調(diào)用。
def roiPooling(self, layer, boxes):
return positionSensitiveRoiPooling(layer, boxes, offset=self.offset, downsample=self.downsample, roiSize=self.POOL_SIZE)
def roiMean(self, layer, boxes):
with tf.name_scope("roiMean"):
return tf.reduce_mean(self.roiPooling(layer, boxes), axis=[1,2])
def getBoxScores(self, boxes):
with tf.name_scope("getBoxScores"):
return self.roiMean(self.classMaps, boxes)
下圖就是實(shí)現(xiàn)過程:

上面的ROI就是這里輸入的box坐標(biāo)。box的坐標(biāo)在self.classMaps上進(jìn)行裁剪,拿到對(duì)應(yīng)的ROI的feature map再進(jìn)行對(duì)應(yīng)的roi_pooling操作。選擇出合適的類別。這里的具體過程還沒仔細(xì)分析。。。。。。
還記得self.classMaps和self.regressionMap嗎??
這里的self.classMaps是用來(lái)計(jì)算分?jǐn)?shù)的,使用getBoxScores已經(jīng)對(duì)self.classMaps操作完了,篩選出合適的box了。現(xiàn)在需要使用refineBoxes對(duì)self.regressionMap進(jìn)行裁剪操作,得到最后的positives 的box
def getBoxes(self, proposals, proposal_scores, maxOutputs=30, nmsThreshold=0.3, scoreThreshold=0.8):
if scoreThreshold is None:
scoreThreshold = 0
with tf.name_scope("getBoxes"):
scores = tf.nn.softmax(self.getBoxScores(proposals))
classes = tf.argmax(scores, 1)
scores = tf.reduce_max(scores, axis=1)
posIndices = tf.cast(tf.where(tf.logical_and(classes > 0, scores>scoreThreshold)), tf.int32)
positives, scores, classes = MultiGather.gather([proposals, scores, classes], posIndices)
positives = self.refineBoxes(positives, False)
#Final NMS
posIndices = tf.image.non_max_suppression(positives, scores, iou_threshold=nmsThreshold, max_output_size=maxOutputs)
posIndices = tf.expand_dims(posIndices, axis=-1)
positives, scores, classes = MultiGather.gather([positives, scores, classes], posIndices)
classes = tf.cast(tf.cast(classes,tf.int32) - 1, tf.uint8)
return positives, scores, classes
可以這么說(shuō)這里檢測(cè)流程已經(jīng)結(jié)束了
loss
def loss(self, proposals, refBoxes, refClasses):
with tf.name_scope("BoxRefinementNetworkLoss"):
proposals = tf.stop_gradient(proposals)
# 位置loss
def getPosLoss(positiveBoxes, positiveRefIndices, nPositive):
with tf.name_scope("getPosLoss"):
positiveRefIndices = tf.reshape(positiveRefIndices,[-1,1])
positiveClasses, positiveRefBoxes = MultiGather.gather([refClasses, refBoxes], positiveRefIndices)
positiveClasses = tf.cast(tf.cast(positiveClasses,tf.int8) + 1, tf.uint8)
if not self.hardMining:
selected = Utils.RandomSelect.randomSelectIndex(tf.shape(positiveBoxes)[0], nPositive)
positiveBoxes, positiveClasses, positiveRefBoxes = MultiGather.gather([positiveBoxes, positiveClasses, positiveRefBoxes], selected)
return tf.tuple([self.classRefinementLoss(positiveBoxes, positiveClasses) + self.boxRefinementLoss(positiveBoxes, positiveRefBoxes), tf.shape(positiveBoxes)[0]])
#
def getNegLoss(negativeBoxes, nNegative):
with tf.name_scope("getNetLoss"):
if not self.hardMining:
negativeIndices = Utils.RandomSelect.randomSelectIndex(tf.shape(negativeBoxes)[0], nNegative)
negativeBoxes = tf.gather_nd(negativeBoxes, negativeIndices)
return self.classRefinementLoss(negativeBoxes, tf.zeros(tf.stack([tf.shape(negativeBoxes)[0],1]), dtype=tf.uint8))
def getRefinementLoss():
with tf.name_scope("getRefinementLoss"):
iou = BoxUtils.iou(proposals, refBoxes)
maxIou = tf.reduce_max(iou, axis=1)
bestIou = tf.expand_dims(tf.cast(tf.argmax(iou, axis=1), tf.int32), axis=-1)
#Find positive and negative indices based on their IOU
posBoxIndices = tf.cast(tf.where(maxIou > self.posIouTheshold), tf.int32)
negBoxIndices = tf.cast(tf.where(tf.logical_and(maxIou < self.negIouThesholdHi, maxIou > self.negIouThesholdLo)), tf.int32)
#Split the boxes and references
posBoxes, posRefIndices = MultiGather.gather([proposals, bestIou], posBoxIndices)
negBoxes = tf.gather_nd(proposals, negBoxIndices)
#Add GT boxes
posBoxes = tf.concat([posBoxes,refBoxes], 0)
posRefIndices = tf.concat([posRefIndices, tf.reshape(tf.range(tf.shape(refClasses)[0]), [-1,1])], 0)
#Call the loss if the box collection is not empty
nPositive = tf.shape(posBoxes)[0]
nNegative = tf.shape(negBoxes)[0]
if self.hardMining:
posLoss = tf.cond(nPositive > 0, lambda: getPosLoss(posBoxes, posRefIndices, 0)[0], lambda: tf.zeros((0,), tf.float32))
negLoss = tf.cond(nNegative > 0, lambda: getNegLoss(negBoxes, 0), lambda: tf.zeros((0,), tf.float32))
allLoss = tf.concat([posLoss, negLoss], 0)
return tf.cond(tf.shape(allLoss)[0]>0, lambda: tf.reduce_mean(Utils.MultiGather.gatherTopK(allLoss, self.nTrainBoxes)), lambda: tf.constant(0.0))
else:
posLoss, posCount = tf.cond(nPositive > 0, lambda: getPosLoss(posBoxes, posRefIndices, self.nTrainPositives), lambda: tf.tuple([tf.constant(0.0), tf.constant(0,tf.int32)]))
negLoss = tf.cond(nNegative > 0, lambda: getNegLoss(negBoxes, self.nTrainBoxes-posCount), lambda: tf.constant(0.0))
nPositive = tf.cast(tf.shape(posLoss)[0], tf.float32)
nNegative = tf.cond(nNegative > 0, lambda: tf.cast(tf.shape(negLoss)[0], tf.float32), lambda: tf.constant(0.0))
return (tf.reduce_mean(posLoss)*nPositive + tf.reduce_mean(negLoss)*nNegative)/(nNegative+nPositive)
return tf.cond(tf.logical_and(tf.shape(proposals)[0] > 0, tf.shape(refBoxes)[0] > 0), lambda: getRefinementLoss(), lambda:tf.constant(0.0))
這里只是R-FCN的loss

總的loss
聯(lián)合訓(xùn)練需要RPN和R-FCN總的loss
def getLoss(self, refBoxes, refClasses):
return self.rpn.loss(refBoxes) + self.boxRefiner.loss(self.proposals, refBoxes, refClasses)