四房播播色播天美传媒,人妻熟女一区二区,性欧美精品一区二区三

這篇文章解析以下CTPN的模型結構。
不多說，先上代碼，用的是tensorflow1中的slim庫。

def model(image):
    image = mean_image_subtraction(image)
    
    with slim.arg_scope(vgg.vgg_arg_scope()):
        conv5_3 = vgg.vgg_16(image)

    rpn_conv = slim.conv2d(conv5_3, 512, 3)

    lstm_output = Bilstm(rpn_conv, 512, 128, 512, scope_name='BiLSTM')

    bbox_pred = lstm_fc(lstm_output, 512, 10 * 4, scope_name="bbox_pred")
    cls_pred = lstm_fc(lstm_output, 512, 10 * 2, scope_name="cls_pred")

    cls_pred_shape = tf.shape(cls_pred)
    cls_pred_reshape = tf.reshape(cls_pred, [cls_pred_shape[0], cls_pred_shape[1], -1, 2])

    cls_pred_reshape_shape = tf.shape(cls_pred_reshape)
    cls_prob = tf.reshape(tf.nn.softmax(tf.reshape(cls_pred_reshape, [-1, cls_pred_reshape_shape[3]])),
                          [-1, cls_pred_reshape_shape[1], cls_pred_reshape_shape[2], cls_pred_reshape_shape[3]],
                          name="cls_prob")

    return bbox_pred, cls_pred, cls_prob

然后我們一句句來看。

image = mean_image_subtraction(image)

這一句，是因為一般使用VGG，用的是在Imgnet上預訓練好的，需要減去Imgnet上圖片的均值，做數(shù)據(jù)特征標準化。

with slim.arg_scope(vgg.vgg_arg_scope()):
        conv5_3 = vgg.vgg_16(image)

這一步得到VGG16提取的特征圖，取的是conv5_3那一層。

rpn_conv = slim.conv2d(conv5_3, 512, 3)

這一句進行卷積核大小是3，步長為1的卷積進一步提取特征。

lstm_output = Bilstm(rpn_conv, 512, 128, 512, scope_name='BiLSTM')

這一步將上面得到的特征輸入到一個雙向的LSTM中。BiLSTM函數(shù)后面會進一步解析。

bbox_pred = lstm_fc(lstm_output, 512, 10 * 4, scope_name="bbox_pred")
cls_pred = lstm_fc(lstm_output, 512, 10 * 2, scope_name="cls_pred")

這兩句是將LSTM層的得到的結果輸入到全連接層得到最終輸出。
其中bbox_pred得到的是檢測框中心點坐標x,y的偏移以及寬和高的偏移。這里其實和原論文中描述的不一樣，這份代碼是根據(jù)Faster RCNN改過來的。原論文中，不需要預測x的偏移以及w的偏移，因為在CTPN設計的時候w固定死了為16個像素（因為vgg提取的特征圖大小是原圖1/16），x就是特征圖上的點映射到原圖上的區(qū)域的中心點的x坐標。所以原CTPN只需要預測y的偏移以及h的偏移就好了。當然像這里這么做也有好處，之后會提到，有side-refinement的作用。
cls_pred得到的是這個框中文本/非文本的得分（置信度）。也就是是文本的可能性和不是文本的可能性。

cls_pred_shape = tf.shape(cls_pred)
cls_pred_reshape = tf.reshape(cls_pred, [cls_pred_shape[0], cls_pred_shape[1], -1, 2])

cls_pred_reshape_shape = tf.shape(cls_pred_reshape)
cls_prob = tf.reshape(tf.nn.softmax(tf.reshape(cls_pred_reshape, [-1, cls_pred_reshape_shape[3]])),
                          [-1, cls_pred_reshape_shape[1], cls_pred_reshape_shape[2], cls_pred_reshape_shape[3]],
                          name="cls_prob")

這些操作是為了計算文本/非文本的概率。主要是為了使用softmax，所以需要先對cls_pred進行reshape，然后再reshape回去。最后cls_prob就是文本/非文本的概率。
下面介紹解析Bilstm函數(shù)，具體可以看注解。

def Bilstm(net, input_channel, hidden_unit_num, output_channel, scope_name):
    # width--->time step
    with tf.variable_scope(scope_name) as scope:
        # reshape 輸入
        shape = tf.shape(net)
        N, H, W, C = shape[0], shape[1], shape[2], shape[3]
        net = tf.reshape(net, [N * H, W, C])
        net.set_shape([None, None, input_channel])
        # 聲明前向LSTM lstm_fw_cell 和后向LSTM lstm_bw_cell 
        lstm_fw_cell = tf.contrib.rnn.LSTMCell(hidden_unit_num, state_is_tuple=True)
        lstm_bw_cell = tf.contrib.rnn.LSTMCell(hidden_unit_num, state_is_tuple=True)
        # 建立雙向LSTM
        lstm_out, last_state = tf.nn.bidirectional_dynamic_rnn(lstm_fw_cell, lstm_bw_cell, net, dtype=tf.float32)
       # concat LSTM的輸出
        lstm_out = tf.concat(lstm_out, axis=-1)
        # reshape LSTM的輸出，為了之后輸入到全連接層
        lstm_out = tf.reshape(lstm_out, [N * H * W, 2 * hidden_unit_num])
        
        # 這里建立全連接層，初始化weight，biases
        init_weights = tf.contrib.layers.variance_scaling_initializer(factor=0.01, mode='FAN_AVG', uniform=False)
        init_biases = tf.constant_initializer(0.0)
        weights = make_var('weights', [2 * hidden_unit_num, output_channel], init_weights)
        biases = make_var('biases', [output_channel], init_biases)

        outputs = tf.matmul(lstm_out, weights) + biases
         # 全連接結束， 再reshape回去
        outputs = tf.reshape(outputs, [N, H, W, output_channel])
        return outputs

論文中其實還提到了side-refinement。這個東西是用來修正文本行的邊界，這里沒有實現(xiàn)，作者提供的caffe代碼也沒有實現(xiàn)（甚至沒有訓練代碼）。但是這里我們預測框的時候?qū)虻乃轿恢煤蛯挾纫策M行了修正，所以這也起來了side-refinement的作用。

到這里model就解析完了，之后是reshape的細節(jié)，不感興趣的可以不看。

先介紹一下tf.reshape()函數(shù)。 tf.reshape(輸入，[reshape之后的維度])。
首先一般圖像輸入都是四維的 NHW*C，N代表幾張圖片也就是batch_size，W的圖片的寬度，H的圖片的高度，C是圖片的通道數(shù)。各個模型這幾維的排列順序可能不同。
還有一個地方，四個維度中如果有一個填的是-1，那代表的意思就是這個維度是多少是自動計算的。還有就是因為tensorflow是靜態(tài)圖，所以你在運行的時候如果想得到shape只能調(diào)用tf.shape()。
之前我們得到的cls_pred是文本/非文本的得分，我們需要的是文本/非文本的概率，那么自然而然就想到用softmax。比如說文本/非文本分數(shù)是 [3,1]，那么softmax之后應該是[0.5 , 0.5]，softmax在這里就不介紹了，具體的可以看這個。http://www.itdecent.cn/p/4666213d20c7](http://www.itdecent.cn/p/4666213d20c7
這里想用tf.nn.softmax，默認計算輸入張量的最后一個維度。那我們就需要把我們之前得到的cls_pred reshape成最后一個維度是文本/非文本得分，也就是說最后一個維度大小是2。

cls_pred_reshape = tf.reshape(cls_pred, [cls_pred_shape[0], cls_pred_shape[1], -1, 2])
tf.reshape(cls_pred_reshape, [-1, cls_pred_reshape_shape[3]])

上面兩句話完成了這一部分工作。

tf.nn.softmax(tf.reshape(cls_pred_reshape, [-1, cls_pred_reshape_shape[3]]))

上面的代碼，對reshape之后的張量進行softmax操作

 tf.reshape(tf.nn.softmax(tf.reshape(cls_pred_reshape, [-1, cls_pred_reshape_shape[3]])),
                          [-1, cls_pred_reshape_shape[1], cls_pred_reshape_shape[2], cls_pred_reshape_shape[3]],
                          name="cls_prob")

最后再reshape回去。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

CTPN model

CTPN model

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

CTPN model

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av