VGG-Very Deep Convolutional Networks for Large-Scale Visual Recognition

摘要

VGG 網(wǎng)絡在ILSVRC2014挑戰(zhàn)賽上取得了定位第一,分類第二的成績,作者來自牛津大學的視覺幾何組( Visual Geometry Group,估計是VGG命名的來源)。其主要貢獻在于主要探討了深度對于網(wǎng)絡的重要性,利用小的尺寸核代替大的卷積核,然后把網(wǎng)絡做深;分別建立了16層,19層的深度網(wǎng)絡(即VGG16,VGG19)。目前在分類,檢測,關鍵點定位中得到了非常廣泛的應用,目標檢測算法如YOLO,SSD,S3FD;人臉關鍵點定位算法如DAN 等都采用VGG16作為特征提取網(wǎng)絡。

網(wǎng)絡性能

image.png
image.png

網(wǎng)絡由來

  • 感受野
    感受野(receptive field)指的是,在卷積神經(jīng)網(wǎng)絡CNN中,決定某一層輸出結果中一個元素所對應的輸入層的區(qū)域大小。比如,一個7x7的圖像卷積層,該層輸出的特征圖的每一個元素對應該層輸入的7x7區(qū)域,這個區(qū)域即為感受野。另外,感受野是相互累計的,即卷積神經(jīng)網(wǎng)絡中每一層的感受野都是相對于第一層輸入而言的,因此計算中需注意:
     ?。?)第一層卷積層的輸出特征圖像素的感受野的大小等于濾波器的大小
     ?。?)深層卷積層的感受野大小和它之前所有層的濾波器大小和步長有關系。
    詳細的感受野計算參見文章末尾的參考鏈接。


    感受野
  • 小尺寸vs 大尺寸卷積核與感受野
    AlexNet最開始的7x7的卷積核的感受野是:7x7。而通過上文的感受野計算公式,對于一個卷積層,在步長相同的情況下,2個3x3卷積核的感受野與1個5x5卷積核的感受野一致,3個3x3卷積核的感受野與1個7x7卷積核的感受野一致,而在參數(shù)量上,3層3x3卷積的參數(shù)量要少于1層7x7卷積(假設輸入輸出通道數(shù)都為C,則參數(shù)量為3x(3x3xCxC)=27CxC v.s. 1x(7x7xCxC)=49CxC)。具體地,VGG前3層卷積的感受野分別為:
    第一個卷積核的感受野:3x3
    第二個卷積核的感受野:(3-1)x 1+3=5
    第三個卷積核的感受野:(5-1)x 1+3=7
    可見三個3x3卷積核和一個7x7卷積核的感受野是一樣的,但是3*3卷積核可以把網(wǎng)絡做的更深。VGGNet不好的一點是它耗費更多計算資源,并且使用了更多的參數(shù),導致更多的內(nèi)存占用。


    image.png

網(wǎng)絡結構

VGG的網(wǎng)絡結構非常簡單,采用conv2d-relu-BatchNorm作為基礎單元,若干(2/3/4)個 這樣的基礎單元形成一組(vgg_block),每組后連接一個2x2的maxpool進行降采樣;若干個組形成不同深度的VGG網(wǎng)絡。
為了減少深度網(wǎng)絡的參數(shù)量,整個網(wǎng)絡中一律采用kernel=3x3,stride=1的卷積
在分類問題上,最后一組經(jīng)過maxpool后經(jīng)過flatten,dropout,3xfull connect,softmax后輸出類別和置信概率。最初的Imagenet預訓練模型是在caffe框架下訓練得到的,后來tensorflow,mxnet,pytorch等都形成了各自的VGG預訓練模型,可以直接將重要的單元結構的預訓練數(shù)據(jù)進行遷移并在此基礎上做微調(diào)即可用在新的視覺任務中。VGG16和VGG19都包含了5組vgg_block,只是每組vgg_block中的基礎單元數(shù)不同,VGG16是2+2+3+3+3,VGG19是2+2+4+4+4, 再加上后面的3個full connection layer, 一共是16和19。本文重點介紹VGG16,其詳細結構如下:

-------------------------------------------------- 
layer          | kh x kw, out, s | out size 
-------------------------------------------------- 
         input image (224 x 224 x3)
-------------------------------------------------- 
conv1_1        | 3x3, 64, 1      | 224x224x64 
conv1_2        | 3x3, 64, 1      | 224x224x64 
-------------------------------------------------- 
max_pool       | 2x2, 64,2       | 112x112x64
-------------------------------------------------- 
conv2_1        | 3x3, 128, 1     | 112x112x128
conv2_2        | 3x3, 128, 1     | 112x112x128
-------------------------------------------------- 
max_pool       | 2x2, 2          | 56x56x128
-------------------------------------------------- 
conv3_1        | 3x3, 256, 1     | 56x56x256 
conv3_2        | 3x3, 256, 1     | 56x56x256 
conv3_3        | 3x3, 256, 1     | 56x56x256 
-------------------------------------------------- 
max_pool       | 2x2, 256,2      | 28x28x256
-------------------------------------------------- 
conv4_1        | 3x3, 512, 1     | 28x28x512 
conv4_2        | 3x3, 512, 1     | 28x28x512 
conv4_3        | 3x3, 512, 1     | 28x28x512 
-------------------------------------------------- 
max_pool       | 2x2, 512,2      | 14x14x512
-------------------------------------------------- 
conv5_1        | 3x3, 512, 1     | 14x14x512 
conv5_2        | 3x3, 512, 1     | 14x14x512 
conv5_3        | 3x3, 512, 1     | 14x14x512 
-------------------------------------------------- 
max_pool       | 2x2, 512,2      | 7x7x512
-------------------------------------------------- 
fc6            | 4096            | 1x1x4096 
fc7            | 4096            | 1x1x4096 
fc8            | 1000            | 1x1x1000
Softmax        | Classifier      | 1x1x1000
--------------------------------------------------

代碼實現(xiàn)

tensorflow(以下簡稱tf)中構建cnn網(wǎng)絡的python API主要有3種:

  • tf.nn
  • tf.layers
  • tf.contrib.layers
    封裝程度逐個遞進,其中tf.nn定義卷積層等是最為復雜的,tf.contrib.layers相對簡單方便的多。值得一提的是,tf.contrib 模塊使用起來能夠結合python的高級語法特性,使得定義網(wǎng)絡結構的代碼可以得到很大程度的簡化,更加可讀并且pythonic,其中的tf.contrib.slim(TF-Slim)也因如此而廣受用戶歡迎,目前很多成熟的網(wǎng)絡結構都是基于該模塊實現(xiàn)的。因為在tensorflow官網(wǎng)的解釋比較詳細。鑒于不同用戶采用的tf API模塊不同,其實現(xiàn)的VGG代碼也不相同,此處做統(tǒng)一整理和比較,以供參考。
方式0

tf.nn神經(jīng)網(wǎng)絡模塊是tensorflow用于深度學習計算的核心模塊,包括conv2d, pool, relu等為首的卷積, 池化,激活等各種操作(opterator)。以圖像2d卷積為例,tf.nn.conv2d 接受一個4D的input([batch, h, w, channel])和一個4D的kernel(kh, kw, in_channel, out_channel)以及stride(int or list) 執(zhí)行2d卷積操作,其中卷積核kernel 以及偏置bias都需要事先根據(jù)shape構造,通常的做法是通過python自己先進行封裝,定義一個能通過指定kw,kh,out_channel等參數(shù)來自動構造卷積層的函數(shù)(如代碼中的conv_op)。全連接層,最大池化也通過類似的方法進行構造。詳細代碼如下:

# --------------------------Method 0 --------------------------------------------
import tensorflow as tf
# 用來創(chuàng)建卷積層并把本層的參數(shù)存入?yún)?shù)列表
def conv_op(input_op, name, kh, kw, n_out, dh, dw, p):
    """
    define conv operator with tf.nn 
    :param input_op: 輸入的tensor
    :param name: 該層的名稱
    :param kh: 卷積層的高
    :param kw: 卷積層的寬
    :param n_out: 輸出通道數(shù)
    :param dh: 步長的高
    :param dw: 步長的寬
    :param p: 參數(shù)列表
    :return: 
    """
    # 輸入的通道數(shù)
    n_in = input_op.get_shape()[-1].value
    with tf.name_scope(name) as scope:
        kernel = tf.get_variable(scope + "w", shape=[kh, kw, n_in, n_out], dtype=tf.float32,
                                 initializer=tf.contrib.layers.xavier_initializer_conv2d())
        conv = tf.nn.conv2d(input_op, kernel, (1, dh, dw, 1), padding='SAME')
        bias_init_val = tf.constant(0.0, shape=[n_out], dtype=tf.float32)
        biases = tf.Variable(bias_init_val, trainable=True, name='b')
        z = tf.nn.bias_add(conv, biases)
        activation = tf.nn.relu(z, name=scope)
        p += [kernel, biases]
        return activation


# 定義全連接層
def fc_op(input_op, name, n_out, p):
    """
    define full connect opterator with tf.nn 
    :param input_op: 輸入的tensor
    :param name: 該層的名稱
    :param n_out: 輸出通道數(shù)
    :param p: 參數(shù)列表
    :return: 
    """
    n_in = input_op.get_shape()[-1].value
    with tf.name_scope(name) as scope:
        kernel = tf.get_variable(scope + 'w', shape=[n_in, n_out], dtype=tf.float32,
                                 initializer=tf.contrib.layers.xavier_initializer_conv2d())
        biases = tf.Variable(tf.constant(0.1, shape=[n_out], dtype=tf.float32), name='b')
        # tf.nn.relu_layer()用來對輸入變量input_op與kernel做乘法并且加上偏置b
        activation = tf.nn.relu_layer(input_op, kernel, biases, name=scope)
        p += [kernel, biases]
        return activation


# 定義最大池化層
def mpool_op(input_op, name, kh, kw, dh, dw):
    return tf.nn.max_pool(input_op, ksize=[1, kh, kw, 1], strides=[1, dh, dw, 1], padding='SAME', name=name)


# 定義網(wǎng)絡結構 Method 0
def vgg16_op(input_op, keep_prob):
    p = []
    conv1_1 = conv_op(input_op, name='conv1_1', kh=3, kw=3, n_out=64, dh=1, dw=1, p=p)
    conv1_2 = conv_op(conv1_1, name='conv1_2', kh=3, kw=3, n_out=64, dh=1, dw=1, p=p)
    pool1 = mpool_op(conv1_2, name='pool1', kh=2, kw=2, dw=2, dh=2)

    conv2_1 = conv_op(pool1, name='conv2_1', kh=3, kw=3, n_out=128, dh=1, dw=1, p=p)
    conv2_2 = conv_op(conv2_1, name='conv2_2', kh=3, kw=3, n_out=128, dh=1, dw=1, p=p)
    pool2 = mpool_op(conv2_2, name='pool2', kh=2, kw=2, dw=2, dh=2)

    conv3_1 = conv_op(pool2, name='conv3_1', kh=3, kw=3, n_out=256, dh=1, dw=1, p=p)
    conv3_2 = conv_op(conv3_1, name='conv3_2', kh=3, kw=3, n_out=256, dh=1, dw=1, p=p)
    conv3_3 = conv_op(conv3_2, name='conv3_3', kh=3, kw=3, n_out=256, dh=1, dw=1, p=p)
    pool3 = mpool_op(conv3_3, name='pool3', kh=2, kw=2, dw=2, dh=2)

    conv4_1 = conv_op(pool3, name='conv4_1', kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
    conv4_2 = conv_op(conv4_1, name='conv4_2', kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
    conv4_3 = conv_op(conv4_2, name='conv4_3', kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
    pool4 = mpool_op(conv4_3, name='pool4', kh=2, kw=2, dw=2, dh=2)

    conv5_1 = conv_op(pool4, name='conv5_1', kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
    conv5_2 = conv_op(conv5_1, name='conv5_2', kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
    conv5_3 = conv_op(conv5_2, name='conv5_3', kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
    pool5 = mpool_op(conv5_3, name='pool5', kh=2, kw=2, dw=2, dh=2)

    shp = pool5.get_shape()
    print("pool5 shape ", shp)

    flattened_shape = shp[1].value * shp[2].value * shp[3].value
    resh1 = tf.reshape(pool5, [-1, flattened_shape], name="resh1")

    fc6 = fc_op(resh1, name="fc6", n_out=4096, p=p)
    fc6_drop = tf.nn.dropout(fc6, keep_prob, name='fc6_drop')
    fc7 = fc_op(fc6_drop, name="fc7", n_out=4096, p=p)
    fc7_drop = tf.nn.dropout(fc7, keep_prob, name="fc7_drop")
    fc8 = fc_op(fc7_drop, name="fc8", n_out=1000, p=p)
    softmax = tf.nn.softmax(fc8)
    predictions = tf.argmax(softmax, 1)
    return predictions, softmax, fc8, p

  • 方式1
    tf.layers模塊屬于TensorFlow的一個穩(wěn)定的中層API,算是tf.nn模塊的抽象,封裝了Conv2D, Dense,BatchNormalization,Conv2DTranspose等類和conv2d等函數(shù),極大地加快了模型的構建速度。如卷積層的構建可以使用conv = tf.layers.conv2d(x, filters=32, kernel_size=3, padding="same", strides=1, activation=tf.nn.relu) 一行代碼實現(xiàn),同時還可以直接指定卷積后激活的函數(shù)?;谠撃K的VGG16代碼實現(xiàn)如下:
# --------------------------Method 1 --------------------------------------------
import tensorflow as tf
class VGG1:
    """
    define with tf.layers
    """
    def __init__(self, resolution_inp=224, channel=3, name='vgg'):
        """
        construct function
        :param resolution_inp: int, size of input image. default 224 of ImageNet
        :param channel: int, channel of input image. 1 or 3
        :param name: 
        """
        self.name = name
        self.channel = channel
        self.resolution_inp = resolution_inp

    def __call__(self, x, dropout=0.5, is_training=True):
        with tf.variable_scope(self.name) as scope:
            size = 64
            se = self.vgg_block(x, 2, size, is_training=is_training)
            se = self.vgg_block(se, 2, size * 2, is_training=is_training)
            se = self.vgg_block(se, 3, size * 4, is_training=is_training)
            se = self.vgg_block(se, 3, size * 8, is_training=is_training)
            se = self.vgg_block(se, 3, size * 8, is_training=is_training)

            flatten = tcl.flatten(se)
            fc6 = tf.layers.dense(flatten, 4096)
            fc6_drop = tcl.dropout(fc6, dropout, is_training=is_training)
            fc7 = tf.layers.dense(fc6_drop, 4096)
            fc7_drop = tcl.dropout(fc7, dropout, is_training=is_training)
            self.fc_out = tf.layers.dense(fc7_drop, 1000)

            # predict for classify
            softmax = tf.nn.softmax(self.fc_out)
            self.predictions = tf.argmax(softmax, 1)
            return self.predictions

    def vgg_block(self, x, num_convs, num_channels, scope=None, is_training=True):
        """
        define the basic repeat unit in vgg: n x (conv-relu-batchnorm)-maxpool
        :param x: tensor or numpy.array, input
        :param num_convs: int, number of conv-relu-batchnorm 
        :param num_channels: int, number of conv filters
        :param scope: name space or scope
        :param is_training: bool, is training or not
        :return: 
        """
        with tf.variable_scope(scope, "conv"):
            se = x
            # conv-relu-batchnorm group
            for i in range(num_convs):
                se = tf.layers.conv2d(se,
                                      filters=num_channels,
                                      kernel_size=3,
                                      padding="same",
                                      strides=1,
                                      activation=tf.nn.relu)
                se = tf.layers.batch_normalization(se,
                                                   training=is_training,
                                                   scale=True)

            se = tf.layers.max_pooling2d(se, 2, 2, padding="same")

        return se

    @property
    def trainable_vars(self):
        return [var for var in tf.trainable_variables() if self.name in var.name]
  • 方式2
    tf.contrib.layers 是tf.layers的進一步封裝,如在tf.contrib.layers.conv2d中增加了batch_norm的參數(shù)。而tf.contrib 的framework實現(xiàn)了很多pythonic的操作,如arg_scope的上下文管理,可以對不同卷積層進行相同的參數(shù)設置(如激活類型,batch_norm等),使得代碼更加簡潔優(yōu)美。當然還有一個比較火的slim模塊,在此基礎上又增加了些新的特性,整體用法基本類似,此處不再贅述。但由于很多模塊不是tf原生支持,在即將發(fā)布的tensorflow2.0聲明中明確指出該模塊下的眾多模塊可能被移到其他模塊或被棄用,屆時此處代碼可能不再合適,在此聲明?;谠撃K的vgg16的實現(xiàn)如下:
# --------------------------Method 2 --------------------------------------------
import tensorflow.contrib.layers as tcl
from tensorflow.contrib.framework import arg_scope
class VGG2:
    """
    define with tf.contrib.layers
    """
    def __init__(self, resolution_inp=224, channel=3, name='vgg'):
        self.name = name
        self.channel = channel
        self.resolution_inp = resolution_inp

    def __call__(self, x, dropout=0.5, is_training=True):
        with tf.variable_scope(self.name) as scope:
            with arg_scope([tcl.batch_norm], is_training=is_training, scale=True):
                with arg_scope([tcl.conv2d],
                               padding="SAME",
                               normalizer_fn=tcl.batch_norm,
                               activation_fn=tf.nn.relu, ):
                    size = 64
                    se = self.vgg_block(x, 2, size, is_training=is_training)
                    se = self.vgg_block(se, 2, size * 2, is_training=is_training)
                    se = self.vgg_block(se, 3, size * 4, is_training=is_training)
                    se = self.vgg_block(se, 3, size * 8, is_training=is_training)
                    se = self.vgg_block(se, 3, size * 8, is_training=is_training)

                    flatten = tcl.flatten(se) 
                    fc6 = tf.layers.dense(flatten, 4096)
                    fc6_drop = tcl.dropout(fc6, dropout, is_training=is_training)
                    print("dropout ", fc6, fc6_drop)

                    fc7 = tf.layers.dense(fc6_drop, 4096)
                    fc7_drop = tcl.dropout(fc7, dropout, is_training=is_training)
                    self.fc_out = tf.layers.dense(fc7_drop, 1000)

                    # predict for classify
                    softmax = tf.nn.softmax(self.fc_out)
                    self.predictions = tf.argmax(softmax, 1)
                    return self.predictions

    def vgg_block(self, x, num_convs, num_channels, scope=None, is_training=True):
        """
        define the basic repeat unit in vgg: n x (conv-relu-batchnorm)-maxpool
        :param x: tensor or numpy.array, input
        :param num_convs: int, number of conv-relu-batchnorm 
        :param num_channels: int, number of conv filters
        :param scope: name space or scope
        :param is_training: bool, is training or not
        :return: 
        """
        with tf.variable_scope(scope, "conv"):
            se = x
            for i in range(num_convs):
                se = tcl.conv2d(se, num_outputs=num_channels, kernel_size=3, stride=1)
            se = tf.layers.max_pooling2d(se, 2, 2, padding="same")

        print("layer ", self.name, "in ", x, "out ", se)

        return se

    @property
    def trainable_vars(self):
        return [var for var in tf.trainable_variables() if self.name in var.name]

    @property
    def vars(self):
        return [var for var in tf.global_variables() if self.name in var.name]```

運行

該部分代碼包含2部分:計時函數(shù)time_tensorflow_run接受一個tf.Session變量和待計算的tensor以及相應的參數(shù)字典和打印信息, 統(tǒng)計執(zhí)行該tensor100次所需要的時間(平均值和方差);主函數(shù) run_benchmark中初始化了vgg16的3種調(diào)用方式,分別統(tǒng)計3中網(wǎng)絡在推理(predict) 和梯度計算(后向傳遞)的時間消耗,詳細代碼如下:

# -------------------------- Demo and Test -------------------------------------------
from datetime import datetime
import tensorflow as tf
import math
import time
batch_size = 16
num_batches = 100
def time_tensorflow_run(session, target, feed, info_string):
    """
    calculate time for each session run
    :param session: tf.Session
    :param target: opterator or tensor need to run with session
    :param feed: feed dict for session
    :param info_string: info message for print
    :return: 
    """
    num_steps_burn_in = 10  # 預熱輪數(shù)
    total_duration = 0.0  # 總時間
    total_duration_squared = 0.0  # 總時間的平方和用以計算方差
    for i in range(num_batches + num_steps_burn_in):
        start_time = time.time()
        _ = session.run(target, feed_dict=feed)
        duration = time.time() - start_time

        if i >= num_steps_burn_in:  # 只考慮預熱輪數(shù)之后的時間
            if not i % 10:
                print('[%s] step %d, duration = %.3f' % (datetime.now(), i - num_steps_burn_in, duration))
            total_duration += duration
            total_duration_squared += duration * duration

    mn = total_duration / num_batches  # 平均每個batch的時間
    vr = total_duration_squared / num_batches - mn * mn  # 方差
    sd = math.sqrt(vr)  # 標準差
    print('[%s] %s across %d steps, %.3f +/- %.3f sec/batch' % (datetime.now(), info_string, num_batches, mn, sd))


# test demo
def run_benchmark():
    """
    main function for test or demo
    :return: 
    """
    with tf.Graph().as_default():
        image_size = 224  # 輸入圖像尺寸
        images = tf.Variable(tf.random_normal([batch_size, image_size, image_size, 3], dtype=tf.float32, stddev=1e-1))
        keep_prob = tf.placeholder(tf.float32)

        # method 0
        # prediction, softmax, fc8, p = vgg16_op(images, keep_prob)

        # method 1 and method 2
        # vgg16 = VGG1(resolution_inp=image_size, name="vgg16")
        vgg16 = VGG2(resolution_inp=image_size, name="vgg16")
        prediction = vgg16(images, 0.5, True)
        fc8 = vgg16.fc_out
        p = vgg16.trainable_vars

        for v in p:
            print(v)
        init = tf.global_variables_initializer()

        # for var in tf.global_variables():
        #     print("param ", var.name)
        sess = tf.Session()
        print("init...")
        sess.run(init)

        print("predict..")
        writer = tf.summary.FileWriter("./logs")
        writer.add_graph(sess.graph)
        time_tensorflow_run(sess, prediction, {keep_prob: 1.0}, "Forward")

        # 用以模擬訓練的過程
        objective = tf.nn.l2_loss(fc8)  # 給一個loss
        grad = tf.gradients(objective, p)  # 相對于loss的 所有模型參數(shù)的梯度

        print('grad backword')
        time_tensorflow_run(sess, grad, {keep_prob: 0.5}, "Forward-backward")
        writer.close()

if __name__ == '__main__':
    run_benchmark()

注: 完整代碼可參見個人github工程

參數(shù)量

image.png

總共參數(shù)數(shù)量大約138M左右
全連接層參數(shù)量:

時間效率

參考

項目主頁
https://blog.csdn.net/wcy12341189/article/details/56281618
https://blog.csdn.net/App_12062011/article/details/60962978
https://blog.csdn.net/zhangwei15hh/article/details/78417789
感受野
感受野計算

最后編輯于
?著作權歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容