TensorFlow 從零開(kāi)始實(shí)現(xiàn)深度卷積生成對(duì)抗網(wǎng)絡(luò)(DCGAN)

????????本文將使用 TensorFlow 實(shí)現(xiàn)深度卷積生成對(duì)抗網(wǎng)絡(luò)(DCGAN),并用其訓(xùn)練生成一些小姐姐的圖像。其中,訓(xùn)練圖像來(lái)源為:用DCGAN生成女朋友,圖像全部由小姐姐的頭像組成,大概如下:

圖1 用于訓(xùn)練 DCGAN 的小姐姐頭像

????????生成對(duì)抗網(wǎng)絡(luò)是近幾年深度學(xué)習(xí)中一個(gè)比較熱門(mén)的研究方向,不斷的提出了各種各樣的變體,包括 GAN、DCGAN、InfoGAN、WGAN、CycleGAN 等。這篇文章在參考 GANDCGAN 這兩篇論文,以及 TensorFlow GAN 部分源代碼的基礎(chǔ)上,簡(jiǎn)單的實(shí)現(xiàn)了 DCGAN,并做了相當(dāng)多的實(shí)驗(yàn),生成了一些比較逼真的圖像。

????????其實(shí),在 GitHub 上已經(jīng)有 DCGAN 的很多項(xiàng)目,星星比較的多的是 DCGAN-tensorflow,但我粗略閱讀了他的代碼后,覺(jué)得可讀性不太好,因此還是覺(jué)得應(yīng)該自己從頭實(shí)現(xiàn)一遍,加深對(duì)對(duì)抗網(wǎng)絡(luò)的理解。深度卷積生成對(duì)抗網(wǎng)絡(luò)的網(wǎng)絡(luò)結(jié)構(gòu)比較簡(jiǎn)單,很容易實(shí)現(xiàn),真正困難的是調(diào)參,參數(shù)稍微調(diào)整不好,就很容易使訓(xùn)練奔潰,生成的圖像完全是噪聲圖像。

一、DCGAN 網(wǎng)絡(luò)的定義

????????根據(jù)生成對(duì)抗網(wǎng)絡(luò)(GAN)的發(fā)明者 Goodfellow 的說(shuō)法,生成對(duì)抗網(wǎng)絡(luò)由生成器(generator)G 和判別器(discriminator)D 兩部分組成,其中生成器像假幣制造者,企圖制造出以假亂真的錢(qián)幣,而判別器則像驗(yàn)鈔機(jī),能識(shí)別出哪些是真幣哪些是假幣。這種造假、打假的矛盾就產(chǎn)生了對(duì)抗,當(dāng)生成器和判別器的能力都充分強(qiáng)大時(shí),對(duì)抗的結(jié)局是趨于平衡,即生成器生成的樣本判別器已經(jīng)無(wú)法區(qū)分真?zhèn)?,判別任何一個(gè)樣本為真的概率都是 0.5。這當(dāng)然是理想情況,實(shí)際對(duì)抗時(shí),很難達(dá)到這樣的平衡,只能達(dá)到一種比較脆弱的、動(dòng)態(tài)的平衡,即生成器能夠生成一些足夠逼真的樣本,而判別器也已很難鑒別樣本的真假,但只要參數(shù)的變化幅度稍微較大時(shí),就可能打破這個(gè)平衡,使得生成器瞬間脆敗,生成的樣本噪聲越來(lái)越大,而判別器則不斷占上風(fēng),能夠輕而易舉的識(shí)別真假,從而使得識(shí)別損失快速下降到 0。但既然是對(duì)抗,生成器就有可能觸底反彈,再次東山再起,重新掀起一陣造假風(fēng)波,使得判別器又陷入難辨真?zhèn)蔚木狡染车亍R话銇?lái)說(shuō),生成對(duì)抗網(wǎng)絡(luò)的訓(xùn)練過(guò)程就是達(dá)到平衡、平衡被破壞的、又達(dá)到平衡、又被破壞的循環(huán)過(guò)程,因此它的損失曲線是一條像過(guò)山車似的波浪線。

????????一般我們接觸得比較多的深度學(xué)習(xí)模型大致有兩類,一類是判別模型,一類是生成模型。判別模型的訓(xùn)練數(shù)據(jù)帶有標(biāo)簽,比如分類,給定了一個(gè)樣本之后需要確定它的歸屬;而生成模型則是需要根據(jù)訓(xùn)練數(shù)據(jù)來(lái)生成樣本,或者確定訓(xùn)練數(shù)據(jù)的分布。通常,生成模型的問(wèn)題更難,因?yàn)榉植嫉臍w一化系數(shù),即配分函數(shù),很難處理。

????????GAN 的作者創(chuàng)造性的將判別模型和生成模型結(jié)合在一起,極大的簡(jiǎn)化了生成模型的求解過(guò)程,不過(guò),缺點(diǎn)是訓(xùn)練不穩(wěn)定。以下,以生成具有某種特性的圖像為例,比如以生成小姐姐的頭像為例,來(lái)簡(jiǎn)單的闡述深度卷積生成對(duì)抗網(wǎng)絡(luò)(DCGAN)的原理和實(shí)現(xiàn)過(guò)程。

????????假如我們現(xiàn)在有很多小姐姐的頭像,我們的目標(biāo)是要設(shè)計(jì)一個(gè)網(wǎng)絡(luò),讓它可以生成很逼真的小姐姐的圖像。一個(gè)很自然的問(wèn)題是:網(wǎng)絡(luò)的輸入是什么?Goodfellow 的想法很簡(jiǎn)單,輸入是一個(gè)隨機(jī)分布(比如正態(tài)分布、均勻分布等)的樣本,一般是從這個(gè)分布中隨機(jī)采樣一個(gè)固定長(zhǎng)度的向量,比如長(zhǎng)度為 100 或 64 等。對(duì)于我們生成小姐姐頭像的目標(biāo),我們需要從這個(gè)一維向量構(gòu)造出一個(gè)具有 3 個(gè)顏色通道的 3 維圖像。這需要借助一種稱為轉(zhuǎn)置卷積(transpose convolution 或 deconvolution)的技術(shù)?;叵胍幌戮矸e網(wǎng)絡(luò)的整個(gè)結(jié)構(gòu):從一幅 3 個(gè)顏色通道的圖像開(kāi)始,經(jīng)過(guò)卷積、池化等作用之后,得到一個(gè)一維的最終輸出。這顯然可以看成是從一個(gè)一維向量生成 3 維圖像過(guò)程的逆過(guò)程,因此也把轉(zhuǎn)置卷積稱為反卷積解卷積。如下圖:

圖2 DCGAN 的生成器網(wǎng)絡(luò)結(jié)構(gòu)

設(shè)隨機(jī)采樣的樣本為 [x1, ..., xn](n=64 或 100 等),為了輸入到一個(gè)(轉(zhuǎn)置)卷積網(wǎng)絡(luò),將樣本數(shù)據(jù)擴(kuò)充為一個(gè)形狀為 1 x 1 x 1 x n 的四維張量,經(jīng)過(guò)第一個(gè)(轉(zhuǎn)置)卷積層(卷積核大小 kernel_size = 4,步幅 stride = 2,填充方式 padding = 'VALID')之后,得到形狀為 1 x 4 x 4 x 1024 的張量(跟卷積的運(yùn)算相反,空間大小變大),再經(jīng)過(guò)第二個(gè)(轉(zhuǎn)置)卷積層(卷積核大小 kernel_size = 4,步幅 stride = 2,填充方式 padding = 'SAME')之后,形狀變?yōu)?1 x 8 x 8 x 512,...,到第 6 個(gè)轉(zhuǎn)置卷積層(卷積核大小 kernel_size = 4,步幅 stride = 2,填充方式 padding = 'SAME')之后,得到形狀大小為 1 x 64 x 64 x 64 的張量,此時(shí),為了得到一張 3 通道的圖像,只需要再作用一個(gè)卷積層(卷積核大小 kernel_size = 1,步幅 stride = 1,特征映射個(gè)數(shù) num_outputs = 3,填充方式 padding = 'SAME')即可,這樣做了之后,輸出張量的形狀大小為 1 x 64 x 64 x 3,壓縮第 0 個(gè)索引維度之后就得到一張分辨率為 64 x 64 的彩色圖像。

????????以上即是生成器的網(wǎng)絡(luò)結(jié)構(gòu)。經(jīng)過(guò)這個(gè)網(wǎng)絡(luò)作用之后,可以把隨機(jī)采樣的一維向量轉(zhuǎn)化成一張圖像,不過(guò)不可忽略的是,這張圖像也是隨機(jī)的,因此可能全是噪聲。為了讓這些生成的圖像具有小姐姐的人臉特征,需要加入一些監(jiān)督信息來(lái)對(duì)生成器的參數(shù)進(jìn)行訓(xùn)練。這部分的工作就由判別器來(lái)承擔(dān)。判別器的網(wǎng)絡(luò)結(jié)構(gòu)基本上就是上述生成器的網(wǎng)絡(luò)結(jié)構(gòu)的逆結(jié)構(gòu)(即幾乎是上圖從右往左看的結(jié)果),只不過(guò)最后的輸出是一個(gè)長(zhǎng)度為 2 的向量,即判別器是一個(gè) 2 分類器,用來(lái)識(shí)別一張圖像是訓(xùn)練的真圖像還是生成的假圖像。因?yàn)榕袆e器是深度網(wǎng)絡(luò),具有很強(qiáng)的擬合能力,因此很容易提取出訓(xùn)練數(shù)據(jù)的人臉特征,相當(dāng)于提供了一種弱監(jiān)督的信息(即提取的人臉特征)。接下來(lái)的關(guān)鍵問(wèn)題是怎么充分的利用這種弱監(jiān)督信息。

????????前面提到過(guò),生產(chǎn)對(duì)抗網(wǎng)絡(luò)的對(duì)抗過(guò)程是:生成器盡量生成逼真的假樣本,使得判別器難辨真假,而判別器則盡量提升自己的判別能力,區(qū)分出生成器的假樣本。因此,對(duì)生成器來(lái)說(shuō),生成的樣本越接近訓(xùn)練數(shù)據(jù)越好。對(duì)于生成小姐姐圖像的這個(gè)任務(wù)來(lái)說(shuō),生成器生成的樣本具有越強(qiáng)的女性人臉特征越好。而女性人臉特征可以由判別器提供,因此得到的弱監(jiān)督目標(biāo)為:判別器作用于生成器生成的圖像的結(jié)果,與判別器作用于真實(shí)訓(xùn)練圖像的結(jié)果相似。換句話說(shuō),對(duì)生成器來(lái)說(shuō),它應(yīng)該把自己生成的圖像當(dāng)成真實(shí)訓(xùn)練圖像來(lái)看。而對(duì)判別器來(lái)說(shuō),則要把生成器生成的圖像當(dāng)做假圖像來(lái)看,從而得到生成對(duì)抗網(wǎng)絡(luò)的損失函數(shù)為:

更容易理解的方式是:

給定一個(gè)隨機(jī)采樣的向量 z,經(jīng)過(guò)生成器作用之后生成一張圖像 G(z),這張圖像 G(z) 送給判別器 D 識(shí)別之后輸出一個(gè) 2 分類概率 D(G(z)) = [p1, p2]。對(duì)于生成器 G 來(lái)說(shuō),它的目標(biāo)是生成和真實(shí)訓(xùn)練樣本 x 相差無(wú)幾的圖像,因此它要把所有生成的圖像都看成是真實(shí)圖像,因此生成器的損失是:
generator_loss = sigmoid_cross_entropy(logits=[p1, p2], labels=[1])
而對(duì)于判別器 D 來(lái)說(shuō),它希望識(shí)別能力越強(qiáng)越好,因此要認(rèn)為這是一張假圖像,從而得到判別器在生成圖像上的損失:
discriminator_loss_on_generated = sigmoid_cross_entropy(logits=[p1, p2], labels=[0])
另一方面,為了利用真實(shí)圖像的(弱監(jiān)督)信息,判別器在所有真實(shí)訓(xùn)練樣本 x 上的損失為:
discriminator_loss_on_real = sigmoid_cross_entropy(logits=[q1, q2], labels=[1])
其中 [q1, q2] = D(x) 是判別器作用于真實(shí)訓(xùn)練圖像 x 后輸出的識(shí)別概率。

????????到此,整個(gè)生成對(duì)抗網(wǎng)絡(luò)的最重要的兩部分(分別是:生成器、判別器的網(wǎng)絡(luò)結(jié)構(gòu)和它們對(duì)應(yīng)的損失)內(nèi)容就講述完了。一般的,在實(shí)際實(shí)現(xiàn)時(shí),上述的損失會(huì)進(jìn)行一些平滑處理(見(jiàn)后面源代碼,或論文 Improved Techniques for Training GANs),除此之外,在優(yōu)化判別器時(shí)使用兩部分損失之和:

discriminator_loss = discriminator_loss_on_generated + discriminator_loss_on_real

這樣,我們總共得到了 4 個(gè)損失,其中用于反向傳播優(yōu)化網(wǎng)絡(luò)參數(shù)的損失是:generator_lossdiscriminator_loss。將以上思想用 TensorFlow 實(shí)現(xiàn),即得到 DCGAN 的模型(命名為 model.py):

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sat May 26 20:03:48 2018

@author: shirhe-lyh
"""

"""Implementation of DCGAN.

This work was first described in:
    Unsupervised representation learning with deep convolutional generative 
    adversarial networks, Alec Radford et al., arXiv: 1511.06434v2
   
This module Based on:
    TensorFlow models/research/slim/nets/dcgan.py
    TensorFlow tensorflow/contrib/gan
"""

import math
import tensorflow as tf

slim = tf.contrib.slim


class DCGAN(object):
    """Implementation of DCGAN."""
    
    def __init__(self, 
                 is_training,
                 generator_depth=64,
                 discriminator_depth=64,
                 final_size=32,
                 num_outputs=3,
                 fused_batch_norm=False):
        """Constructor.
        
        Args:
            is_training: Whether the the network is for training or not.
            generator_depth: Number of channels in last deconvolution layer of
                the generator network.
            discriminator_depth: Number of channels in first convolution layer
                of the discirminator network.
            final_size: The shape of the final output.
            num_outputs: Nuber of output features. For images, this is the
                number of channels.
            fused_batch_norm: If 'True', use a faster, fused implementation
                of batch normalization.
        """
        self._is_training = is_training
        self._generator_depth = generator_depth
        self._discirminator_depth = discriminator_depth
        self._final_size = final_size
        self._num_outputs = num_outputs
        self._fused_batch_norm = fused_batch_norm
        
    def _validate_image_inputs(self, inputs):
        """Check the inputs whether is valid or not.
        
        Copy from:
            https://github.com/tensorflow/models/blob/master/research/
            slim/nets/dcgan.py
            
        Args:
            inputs: A float32 tensor with shape [batch_size, height, width, 
                channels].
            
        Raises:
            ValueError: If the input image shape is not 4-dimensional, if the 
                spatial dimensions aren't defined at graph construction time, 
                if the spatial dimensions aren't square, or if the spatial 
                dimensions aren't a power of two.
        """
        inputs.get_shape().assert_has_rank(4)
        inputs.get_shape()[1:3].assert_is_fully_defined()
        if inputs.get_shape()[1] != inputs.get_shape()[2]:
            raise ValueError('Input tensor does not have equal width and '
                             'height: ', inputs.get_shape()[1:3])
        width = inputs.get_shape().as_list()[2]
        if math.log(width, 2) != int(math.log(width, 2)):
            raise ValueError("Input tensor 'width' is not a power of 2: ",
                             width)
            
    def discriminator(self, 
                      inputs,
                      depth=64,
                      is_training=True,
                      reuse=None,
                      scope='Discriminator',
                      fused_batch_norm=False):
        """Discriminator network for DCGAN.
        
        Construct discriminator network from inputs to the final endpoint.
        
        Copy from:
            https://github.com/tensorflow/models/blob/master/research/
            slim/nets/dcgan.py
        
        Args:
            inputs: A float32 tensor with shape [batch_size, height, width, 
                channels].
            depth: Number of channels in first convolution layer.
            is_training: Whether the network is for training or not.
            reuse: Whether or not the network variables should be reused.
                'scope' must be given to be reused.
            scope: Optional variable_scope. Default value is 'Discriminator'.
            fused_batch_norm: If 'True', use a faster, fused implementation
                of batch normalization.
                
        Returns:
            logits: The pre-softmax activations, a float32 tensor with shape
                [batch_size, 1].
            end_points: A dictionary from components of the network to their
                activation.
                
        Raises:
            ValueError: If the input image shape is not 4-dimensional, if the 
                spatial dimensions aren't defined at graph construction time, 
                if the spatial dimensions aren't square, or if the spatial 
                dimensions aren't a power of two.
        """
        normalizer_fn = slim.batch_norm
        normalizer_fn_args = {
            'is_training': is_training,
            'zero_debias_moving_mean': True,
            'fused': fused_batch_norm}
        
        self._validate_image_inputs(inputs)
        height = inputs.get_shape().as_list()[1]
        
        end_points = {}
        with tf.variable_scope(scope, values=[inputs], reuse=reuse) as scope:
            with slim.arg_scope([normalizer_fn], **normalizer_fn_args):
                with slim.arg_scope([slim.conv2d], stride=2, kernel_size=4,
                                    activation_fn=tf.nn.leaky_relu):
                    net = inputs
                    for i in range(int(math.log(height, 2))):
                        scope = 'conv%i' % (i+1)
                        current_depth = depth * 2**i
                        normalizer_fn_ = None if i == 0 else normalizer_fn
                        net = slim.conv2d(net, num_outputs=current_depth, 
                                          normalizer_fn=normalizer_fn_,
                                          scope=scope)
                        end_points[scope] = net
                    
                    logits = slim.conv2d(net, 1, kernel_size=1, stride=1,
                                         padding='VALID', normalizer_fn=None,
                                         activation_fn=None)
                    logits = tf.reshape(logits, [-1, 1])
                    end_points['logits'] = logits
                    
                    return logits, end_points
                
    def generator(self,
                  inputs,
                  depth=64,
                  final_size=32,
                  num_outputs=3,
                  is_training=True,
                  reuse=None,
                  scope='Generator',
                  fused_batch_norm=False):
        """Generator network for DCGAN.
        
        Construct generator network from inputs to the final endpoint.
        
        Copy from:
            https://github.com/tensorflow/models/blob/master/research/
            slim/nets/dcgan.py
        
        Args:
            inputs: A float32 tensor with shape [batch_size, N] for any size N.
            depth: Number of channels in last deconvolution layer.
            final_size: The shape of the final output.
            num_outputs: Nuber of output features. For images, this is the
                number of channels.
            is_training: Whether is training or not.
            reuse: Whether or not the network has its variables should be 
                reused. 'scope' must be given to be reused.
            scope: Optional variable_scope. Default value is 'Generator'.
            fused_batch_norm: If 'True', use a faster, fused implementation
                of batch normalization.
                
        Returns:
            logits: The pre-sortmax activations, a float32 tensor with shape
                [batch_size, final_size, final_size, num_outputs].
            end_points: A dictionary from components of the network to their
                activation.
            
        Raises:
            ValueError: If 'inputs' is not 2-dimensional, or if 'final_size'
                is not a power of 2 or is less than 8.
        """
        normalizer_fn = slim.batch_norm
        normalizer_fn_args = {
            'is_training': is_training,
            'zero_debias_moving_mean': True,
            'fused': fused_batch_norm}
        
        inputs.get_shape().assert_has_rank(2)
        if math.log(final_size, 2) != int(math.log(final_size, 2)):
            raise ValueError("'final_size' (%i) must be a power of 2."
                             % final_size)
        if final_size < 8:
            raise ValueError("'final_size' (%i) must be greater than 8."
                             % final_size)
            
        end_points = {}
        num_layers = int(math.log(final_size, 2)) - 1
        with tf.variable_scope(scope, values=[inputs], reuse=reuse) as scope:
            with slim.arg_scope([normalizer_fn], **normalizer_fn_args):
                with slim.arg_scope([slim.conv2d_transpose],
                                    normalizer_fn=normalizer_fn,
                                    stride=2, kernel_size=4):
                    net = tf.expand_dims(tf.expand_dims(inputs, 1), 1)
                    
                    # First upscaling is different because it takes the input
                    # vector.
                    current_depth = depth * 2 ** (num_layers - 1)
                    scope = 'deconv1'
                    net = slim.conv2d_transpose(net, current_depth, stride=1, 
                                                padding='VALID', scope=scope)
                    end_points[scope] = net
                    
                    for i in range(2, num_layers):
                        scope = 'deconv%i' % i
                        current_depth = depth * 2 * (num_layers - i)
                        net = slim.conv2d_transpose(net, current_depth, 
                                                    scope=scope)
                        end_points[scope] = net
                        
                    # Last layer has different normalizer and activation.
                    scope = 'deconv%i' % num_layers
                    net = slim.conv2d_transpose(net, depth, normalizer_fn=None,
                                                activation_fn=None, scope=scope)
                    end_points[scope] = net
                    
                    # Convert to proper channels
                    scope = 'logits'
                    logits = slim.conv2d(
                        net,
                        num_outputs,
                        normalizer_fn=None,
                        activation_fn=tf.nn.tanh,
                        kernel_size=1,
                        stride=1,
                        padding='VALID',
                        scope=scope)
                    end_points[scope] = logits
                    
                    logits.get_shape().assert_has_rank(4)
                    logits.get_shape().assert_is_compatible_with(
                        [None, final_size, final_size, num_outputs])
                    
                    return logits, end_points
                
    def dcgan_model(self, 
                      real_data, 
                      generator_inputs,
                      generator_scope='Generator',
                      discirminator_scope='Discriminator',
                      check_shapes=True):
        """Returns DCGAN model outputs and variables.
        
        Modified from:
            https://github.com/tensorflow/tensorflow/blob/master/tensorflow/
            contrib/gan/python/train.py
            
        Args:
            real_data: A float32 tensor with shape [batch_size, height, width, 
                channels].
            generator_inputs: A float32 tensor with shape [batch_size, N] for 
                any size N.
            generator_scope: Optional genertor variable scope. Useful if you
                want to reuse a subgraph that has already been created.
            discriminator_scope: Optional discriminator variable scope. Useful
                if you want to reuse a subgraph that has already been created.
            check_shapes: If 'True', check that generator produces Tensors
                that are the same shape as real data. Otherwise, skip this
                check.
                
        Returns:
            A dictionary containing output tensors.
            
        Raises:
            ValueError: If the generator outputs a tensor that isn't the same
                shape as 'real_data'.
        """
        # Create models
        with tf.variable_scope(generator_scope) as gen_scope:
            generated_data, _ = self.generator(
                generator_inputs, self._generator_depth, self._final_size,
                self._num_outputs, self._is_training)
        with tf.variable_scope(discirminator_scope) as dis_scope:
            discriminator_gen_outputs, _ = self.discriminator(
                generated_data, self._discirminator_depth, self._is_training)
        with tf.variable_scope(dis_scope, reuse=True):
            discriminator_real_outputs, _ = self.discriminator(
                real_data, self._discirminator_depth, self._is_training)
        
        if check_shapes:
            if not generated_data.shape.is_compatible_with(real_data.shape):
                raise ValueError('Generator output shape (%s) must be the '
                                 'shape as real data (%s).'
                                 % (generated_data.shape, real_data.shape))
                
        # Get model-specific variables
        generator_variables = slim.get_trainable_variables(gen_scope)
        discriminator_variables = slim.get_trainable_variables(dis_scope)
        
        return {'generated_data': generated_data,
                'discriminator_gen_outputs': discriminator_gen_outputs,
                'discriminator_real_outputs': discriminator_real_outputs,
                'generator_variables': generator_variables,
                'discriminator_variables': discriminator_variables}
        
    def predict(self, generator_inputs):
        """Return the generated results by generator network.
        
        Args:
            generator_inputs: A float32 tensor with shape [batch_size, N] for 
                any size N.
                
        Returns:
            logits: The pre-sortmax activations, a float32 tensor with shape
                [batch_size, final_size, final_size, num_outputs].
        """
        logits, _ = self.generator(generator_inputs, self._generator_depth,
                                   self._final_size, self._num_outputs,
                                   is_training=False)
        return logits
        
    def discriminator_loss(self, 
                           discriminator_real_outputs,
                           discriminator_gen_outputs,
                           label_smoothing=0.25):
        """Original minmax discriminator loss for GANs, with label smoothing.
        
        Modified from:
            https://github.com/tensorflow/tensorflow/blob/master/tensorflow/
            contrib/gan/python/losses/python/losses_impl.py
        
        Args:
            discriminator_real_outputs: Discriminator output on real data.
            discriminator_gen_outputs: Discriminator output on generated data.
                Expected to be in the range of (-inf, inf).
            label_smoothing: The amount of smoothing for positive labels. This
                technique is taken from `Improved Techniques for Training GANs`
                (https://arxiv.org/abs/1606.03498). `0.0` means no smoothing.
                
        Returns:
            loss_dict: A dictionary containing three scalar tensors.
        """
        # -log((1 - label_smoothing) - sigmoid(D(x)))
        losses_on_real = slim.losses.sigmoid_cross_entropy(
            logits=discriminator_real_outputs,
            multi_class_labels=tf.ones_like(discriminator_real_outputs),
            label_smoothing=label_smoothing)
        loss_on_real = tf.reduce_mean(losses_on_real)
        # -log(- sigmoid(D(G(x))))
        losses_on_generated = slim.losses.sigmoid_cross_entropy(
            logits=discriminator_gen_outputs,
            multi_class_labels=tf.zeros_like(discriminator_gen_outputs))
        loss_on_generated = tf.reduce_mean(losses_on_generated)
        
        loss = loss_on_real + loss_on_generated
        return {'dis_loss': loss,
                'dis_loss_on_real': loss_on_real,
                'dis_loss_on_generated': loss_on_generated}
        
    def generator_loss(self, discriminator_gen_outputs, label_smoothing=0.0):
        """Modified generator loss for DCGAN.
        
        Modified from:
            https://github.com/tensorflow/tensorflow/blob/master/tensorflow/
            contrib/gan/python/losses/python/losses_impl.py
        
        Args:
            discriminator_gen_outputs: Discriminator output on generated data.
                Expected to be in the range of (-inf, inf).
                
        Returns:
            loss: A scalar tensor.
        """
        losses = slim.losses.sigmoid_cross_entropy(
            logits=discriminator_gen_outputs, 
            multi_class_labels=tf.ones_like(discriminator_gen_outputs),
            label_smoothing=label_smoothing)
        loss = tf.reduce_mean(losses)
        return loss
    
    def loss(self, discriminator_real_outputs, discriminator_gen_outputs):
        """Computes the loss of DCGAN.
        
        Args:
            discriminator_real_outputs: Discriminator output on real data.
            discriminator_gen_outputs: Discriminator output on generated data.
                Expected to be in the range of (-inf, inf).
                
        Returns:
            A dictionary contraining 4 scalar tensors.
        """
        dis_loss_dict = self.discriminator_loss(discriminator_real_outputs,
                                                discriminator_gen_outputs)
        gen_loss = self.generator_loss(discriminator_gen_outputs)
        dis_loss_dict.update({'gen_loss': gen_loss})
        return dis_loss_dict

二、訓(xùn)練并生成圖像

????????深度卷積生成對(duì)抗網(wǎng)絡(luò) DCGAN 論文的作者總結(jié)了他們?nèi)〉脤⑸蓪?duì)抗網(wǎng)絡(luò)用于無(wú)監(jiān)督、穩(wěn)定的生成圖像成功的一些技術(shù):

上面的代碼(model.py) 基本上忠實(shí)的采用了這些技術(shù)。一些細(xì)微的差別為:

  1. 從隨機(jī)分布中采樣出的向量長(zhǎng)度為 64,而不是論文中的 100;
  2. 用于訓(xùn)練的真實(shí)圖像的分辨率只能是 n x n,其中 n 必須是 2 的冪;
  3. 生成圖像的分辨率也只能是 m x m,其中 m 必須是 2 的冪;
  4. 定義損失時(shí),使用了平滑的技術(shù) Improved Techniques for Training GANs。

????????這一節(jié)關(guān)注訓(xùn)練 DCGAN 的問(wèn)題。首先,將訓(xùn)練文件(命名為 train.py)的代碼列出如下:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun May 27 16:55:12 2018

@author: shirhe-lyh
"""

"""Train a DCGAN to generating fake images.

Example Usage:
---------------
python3 train.py \
    --images_dir: Path to real images directory.
    --images_pattern: The pattern of input images.
    --generated_images_save_dir: Path to directory where to write gen images.
    --logdir: Path to log directory.
    --num_steps: Number of steps.
"""

import cv2
import glob
import numpy as np
import os
import tensorflow as tf

import model

flags = tf.flags

flags.DEFINE_string('images_dir', None, 'Path to real images directory.')
flags.DEFINE_string('images_pattern', '*.jpg', 'The pattern of input images.')
flags.DEFINE_string('generated_images_save_dir', None, 'Path to directory '
                    'where to write generated images.')
flags.DEFINE_string('logdir', './training', 'Path to log directory.')
flags.DEFINE_integer('num_steps', 20000, 'Number of steps.')

FLAGS = flags.FLAGS


def get_next_batch(batch_size=64):
    """Get a batch set of real images and random generated inputs."""
    if not os.path.exists(FLAGS.images_dir):
        raise ValueError('images_dir is not exist.')
       
    images_path = os.path.join(FLAGS.images_dir, FLAGS.images_pattern)
    image_files_list = glob.glob(images_path)
    image_files_arr = np.array(image_files_list)
    selected_indices = np.random.choice(len(image_files_list), batch_size)
    selected_image_files = image_files_arr[selected_indices]
    images = read_images(selected_image_files)
    
#    generated_inputs = np.random.normal(size=[batch_size, 64])
    generated_inputs = np.random.uniform(
        low=-1, high=1.0, size=[batch_size, 64])
    return images, generated_inputs
    
    
def read_images(image_files):
    """Read images by OpenCV."""
    images = []
    for image_path in image_files:
        image = cv2.imread(image_path)
        image = cv2.resize(image, (64, 64))
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        image = (image - 127.5) / 127.5
        images.append(image)
    return np.array(images)


def write_images(generated_images, images_save_dir, num_step):
    """Write images to a given directory."""
    #Scale images from [-1, 1] to [0, 255].
    generated_images = ((generated_images + 1) * 127.5).astype(np.uint8)
    for j, image in enumerate(generated_images):
        image_name = 'generated_step{}_{}.jpg'.format(num_step+1, j+1)
        image_path = os.path.join(FLAGS.generated_images_save_dir,
                                  image_name)
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        cv2.imwrite(image_path, image)


def main(_):
    # Define placeholder
    real_data = tf.placeholder(
        tf.float32, shape=[None, 64, 64, 3], name='real_data')
    generated_inputs = tf.placeholder(
        tf.float32, [None, 64], name='generated_inputs')
    
    # Create DCGAN model
    dcgan_model = model.DCGAN(is_training=True, final_size=64)
    outputs_dict = dcgan_model.dcgan_model(real_data, generated_inputs)
    generated_data = outputs_dict['generated_data']
    generated_data_ = tf.identity(generated_data, name='generated_data')
    discriminator_gen_outputs = outputs_dict['discriminator_gen_outputs']
    discriminator_real_outputs = outputs_dict['discriminator_real_outputs']
    generator_variables = outputs_dict['generator_variables']
    discriminator_variables = outputs_dict['discriminator_variables']
    loss_dict = dcgan_model.loss(discriminator_real_outputs,
                                 discriminator_gen_outputs)
    discriminator_loss = loss_dict['dis_loss']
    discriminator_loss_on_real = loss_dict['dis_loss_on_real']
    discriminator_loss_on_generated = loss_dict['dis_loss_on_generated']
    generator_loss = loss_dict['gen_loss']

    # Write loss values to logdir (tensorboard)
    tf.summary.scalar('discriminator_loss', discriminator_loss)
    tf.summary.scalar('discriminator_loss_on_real', discriminator_loss_on_real)
    tf.summary.scalar('discriminator_loss_on_generated',
                      discriminator_loss_on_generated)
    tf.summary.scalar('generator_loss', generator_loss)
    merged_summary = tf.summary.merge_all(key=tf.GraphKeys.SUMMARIES)
    
    # Create optimizer
    discriminator_optimizer = tf.train.AdamOptimizer(learning_rate=0.0004,  # 0.0005
                                                     beta1=0.5)
    discriminator_train_step = discriminator_optimizer.minimize(
        discriminator_loss, var_list=discriminator_variables)
    generator_optimizer = tf.train.AdamOptimizer(learning_rate=0.0001,
                                                 beta1=0.5)
    generator_train_step = generator_optimizer.minimize(
        generator_loss, var_list=generator_variables)
    
    saver = tf.train.Saver(var_list=tf.global_variables())
    
    init = tf.global_variables_initializer()
    
    with tf.Session() as sess:
        sess.run(init)
        
        # Write model graph to tensorboard
        if not FLAGS.logdir:
            raise ValueError('logdir is not specified.')
        if not os.path.exists(FLAGS.logdir):
            os.makedirs(FLAGS.logdir)
        writer = tf.summary.FileWriter(FLAGS.logdir, sess.graph)
        
        fixed_images, fixed_generated_inputs = get_next_batch()
        
        for i in range(FLAGS.num_steps):
            if (i+1) % 500 == 0:
                batch_images = fixed_images
                batch_generated_inputs = fixed_generated_inputs
            else:
                batch_images, batch_generated_inputs = get_next_batch()
            train_dict = {real_data: batch_images,
                          generated_inputs: batch_generated_inputs}
                
            # Update discriminator network
            sess.run(discriminator_train_step, feed_dict=train_dict)
            
            # Update generator network five times
            sess.run(generator_train_step, feed_dict=train_dict)
            sess.run(generator_train_step, feed_dict=train_dict)
            sess.run(generator_train_step, feed_dict=train_dict)
            sess.run(generator_train_step, feed_dict=train_dict)
            sess.run(generator_train_step, feed_dict=train_dict)
            
            summary, generated_images = sess.run(
                [merged_summary, generated_data], feed_dict=train_dict)
            
            # Write loss values to tensorboard
            writer.add_summary(summary, i+1)
            
            if (i+1) % 500 == 0:
                # Save model
                model_save_path = os.path.join(FLAGS.logdir, 'model.ckpt')
                saver.save(sess, save_path=model_save_path, global_step=i+1)
                
                # Save generated images
                if not FLAGS.generated_images_save_dir:
                    FLAGS.generated_images_save_dir = './generated_images'
                if not os.path.exists(FLAGS.generated_images_save_dir):
                    os.makedirs(FLAGS.generated_images_save_dir)
                write_images(
                    generated_images, FLAGS.generated_images_save_dir, i)
            
        writer.close()
        
        
if __name__ == '__main__':
    tf.app.run()

這個(gè)文件定義了 4 個(gè)函數(shù),從上到下分別是:用于隨機(jī)采樣一個(gè)批量訓(xùn)練數(shù)據(jù)的函數(shù) get_next_batch,用于從本地文件夾讀取訓(xùn)練圖像的函數(shù) read_images,用于將生成器生成的圖像保存到某一文件夾的函數(shù) write_images,以及訓(xùn)練整個(gè)深度卷積生成對(duì)抗網(wǎng)絡(luò)的主函數(shù) main。前 3 個(gè)內(nèi)容少而簡(jiǎn)單,直接略過(guò),我們只看 main 函數(shù)。主函數(shù)首先定義了兩個(gè)占位符,用于作為數(shù)據(jù)入口。接下來(lái),實(shí)例化一個(gè)類 DCGAN 的一個(gè)對(duì)象,然后作用于占位符上,得到模型輸出和 4 個(gè)損失,緊隨其后的 5 條語(yǔ)句 tf.summary 將損失寫(xiě)入到日志文件,其目的是可以使用 tensorboard 在瀏覽器中可視化的查看損失的變化情況。再然后是定義了兩個(gè)優(yōu)化器:discriminator_optimizer、generateor_optimizer,分別用于優(yōu)化判別器和生成器的損失。最后,在定義了模型保存對(duì)象 saver 和將模型的 graph 寫(xiě)入到日志文件之后,來(lái)到了訓(xùn)練過(guò)程(for 循環(huán)):

  1. 隨機(jī)從訓(xùn)練圖像中選擇一個(gè)批量的訓(xùn)練樣本;
  2. 每?jī)?yōu)化 1 次判別器都要相繼優(yōu)化 5 次生成器;
  3. 每訓(xùn)練 500 步保存一次生成的圖像和模型。

另外,為了能夠看清模型生成的圖像的演化過(guò)程,每訓(xùn)練 500 步都使用同樣的輸入數(shù)據(jù)。

????????關(guān)于生成對(duì)抗網(wǎng)絡(luò)訓(xùn)練的方法,GAN 講得比較清晰:

我們需要關(guān)注的一個(gè)重點(diǎn)是:判別器每訓(xùn)練 k 次,生成器訓(xùn)練 1 次。但按照我自己的理解(可能有誤),應(yīng)該是:生成器每訓(xùn)練 k 次,判別器訓(xùn)練 1 次。這是因?yàn)?,在?xùn)練的早期,生成器生成的樣本與訓(xùn)練的真實(shí)樣本差別很大,判別器能夠輕而易舉的識(shí)別出來(lái),因此損失 discriminator_loss_on_generated 會(huì)迅速的下降到 0,為了延緩這個(gè)損失的下降,以及為了讓生成器得到充分的訓(xùn)練盡快生成質(zhì)量較高的樣本,選擇連續(xù)優(yōu)化生成器 k 次。

????????回到我們生成小姐姐頭像的問(wèn)題,經(jīng)過(guò)多次實(shí)驗(yàn),最終選擇 k = 5,即每?jī)?yōu)化 5 次生成器才優(yōu)化 1 次判別器。這樣訓(xùn)練可以讓損失 discriminator_loss_on_generated 以及損失 generator_loss 有一段相當(dāng)長(zhǎng)的對(duì)抗平衡過(guò)程,從而能夠讓生成器能夠長(zhǎng)時(shí)間的得到優(yōu)化,進(jìn)而生成質(zhì)量較高的圖像。

????????在項(xiàng)目的當(dāng)前目錄的終端執(zhí)行:

python3 train.py --images_dir path/to/images/directory

此時(shí)會(huì)在當(dāng)前目錄下生成一個(gè)新的文件夾:training,這個(gè)文件夾用來(lái)保存訓(xùn)練過(guò)程中產(chǎn)生的數(shù)據(jù),如模型各種參數(shù)等。然后,再運(yùn)行 tensorboard:

tensorboard --logdir ./training

打開(kāi)終端返回的瀏覽器鏈接,你可以在 SCALARS 頁(yè)面下看到四條損失曲線,為了更深刻的理解生成對(duì)抗網(wǎng)絡(luò),建議你仔細(xì)的觀察這些損失曲線的變化過(guò)程,并思考怎樣調(diào)整參數(shù),讓網(wǎng)絡(luò)生成更逼真的圖像。

????????train.pymain 函數(shù)中的優(yōu)化器的參數(shù)是我試驗(yàn)了很多次之后確定的,雖然還不是很讓人滿意的參數(shù),但已經(jīng)可以生成一些比較好的圖像了,如訓(xùn)練 15500 次之后生成的圖像為(所有生成的圖像都保存在文件夾 generated_images):

圖3 訓(xùn)練 15500 次之后生成器生成的圖像

可以看到,生成的圖像整體質(zhì)量已經(jīng)比較好了。如果從中挑選出一些比較滿意的圖像的話,下面這些生成的小姐姐應(yīng)該可以以假亂真了:

圖4 生成器生成的質(zhì)量上佳的圖像

當(dāng)然,清晰度還需要繼續(xù)提高。

三、訓(xùn)練的一些細(xì)節(jié)

????????訓(xùn)練生成對(duì)抗網(wǎng)絡(luò)時(shí),需要調(diào)整的重點(diǎn)是:兩個(gè)優(yōu)化器的學(xué)習(xí)率和判別器每?jī)?yōu)化 1 次生成器優(yōu)化的次數(shù) k。為了學(xué)習(xí)率的確定更簡(jiǎn)單,可以使用自適應(yīng)學(xué)習(xí)率的優(yōu)化器 Adam,此時(shí),一般的初始學(xué)習(xí)率為 0.0001,調(diào)整時(shí),可以固定其中一個(gè),而重點(diǎn)去調(diào)整另外一個(gè)。調(diào)整過(guò)程中,需要確保損失 discriminator_loss_on_generated 不會(huì)一直下降,對(duì)應(yīng)的,即損失 generator_loss 不能一直上升,比較理想的情況是兩者都穩(wěn)定在某一數(shù)值附近波動(dòng)。一般的,如果訓(xùn)練 500 次之后,文件夾 generated_images 里生成的圖像都是糊的,說(shuō)明當(dāng)前學(xué)習(xí)率選得不好,要中斷訓(xùn)練過(guò)程重新調(diào)整學(xué)習(xí)率;而如果此時(shí)文件夾里的圖像已經(jīng)依稀有人臉特征,說(shuō)明可以繼續(xù)往下訓(xùn)練。以下是我某次訓(xùn)練時(shí)的損失曲線(所有參數(shù)跟 train.py 中的一樣):


圖5 訓(xùn)練 20000 次的損失曲線

????????根據(jù)上圖,損失 discriminator_loss_on_generatedgenerator_loss 在 5000 次訓(xùn)練之前處于平衡狀態(tài),此時(shí)生成的圖像越來(lái)越清晰。但 5500 次訓(xùn)練之后,損失 generator_loss 開(kāi)始迅速增大,生成的圖像全部變成噪聲圖像(見(jiàn)下圖),此后,在訓(xùn)練 8000 次之后,generator_loss 損失又急劇降到低水平,此時(shí)生成的質(zhì)量又開(kāi)始變好。到 16000 次之后,隨著損失 generator_loss 再次變大,生成的圖像再次變糊。對(duì)照以上過(guò)程,整個(gè)訓(xùn)練過(guò)程中生成的對(duì)應(yīng)圖像如下(因 16000 次之后的圖像全是糊的,故略去):

圖6 訓(xùn)練過(guò)程中生成的圖像,請(qǐng)對(duì)照損失曲線 generator_loss 觀看

????????最后,需要說(shuō)明的一點(diǎn)是,在選擇生成器輸入的隨機(jī)分布時(shí),如果使用正態(tài)分布(見(jiàn)函數(shù) get_next_batch 被注釋的一行):

generated_inputs = np.random.normal(size=[batch_size, 64])

則生成的圖像中會(huì)有很多是相似的,如第 17500 次訓(xùn)練時(shí)生成的 64 張圖像中:

圖7 使用標(biāo)準(zhǔn)正態(tài)分布作為生成器輸入時(shí)會(huì)生成的圖像

第 12、14、21、26、27、39、46、49 張圖像,及第 9、22、28、33、42、47、56、61 張圖像都非常相似(說(shuō)明標(biāo)準(zhǔn)正太分布生成的樣本本身很相似,適用于條件生成對(duì)抗網(wǎng)絡(luò))。而采用均勻分布:

generated_inputs = np.random.uniform(low=-1, high=1.0, size=[batch_size, 64])

則可以極大的緩解這個(gè)問(wèn)題,見(jiàn)圖 3。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容