????????本文將使用 TensorFlow 實(shí)現(xiàn)深度卷積生成對(duì)抗網(wǎng)絡(luò)(DCGAN),并用其訓(xùn)練生成一些小姐姐的圖像。其中,訓(xùn)練圖像來(lái)源為:用DCGAN生成女朋友,圖像全部由小姐姐的頭像組成,大概如下:
????????生成對(duì)抗網(wǎng)絡(luò)是近幾年深度學(xué)習(xí)中一個(gè)比較熱門(mén)的研究方向,不斷的提出了各種各樣的變體,包括 GAN、DCGAN、InfoGAN、WGAN、CycleGAN 等。這篇文章在參考 GAN 和 DCGAN 這兩篇論文,以及 TensorFlow GAN 部分源代碼的基礎(chǔ)上,簡(jiǎn)單的實(shí)現(xiàn)了 DCGAN,并做了相當(dāng)多的實(shí)驗(yàn),生成了一些比較逼真的圖像。
????????其實(shí),在 GitHub 上已經(jīng)有 DCGAN 的很多項(xiàng)目,星星比較的多的是 DCGAN-tensorflow,但我粗略閱讀了他的代碼后,覺(jué)得可讀性不太好,因此還是覺(jué)得應(yīng)該自己從頭實(shí)現(xiàn)一遍,加深對(duì)對(duì)抗網(wǎng)絡(luò)的理解。深度卷積生成對(duì)抗網(wǎng)絡(luò)的網(wǎng)絡(luò)結(jié)構(gòu)比較簡(jiǎn)單,很容易實(shí)現(xiàn),真正困難的是調(diào)參,參數(shù)稍微調(diào)整不好,就很容易使訓(xùn)練奔潰,生成的圖像完全是噪聲圖像。
一、DCGAN 網(wǎng)絡(luò)的定義
????????根據(jù)生成對(duì)抗網(wǎng)絡(luò)(GAN)的發(fā)明者 Goodfellow 的說(shuō)法,生成對(duì)抗網(wǎng)絡(luò)由生成器(generator)G 和判別器(discriminator)D 兩部分組成,其中生成器像假幣制造者,企圖制造出以假亂真的錢(qián)幣,而判別器則像驗(yàn)鈔機(jī),能識(shí)別出哪些是真幣哪些是假幣。這種造假、打假的矛盾就產(chǎn)生了對(duì)抗,當(dāng)生成器和判別器的能力都充分強(qiáng)大時(shí),對(duì)抗的結(jié)局是趨于平衡,即生成器生成的樣本判別器已經(jīng)無(wú)法區(qū)分真?zhèn)?,判別任何一個(gè)樣本為真的概率都是 0.5。這當(dāng)然是理想情況,實(shí)際對(duì)抗時(shí),很難達(dá)到這樣的平衡,只能達(dá)到一種比較脆弱的、動(dòng)態(tài)的平衡,即生成器能夠生成一些足夠逼真的樣本,而判別器也已很難鑒別樣本的真假,但只要參數(shù)的變化幅度稍微較大時(shí),就可能打破這個(gè)平衡,使得生成器瞬間脆敗,生成的樣本噪聲越來(lái)越大,而判別器則不斷占上風(fēng),能夠輕而易舉的識(shí)別真假,從而使得識(shí)別損失快速下降到 0。但既然是對(duì)抗,生成器就有可能觸底反彈,再次東山再起,重新掀起一陣造假風(fēng)波,使得判別器又陷入難辨真?zhèn)蔚木狡染车亍R话銇?lái)說(shuō),生成對(duì)抗網(wǎng)絡(luò)的訓(xùn)練過(guò)程就是達(dá)到平衡、平衡被破壞的、又達(dá)到平衡、又被破壞的循環(huán)過(guò)程,因此它的損失曲線是一條像過(guò)山車似的波浪線。
????????一般我們接觸得比較多的深度學(xué)習(xí)模型大致有兩類,一類是判別模型,一類是生成模型。判別模型的訓(xùn)練數(shù)據(jù)帶有標(biāo)簽,比如分類,給定了一個(gè)樣本之后需要確定它的歸屬;而生成模型則是需要根據(jù)訓(xùn)練數(shù)據(jù)來(lái)生成樣本,或者確定訓(xùn)練數(shù)據(jù)的分布。通常,生成模型的問(wèn)題更難,因?yàn)榉植嫉臍w一化系數(shù),即配分函數(shù),很難處理。
????????GAN 的作者創(chuàng)造性的將判別模型和生成模型結(jié)合在一起,極大的簡(jiǎn)化了生成模型的求解過(guò)程,不過(guò),缺點(diǎn)是訓(xùn)練不穩(wěn)定。以下,以生成具有某種特性的圖像為例,比如以生成小姐姐的頭像為例,來(lái)簡(jiǎn)單的闡述深度卷積生成對(duì)抗網(wǎng)絡(luò)(DCGAN)的原理和實(shí)現(xiàn)過(guò)程。
????????假如我們現(xiàn)在有很多小姐姐的頭像,我們的目標(biāo)是要設(shè)計(jì)一個(gè)網(wǎng)絡(luò),讓它可以生成很逼真的小姐姐的圖像。一個(gè)很自然的問(wèn)題是:網(wǎng)絡(luò)的輸入是什么?Goodfellow 的想法很簡(jiǎn)單,輸入是一個(gè)隨機(jī)分布(比如正態(tài)分布、均勻分布等)的樣本,一般是從這個(gè)分布中隨機(jī)采樣一個(gè)固定長(zhǎng)度的向量,比如長(zhǎng)度為 100 或 64 等。對(duì)于我們生成小姐姐頭像的目標(biāo),我們需要從這個(gè)一維向量構(gòu)造出一個(gè)具有 3 個(gè)顏色通道的 3 維圖像。這需要借助一種稱為轉(zhuǎn)置卷積(transpose convolution 或 deconvolution)的技術(shù)?;叵胍幌戮矸e網(wǎng)絡(luò)的整個(gè)結(jié)構(gòu):從一幅 3 個(gè)顏色通道的圖像開(kāi)始,經(jīng)過(guò)卷積、池化等作用之后,得到一個(gè)一維的最終輸出。這顯然可以看成是從一個(gè)一維向量生成 3 維圖像過(guò)程的逆過(guò)程,因此也把轉(zhuǎn)置卷積稱為反卷積或解卷積。如下圖:
設(shè)隨機(jī)采樣的樣本為 [x1, ..., xn](n=64 或 100 等),為了輸入到一個(gè)(轉(zhuǎn)置)卷積網(wǎng)絡(luò),將樣本數(shù)據(jù)擴(kuò)充為一個(gè)形狀為 1 x 1 x 1 x n 的四維張量,經(jīng)過(guò)第一個(gè)(轉(zhuǎn)置)卷積層(卷積核大小 kernel_size = 4,步幅 stride = 2,填充方式 padding = 'VALID')之后,得到形狀為 1 x 4 x 4 x 1024 的張量(跟卷積的運(yùn)算相反,空間大小變大),再經(jīng)過(guò)第二個(gè)(轉(zhuǎn)置)卷積層(卷積核大小 kernel_size = 4,步幅 stride = 2,填充方式 padding = 'SAME')之后,形狀變?yōu)?1 x 8 x 8 x 512,...,到第 6 個(gè)轉(zhuǎn)置卷積層(卷積核大小 kernel_size = 4,步幅 stride = 2,填充方式 padding = 'SAME')之后,得到形狀大小為 1 x 64 x 64 x 64 的張量,此時(shí),為了得到一張 3 通道的圖像,只需要再作用一個(gè)卷積層(卷積核大小 kernel_size = 1,步幅 stride = 1,特征映射個(gè)數(shù) num_outputs = 3,填充方式 padding = 'SAME')即可,這樣做了之后,輸出張量的形狀大小為 1 x 64 x 64 x 3,壓縮第 0 個(gè)索引維度之后就得到一張分辨率為 64 x 64 的彩色圖像。
????????以上即是生成器的網(wǎng)絡(luò)結(jié)構(gòu)。經(jīng)過(guò)這個(gè)網(wǎng)絡(luò)作用之后,可以把隨機(jī)采樣的一維向量轉(zhuǎn)化成一張圖像,不過(guò)不可忽略的是,這張圖像也是隨機(jī)的,因此可能全是噪聲。為了讓這些生成的圖像具有小姐姐的人臉特征,需要加入一些監(jiān)督信息來(lái)對(duì)生成器的參數(shù)進(jìn)行訓(xùn)練。這部分的工作就由判別器來(lái)承擔(dān)。判別器的網(wǎng)絡(luò)結(jié)構(gòu)基本上就是上述生成器的網(wǎng)絡(luò)結(jié)構(gòu)的逆結(jié)構(gòu)(即幾乎是上圖從右往左看的結(jié)果),只不過(guò)最后的輸出是一個(gè)長(zhǎng)度為 2 的向量,即判別器是一個(gè) 2 分類器,用來(lái)識(shí)別一張圖像是訓(xùn)練的真圖像還是生成的假圖像。因?yàn)榕袆e器是深度網(wǎng)絡(luò),具有很強(qiáng)的擬合能力,因此很容易提取出訓(xùn)練數(shù)據(jù)的人臉特征,相當(dāng)于提供了一種弱監(jiān)督的信息(即提取的人臉特征)。接下來(lái)的關(guān)鍵問(wèn)題是怎么充分的利用這種弱監(jiān)督信息。
????????前面提到過(guò),生產(chǎn)對(duì)抗網(wǎng)絡(luò)的對(duì)抗過(guò)程是:生成器盡量生成逼真的假樣本,使得判別器難辨真假,而判別器則盡量提升自己的判別能力,區(qū)分出生成器的假樣本。因此,對(duì)生成器來(lái)說(shuō),生成的樣本越接近訓(xùn)練數(shù)據(jù)越好。對(duì)于生成小姐姐圖像的這個(gè)任務(wù)來(lái)說(shuō),生成器生成的樣本具有越強(qiáng)的女性人臉特征越好。而女性人臉特征可以由判別器提供,因此得到的弱監(jiān)督目標(biāo)為:判別器作用于生成器生成的圖像的結(jié)果,與判別器作用于真實(shí)訓(xùn)練圖像的結(jié)果相似。換句話說(shuō),對(duì)生成器來(lái)說(shuō),它應(yīng)該把自己生成的圖像當(dāng)成真實(shí)訓(xùn)練圖像來(lái)看。而對(duì)判別器來(lái)說(shuō),則要把生成器生成的圖像當(dāng)做假圖像來(lái)看,從而得到生成對(duì)抗網(wǎng)絡(luò)的損失函數(shù)為:
更容易理解的方式是:
給定一個(gè)隨機(jī)采樣的向量 z,經(jīng)過(guò)生成器作用之后生成一張圖像 G(z),這張圖像 G(z) 送給判別器 D 識(shí)別之后輸出一個(gè) 2 分類概率 D(G(z)) = [p1, p2]。對(duì)于生成器 G 來(lái)說(shuō),它的目標(biāo)是生成和真實(shí)訓(xùn)練樣本 x 相差無(wú)幾的圖像,因此它要把所有生成的圖像都看成是真實(shí)圖像,因此生成器的損失是:
generator_loss = sigmoid_cross_entropy(logits=[p1, p2], labels=[1])
而對(duì)于判別器 D 來(lái)說(shuō),它希望識(shí)別能力越強(qiáng)越好,因此要認(rèn)為這是一張假圖像,從而得到判別器在生成圖像上的損失:
discriminator_loss_on_generated = sigmoid_cross_entropy(logits=[p1, p2], labels=[0])
另一方面,為了利用真實(shí)圖像的(弱監(jiān)督)信息,判別器在所有真實(shí)訓(xùn)練樣本 x 上的損失為:
discriminator_loss_on_real = sigmoid_cross_entropy(logits=[q1, q2], labels=[1])
其中 [q1, q2] = D(x) 是判別器作用于真實(shí)訓(xùn)練圖像 x 后輸出的識(shí)別概率。
????????到此,整個(gè)生成對(duì)抗網(wǎng)絡(luò)的最重要的兩部分(分別是:生成器、判別器的網(wǎng)絡(luò)結(jié)構(gòu)和它們對(duì)應(yīng)的損失)內(nèi)容就講述完了。一般的,在實(shí)際實(shí)現(xiàn)時(shí),上述的損失會(huì)進(jìn)行一些平滑處理(見(jiàn)后面源代碼,或論文 Improved Techniques for Training GANs),除此之外,在優(yōu)化判別器時(shí)使用兩部分損失之和:
discriminator_loss = discriminator_loss_on_generated + discriminator_loss_on_real
這樣,我們總共得到了 4 個(gè)損失,其中用于反向傳播優(yōu)化網(wǎng)絡(luò)參數(shù)的損失是:generator_loss 和 discriminator_loss。將以上思想用 TensorFlow 實(shí)現(xiàn),即得到 DCGAN 的模型(命名為 model.py):
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sat May 26 20:03:48 2018
@author: shirhe-lyh
"""
"""Implementation of DCGAN.
This work was first described in:
Unsupervised representation learning with deep convolutional generative
adversarial networks, Alec Radford et al., arXiv: 1511.06434v2
This module Based on:
TensorFlow models/research/slim/nets/dcgan.py
TensorFlow tensorflow/contrib/gan
"""
import math
import tensorflow as tf
slim = tf.contrib.slim
class DCGAN(object):
"""Implementation of DCGAN."""
def __init__(self,
is_training,
generator_depth=64,
discriminator_depth=64,
final_size=32,
num_outputs=3,
fused_batch_norm=False):
"""Constructor.
Args:
is_training: Whether the the network is for training or not.
generator_depth: Number of channels in last deconvolution layer of
the generator network.
discriminator_depth: Number of channels in first convolution layer
of the discirminator network.
final_size: The shape of the final output.
num_outputs: Nuber of output features. For images, this is the
number of channels.
fused_batch_norm: If 'True', use a faster, fused implementation
of batch normalization.
"""
self._is_training = is_training
self._generator_depth = generator_depth
self._discirminator_depth = discriminator_depth
self._final_size = final_size
self._num_outputs = num_outputs
self._fused_batch_norm = fused_batch_norm
def _validate_image_inputs(self, inputs):
"""Check the inputs whether is valid or not.
Copy from:
https://github.com/tensorflow/models/blob/master/research/
slim/nets/dcgan.py
Args:
inputs: A float32 tensor with shape [batch_size, height, width,
channels].
Raises:
ValueError: If the input image shape is not 4-dimensional, if the
spatial dimensions aren't defined at graph construction time,
if the spatial dimensions aren't square, or if the spatial
dimensions aren't a power of two.
"""
inputs.get_shape().assert_has_rank(4)
inputs.get_shape()[1:3].assert_is_fully_defined()
if inputs.get_shape()[1] != inputs.get_shape()[2]:
raise ValueError('Input tensor does not have equal width and '
'height: ', inputs.get_shape()[1:3])
width = inputs.get_shape().as_list()[2]
if math.log(width, 2) != int(math.log(width, 2)):
raise ValueError("Input tensor 'width' is not a power of 2: ",
width)
def discriminator(self,
inputs,
depth=64,
is_training=True,
reuse=None,
scope='Discriminator',
fused_batch_norm=False):
"""Discriminator network for DCGAN.
Construct discriminator network from inputs to the final endpoint.
Copy from:
https://github.com/tensorflow/models/blob/master/research/
slim/nets/dcgan.py
Args:
inputs: A float32 tensor with shape [batch_size, height, width,
channels].
depth: Number of channels in first convolution layer.
is_training: Whether the network is for training or not.
reuse: Whether or not the network variables should be reused.
'scope' must be given to be reused.
scope: Optional variable_scope. Default value is 'Discriminator'.
fused_batch_norm: If 'True', use a faster, fused implementation
of batch normalization.
Returns:
logits: The pre-softmax activations, a float32 tensor with shape
[batch_size, 1].
end_points: A dictionary from components of the network to their
activation.
Raises:
ValueError: If the input image shape is not 4-dimensional, if the
spatial dimensions aren't defined at graph construction time,
if the spatial dimensions aren't square, or if the spatial
dimensions aren't a power of two.
"""
normalizer_fn = slim.batch_norm
normalizer_fn_args = {
'is_training': is_training,
'zero_debias_moving_mean': True,
'fused': fused_batch_norm}
self._validate_image_inputs(inputs)
height = inputs.get_shape().as_list()[1]
end_points = {}
with tf.variable_scope(scope, values=[inputs], reuse=reuse) as scope:
with slim.arg_scope([normalizer_fn], **normalizer_fn_args):
with slim.arg_scope([slim.conv2d], stride=2, kernel_size=4,
activation_fn=tf.nn.leaky_relu):
net = inputs
for i in range(int(math.log(height, 2))):
scope = 'conv%i' % (i+1)
current_depth = depth * 2**i
normalizer_fn_ = None if i == 0 else normalizer_fn
net = slim.conv2d(net, num_outputs=current_depth,
normalizer_fn=normalizer_fn_,
scope=scope)
end_points[scope] = net
logits = slim.conv2d(net, 1, kernel_size=1, stride=1,
padding='VALID', normalizer_fn=None,
activation_fn=None)
logits = tf.reshape(logits, [-1, 1])
end_points['logits'] = logits
return logits, end_points
def generator(self,
inputs,
depth=64,
final_size=32,
num_outputs=3,
is_training=True,
reuse=None,
scope='Generator',
fused_batch_norm=False):
"""Generator network for DCGAN.
Construct generator network from inputs to the final endpoint.
Copy from:
https://github.com/tensorflow/models/blob/master/research/
slim/nets/dcgan.py
Args:
inputs: A float32 tensor with shape [batch_size, N] for any size N.
depth: Number of channels in last deconvolution layer.
final_size: The shape of the final output.
num_outputs: Nuber of output features. For images, this is the
number of channels.
is_training: Whether is training or not.
reuse: Whether or not the network has its variables should be
reused. 'scope' must be given to be reused.
scope: Optional variable_scope. Default value is 'Generator'.
fused_batch_norm: If 'True', use a faster, fused implementation
of batch normalization.
Returns:
logits: The pre-sortmax activations, a float32 tensor with shape
[batch_size, final_size, final_size, num_outputs].
end_points: A dictionary from components of the network to their
activation.
Raises:
ValueError: If 'inputs' is not 2-dimensional, or if 'final_size'
is not a power of 2 or is less than 8.
"""
normalizer_fn = slim.batch_norm
normalizer_fn_args = {
'is_training': is_training,
'zero_debias_moving_mean': True,
'fused': fused_batch_norm}
inputs.get_shape().assert_has_rank(2)
if math.log(final_size, 2) != int(math.log(final_size, 2)):
raise ValueError("'final_size' (%i) must be a power of 2."
% final_size)
if final_size < 8:
raise ValueError("'final_size' (%i) must be greater than 8."
% final_size)
end_points = {}
num_layers = int(math.log(final_size, 2)) - 1
with tf.variable_scope(scope, values=[inputs], reuse=reuse) as scope:
with slim.arg_scope([normalizer_fn], **normalizer_fn_args):
with slim.arg_scope([slim.conv2d_transpose],
normalizer_fn=normalizer_fn,
stride=2, kernel_size=4):
net = tf.expand_dims(tf.expand_dims(inputs, 1), 1)
# First upscaling is different because it takes the input
# vector.
current_depth = depth * 2 ** (num_layers - 1)
scope = 'deconv1'
net = slim.conv2d_transpose(net, current_depth, stride=1,
padding='VALID', scope=scope)
end_points[scope] = net
for i in range(2, num_layers):
scope = 'deconv%i' % i
current_depth = depth * 2 * (num_layers - i)
net = slim.conv2d_transpose(net, current_depth,
scope=scope)
end_points[scope] = net
# Last layer has different normalizer and activation.
scope = 'deconv%i' % num_layers
net = slim.conv2d_transpose(net, depth, normalizer_fn=None,
activation_fn=None, scope=scope)
end_points[scope] = net
# Convert to proper channels
scope = 'logits'
logits = slim.conv2d(
net,
num_outputs,
normalizer_fn=None,
activation_fn=tf.nn.tanh,
kernel_size=1,
stride=1,
padding='VALID',
scope=scope)
end_points[scope] = logits
logits.get_shape().assert_has_rank(4)
logits.get_shape().assert_is_compatible_with(
[None, final_size, final_size, num_outputs])
return logits, end_points
def dcgan_model(self,
real_data,
generator_inputs,
generator_scope='Generator',
discirminator_scope='Discriminator',
check_shapes=True):
"""Returns DCGAN model outputs and variables.
Modified from:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/
contrib/gan/python/train.py
Args:
real_data: A float32 tensor with shape [batch_size, height, width,
channels].
generator_inputs: A float32 tensor with shape [batch_size, N] for
any size N.
generator_scope: Optional genertor variable scope. Useful if you
want to reuse a subgraph that has already been created.
discriminator_scope: Optional discriminator variable scope. Useful
if you want to reuse a subgraph that has already been created.
check_shapes: If 'True', check that generator produces Tensors
that are the same shape as real data. Otherwise, skip this
check.
Returns:
A dictionary containing output tensors.
Raises:
ValueError: If the generator outputs a tensor that isn't the same
shape as 'real_data'.
"""
# Create models
with tf.variable_scope(generator_scope) as gen_scope:
generated_data, _ = self.generator(
generator_inputs, self._generator_depth, self._final_size,
self._num_outputs, self._is_training)
with tf.variable_scope(discirminator_scope) as dis_scope:
discriminator_gen_outputs, _ = self.discriminator(
generated_data, self._discirminator_depth, self._is_training)
with tf.variable_scope(dis_scope, reuse=True):
discriminator_real_outputs, _ = self.discriminator(
real_data, self._discirminator_depth, self._is_training)
if check_shapes:
if not generated_data.shape.is_compatible_with(real_data.shape):
raise ValueError('Generator output shape (%s) must be the '
'shape as real data (%s).'
% (generated_data.shape, real_data.shape))
# Get model-specific variables
generator_variables = slim.get_trainable_variables(gen_scope)
discriminator_variables = slim.get_trainable_variables(dis_scope)
return {'generated_data': generated_data,
'discriminator_gen_outputs': discriminator_gen_outputs,
'discriminator_real_outputs': discriminator_real_outputs,
'generator_variables': generator_variables,
'discriminator_variables': discriminator_variables}
def predict(self, generator_inputs):
"""Return the generated results by generator network.
Args:
generator_inputs: A float32 tensor with shape [batch_size, N] for
any size N.
Returns:
logits: The pre-sortmax activations, a float32 tensor with shape
[batch_size, final_size, final_size, num_outputs].
"""
logits, _ = self.generator(generator_inputs, self._generator_depth,
self._final_size, self._num_outputs,
is_training=False)
return logits
def discriminator_loss(self,
discriminator_real_outputs,
discriminator_gen_outputs,
label_smoothing=0.25):
"""Original minmax discriminator loss for GANs, with label smoothing.
Modified from:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/
contrib/gan/python/losses/python/losses_impl.py
Args:
discriminator_real_outputs: Discriminator output on real data.
discriminator_gen_outputs: Discriminator output on generated data.
Expected to be in the range of (-inf, inf).
label_smoothing: The amount of smoothing for positive labels. This
technique is taken from `Improved Techniques for Training GANs`
(https://arxiv.org/abs/1606.03498). `0.0` means no smoothing.
Returns:
loss_dict: A dictionary containing three scalar tensors.
"""
# -log((1 - label_smoothing) - sigmoid(D(x)))
losses_on_real = slim.losses.sigmoid_cross_entropy(
logits=discriminator_real_outputs,
multi_class_labels=tf.ones_like(discriminator_real_outputs),
label_smoothing=label_smoothing)
loss_on_real = tf.reduce_mean(losses_on_real)
# -log(- sigmoid(D(G(x))))
losses_on_generated = slim.losses.sigmoid_cross_entropy(
logits=discriminator_gen_outputs,
multi_class_labels=tf.zeros_like(discriminator_gen_outputs))
loss_on_generated = tf.reduce_mean(losses_on_generated)
loss = loss_on_real + loss_on_generated
return {'dis_loss': loss,
'dis_loss_on_real': loss_on_real,
'dis_loss_on_generated': loss_on_generated}
def generator_loss(self, discriminator_gen_outputs, label_smoothing=0.0):
"""Modified generator loss for DCGAN.
Modified from:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/
contrib/gan/python/losses/python/losses_impl.py
Args:
discriminator_gen_outputs: Discriminator output on generated data.
Expected to be in the range of (-inf, inf).
Returns:
loss: A scalar tensor.
"""
losses = slim.losses.sigmoid_cross_entropy(
logits=discriminator_gen_outputs,
multi_class_labels=tf.ones_like(discriminator_gen_outputs),
label_smoothing=label_smoothing)
loss = tf.reduce_mean(losses)
return loss
def loss(self, discriminator_real_outputs, discriminator_gen_outputs):
"""Computes the loss of DCGAN.
Args:
discriminator_real_outputs: Discriminator output on real data.
discriminator_gen_outputs: Discriminator output on generated data.
Expected to be in the range of (-inf, inf).
Returns:
A dictionary contraining 4 scalar tensors.
"""
dis_loss_dict = self.discriminator_loss(discriminator_real_outputs,
discriminator_gen_outputs)
gen_loss = self.generator_loss(discriminator_gen_outputs)
dis_loss_dict.update({'gen_loss': gen_loss})
return dis_loss_dict
二、訓(xùn)練并生成圖像
????????深度卷積生成對(duì)抗網(wǎng)絡(luò) DCGAN 論文的作者總結(jié)了他們?nèi)〉脤⑸蓪?duì)抗網(wǎng)絡(luò)用于無(wú)監(jiān)督、穩(wěn)定的生成圖像成功的一些技術(shù):
上面的代碼(model.py) 基本上忠實(shí)的采用了這些技術(shù)。一些細(xì)微的差別為:
- 從隨機(jī)分布中采樣出的向量長(zhǎng)度為 64,而不是論文中的 100;
- 用于訓(xùn)練的真實(shí)圖像的分辨率只能是 n x n,其中 n 必須是 2 的冪;
- 生成圖像的分辨率也只能是 m x m,其中 m 必須是 2 的冪;
- 定義損失時(shí),使用了平滑的技術(shù) Improved Techniques for Training GANs。
????????這一節(jié)關(guān)注訓(xùn)練 DCGAN 的問(wèn)題。首先,將訓(xùn)練文件(命名為 train.py)的代碼列出如下:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun May 27 16:55:12 2018
@author: shirhe-lyh
"""
"""Train a DCGAN to generating fake images.
Example Usage:
---------------
python3 train.py \
--images_dir: Path to real images directory.
--images_pattern: The pattern of input images.
--generated_images_save_dir: Path to directory where to write gen images.
--logdir: Path to log directory.
--num_steps: Number of steps.
"""
import cv2
import glob
import numpy as np
import os
import tensorflow as tf
import model
flags = tf.flags
flags.DEFINE_string('images_dir', None, 'Path to real images directory.')
flags.DEFINE_string('images_pattern', '*.jpg', 'The pattern of input images.')
flags.DEFINE_string('generated_images_save_dir', None, 'Path to directory '
'where to write generated images.')
flags.DEFINE_string('logdir', './training', 'Path to log directory.')
flags.DEFINE_integer('num_steps', 20000, 'Number of steps.')
FLAGS = flags.FLAGS
def get_next_batch(batch_size=64):
"""Get a batch set of real images and random generated inputs."""
if not os.path.exists(FLAGS.images_dir):
raise ValueError('images_dir is not exist.')
images_path = os.path.join(FLAGS.images_dir, FLAGS.images_pattern)
image_files_list = glob.glob(images_path)
image_files_arr = np.array(image_files_list)
selected_indices = np.random.choice(len(image_files_list), batch_size)
selected_image_files = image_files_arr[selected_indices]
images = read_images(selected_image_files)
# generated_inputs = np.random.normal(size=[batch_size, 64])
generated_inputs = np.random.uniform(
low=-1, high=1.0, size=[batch_size, 64])
return images, generated_inputs
def read_images(image_files):
"""Read images by OpenCV."""
images = []
for image_path in image_files:
image = cv2.imread(image_path)
image = cv2.resize(image, (64, 64))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = (image - 127.5) / 127.5
images.append(image)
return np.array(images)
def write_images(generated_images, images_save_dir, num_step):
"""Write images to a given directory."""
#Scale images from [-1, 1] to [0, 255].
generated_images = ((generated_images + 1) * 127.5).astype(np.uint8)
for j, image in enumerate(generated_images):
image_name = 'generated_step{}_{}.jpg'.format(num_step+1, j+1)
image_path = os.path.join(FLAGS.generated_images_save_dir,
image_name)
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
cv2.imwrite(image_path, image)
def main(_):
# Define placeholder
real_data = tf.placeholder(
tf.float32, shape=[None, 64, 64, 3], name='real_data')
generated_inputs = tf.placeholder(
tf.float32, [None, 64], name='generated_inputs')
# Create DCGAN model
dcgan_model = model.DCGAN(is_training=True, final_size=64)
outputs_dict = dcgan_model.dcgan_model(real_data, generated_inputs)
generated_data = outputs_dict['generated_data']
generated_data_ = tf.identity(generated_data, name='generated_data')
discriminator_gen_outputs = outputs_dict['discriminator_gen_outputs']
discriminator_real_outputs = outputs_dict['discriminator_real_outputs']
generator_variables = outputs_dict['generator_variables']
discriminator_variables = outputs_dict['discriminator_variables']
loss_dict = dcgan_model.loss(discriminator_real_outputs,
discriminator_gen_outputs)
discriminator_loss = loss_dict['dis_loss']
discriminator_loss_on_real = loss_dict['dis_loss_on_real']
discriminator_loss_on_generated = loss_dict['dis_loss_on_generated']
generator_loss = loss_dict['gen_loss']
# Write loss values to logdir (tensorboard)
tf.summary.scalar('discriminator_loss', discriminator_loss)
tf.summary.scalar('discriminator_loss_on_real', discriminator_loss_on_real)
tf.summary.scalar('discriminator_loss_on_generated',
discriminator_loss_on_generated)
tf.summary.scalar('generator_loss', generator_loss)
merged_summary = tf.summary.merge_all(key=tf.GraphKeys.SUMMARIES)
# Create optimizer
discriminator_optimizer = tf.train.AdamOptimizer(learning_rate=0.0004, # 0.0005
beta1=0.5)
discriminator_train_step = discriminator_optimizer.minimize(
discriminator_loss, var_list=discriminator_variables)
generator_optimizer = tf.train.AdamOptimizer(learning_rate=0.0001,
beta1=0.5)
generator_train_step = generator_optimizer.minimize(
generator_loss, var_list=generator_variables)
saver = tf.train.Saver(var_list=tf.global_variables())
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
# Write model graph to tensorboard
if not FLAGS.logdir:
raise ValueError('logdir is not specified.')
if not os.path.exists(FLAGS.logdir):
os.makedirs(FLAGS.logdir)
writer = tf.summary.FileWriter(FLAGS.logdir, sess.graph)
fixed_images, fixed_generated_inputs = get_next_batch()
for i in range(FLAGS.num_steps):
if (i+1) % 500 == 0:
batch_images = fixed_images
batch_generated_inputs = fixed_generated_inputs
else:
batch_images, batch_generated_inputs = get_next_batch()
train_dict = {real_data: batch_images,
generated_inputs: batch_generated_inputs}
# Update discriminator network
sess.run(discriminator_train_step, feed_dict=train_dict)
# Update generator network five times
sess.run(generator_train_step, feed_dict=train_dict)
sess.run(generator_train_step, feed_dict=train_dict)
sess.run(generator_train_step, feed_dict=train_dict)
sess.run(generator_train_step, feed_dict=train_dict)
sess.run(generator_train_step, feed_dict=train_dict)
summary, generated_images = sess.run(
[merged_summary, generated_data], feed_dict=train_dict)
# Write loss values to tensorboard
writer.add_summary(summary, i+1)
if (i+1) % 500 == 0:
# Save model
model_save_path = os.path.join(FLAGS.logdir, 'model.ckpt')
saver.save(sess, save_path=model_save_path, global_step=i+1)
# Save generated images
if not FLAGS.generated_images_save_dir:
FLAGS.generated_images_save_dir = './generated_images'
if not os.path.exists(FLAGS.generated_images_save_dir):
os.makedirs(FLAGS.generated_images_save_dir)
write_images(
generated_images, FLAGS.generated_images_save_dir, i)
writer.close()
if __name__ == '__main__':
tf.app.run()
這個(gè)文件定義了 4 個(gè)函數(shù),從上到下分別是:用于隨機(jī)采樣一個(gè)批量訓(xùn)練數(shù)據(jù)的函數(shù) get_next_batch,用于從本地文件夾讀取訓(xùn)練圖像的函數(shù) read_images,用于將生成器生成的圖像保存到某一文件夾的函數(shù) write_images,以及訓(xùn)練整個(gè)深度卷積生成對(duì)抗網(wǎng)絡(luò)的主函數(shù) main。前 3 個(gè)內(nèi)容少而簡(jiǎn)單,直接略過(guò),我們只看 main 函數(shù)。主函數(shù)首先定義了兩個(gè)占位符,用于作為數(shù)據(jù)入口。接下來(lái),實(shí)例化一個(gè)類 DCGAN 的一個(gè)對(duì)象,然后作用于占位符上,得到模型輸出和 4 個(gè)損失,緊隨其后的 5 條語(yǔ)句 tf.summary 將損失寫(xiě)入到日志文件,其目的是可以使用 tensorboard 在瀏覽器中可視化的查看損失的變化情況。再然后是定義了兩個(gè)優(yōu)化器:discriminator_optimizer、generateor_optimizer,分別用于優(yōu)化判別器和生成器的損失。最后,在定義了模型保存對(duì)象 saver 和將模型的 graph 寫(xiě)入到日志文件之后,來(lái)到了訓(xùn)練過(guò)程(for 循環(huán)):
- 隨機(jī)從訓(xùn)練圖像中選擇一個(gè)批量的訓(xùn)練樣本;
- 每?jī)?yōu)化 1 次判別器都要相繼優(yōu)化 5 次生成器;
- 每訓(xùn)練 500 步保存一次生成的圖像和模型。
另外,為了能夠看清模型生成的圖像的演化過(guò)程,每訓(xùn)練 500 步都使用同樣的輸入數(shù)據(jù)。
????????關(guān)于生成對(duì)抗網(wǎng)絡(luò)訓(xùn)練的方法,GAN 講得比較清晰:
我們需要關(guān)注的一個(gè)重點(diǎn)是:判別器每訓(xùn)練 k 次,生成器訓(xùn)練 1 次。但按照我自己的理解(可能有誤),應(yīng)該是:生成器每訓(xùn)練 k 次,判別器訓(xùn)練 1 次。這是因?yàn)?,在?xùn)練的早期,生成器生成的樣本與訓(xùn)練的真實(shí)樣本差別很大,判別器能夠輕而易舉的識(shí)別出來(lái),因此損失 discriminator_loss_on_generated 會(huì)迅速的下降到 0,為了延緩這個(gè)損失的下降,以及為了讓生成器得到充分的訓(xùn)練盡快生成質(zhì)量較高的樣本,選擇連續(xù)優(yōu)化生成器 k 次。
????????回到我們生成小姐姐頭像的問(wèn)題,經(jīng)過(guò)多次實(shí)驗(yàn),最終選擇 k = 5,即每?jī)?yōu)化 5 次生成器才優(yōu)化 1 次判別器。這樣訓(xùn)練可以讓損失 discriminator_loss_on_generated 以及損失 generator_loss 有一段相當(dāng)長(zhǎng)的對(duì)抗平衡過(guò)程,從而能夠讓生成器能夠長(zhǎng)時(shí)間的得到優(yōu)化,進(jìn)而生成質(zhì)量較高的圖像。
????????在項(xiàng)目的當(dāng)前目錄的終端執(zhí)行:
python3 train.py --images_dir path/to/images/directory
此時(shí)會(huì)在當(dāng)前目錄下生成一個(gè)新的文件夾:training,這個(gè)文件夾用來(lái)保存訓(xùn)練過(guò)程中產(chǎn)生的數(shù)據(jù),如模型各種參數(shù)等。然后,再運(yùn)行 tensorboard:
tensorboard --logdir ./training
打開(kāi)終端返回的瀏覽器鏈接,你可以在 SCALARS 頁(yè)面下看到四條損失曲線,為了更深刻的理解生成對(duì)抗網(wǎng)絡(luò),建議你仔細(xì)的觀察這些損失曲線的變化過(guò)程,并思考怎樣調(diào)整參數(shù),讓網(wǎng)絡(luò)生成更逼真的圖像。
????????train.py 中 main 函數(shù)中的優(yōu)化器的參數(shù)是我試驗(yàn)了很多次之后確定的,雖然還不是很讓人滿意的參數(shù),但已經(jīng)可以生成一些比較好的圖像了,如訓(xùn)練 15500 次之后生成的圖像為(所有生成的圖像都保存在文件夾 generated_images):
可以看到,生成的圖像整體質(zhì)量已經(jīng)比較好了。如果從中挑選出一些比較滿意的圖像的話,下面這些生成的小姐姐應(yīng)該可以以假亂真了:
當(dāng)然,清晰度還需要繼續(xù)提高。
三、訓(xùn)練的一些細(xì)節(jié)
????????訓(xùn)練生成對(duì)抗網(wǎng)絡(luò)時(shí),需要調(diào)整的重點(diǎn)是:兩個(gè)優(yōu)化器的學(xué)習(xí)率和判別器每?jī)?yōu)化 1 次生成器優(yōu)化的次數(shù) k。為了學(xué)習(xí)率的確定更簡(jiǎn)單,可以使用自適應(yīng)學(xué)習(xí)率的優(yōu)化器 Adam,此時(shí),一般的初始學(xué)習(xí)率為 0.0001,調(diào)整時(shí),可以固定其中一個(gè),而重點(diǎn)去調(diào)整另外一個(gè)。調(diào)整過(guò)程中,需要確保損失 discriminator_loss_on_generated 不會(huì)一直下降,對(duì)應(yīng)的,即損失 generator_loss 不能一直上升,比較理想的情況是兩者都穩(wěn)定在某一數(shù)值附近波動(dòng)。一般的,如果訓(xùn)練 500 次之后,文件夾 generated_images 里生成的圖像都是糊的,說(shuō)明當(dāng)前學(xué)習(xí)率選得不好,要中斷訓(xùn)練過(guò)程重新調(diào)整學(xué)習(xí)率;而如果此時(shí)文件夾里的圖像已經(jīng)依稀有人臉特征,說(shuō)明可以繼續(xù)往下訓(xùn)練。以下是我某次訓(xùn)練時(shí)的損失曲線(所有參數(shù)跟 train.py 中的一樣):


????????根據(jù)上圖,損失 discriminator_loss_on_generated 和 generator_loss 在 5000 次訓(xùn)練之前處于平衡狀態(tài),此時(shí)生成的圖像越來(lái)越清晰。但 5500 次訓(xùn)練之后,損失 generator_loss 開(kāi)始迅速增大,生成的圖像全部變成噪聲圖像(見(jiàn)下圖),此后,在訓(xùn)練 8000 次之后,generator_loss 損失又急劇降到低水平,此時(shí)生成的質(zhì)量又開(kāi)始變好。到 16000 次之后,隨著損失 generator_loss 再次變大,生成的圖像再次變糊。對(duì)照以上過(guò)程,整個(gè)訓(xùn)練過(guò)程中生成的對(duì)應(yīng)圖像如下(因 16000 次之后的圖像全是糊的,故略去):
????????最后,需要說(shuō)明的一點(diǎn)是,在選擇生成器輸入的隨機(jī)分布時(shí),如果使用正態(tài)分布(見(jiàn)函數(shù) get_next_batch 被注釋的一行):
generated_inputs = np.random.normal(size=[batch_size, 64])
則生成的圖像中會(huì)有很多是相似的,如第 17500 次訓(xùn)練時(shí)生成的 64 張圖像中:
第 12、14、21、26、27、39、46、49 張圖像,及第 9、22、28、33、42、47、56、61 張圖像都非常相似(說(shuō)明標(biāo)準(zhǔn)正太分布生成的樣本本身很相似,適用于條件生成對(duì)抗網(wǎng)絡(luò))。而采用均勻分布:
generated_inputs = np.random.uniform(low=-1, high=1.0, size=[batch_size, 64])
則可以極大的緩解這個(gè)問(wèn)題,見(jiàn)圖 3。