在這篇文章中,我將向您展示如何生成可能在現實生活中不存在的新人臉。我將使用生成對抗網絡(GAN)來完成任務。我正在使用CelebA數據集來訓練網絡。該數據集包含2,00,000個知名人物的圖像。我假設您對GAN有理論上的理解。我將在本教程中使用Tensorflow框架。
這就是我們的管道的樣子
歸一化圖像。創建Generator和Discriminator網絡。訓練神經網絡並生成新面孔。以下是我們機器學習數據集中的少量圖像。
CelebA數據集
歸一化圖像
作為第一步,我們導入我們將使用的Python庫。我們使用PIL加載所有圖像。在加載圖像時,我們會裁剪臉部周圍的所有圖像並將其調整為(64,64,3)這些圖像的範圍為(0,255)。我們將這些圖像的位範圍壓縮在(-1,1)之間,這是在tanh激活的範圍內。Python代碼如下:
import glob
import numpy as np
from PIL import Image
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow import reduce_mean
from tensorflow.train import AdamOptimizer as adam
from tensorflow.nn import sigmoid_cross_entropy_with_logits as loss
from tensorflow.layers import dense, batch_normalization, conv2d_transpose, conv2d
image_ids = glob.glob('../input/data/*')
crop = (30, 55, 150, 175)
images = [np.array((Image.open(i).crop(crop)).resize((64,64))) for i in image_ids]
for i in range(len(images)):
images[i] = ((images[i] - images[i].min())/(255 - images[i].min()))
images[i] = images[i]*2-1
images = np.array(images)
輔助函數
def view_samples(epoch, samples, nrows, ncols, figsize=(10,10)):
fig, axes = plt.subplots(figsize=figsize, nrows=nrows, ncols=ncols)
for ax, img in zip(axes.flatten(), samples[epoch]):
ax.axis('off')
img = (((img+1)/2)*255).astype(np.uint8)
im = ax.imshow(img)
plt.subplots_adjust(wspace=0, hspace=0)
return fig, axes
創建網絡
具體來說,我在這裡實現DCGAN。這叫做深度卷積生成對抗性網絡。這是Ian Goodfellow在2014年推出的GAN標準的變體。DCGAN使用卷積層,而不是所有全連接層。GAN背後的概念是它有兩個名為Generator和 Discriminator的網絡。Generator的工作是從噪聲中生成逼真的圖像,並欺騙Discriminator。另一方面,Discriminator的工作是區分真實和虛假圖像。這兩個網絡都是分開訓練的。在開始時,兩個網絡的工作都很差,但是當我們繼續訓練時,Discriminator在識別真實和假圖像方面變得會更好,而Generator在生成真實圖像方面變得更好,這樣它就可以欺騙Discriminator。但是,訓練GAN的任務並不容易。網絡的體系結構和超參數的選擇與DCGAN論文中使用的非常相似。Generator的Python實現如下:
def generator(noise, reuse=False, alpha=0.2, training=True):
with tf.variable_scope('generator', reuse=reuse):
x = dense(noise, 4*4*512)
x = tf.reshape(x, (-1, 4, 4, 512))
x = batch_normalization(x, training=training)
x = tf.maximum(0., x)
x = conv2d_transpose(x, 256, 5, 2, padding='same')
x = batch_normalization(x, training=training)
x = tf.maximum(0., x)
x = conv2d_transpose(x, 128, 5, 2, padding='same')
x = batch_normalization(x, training=training)
x = tf.maximum(0., x)
x = conv2d_transpose(x, 64, 5, 2, padding='same')
x = batch_normalization(x, training=training)
x = tf.maximum(0., x)
logits = conv2d_transpose(x, 3, 5, 2, padding='same')
out = tf.tanh(logits)
return out, logits
Discriminator的Python實現如下:
def discriminator(x, reuse=False, alpha=0.2, training=True):
with tf.variable_scope('discriminator', reuse=reuse):
x = conv2d(x, 32, 5, 2, padding='same')
x = tf.maximum(alpha*x, x)
x = conv2d(x, 64, 5, 2, padding='same')
x = batch_normalization(x, training=training)
x = tf.maximum(alpha*x, x)
x = conv2d(x, 128, 5, 2, padding='same')
x = batch_normalization(x, training=training)
x = tf.maximum(alpha*x, x)
x = conv2d(x, 256, 5, 2, padding='same')
x = batch_normalization(x, training=training)
x = tf.maximum(alpha*x, x)
flatten = tf.reshape(x, (-1, 4*4*256))
logits = dense(flatten, 1)
out = tf.sigmoid(logits)
return out, logits
def inputs(real_dim, noise_dim):
inputs_real = tf.placeholder(tf.float32, (None, *real_dim), name='input_real')
inputs_noise = tf.placeholder(tf.float32, (None, noise_dim), name='input_noise')
return inputs_real, inputs_noise
# building the graph
tf.reset_default_graph()
input_real, input_noise = inputs(input_shape, noise_size)
gen_noise, gen_logits = generator(input_noise)
dis_out_real, dis_logits_real = discriminator(input_real)
dis_out_fake, dis_logits_fake = discriminator(gen_noise, reuse=True)
# defining losses
shape = dis_logits_real
dis_loss_real = reduce_mean(loss(logits=dis_logits_real, labels=tf.ones_like(shape*smooth)))
dis_loss_fake = reduce_mean(loss(logits=dis_logits_fake, labels=tf.zeros_like(shape)))
gen_loss = reduce_mean(loss(logits=dis_logits_fake, labels=tf.ones_like(shape*smooth)))
dis_loss = dis_loss_real + dis_loss_fake
# defining optimizers
total_vars = tf.trainable_variables()
dis_vars = [var for var in total_vars if var.name[0] == 'd']
gen_vars = [var for var in total_vars if var.name[0] == 'g']
dis_opt = adam(learning_rate=learning_rate, beta1=beta1).minimize(dis_loss, var_list=dis_vars)
gen_opt = adam(learning_rate=learning_rate, beta1=beta1).minimize(gen_loss, var_list=gen_vars)
訓練網絡
下面是超參數的選擇。學習率為0.0002,噪聲大小為100,標籤平滑因子為0.9,LeakyRelu的leak參數為0.2,Adam optimizer的beta1為0.5。# hyperparameters
beta1 = 0.5
alpha = 0.2
smooth = 0.9
noise_size = 100
learning_rate = 0.0002
input_shape = (64,64,3)
batch_size = 128
epochs = 100
saver = tf.train.Saver(var_list = gen_vars)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for e in range(epochs):
iters = 10000//batch_size
for i in range(1):
batch_images= get_images(batch_size)
batch_noise = np.random.uniform(-1, 1, size=(batch_size, noise_size))
sess.run(dis_opt, feed_dict={input_real: batch_images, input_noise: batch_noise})
sess.run(gen_opt, feed_dict={input_real: batch_images, input_noise: batch_noise})
loss_dis = sess.run(dis_loss, {input_noise: batch_noise, input_real: batch_images})
loss_gen = gen_loss.eval({input_real: batch_images, input_noise: batch_noise})
print("Epoch {}/{}...".format(e+1, epochs),"Discriminator Loss: {:.4f}...".format(loss_dis),
"Generator Loss: {:.4f}".format(loss_gen))
sample_noise = np.random.uniform(-1, 1, size=(8, noise_size))
gen_samples = sess.run(generator(input_noise, reuse=True, alpha=alpha),
feed_dict={input_noise: sample_noise})
view_samples(-1, gen_samples, 2, 4, (10,5))
plt.show()
saver.save(sess, './checkpoints/generator.ckpt')
訓練後生成的人臉