[He Zhiyuan – 21 projects to play deep learning] – Chapter4-4.2.1 Deep Dream model practice in Tensorflow (1)

Hits: 0

First, the original words of the book are quoted as follows, aiming at the concept of Deep Dream:

Deep Dream is a highly interesting technology announced by Google in 2015. In a trained [convolutional neural network] , only a few parameters need to be set to generate an image through this technique. The resulting images are not only impressive, but also help us understand the workings behind convolutional neural networks. This chapter introduces the basic principles of Deep Dream and uses TensorFlow to implement Deep Dream generation models.

The technical principle of Deep Dream
         In the convolutional network, the input is generally an image, the middle layer is a number of convolution operations, and the output is the category of the image. In the training phase, a large number of training images are used to calculate the gradient, and the network continuously adjusts and learns the best parameters according to the gradient. In this regard, there are usually some questions, such as . (I) What exactly did the convolutional layer learn? (2) What is the meaning of the parameter representation of the convolutional layer? (3) What is the difference between the content learned by the shallow convolution and the deep convolution?

Deep Dream can answer the above questions.
Let the image input to the network be x, and the probability of each category output by the network is t (such as ImageNet is 1000 categories, in this case, t is a 1000-dimensional vector, representing the probability of 1000 categories), with banana Taking the category as an example, suppose the probability output value corresponding to the palace is t[100]. In other words, t[100] represents the probability that the neural network thinks a picture is a banana. Set t[100] as the optimization goal, and constantly let the neural network adjust the pixel value of the input image x, so that the output t[100] is as large as possible.

For example, if you want to figure out what the convolutional layers in the middle of the neural network have learned. Just maximize the output of one channel of the convolutional layer. Also let the input image be x, and the output of a convolutional layer in the middle is y. The shape of y should be hwc, where h is the height of y, w is the width of y, and c is the “number of channels”. The original image has three channels R, G, and B, while in most convolutional layers, the number of channels is much more than three. One channel of the convolution can represent a learned “information”. Taking the average value of a certain channel as the optimization goal, we can figure out what this channel has learned, which is also the basic principle of Deep Dream. 

Step 1: Import the model

The following is an example of importing the Inception model. The imported code is as follows:

# coding:utf-8 
# Import the basic modules to be used. 
from __future__ import print_function
 import numpy as np
 import tensorflow as tf

# Create graph and Session
graph = tf.Graph()
sess = tf.InteractiveSession(graph=graph)

# In the tensorflow_inception_graph.pb file, both the network structure of inception and the corresponding data are stored 
# Use the following statement to import 
it model_fn = 'tensorflow_inception_graph.pb' 
with tf.gfile.FastGFile(model_fn, 'rb' ) as f :
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
# Define t_input as our input image 
t_input = tf.placeholder(np.float32, name= 'input' )
imagenet_mean = 117.0 
# The input image needs to be processed before it can be sent to the network 
# expand_dims is to add one dimension, from [height, width, channel] to [1, height, width, channel] 
# t_input - imagenet_mean is to subtract a mean 
t_preprocessed = tf.expand_dims(t_input - imagenet_mean, 0 )
tf.import_graph_def(graph_def, {'input': t_preprocessed})

# find all convolutional layers 
layers = [op.name for op in graph.get_operations() if op.type == 'Conv2D'  and  'import/'  in op.name]

# Output the number of convolutional layers 
print( 'Number of layers' , len(layers))

# In particular, output the shape of mixed4d_3x3_bottleneck_pre_relu 
name = 'mixed4d_3x3_bottleneck_pre_relu' 
print( 'shape of %s: %s' % (name, str(graph.get_tensor_by_name( 'import/' + name + ':0' ).get_shape() )))

Among them, [https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip]  download and decompress to get the model file tensorflow_inception_graph.pb 

The layers is a list, which contains the selected operation names in the graph. For the specific use of the graph.get_operations() function, please refer to this article: get_operations()

Then print the contents of layers to see what it is:

['import/conv2d0_pre_relu/conv',
 'import/conv2d1_pre_relu/conv',
 'import/conv2d2_pre_relu/conv',
 'import/mixed3a_pool_reduce_pre_relu/conv',
 'import/mixed3a_5x5_bottleneck_pre_relu/conv',
 'import/mixed3a_5x5_pre_relu/conv',
 'import/mixed3a_3x3_bottleneck_pre_relu/conv',
 'import/mixed3a_3x3_pre_relu/conv',
 'import/mixed3a_1x1_pre_relu/conv',
 'import/mixed3b_pool_reduce_pre_relu/conv',
 'import/mixed3b_5x5_bottleneck_pre_relu/conv',
 'import/mixed3b_5x5_pre_relu/conv',
 'import/mixed3b_3x3_bottleneck_pre_relu/conv',
 'import/mixed3b_3x3_pre_relu/conv',
 'import/mixed3b_1x1_pre_relu/conv',
 'import/mixed4a_pool_reduce_pre_relu/conv',
 'import/mixed4a_5x5_bottleneck_pre_relu/conv',
 'import/mixed4a_5x5_pre_relu/conv',
 'import/mixed4a_3x3_bottleneck_pre_relu/conv',
 'import/mixed4a_3x3_pre_relu/conv',
 'import/mixed4a_1x1_pre_relu/conv',
 'import/head0_bottleneck_pre_relu/conv',
 'import/mixed4b_pool_reduce_pre_relu/conv',
 'import/mixed4b_5x5_bottleneck_pre_relu/conv',
 'import/mixed4b_5x5_pre_relu/conv',
 'import/mixed4b_3x3_bottleneck_pre_relu/conv',
 'import/mixed4b_3x3_pre_relu/conv',
 'import/mixed4b_1x1_pre_relu/conv',
 'import/mixed4c_pool_reduce_pre_relu/conv',
 'import/mixed4c_5x5_bottleneck_pre_relu/conv',
 'import/mixed4c_5x5_pre_relu/conv',
 'import/mixed4c_3x3_bottleneck_pre_relu/conv',
 'import/mixed4c_3x3_pre_relu/conv',
 'import/mixed4c_1x1_pre_relu/conv',
 'import/mixed4d_pool_reduce_pre_relu/conv',
 'import/mixed4d_5x5_bottleneck_pre_relu/conv',
 'import/mixed4d_5x5_pre_relu/conv',
 'import/mixed4d_3x3_bottleneck_pre_relu/conv',
 'import/mixed4d_3x3_pre_relu/conv',
 'import/mixed4d_1x1_pre_relu/conv',
 'import/head1_bottleneck_pre_relu/conv',
 'import/mixed4e_pool_reduce_pre_relu/conv',
 'import/mixed4e_5x5_bottleneck_pre_relu/conv',
 'import/mixed4e_5x5_pre_relu/conv',
 'import/mixed4e_3x3_bottleneck_pre_relu/conv',
 'import/mixed4e_3x3_pre_relu/conv',
 'import/mixed4e_1x1_pre_relu/conv',
 'import/mixed5a_pool_reduce_pre_relu/conv',
 'import/mixed5a_5x5_bottleneck_pre_relu/conv',
 'import/mixed5a_5x5_pre_relu/conv',
 'import/mixed5a_3x3_bottleneck_pre_relu/conv',
 'import/mixed5a_3x3_pre_relu/conv',
 'import/mixed5a_1x1_pre_relu/conv',
 'import/mixed5b_pool_reduce_pre_relu/conv',
 'import/mixed5b_5x5_bottleneck_pre_relu/conv',
 'import/mixed5b_5x5_pre_relu/conv',
 'import/mixed5b_3x3_bottleneck_pre_relu/conv',
 'import/mixed5b_3x3_pre_relu/conv',
 'import/mixed5b_1x1_pre_relu/conv']

The above is the result of executing In[]: layers with jupyter notebooke. There are 59 operation names that meet the brush selection conditions. We specify a specific operation name to observe the shape, such as name = ‘mixed4d_3x3_bottleneck_pre_relu’ in the code

Printed result:  shape of mixed4d_3x3_bottleneck_pre_relu: (?, ?, ?, 144)    

The reason for the above results is that the number, size and channel number of input images are not known, so the first 3 items are unknown question marks.

(One thing to note is that you also need to subtract a pixel mean for the image. This is because the preprocessing of subtracting the mean has been done when training the Inception model, so the same preprocessing method should be used to keep the input The Inception model used here subtracts a fixed mean of 117, so 7 imagenet_mean= 117 is also defined in the program, and imagenet_mean is subtracted from t_input.)

Step2: Generate the original Deep Dream diagram

The complete code for this part is as follows:

# coding: utf-8
from __future__ import print_function
import os
from io import BytesIO
import numpy as np
from functools import partial
import PIL.Image
import scipy.misc
import tensorflow as tf


graph = tf.Graph()
model_fn = 'tensorflow_inception_graph.pb'
sess = tf.InteractiveSession(graph=graph)
with tf.gfile.FastGFile(model_fn, 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
t_input = tf.placeholder(np.float32, name='input')
imagenet_mean = 117.0
t_preprocessed = tf.expand_dims(t_input - imagenet_mean, 0)
tf.import_graph_def(graph_def, {'input': t_preprocessed})


def savearray(img_array, img_name):
    scipy.misc.toimage(img_array).save(img_name)
    print('img saved: %s' % img_name)


def  render_naive (t_obj, img0, iter_n= 20 , step= 1.0 ) : 
    # t_score is the optimization target. It is the average value of t_obj 
    # Combined with the call site, it is actually the average value of layer_output[:, :, :, channel]
    t_score = tf.reduce_mean(t_obj)
    # Calculate the gradient of t_score to t_input 
    t_grad = tf.gradients(t_score, t_input)[ 0 ]

    # create new graph
    img = img0.copy()
    for i in range(iter_n):
         # Calculate the gradient in sess, and the current score
        g, score = sess.run([t_grad, t_score], {t_input: img})
        # Apply gradient to img. step can be seen as "learning rate" 
        g /= g.std() + 1e-8
        img += g * step
        print( 'score(mean)=%f' % (score))
     # save the image 
    savearray(img, 'naive.jpg' )

# Define the convolutional layer, the number of channels, and take out the corresponding tensor 
name = 'mixed4d_3x3_bottleneck_pre_relu' 
channel = 139 
layer_output = graph.get_tensor_by_name( "import/%s:0" % name)

# Define the original image noise 
img_noise = np.random.uniform(size=( 224 , 224 , 3 )) + 100.0 
# Call the render_naive function to render 
render_naive(layer_output[:, :, :, channel], img_noise, iter_n= 20 )

The results are as follows:

score(mean)=-20.088280
score(mean)=-30.066317
score(mean)=12.236075
score(mean)=85.736198
score(mean)=155.088699
score(mean)=213.084106
score(mean)=273.888580
score(mean)=323.319122
score(mean)=378.042328
score(mean)=416.296783
score(mean)=455.985138
score(mean)=500.792450
score(mean)=532.272156
score(mean)=569.163086
score(mean)=596.386108
score(mean)=627.763367
score(mean)=650.017944
score(mean)=684.536133
score(mean)=698.245605
score(mean)=729.858521
img saved: naive.jpg

The saved image looks like this:

The parameter t_obj of the function is actually layer_output[:, :, :, channel], that is, the value of a channel of the convolutional layer. t_score = tf.reduce_mean(t_obj), which means t_score is the mean of t_obj. The larger the t_score, the larger the average activation of the corresponding channel of the neural network convolutional layer.
The goal of this section is to make t_score as large as possible by adjusting the input image t_input. Gradient descent is used for this, and the gradient t_grad = tf.gradients(t_score, t_input)[0] is defined. In the following program, the calculated gradient will be applied to the input image.
When selecting channel=110, the corresponding generated images are as follows:

You may also like...

Leave a Reply

Your email address will not be published.