
0. Google Inception模型簡介
Inception為Google開源的CNN模型,至今已經(jīng)公開四個版本,每一個版本都是基于大型圖像數(shù)據(jù)庫ImageNet中的數(shù)據(jù)訓練而成。因此我們可以直接利用Google的Inception模型來實現(xiàn)圖像分類。本篇文章主要以Inception_v3模型為基礎。Inception v3模型大約有2500萬個參數(shù),分類一張圖像就用了50億的乘加指令。在一臺沒有GPU的現(xiàn)代PC上,分類一張圖像轉眼就能完成。
1. Google Inception模型發(fā)展
以下為Inception四個版本所對應的論文,末尾為ILSVRC中的Top-5錯誤率:
- [v1] Going Deeper with Convolutions: 6.67% test error
- [v2] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift: 4.8% test error
- [v3] Rethinking the Inception Architecture for Computer Vision: 3.5% test error
- [v4] Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning: 3.08% test error
2. 下載Inception_v3模型
-
Inception_v3模型源碼下載
當然,要想自己從頭訓練一個Inception_v3模型是可以的,但費時費力,沒有必要。當然,在已經(jīng)訓練好的Inception_v3模型上修修改改retrain是沒有問題的,具體將在后續(xù)文中提到。 -
models/imagenet/classify_image.py
官方已經(jīng)放出了例程,也可以直接閱讀官方代碼。 - 【下載】已訓練好的Inception_v3模型(百度網(wǎng)盤)
- 【下載】已訓練好的Inception_v3模型(官方)
以上兩個鏈接都是可以使用的。

- classify_image_graph_def.pb文件為Inception_v3本體
-
imagenet_2012_challenge_label_map_proto.pbtxt文件內容如下所示:imagenet_2012_challenge_label_map_proto.pbtxt
包含target_class與target_class_string,前者為分類代碼,從1~1000,共1k類,記為Node_ID;后者為一編號字符串“n********”,可以理解為“地址”或者“橋梁”,記為UID。
-
imagenet_synset_to_human_label_map.txt文件內容如下:imagenet_synset_to_human_label_map.txt
包含UID與類別的映射,這種類別文字標簽記為human_string。
3. 準備工作
隨便從網(wǎng)上下載一張圖片,命名為husky.jpg:

下面的代碼就將使用Inception_v3模型對這張哈士奇圖片進行分類。
4. 代碼
先創(chuàng)建一個類NodeLookup來將softmax概率值映射到標簽上;然后創(chuàng)建一個函數(shù)create_graph()來讀取并新建模型;最后讀取哈士奇圖片進行分類識別:
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
#import re
import os
model_dir='C:/Users/Dexter/Documents/ML_files/171003_Inception_v3/Inception_model'
image = 'C:/Users/Dexter/Documents/ML_files/171003_Inception_v3/Images/husky.jpg'
#將類別ID轉換為人類易讀的標簽
class NodeLookup(object):
def __init__(self, label_lookup_path=None, uid_lookup_path=None):
if not label_lookup_path:
# 加載“l(fā)abel_lookup_path”文件
# 此文件將數(shù)據(jù)集中所含類別(1-1000)與一個叫做target_class_string的地址對應起來
# 其地址編碼為“n********”星號代表數(shù)字
label_lookup_path = os.path.join(
model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt')
if not uid_lookup_path:
# 加載“uid_lookup_path”文件
# 此文件將數(shù)據(jù)集中所含類別具體名稱與編碼方式為“n********”的地址/UID一一對應起來
uid_lookup_path = os.path.join(
model_dir, 'imagenet_synset_to_human_label_map.txt')
self.node_lookup = self.load(label_lookup_path, uid_lookup_path)
def load(self, label_lookup_path, uid_lookup_path):
if not tf.gfile.Exists(uid_lookup_path):
# 預先檢測地址是否存在
tf.logging.fatal('File does not exist %s', uid_lookup_path)
if not tf.gfile.Exists(label_lookup_path):
# 預先檢測地址是否存在
tf.logging.fatal('File does not exist %s', label_lookup_path)
# Loads mapping from string UID to human-readable string
# 加載編號字符串n********,即UID與分類名稱之間的映射關系(字典):uid_to_human
# 讀取uid_lookup_path中所有的lines
# readlines(): Returns all lines from the file in a list.
# Leaves the '\n' at the end.
proto_as_ascii_lines = tf.gfile.GFile(uid_lookup_path).readlines()
# 創(chuàng)建空字典uid_to_human用以存儲映射關系
uid_to_human = {}
# =============================================================================
# # 使用正則化方法處理文件:
# p = re.compile(r'[n\d]*[ \S,]*')
# for line in proto_as_ascii_lines:
# = p.findall(line)
# uid = parsed_items[0]
# human_string = parsed_items[2]
# uid_to_human[uid] = human_string
# =============================================================================
# 使用簡單方法處理文件:
# 一行行讀取數(shù)據(jù)
for line in proto_as_ascii_lines:
# 去掉換行符
line = line.strip('\n')
# 按照‘\t’分割,即tab,將line分為兩個部分
parse_items = line.split('\t')
# 獲取分類編碼,即UID
uid = parse_items[0]
# 獲取分類名稱
human_string = parse_items[1]
# 新建編號字符串n********,即UID與分類名稱之間的映射關系(字典):uid_to_human
uid_to_human[uid] = human_string
# Loads mapping from string UID to integer node ID.
# 加載編號字符串n********,即UID與分類代號,即node ID之間的映射關系(字典)
# 加載分類字符串n********,即UID對應分類編號1-1000的文件
proto_as_ascii = tf.gfile.GFile(label_lookup_path).readlines()
# 創(chuàng)建空字典node_id_to_uid用以存儲分類代碼node ID與UID之間的關系
node_id_to_uid = {}
for line in proto_as_ascii:
# 注意空格
if line.startswith(' target_class:'):
# 獲取分類編號
target_class = int(line.split(': ')[1])
if line.startswith(' target_class_string:'):
# 獲取UID(帶雙引號,eg:"n01484850")
target_class_string = line.split(': ')[1]
# 去掉前后的雙引號,構建映射關系
node_id_to_uid[target_class] = target_class_string[1:-2]
# Loads the final mapping of integer node ID to human-readable string
# 加載node ID與分類名稱之間的映射關系
node_id_to_name = {}
for key, val in node_id_to_uid.items():
# 假如uid不存在于uid_to_human中,則報錯
if val not in uid_to_human:
tf.logging.fatal('Failed to locate: %s', val)
# 獲取分類名稱
name = uid_to_human[val]
# 構建分類編號1-1000對應分類名稱的映射關系:key為node_id;val為name
node_id_to_name[key] = name
return node_id_to_name
# 傳入分類編號1-1000,返回分類具體名稱
def id_to_string(self, node_id):
# 若不存在,則返回空字符串
if node_id not in self.node_lookup:
return ''
return self.node_lookup[node_id]
# 讀取并創(chuàng)建一個圖graph來存放Google訓練好的Inception_v3模型(函數(shù))
def create_graph():
with tf.gfile.FastGFile(os.path.join(
model_dir, 'classify_image_graph_def.pb'), 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
tf.import_graph_def(graph_def, name='')
#讀取圖片
image_data = tf.gfile.FastGFile(image, 'rb').read()
#創(chuàng)建graph
create_graph()
# 創(chuàng)建會話,因為是從已有的Inception_v3模型中恢復,所以無需初始化
with tf.Session() as sess:
# Inception_v3模型的最后一層softmax的輸出
# 形如'conv1'是節(jié)點名稱,而'conv1:0'是張量名稱,表示節(jié)點的第一個輸出張量
softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
# 輸入圖像(jpg格式)數(shù)據(jù),得到softmax概率值(一個shape=(1,1008)的向量)
predictions = sess.run(softmax_tensor,{'DecodeJpeg/contents:0': image_data})
# 將結果轉為1維數(shù)據(jù)
predictions = np.squeeze(predictions)
# 新建類:ID --> English string label.
node_lookup = NodeLookup()
# 排序,取出前5個概率最大的值(top-5)
# argsort()返回的是數(shù)組值從小到大排列所對應的索引值
top_5 = predictions.argsort()[-5:][::-1]
for node_id in top_5:
# 獲取分類名稱
human_string = node_lookup.id_to_string(node_id)
# 獲取該分類的置信度
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
最后輸出:
runfile('C:/Users/Dexter/Documents/ML_files/171003_Inception_v3/test.py', wdir='C:/Users/Dexter/Documents/ML_files/171003_Inception_v3')
Siberian husky (score = 0.51033)
Eskimo dog, husky (score = 0.41048)
malamute, malemute, Alaskan malamute (score = 0.00653)
kelpie (score = 0.00136)
dogsled, dog sled, dog sleigh (score = 0.00133)
稍微修改一下代碼,使輸入為多張圖片,輸出為圖片路徑+圖片+預測結果:
# -*- coding: utf-8 -*-
"""
Created on Fri Oct 6 19:32:04 2017
test2:將test中輸入一張圖片變?yōu)檩斎胍粋€文件夾的圖片,并使輸出可見
@author: Dexter
"""
import tensorflow as tf
import numpy as np
#import re
import os
from PIL import Image
import matplotlib.pyplot as plt
model_dir='C:/Users/Dexter/Documents/ML_files/171003_Inception_v3/Inception_model'
image = 'C:/Users/Dexter/Documents/ML_files/171003_Inception_v3/Images/'
#將類別ID轉換為人類易讀的標簽
class NodeLookup(object):
def __init__(self, label_lookup_path=None, uid_lookup_path=None):
if not label_lookup_path:
# 加載“l(fā)abel_lookup_path”文件
# 此文件將數(shù)據(jù)集中所含類別(1-1000)與一個叫做target_class_string的地址對應起來
# 其地址編碼為“n********”星號代表數(shù)字
label_lookup_path = os.path.join(
model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt')
if not uid_lookup_path:
# 加載“uid_lookup_path”文件
# 此文件將數(shù)據(jù)集中所含類別具體名稱與編碼方式為“n********”的地址/UID一一對應起來
uid_lookup_path = os.path.join(
model_dir, 'imagenet_synset_to_human_label_map.txt')
self.node_lookup = self.load(label_lookup_path, uid_lookup_path)
def load(self, label_lookup_path, uid_lookup_path):
if not tf.gfile.Exists(uid_lookup_path):
# 預先檢測地址是否存在
tf.logging.fatal('File does not exist %s', uid_lookup_path)
if not tf.gfile.Exists(label_lookup_path):
# 預先檢測地址是否存在
tf.logging.fatal('File does not exist %s', label_lookup_path)
# Loads mapping from string UID to human-readable string
# 加載編號字符串n********,即UID與分類名稱之間的映射關系(字典):uid_to_human
# 讀取uid_lookup_path中所有的lines
# readlines(): Returns all lines from the file in a list.
# Leaves the '\n' at the end.
proto_as_ascii_lines = tf.gfile.GFile(uid_lookup_path).readlines()
# 創(chuàng)建空字典uid_to_human用以存儲映射關系
uid_to_human = {}
# =============================================================================
# # 使用正則化方法處理文件:
# p = re.compile(r'[n\d]*[ \S,]*')
# for line in proto_as_ascii_lines:
# = p.findall(line)
# uid = parsed_items[0]
# human_string = parsed_items[2]
# uid_to_human[uid] = human_string
# =============================================================================
# 使用簡單方法處理文件:
# 一行行讀取數(shù)據(jù)
for line in proto_as_ascii_lines:
# 去掉換行符
line = line.strip('\n')
# 按照‘\t’分割,即tab,將line分為兩個部分
parse_items = line.split('\t')
# 獲取分類編碼,即UID
uid = parse_items[0]
# 獲取分類名稱
human_string = parse_items[1]
# 新建編號字符串n********,即UID與分類名稱之間的映射關系(字典):uid_to_human
uid_to_human[uid] = human_string
# Loads mapping from string UID to integer node ID.
# 加載編號字符串n********,即UID與分類代號,即node ID之間的映射關系(字典)
# 加載分類字符串n********,即UID對應分類編號1-1000的文件
proto_as_ascii = tf.gfile.GFile(label_lookup_path).readlines()
# 創(chuàng)建空字典node_id_to_uid用以存儲分類代碼node ID與UID之間的關系
node_id_to_uid = {}
for line in proto_as_ascii:
# 注意空格
if line.startswith(' target_class:'):
# 獲取分類編號
target_class = int(line.split(': ')[1])
if line.startswith(' target_class_string:'):
# 獲取UID(帶雙引號,eg:"n01484850")
target_class_string = line.split(': ')[1]
# 去掉前后的雙引號,構建映射關系
node_id_to_uid[target_class] = target_class_string[1:-2]
# Loads the final mapping of integer node ID to human-readable string
# 加載node ID與分類名稱之間的映射關系
node_id_to_name = {}
for key, val in node_id_to_uid.items():
# 假如uid不存在于uid_to_human中,則報錯
if val not in uid_to_human:
tf.logging.fatal('Failed to locate: %s', val)
# 獲取分類名稱
name = uid_to_human[val]
# 構建分類編號1-1000對應分類名稱的映射關系:key為node_id;val為name
node_id_to_name[key] = name
return node_id_to_name
# 傳入分類編號1-1000,返回分類具體名稱
def id_to_string(self, node_id):
# 若不存在,則返回空字符串
if node_id not in self.node_lookup:
return ''
return self.node_lookup[node_id]
# 讀取并創(chuàng)建一個圖graph來存放Google訓練好的Inception_v3模型(函數(shù))
def create_graph():
with tf.gfile.FastGFile(os.path.join(
model_dir, 'classify_image_graph_def.pb'), 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
tf.import_graph_def(graph_def, name='')
#創(chuàng)建graph
create_graph()
# 創(chuàng)建會話,因為是從已有的Inception_v3模型中恢復,所以無需初始化
with tf.Session() as sess:
# Inception_v3模型的最后一層softmax的輸出
softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
# 遍歷目錄
for root, dirs, files in os.walk('images/'):
for file in files:
# 載入圖片
image_data = tf.gfile.FastGFile(os.path.join(root, file), 'rb').read()
# 輸入圖像(jpg格式)數(shù)據(jù),得到softmax概率值(一個shape=(1,1008)的向量)
predictions = sess.run(softmax_tensor,{'DecodeJpeg/contents:0': image_data})
# 將結果轉為1維數(shù)據(jù)
predictions = np.squeeze(predictions)
# 打印圖片路徑及名稱
image_path = os.path.join(root, file)
print(image_path)
# 顯示圖片
img = Image.open(image_path)
plt.imshow(img)
plt.axis('off')
plt.show()
# 新建類:ID --> English string label.
node_lookup = NodeLookup()
# 排序,取出前5個概率最大的值(top-5)
# argsort()返回的是數(shù)組值從小到大排列所對應的索引值
top_5 = predictions.argsort()[-5:][::-1]
for node_id in top_5:
# 獲取分類名稱
human_string = node_lookup.id_to_string(node_id)
# 獲取該分類的置信度
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
print()
最后輸出:
runfile('C:/Users/Dexter/Documents/ML_files/171003_Inception_v3/test2.py', wdir='C:/Users/Dexter/Documents/ML_files/171003_Inception_v3')
images/dog.jpg

dingo, warrigal, warragal, Canis dingo (score = 0.46103)
Chihuahua (score = 0.05741)
Eskimo dog, husky (score = 0.04384)
dhole, Cuon alpinus (score = 0.04106)
Pembroke, Pembroke Welsh corgi (score = 0.02823)
images/husky.jpg

Siberian husky (score = 0.51033)
Eskimo dog, husky (score = 0.41048)
malamute, malemute, Alaskan malamute (score = 0.00653)
kelpie (score = 0.00136)
dogsled, dog sled, dog sleigh (score = 0.00133)
5. 相關函數(shù)補充說明
-
tf.get_default_graph()
返回當前進程中的默認圖(可以使用Graph.as_default()設置)
Returns the default graph for the current thread.
The returned graph will be the innermost graph on which a Graph.as_default() context has been entered, or a global default graph if none has been explicitly created.
NOTE: The default graph is a property of the current thread. If you create a new thread, and wish to use the default graph in that thread, you must explicitly add a with g.as_default(): in that thread's function.Returns:
The default Graph being used in the current thread.
-
tf.Graph.as_default()
將Graph設置為默認圖
Returns a context manager that makes this Graph the default graph.
tf.Graph.get_tensor_by_name()
All tensors have string names which you can see as follows:
[tensor.name for tensor in tf.get_default_graph().as_graph_def().node]Once you know the name you can fetch the Tensor using <name>:0 (0 refers to endpoint which is somewhat redundant)
import tensorflow as tf
c = tf.constant([[1.0, 2.0], [3.0, 4.0]])
d = tf.constant([[1.0, 1.0], [0.0, 1.0]])
e = tf.matmul(c, d, name='example')
with tf.Session() as sess:
test = sess.run(e)
print (e.name)
#example:0
#<name>:0 (0 refers to endpoint which is somewhat redundant)
test = tf.get_default_graph().get_tensor_by_name("example:0")
print (test)
#Tensor("example:0", shape=(2, 2), dtype=float32)
參考資料:
6. 一些改進
6.1 使用png或者其他圖片格式,代替jpg作為輸入
The shipped InceptionV3 graph used in classify_image.py
only supports JPEG images out-of-the-box. There are two ways you could use this graph with PNG images:
- Convert the PNG image to a height
x width x 3 (channels) Numpy array, for example using PIL, then feed the 'DecodeJpeg:0' tensor:
import numpy as np
from PIL import Image
# ...
image = Image.open("example.png")
image_array = np.array(image)[:, :, 0:3] # Select RGB channels only.
prediction = sess.run(softmax_tensor, {'DecodeJpeg:0': image_array})
Perhaps confusingly, 'DecodeJpeg:0' is the output of the DecodeJpeg op, so by feeding this tensor, you are able to feed raw image data.
- Add a tf.image.decode_png() op to the imported graph. Simply switching the name of the fed tensor from 'DecodeJpeg/contents:0'
to 'DecodePng/contents:0' does not work because there is no 'DecodePng' op in the shipped graph. You can add such a node to the graph by using the input_map argument to tf.import_graph_def()
:
png_data = tf.placeholder(tf.string, shape=[])
decoded_png = tf.image.decode_png(png_data, channels=3)
# ...
graph_def = ...
softmax_tensor = tf.import_graph_def(
graph_def,
input_map={'DecodeJpeg:0': decoded_png},
return_elements=['softmax:0'])
sess.run(softmax_tensor, {png_data: ...})
- The following code should handle of both cases.
import numpy as np
from PIL import Image
image_file = 'test.jpeg'
with tf.Session() as sess:
# softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
if image_file.lower().endswith('.jpeg'):
image_data = tf.gfile.FastGFile(image_file, 'rb').read()
prediction = sess.run('final_result:0', {'DecodeJpeg/contents:0': image_data})
elif image_file.lower().endswith('.png'):
image = Image.open(image_file)
image_array = np.array(image)[:, :, 0:3]
prediction = sess.run('final_result:0', {'DecodeJpeg:0': image_array})
prediction = prediction[0]
print(prediction)
or shorter version with direct strings:
image_file = 'test.png' # or 'test.jpeg'
image_data = tf.gfile.FastGFile(image_file, 'rb').read()
ph = tf.placeholder(tf.string, shape=[])
with tf.Session() as sess:
predictions = sess.run(output_layer_name, {ph: image_data} )

