Keras 實(shí)例教程(三)- 使用VGG-16識(shí)別

Keras 作為當(dāng)前深度學(xué)習(xí)框架中的熱門(mén)之一,使用起來(lái)是極其簡(jiǎn)便的,它所提供的各種友好而靈活的API,即使對(duì)于新手而言,相比于TensorFlow也非常容易上手。更特別的是,Keras中還預(yù)置了多種已經(jīng)訓(xùn)練好的、非常流行的神經(jīng)網(wǎng)絡(luò)模型:

Model Size Top-1 Accuracy Top-5 Accuracy Parameters Depth
Xception 88 MB 0.790 0.945 22,910,480 126
VGG16 528MB 0.713 0.901 138,357,544 23
VGG19 549 MB 0.713 0.900 143,667,240 26
ResNet50 99 MB 0.749 0.921 25,636,712 168
InceptionV3 92 MB 0.779 0.937 23,851,784 159
InceptionResNetV2 215 MB 0.803 0.953 55,873,736 572
MobileNet 16 MB 0.704 0.895 4,253,864 88
MobileNetV2 14 MB 0.713 0.901 3,538,984 88
DenseNet121 33 MB 0.750 0.923 8,062,504 121
DenseNet169 57 MB 0.762 0.932 14,307,880 169
DenseNet201 80 MB 0.773 0.936 20,242,984 201
NASNetMobile 23 MB 0.744 0.919 5,326,716 -
NASNetLarge 343 MB 0.825 0.960 88,949,818 -

VGG 結(jié)構(gòu)簡(jiǎn)介

使用者可以非常方便地以他山之石來(lái)解決自己的問(wèn)題。本文將以VGG16為例來(lái)演示,如何在Keras中執(zhí)行物體識(shí)別(Object Recognization)任務(wù)。VGG16是由來(lái)自牛津大學(xué)的研究團(tuán)隊(duì)涉及并實(shí)現(xiàn)的一個(gè)基于CNN的深度學(xué)習(xí)網(wǎng)絡(luò),它的深度為23(包括16個(gè)layers),所有的權(quán)重總計(jì)超過(guò)500M,下圖給出了它的一個(gè)基本結(jié)構(gòu)(參考D列):

image

通過(guò)下圖可以更加清晰了解:

image

簡(jiǎn)單概括其結(jié)構(gòu)為:
VGG-16,輸入層224x224x3,經(jīng)過(guò)兩層相同的卷積,卷積filter為3*3,stride為1,filter數(shù)為64,然后經(jīng)過(guò)一層pooling。接著按照相同的方式,讓寬和高越來(lái)越小,而通道數(shù)逐倍增加,直到512。最后用兩層相同全連接加一個(gè)softmax。使用流程圖即為:

image

這里有更加清楚的VGG結(jié)構(gòu)圖

VGG-16使用

可以使用下面的命令直接導(dǎo)入已經(jīng)訓(xùn)練好的VGG16網(wǎng)絡(luò),注意因?yàn)槿康膮?shù)總計(jì)超過(guò)500M,因此當(dāng)你首次使用下面的命令時(shí),Keras需要從網(wǎng)上先下載這些參數(shù),這可能需要耗用一些時(shí)間。

from keras.applications.vgg16 import VGG16
model = VGG16()
print(model.summary())

最后一句會(huì)輸入VGG16網(wǎng)絡(luò)的層級(jí)結(jié)構(gòu),不僅如此,VGG()這個(gè)類(lèi)中還提供了一些參數(shù),這些參數(shù)可以令你非常方便地定制個(gè)性化的網(wǎng)絡(luò)結(jié)構(gòu),這一點(diǎn)在遷移學(xué)習(xí)(Transfer Learning)中尤其有用,摘列部分參數(shù)如下:

  • include_top (True): Whether or not to include the output layers for the model. You don’t need these if you are fitting the model on your own problem.
  • weights (‘imagenet‘): What weights to load. You can specify None to not load pre-trained weights if you are interested in training the model yourself from scratch.
  • input_tensor (None): A new input layer if you intend to fit the model on new data of a different size.
  • input_shape (None): The size of images that the model is expected to take if you change the input layer.
  • pooling (None): The type of pooling to use when you are training a new set of output layers.
  • classes (1000): The number of classes (e.g. size of output vector) for the model.

當(dāng)你需要直接使用VGG-16輸出識(shí)別結(jié)果時(shí),需要enable include_top來(lái)包含output layer。

加載圖片及處理

準(zhǔn)確好一張待識(shí)別的圖片,其內(nèi)容為一只金毛犬(golden_retriever):


image
from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions

from keras.preprocessing.image import load_img, img_to_array
import numpy as np

image = load_img('C:/Pictures/test_imgs/golden.jpg', target_size=(224, 224))
image_data = img_to_array(image)

# reshape it into the specific format
image_data = image_data.reshape((1,) + image_data.shape)
print(image_data.shape)

# prepare the image data for VGG
image_data = preprocess_input(image_data)

注意點(diǎn):

  • 輸入圖片的dim是224x224;
  • 需要reshape為(samples,dims),即dim為(224,224,3)的若干輸入樣本;
  • 最后還需要preprocess_input()是將其轉(zhuǎn)化為VGG-16能夠接受的輸入,實(shí)際上為每個(gè)像素減去均值(見(jiàn)原文描述):
The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel.
進(jìn)行預(yù)測(cè)和解析
# using the pre-trained model to predict
prediction = model.predict(image_data)

# decode the prediction results
results = decode_predictions(prediction, top=3)

print(results)

我們將得到可能性最高的前三個(gè)識(shí)別結(jié)果:

[[('n02099601', 'golden_retriever', 0.9698627), ('n04409515', 'tennis_ball', 0.008626293), ('n02100877', 'Irish_setter', 0.004562445)]]

可見(jiàn)與結(jié)果試一致的,97%預(yù)測(cè)是金毛。

完整代碼:
from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions

from keras.preprocessing.image import load_img, img_to_array
import numpy as np
# VGG-16 instance
model = VGG16(weights='imagenet', include_top=True)

image = load_img('C:/Pictures/Pictures/test_imgs/golden.jpg', target_size=(224, 224))
image_data = img_to_array(image)

# reshape it into the specific format
image_data = image_data.reshape((1,) + image_data.shape)
print(image_data.shape)

# prepare the image data for VGG
image_data = preprocess_input(image_data)

# using the pre-trained model to predict
prediction = model.predict(image_data)

# decode the prediction results
results = decode_predictions(prediction, top=3)

print(results)
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容