Keras 作為當(dāng)前深度學(xué)習(xí)框架中的熱門(mén)之一,使用起來(lái)是極其簡(jiǎn)便的,它所提供的各種友好而靈活的API,即使對(duì)于新手而言,相比于TensorFlow也非常容易上手。更特別的是,Keras中還預(yù)置了多種已經(jīng)訓(xùn)練好的、非常流行的神經(jīng)網(wǎng)絡(luò)模型:
| Model | Size | Top-1 Accuracy | Top-5 Accuracy | Parameters | Depth |
|---|---|---|---|---|---|
| Xception | 88 MB | 0.790 | 0.945 | 22,910,480 | 126 |
| VGG16 | 528MB | 0.713 | 0.901 | 138,357,544 | 23 |
| VGG19 | 549 MB | 0.713 | 0.900 | 143,667,240 | 26 |
| ResNet50 | 99 MB | 0.749 | 0.921 | 25,636,712 | 168 |
| InceptionV3 | 92 MB | 0.779 | 0.937 | 23,851,784 | 159 |
| InceptionResNetV2 | 215 MB | 0.803 | 0.953 | 55,873,736 | 572 |
| MobileNet | 16 MB | 0.704 | 0.895 | 4,253,864 | 88 |
| MobileNetV2 | 14 MB | 0.713 | 0.901 | 3,538,984 | 88 |
| DenseNet121 | 33 MB | 0.750 | 0.923 | 8,062,504 | 121 |
| DenseNet169 | 57 MB | 0.762 | 0.932 | 14,307,880 | 169 |
| DenseNet201 | 80 MB | 0.773 | 0.936 | 20,242,984 | 201 |
| NASNetMobile | 23 MB | 0.744 | 0.919 | 5,326,716 | - |
| NASNetLarge | 343 MB | 0.825 | 0.960 | 88,949,818 | - |
VGG 結(jié)構(gòu)簡(jiǎn)介
使用者可以非常方便地以他山之石來(lái)解決自己的問(wèn)題。本文將以VGG16為例來(lái)演示,如何在Keras中執(zhí)行物體識(shí)別(Object Recognization)任務(wù)。VGG16是由來(lái)自牛津大學(xué)的研究團(tuán)隊(duì)涉及并實(shí)現(xiàn)的一個(gè)基于CNN的深度學(xué)習(xí)網(wǎng)絡(luò),它的深度為23(包括16個(gè)layers),所有的權(quán)重總計(jì)超過(guò)500M,下圖給出了它的一個(gè)基本結(jié)構(gòu)(參考D列):
通過(guò)下圖可以更加清晰了解:
簡(jiǎn)單概括其結(jié)構(gòu)為:
VGG-16,輸入層224x224x3,經(jīng)過(guò)兩層相同的卷積,卷積filter為3*3,stride為1,filter數(shù)為64,然后經(jīng)過(guò)一層pooling。接著按照相同的方式,讓寬和高越來(lái)越小,而通道數(shù)逐倍增加,直到512。最后用兩層相同全連接加一個(gè)softmax。使用流程圖即為:
這里有更加清楚的VGG結(jié)構(gòu)圖。
VGG-16使用
可以使用下面的命令直接導(dǎo)入已經(jīng)訓(xùn)練好的VGG16網(wǎng)絡(luò),注意因?yàn)槿康膮?shù)總計(jì)超過(guò)500M,因此當(dāng)你首次使用下面的命令時(shí),Keras需要從網(wǎng)上先下載這些參數(shù),這可能需要耗用一些時(shí)間。
from keras.applications.vgg16 import VGG16
model = VGG16()
print(model.summary())
最后一句會(huì)輸入VGG16網(wǎng)絡(luò)的層級(jí)結(jié)構(gòu),不僅如此,VGG()這個(gè)類(lèi)中還提供了一些參數(shù),這些參數(shù)可以令你非常方便地定制個(gè)性化的網(wǎng)絡(luò)結(jié)構(gòu),這一點(diǎn)在遷移學(xué)習(xí)(Transfer Learning)中尤其有用,摘列部分參數(shù)如下:
- include_top (True): Whether or not to include the output layers for the model. You don’t need these if you are fitting the model on your own problem.
- weights (‘imagenet‘): What weights to load. You can specify None to not load pre-trained weights if you are interested in training the model yourself from scratch.
- input_tensor (None): A new input layer if you intend to fit the model on new data of a different size.
- input_shape (None): The size of images that the model is expected to take if you change the input layer.
- pooling (None): The type of pooling to use when you are training a new set of output layers.
- classes (1000): The number of classes (e.g. size of output vector) for the model.
當(dāng)你需要直接使用VGG-16輸出識(shí)別結(jié)果時(shí),需要enable include_top來(lái)包含output layer。
加載圖片及處理
準(zhǔn)確好一張待識(shí)別的圖片,其內(nèi)容為一只金毛犬(golden_retriever):
from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
from keras.preprocessing.image import load_img, img_to_array
import numpy as np
image = load_img('C:/Pictures/test_imgs/golden.jpg', target_size=(224, 224))
image_data = img_to_array(image)
# reshape it into the specific format
image_data = image_data.reshape((1,) + image_data.shape)
print(image_data.shape)
# prepare the image data for VGG
image_data = preprocess_input(image_data)
注意點(diǎn):
- 輸入圖片的dim是224x224;
- 需要reshape為(samples,dims),即dim為(224,224,3)的若干輸入樣本;
- 最后還需要preprocess_input()是將其轉(zhuǎn)化為VGG-16能夠接受的輸入,實(shí)際上為每個(gè)像素減去均值(見(jiàn)原文描述):
The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel.
進(jìn)行預(yù)測(cè)和解析
# using the pre-trained model to predict
prediction = model.predict(image_data)
# decode the prediction results
results = decode_predictions(prediction, top=3)
print(results)
我們將得到可能性最高的前三個(gè)識(shí)別結(jié)果:
[[('n02099601', 'golden_retriever', 0.9698627), ('n04409515', 'tennis_ball', 0.008626293), ('n02100877', 'Irish_setter', 0.004562445)]]
可見(jiàn)與結(jié)果試一致的,97%預(yù)測(cè)是金毛。
完整代碼:
from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
from keras.preprocessing.image import load_img, img_to_array
import numpy as np
# VGG-16 instance
model = VGG16(weights='imagenet', include_top=True)
image = load_img('C:/Pictures/Pictures/test_imgs/golden.jpg', target_size=(224, 224))
image_data = img_to_array(image)
# reshape it into the specific format
image_data = image_data.reshape((1,) + image_data.shape)
print(image_data.shape)
# prepare the image data for VGG
image_data = preprocess_input(image_data)
# using the pre-trained model to predict
prediction = model.predict(image_data)
# decode the prediction results
results = decode_predictions(prediction, top=3)
print(results)