深度學(xué)習(xí)筆記:三維圖片分類與三維卷積神經(jīng)網(wǎng)絡(luò)

簡介

做為機(jī)器學(xué)習(xí)領(lǐng)域里的“Hello world”,MNIST 手寫數(shù)字圖片數(shù)據(jù)集,是許多人研初學(xué)機(jī)器學(xué)習(xí)時都接觸過的數(shù)據(jù)集。近期,為了研究深度學(xué)習(xí)在時空序列數(shù)據(jù)方面的應(yīng)用,我想要了解三維卷積神經(jīng)網(wǎng)絡(luò)。在入門階段,我接觸到了三維的 MNIST 數(shù)據(jù)集,并且根據(jù)國外研究者給出示例代碼來理解了三維卷積神經(jīng)網(wǎng)絡(luò)的基本結(jié)構(gòu)。

數(shù)據(jù)集:3D MNIST

2D vs 3D MNIST

3D MNIST 的 Kaggle 地址是 3D MNIST
相關(guān)數(shù)據(jù)的儲存格式是.h5格式,數(shù)據(jù)集分割成了一下的數(shù)組:

X_train (10000, 4096)
y_train (10000)
X_test(2000, 4096)
y_test (2000)

訓(xùn)練集10000張圖片,測試集2000張圖片,每張圖片被拉平成了4096維度的向量(長16X寬16X高16=4096)。

讀取數(shù)據(jù)集的示例代碼:

with h5py.File("../input/train_point_clouds.h5", "r") as hf:    
     X_train = hf["X_train"][:]
     y_train = hf["y_train"][:]    
     X_test = hf["X_test"][:]  
     y_test = hf["y_test"][:]  

既然數(shù)據(jù)集是三維的,那么,在識別圖片所屬數(shù)字的任務(wù)中,使用三維的卷積神經(jīng)網(wǎng)絡(luò),是否比二維的卷積神經(jīng)網(wǎng)絡(luò)表現(xiàn)更佳呢?我們來實驗一次。

二維卷積神經(jīng)網(wǎng)絡(luò)

本次試驗,使用的是 Keras 框架,首先,載入所需模塊。

from __future__ import division, print_function, absolute_import

from keras.models import Sequential, model_from_json
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D, BatchNormalization
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator
from keras.utils.np_utils import to_categorical
from keras.callbacks import ReduceLROnPlateau, TensorBoard

import h5py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')

from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split

設(shè)置超參數(shù)

# set up hyperparameter
batch_size = 64
epochs = 20

在本地讀取數(shù)據(jù)集

with h5py.File("/Users/apple/pydata/3d_mnist/full_dataset_vectors.h5","r") as h5:
    X_train, y_train = h5["X_train"][:], h5["y_train"][:]
    X_test, y_test = h5["X_test"][:], h5["y_test"][:]

驗證集所用的圖片標(biāo)簽轉(zhuǎn)化為One-Hot的數(shù)組

y_train = to_categorical(y_train, num_classes=10)

這一次用的是二維的卷積神經(jīng)網(wǎng)絡(luò),需要一個3D的矩陣,因此,沒有添加RGB 彩色通道。

X_train = X_train.reshape(-1, 16, 16, 16)
X_test = X_test.reshape(-1, 16, 16, 16)
X_train,X_val,y_train,y_val = train_test_split(X_train, y_train,
                                              test_size=0.25,
                                              random_state=42)

定義二維卷積層

# Conv2D layer
def Conv(filters=16, kernel_size=(3,3), activation='relu', input_shape=None):
    if input_shape:
        return Conv2D(filters=filters, kernel_size = kernel_size, padding='Same'
                      , activation=activation, input_shape=input_shape)
    else:
        return Conv2D(filters=filters, kernel_size = kernel_size, padding='Same'
                      , activation=activation)

定義模型架構(gòu)

# Define model
def CNN(input_dim, num_classes):
    model = Sequential()
    
    model.add((Conv(8, (3,3), input_shape=input_dim)))
    model.add((Conv(16,(3,3))))
    # model.add(BatchNormalization())
    model.add(MaxPool2D(pool_size=(2,2)))
    model.add(Dropout(0.25))
    
    model.add(Conv(32,(3,3)))
    model.add(Conv(64, (3,3)))
    model.add(BatchNormalization())
    model.add(MaxPool2D())
    model.add(Dropout(0.25))
    
    model.add(Flatten())
    
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    
    model.add(Dense(1024, activation='relu'))
    model.add(Dropout(0.5))
    
    model.add(Dense(num_classes, activation='softmax'))
    
    return model

定義訓(xùn)練參數(shù),驗證方法,保存模型以及加載模型

# Train Model

def train(optimizer, scheduler, gen):
    global model
    
    print("Training...Please wait")
    model.compile(optimizer='adam', loss = "categorical_crossentropy", metrics=["accuracy"])
    
    model.fit_generator(gen.flow(X_train, y_train, batch_size=batch_size),
                    epochs=epochs, validation_data=(X_val, y_val),
                    verbose=2, steps_per_epoch=X_train.shape[0]//batch_size,
                    callbacks=[scheduler, tensorboard])

def evaluate():
    global model
    
    pred = model.predict(X_test)
    pred = np.argmax(pred, axis=1)
    
    print(accuracy_score(pred, y_test))
    
    # Heat map
    
    array = confusion_matrix(y_test, pred)
    cm = pd.DataFrame(array, index = range(10), columns = range(10))
    plt.figure(figsize=(20,20))
    sns.heatmap(cm, annot=True)
    plt.show()
def save_model():
    global model
    
    model_json = model.to_json()
    with open('/Users/apple/pydata/3d_mnist/model/model_2D.json','w') as f:
        f.write(model_json)
        
    model.save_weights('/Users/apple/pydata/3d_mnist/model/model_2D.h5')
    
    print("Model Saved")

def load_model():
    f = open("/Users/apple/pydata/3d_mnist/model/model_2D.json","r")
    model_json = f.read()
    f.close()
    
    loaded_model = model_from_json(model_json)
    loaded_model.load_weights('/Users/apple/pydata/3d_mnist/model/model_2D.h5')
    
    print("Model Loaded.")
    
    return loaded_model

if __name__ == '__main__':

    optimizer = RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
    scheduler = ReduceLROnPlateau(monitor='val_acc', patience=3, verbose=1, factor=0.5, min_lr=1e-5)

    model = CNN((16,16,16), 10)

    gen = ImageDataGenerator(rotation_range=10, zoom_range = 0.1, width_shift_range=0.1, height_shift_range=0.1)
    gen.fit(X_train)

    train(optimizer, scheduler, gen)
    evaluate()
    save_model()

二維卷積神經(jīng)網(wǎng)絡(luò)結(jié)果:最高準(zhǔn)確率68.5%

Training...Please wait
Epoch 1/20
 - 40s - loss: 2.2051 - acc: 0.2574 - val_loss: 1.4624 - val_acc: 0.4936
Epoch 2/20
 - 42s - loss: 1.4804 - acc: 0.4842 - val_loss: 1.2500 - val_acc: 0.5528
Epoch 3/20
 - 33s - loss: 1.3187 - acc: 0.5341 - val_loss: 1.2400 - val_acc: 0.5648
Epoch 4/20
 - 31s - loss: 1.2488 - acc: 0.5604 - val_loss: 1.0896 - val_acc: 0.6132
Epoch 5/20
 - 31s - loss: 1.2123 - acc: 0.5740 - val_loss: 1.1378 - val_acc: 0.5868
Epoch 6/20
 - 31s - loss: 1.1782 - acc: 0.5833 - val_loss: 1.0483 - val_acc: 0.6284
Epoch 7/20
 - 31s - loss: 1.1431 - acc: 0.5967 - val_loss: 1.0335 - val_acc: 0.6328
Epoch 8/20
 - 31s - loss: 1.1129 - acc: 0.6054 - val_loss: 1.0082 - val_acc: 0.6412
Epoch 9/20
 - 30s - loss: 1.1071 - acc: 0.6059 - val_loss: 1.0608 - val_acc: 0.6224
Epoch 10/20
 - 31s - loss: 1.0878 - acc: 0.6127 - val_loss: 0.9602 - val_acc: 0.6580
Epoch 11/20
 - 31s - loss: 1.0756 - acc: 0.6169 - val_loss: 1.0182 - val_acc: 0.6424
Epoch 12/20
 - 31s - loss: 1.0649 - acc: 0.6221 - val_loss: 0.9905 - val_acc: 0.6560
Epoch 13/20
 - 30s - loss: 1.0508 - acc: 0.6321 - val_loss: 0.9642 - val_acc: 0.6628
Epoch 14/20
 - 32s - loss: 1.0567 - acc: 0.6289 - val_loss: 0.9452 - val_acc: 0.6696
Epoch 15/20
 - 35s - loss: 1.0271 - acc: 0.6346 - val_loss: 0.9287 - val_acc: 0.6748
Epoch 16/20
 - 36s - loss: 1.0169 - acc: 0.6386 - val_loss: 0.9542 - val_acc: 0.6668
Epoch 17/20
 - 38s - loss: 0.9975 - acc: 0.6456 - val_loss: 0.9509 - val_acc: 0.6656
Epoch 18/20
 - 35s - loss: 1.0139 - acc: 0.6456 - val_loss: 0.9452 - val_acc: 0.6716

Epoch 00018: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 19/20
 - 36s - loss: 0.9616 - acc: 0.6586 - val_loss: 0.9114 - val_acc: 0.6856
Epoch 20/20
 - 31s - loss: 0.9359 - acc: 0.6652 - val_loss: 0.9137 - val_acc: 0.6832
0.6845

混淆矩陣 Confusion Matrix

image.png

Keras 的三維卷積神經(jīng)網(wǎng)絡(luò)

相對于常見的二維卷積,三維卷積的資料較少。下面是一個三維卷積的示例圖:


3D CNN

三維卷積是一個三維的濾波器,它從三個維度(x,y,z)來計算低維的特征表示,輸出是一個三維的卷積空間。它在視頻的事件檢測,三維醫(yī)學(xué)影像圖片等非常有用。當(dāng)然,它的使用,不僅局限于三維空間,也可應(yīng)用于二維的輸入,比如圖片等。

下面是代碼實施部分:

首先,載入所需模塊


from __future__ import division, print_function, absolute_import

from keras.models import Sequential, model_from_json
from keras.layers import Dense, Dropout, Flatten, Conv3D, MaxPool3D, BatchNormalization, Input
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator
from keras.utils.np_utils import to_categorical
from keras.callbacks import ReduceLROnPlateau, TensorBoard
Using TensorFlow backend.

import h5py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')

from sklearn.metrics import confusion_matrix, accuracy_score
# Hyper Parameter
batch_size = 86
epochs = 20
# Set up TensorBoard
tensorboard = TensorBoard(batch_size=batch_size)

讀取數(shù)據(jù)

with h5py.File("/Users/apple/pydata/3d_mnist/full_dataset_vectors.h5", 'r') as h5:
    X_train, y_train = h5["X_train"][:], h5["y_train"][:]
    X_test, y_test = h5["X_test"][:], h5["y_test"][:]

給圖片添加 RGB 數(shù)據(jù)通道的維度(根據(jù)Kaggle數(shù)據(jù)頁里提供plot3D.py文件,第一個函數(shù))

# Translate data to color
def array_to_color(array, cmap="Oranges"):
    s_m = plt.cm.ScalarMappable(cmap=cmap)
    return s_m.to_rgba(array)[:,:-1]

def translate(x):
    xx = np.ndarray((x.shape[0], 4096, 3))
    for i in range(x.shape[0]):
        xx[i] = array_to_color(x[i])
        if i % 1000 == 0:
            print(i)
    # Free Memory
    del x

    return xx

數(shù)據(jù)轉(zhuǎn)換為矢量形式

y_train = to_categorical(y_train, num_classes=10)
# y_test = to_categorical(y_test, num_classes=10)

X_train = translate(X_train).reshape(-1, 16, 16, 16, 3)
X_test  = translate(X_test).reshape(-1, 16, 16, 16, 3)

定義模型結(jié)構(gòu)

# Conv3D layer
def Conv(filters=16, kernel_size=(3,3,3), activation='relu', input_shape=None):
    if input_shape:
        return Conv3D(filters=filters, kernel_size=kernel_size, padding='Same', activation=activation, input_shape=input_shape)
    else:
        return Conv3D(filters=filters, kernel_size=kernel_size, padding='Same', activation=activation)

# Define Model
def CNN(input_dim, num_classes):
    model = Sequential()

    model.add(Conv(8, (3,3,3), input_shape=input_dim))
    model.add(Conv(16, (3,3,3)))
    # model.add(BatchNormalization())
    model.add(MaxPool3D())
    # model.add(Dropout(0.25))

    model.add(Conv(32, (3,3,3)))
    model.add(Conv(64, (3,3,3)))
    model.add(BatchNormalization())
    model.add(MaxPool3D())
    model.add(Dropout(0.25))

    model.add(Flatten())

    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))

    model.add(Dense(1024, activation='relu'))
    model.add(Dropout(0.5))

    model.add(Dense(num_classes, activation='softmax'))

    return model

定義訓(xùn)練參數(shù),驗證方法,保存模型以及加載模型

# Train Model
def train(optimizer, scheduler):
    global model

    print("Training...")
    model.compile(optimizer = 'adam' , loss = "categorical_crossentropy", metrics=["accuracy"])

    model.fit(x=X_train, y=y_train, batch_size=batch_size, epochs=epochs, validation_split=0.15,
                    verbose=2, callbacks=[scheduler, tensorboard])

def evaluate():
    global model

    pred = model.predict(X_test)
    pred = np.argmax(pred, axis=1)

    print(accuracy_score(pred,y_test))
    # Heat Map
    array = confusion_matrix(y_test, pred)
    cm = pd.DataFrame(array, index = range(10), columns = range(10))
    plt.figure(figsize=(20,20))
    sns.heatmap(cm, annot=True)
    plt.show()

def save_model():
    global model

    model_json = model.to_json()
    with open('/Users/apple/pydata/3d_mnist/model/model_3D.json', 'w') as f:
        f.write(model_json)

    model.save_weights('/Users/apple/pydata/3d_mnist/model/model_3D.h5')

    print('Model Saved.')

def load_model():
    f = open('model/model_3D.json', 'r')
    model_json = f.read()
    f.close()

    loaded_model = model_from_json(model_json)
    loaded_model.load_weights('/Users/apple/pydata/3d_mnist/model/model_3D.h5')

    print("Model Loaded.")
    return loaded_model

if __name__ == '__main__':

    optimizer = RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
    scheduler = ReduceLROnPlateau(monitor='val_acc', patience=3, verbose=1, factor=0.5, min_lr=1e-5)

    model = CNN((16,16,16,3), 10)

    train(optimizer, scheduler)
    evaluate()
    save_model()

三維卷積神經(jīng)網(wǎng)絡(luò)結(jié)果:最高準(zhǔn)確率75%

Training...
Train on 8500 samples, validate on 1500 samples
Epoch 1/20
 - 696s - loss: 3.1408 - acc: 0.1760 - val_loss: 7.5856 - val_acc: 0.1973
Epoch 2/20
 - 703s - loss: 1.6178 - acc: 0.4213 - val_loss: 7.9127 - val_acc: 0.2127
Epoch 3/20
 - 798s - loss: 1.2917 - acc: 0.5452 - val_loss: 6.1975 - val_acc: 0.2987
Epoch 4/20
 - 757s - loss: 1.1254 - acc: 0.6035 - val_loss: 1.0294 - val_acc: 0.6527
Epoch 5/20
 - 691s - loss: 1.0346 - acc: 0.6421 - val_loss: 1.0982 - val_acc: 0.6247
Epoch 6/20
 - 707s - loss: 0.9758 - acc: 0.6581 - val_loss: 0.9593 - val_acc: 0.6673
Epoch 7/20
 - 791s - loss: 0.9062 - acc: 0.6854 - val_loss: 0.9851 - val_acc: 0.6520
Epoch 8/20
 - 776s - loss: 0.8520 - acc: 0.7064 - val_loss: 1.1886 - val_acc: 0.6320
Epoch 9/20
 - 771s - loss: 0.7860 - acc: 0.7273 - val_loss: 3.0187 - val_acc: 0.5213

Epoch 00009: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 10/20
 - 767s - loss: 0.6525 - acc: 0.7728 - val_loss: 1.0288 - val_acc: 0.6793
Epoch 11/20
 - 728s - loss: 0.5816 - acc: 0.7995 - val_loss: 1.0606 - val_acc: 0.6760
Epoch 12/20
 - 688s - loss: 0.5443 - acc: 0.8114 - val_loss: 0.8698 - val_acc: 0.7247
Epoch 13/20
 - 696s - loss: 0.4823 - acc: 0.8326 - val_loss: 0.9301 - val_acc: 0.7007
Epoch 14/20
 - 740s - loss: 0.4209 - acc: 0.8561 - val_loss: 0.9847 - val_acc: 0.7100
Epoch 15/20
 - 730s - loss: 0.3656 - acc: 0.8746 - val_loss: 0.9250 - val_acc: 0.7260
Epoch 16/20
 - 804s - loss: 0.3150 - acc: 0.8928 - val_loss: 0.9000 - val_acc: 0.7387
Epoch 17/20
 - 759s - loss: 0.2949 - acc: 0.8999 - val_loss: 0.8230 - val_acc: 0.7387
Epoch 18/20
 - 778s - loss: 0.2401 - acc: 0.9180 - val_loss: 0.9853 - val_acc: 0.7460
Epoch 19/20
 - 759s - loss: 0.1829 - acc: 0.9365 - val_loss: 1.0410 - val_acc: 0.7493
Epoch 20/20
 - 695s - loss: 0.1827 - acc: 0.9392 - val_loss: 0.9528 - val_acc: 0.7507
0.753

Confusion Matrix 混淆矩陣


Confusion Matrix

討論

  • 結(jié)論:從本機(jī)上復(fù)現(xiàn)的結(jié)果來看,在3D MNIST 數(shù)據(jù)集上,三維卷積神經(jīng)網(wǎng)絡(luò)的預(yù)測準(zhǔn)確率,相比二維卷積神經(jīng)網(wǎng)絡(luò),有著顯著提升,最高提升約6%。
  • 不足之處:僅僅是復(fù)用了開源代碼,修改了batch_size 和 epoch,識別準(zhǔn)確率還不夠高。

To-do

  • 調(diào)整超參數(shù),修改模型結(jié)構(gòu),試著提高準(zhǔn)確率
    • 更多神經(jīng)層,更深的結(jié)構(gòu)
    • 學(xué)習(xí)率、梯度下降的其他方法、不同的批尺寸(batch_size)等等
  • 在其他3D 數(shù)據(jù)集上實驗三維卷積神經(jīng)網(wǎng)絡(luò)

參考資料

3D-MNIST Image Classification
3D Convolutions : Understanding and Implementation

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 獨(dú)居陋室不寂寞,網(wǎng)絡(luò)連著你和我。說是陋室,因為它僅十五平米,一室一廚無衛(wèi)浴。熟悉的人都知道。但是,斯是陋室,唯吾德...
    米雷聰聰閱讀 337評論 0 0
  • 今天去西溝街跑步,和k來了個偶遇,看到了煙火里的哈爾濱。 理發(fā)師在路邊撐了一個簡易的理發(fā)小攤,上了年紀(jì)的大爺大媽在...
    亞茹_我是阿茹閱讀 176評論 0 0
  • 臭小子惡作劇 1)今天放學(xué),生活老師投訴:你家兒子中午干了件壞事。人家躺著睡覺,他把別人的襪子脫下來,放到別人臉上...
    米勒Li閱讀 347評論 0 1
  • 一款產(chǎn)品(app),首先要是能解決目標(biāo)用戶群體一個什么樣的問題,只有當(dāng)目標(biāo)明確了以后(方向確定了)。之后才考慮產(chǎn)品...
    DQLee閱讀 190評論 0 0

友情鏈接更多精彩內(nèi)容