Real 3D / Volumetric CNN for medical image classification

Author: Zongwei Zhou | 周縱葦
Weibo: @MrGiovanni
Email: zongweiz@asu.edu
原文鏈接: http://zongwei.leanote.com/post/3D


Reviews

[1] Automatic Detection of Cerebral Microbleeds From MR Images via 3D Convolutional Neural Networks. paper

  • Application: Cerebral microbleeds (CMB) detection.
  • Dataset: SWI-CMB
  • Preprocessing: normalized the volume intensities to the range of [0,1].
  • Evaluation: sensitivity (S), precision (P) and the average number of false positives per subject ($FP_{avg}$).
  • System Implementation: Framework based on Theano library, using a GPU of NVIDIA GeForce GTX TITAN Z.
  • Method

1. Screening strategy > conventional sliding window strategy.相當于一個3D的fully convolutional networks,把3D的數(shù)據(jù)輸入,輸出一個3D的score map。這樣來初步找到可能目標的坐標點集(Region of Interest, ROI),其中會包含很多false positive,不過這也比掃描高效很多。問題是從TABLE 1看,這個網(wǎng)絡結構并不像Fully Convolutional Networks啊,更像是個普通的分類網(wǎng)絡。不知道作者是如何得到score map的。

THE ARCHITECTURE OF 3D FCN SCREENING MODEL
THE ARCHITECTURE OF 3D FCN SCREENING MODEL

2. Discrimination stage removes large number of false positive candidates. 相當于一個3D的CNN,用來檢測3D patch。ReLU is utilized in the C and FC layer.
3D CNN architecture details: The 3D convolution kernels are randomly initialized form the Gaussian distribution (Learning from Scratch), opimizer is SGD, loss funciton is cross entropy loss. Meanwhile, dropout strategy is utilized. lr=0.03, momentum=0.9, dropout rate=0.3, batch size=100.

512 $\times$ 512 $\times$ 150 image $\longrightarrow$ 3D FCN $\longrightarrow$ 512 $\times$ 512 $\times$ 150 score map $\longrightarrow$ threshold ($\mathcal{T}$ = 0.64) $\longrightarrow$ 20 $\times$ 20 $\times$ 16 patch $\longrightarrow$ 3D CNN $\longrightarrow$ labeled.

  • Results and Conclusions:

1. 3D FCN better than these two methods - Barnes et al. and Chen et al.

COMPARISION OF DIFFERENT SCREENING METHODS
COMPARISION OF DIFFERENT SCREENING METHODS

2. Good detection performance

EVALUATION OF DETECTION RESULTS
EVALUATION OF DETECTION RESULTS

FROC COMPARISON
FROC COMPARISON

對比到對象是Bames et al,random forest和2D-CNN-SVM。

3. Capability of intermediate FEATURE representation better.

FEATURE REPRESENTATION
FEATURE REPRESENTATION

這個對比還是很新奇的,使用的工具是t-SNE toolbox.

[2] Multi-level Contextual 3D CNNs for False Positive Reduction in Pulmonary Nodule Detection. paper

  • Application: reduce false positive for pulmonary nodule detection in volumetric CT scans.
  • Dataset: LUNA16 challenge held in conjunction with ISBI 2016. Totally extracted 0.65 million samples to train the 3D CNNs in order to meet the larger parameter scales in 3D CNNs.
  • Preprocessing: 1) Data augmentation - translated by 1 voxel along each axis and rotated 90, 180 and 270 degrees with the transverse plane. In total, 0.65 million samples generated for training. 2) Normalization - clipped the intensities into the interval (-1000,400) HU and normalized them to the range of (0,1).
  • 3D CNN architecture details: Learning from Scratch, lr=0.3 and decayed by 5% every 5000 iterations. batchsize=200, momentum=0.9, and the dropout rate=0.2 stragety is utilized in C and FC layers.
  • Evaluation: FROC, Sensitivity
  • System Implementation: Framework based on Theano library, using a GPU of NVIDIA GeForce GTX TITAN Z.
  • Method

1. Multi-level contextual receptive field.

FUSION OF THREE 3D CNNs
FUSION OF THREE 3D CNNs

實質上是融合了三個不同的3D CNN的預測結果,這三個網(wǎng)絡是根據(jù)不同尺寸的input patch來訓練得到的,也就是說“多尺度”的CNN。。。好吧,理論上的優(yōu)點是既用到了局部的細節(jié)特征,又用到了全局的特征。這個方法我們曾經(jīng)有想過,也有很多研究者在2D上做過這個。對于多尺度問題,需要定義“尺度”的大小,所以作者就對數(shù)據(jù)集做了統(tǒng)計分析,如下圖
DISTRIBUTION ANALYSIS OF THE SIZES OF PULMONARY NODULES FOR DETERMINING RECEPTIVE FIELDS.
DISTRIBUTION ANALYSIS OF THE SIZES OF PULMONARY NODULES FOR DETERMINING RECEPTIVE FIELDS.

這個多尺度的劃分方法感覺是比較原始的,在實際應用中可參考性不佳,因為需要對數(shù)據(jù)集做一個統(tǒng)計,而選取的樣本是否有統(tǒng)計代表性,要是來了新的數(shù)據(jù)是否還適用,都是不確定的。作者用的是voxels來標定的,首先來說我認為可以改成絕對的尺度(mm)。

2. Multi-model fusion
接下來看三個3D網(wǎng)絡的融合過程,三個網(wǎng)絡結構如表

THE ARCHITECTURE OF DIFFERENT RECEPTIVE FIELD 3D CNN
THE ARCHITECTURE OF DIFFERENT RECEPTIVE FIELD 3D CNN

Fuse the softwax regression outputs (probabilities) from all networks. The fused posterior probability $P_{fusion}$ is estimated by weighted linear combination:
$$P_{fusion}=\sum_{i\in{1,2,3}}\gamma_i\cdot P_i$$
The constant weight $\gamma_i$ were determined using grid search on a small subset of the training data in our experiments ($\gamma_1=0.3$, $\gamma_2=0.4$, $\gamma_3=0.3$).
這個融合其實并沒有在網(wǎng)絡內部進行融合,只是對于輸出的概率做了一個簡單的融合,這個是表面上的“融合”。對于融合,還有更多的方法,如拼接三個CNN的全連接層來融合,一個思想是把back propagation機制放在融合的過程中,這才是我比較認同的融合。

  • Evaluation Metrics

我覺得這部分是比較有參考價值:
The challenge evaluated detection results by measuring the detection sensitivity and average false positive rate per scan. A predicted candidate location was counted as a true positive if it was located within the radius of a true nodule center.(對于True Positive的定義對于畫FROC是很關鍵的) Detections of irrelevant findings were ignored (i.e., considered as neither false positives nor true positives) in the evaluation. The challenge organizers performed the free receiver operation characteristic (FROC) analysis by setting different thresholds on the raw prediction probabilities submitted by the participating teams. The evaluation also computed the 95% confidence interval using the bootstrapping [36]. A competition performance metric (CPM) score [37], which was calculated as the average sensitivity at seven predefined false positive rates: 1/8, 1/4, 1/2, 1, 2, 4 and 8 false positives per scan, was produced for each algorithm. The ten-fold cross validation on the dataset was specified.

  • Results and Conclusions:

1. 3D > 2D

3D vs 2D CNN detection
3D vs 2D CNN detection

2. Fusion multi-level > single level

FROC ANALYSIS FOR DIFFERENT LEVEL
FROC ANALYSIS FOR DIFFERENT LEVEL

在論文的最后作者給出了3D的卷積核的可視化圖,我不清楚放這個有什么用,能說明什么結果?

[3] 3D Deeply Supervised Network for Automatic Liver Segmentation from CT Volumes. paper

這篇文章給我的感覺就是一個3D的HED (paper),或者說一個3D Fully Convolutional Networks (paper),來對比一下它們的網(wǎng)絡結構:

3D DSN
3D DSN

HED
HED

FCN
FCN

都是結合中間層的輸出map,來做最后的分割預測,這個結構當時給我的疑問是如何設計back propagation,還有怎么把各個中間層結合起來,加權的權重是怎么學習出來的,是否也要放到back propagation中去?

  • Application: Liver (肝臟) Segmentation.
  • Dataset: MICCAI-SLiver07 dataset. The dataset totally consists of 30 contrast-enhanced CT scans (20 training and 10 testing).
  • 3D DSN architecture details: The mainstream network consists 11 layers: 6 convolutional layers, 2 max-pooling lyers, 2 deconvolutional layers and 1 softmax layer.(這里的一個問題是:我發(fā)現(xiàn)作者每篇論文中的網(wǎng)絡kernel,stride,pooing大小都不太一樣,這個是憑感覺決定的嗎~正常比較靠譜的convolutional大小應該是像VGG那樣的3$\times$3$\times$3)。Learning from Scratch, lr=0.1 and divided by 10 every fifty epochs. The deep supervision balancing weights ($\eta_h$?) were initialized as 0.3 and 0.4, and decayed by 5% every ten epochs.
  • Evaluation: Volumetric overlap error (VOE[%]), relative volume difference (VD[%]), average symmetric surface distance (AvgD[mm]), root mean square symmetric surface distance (RMSD[mm]) and maximum symmetric surface distance (MaxD[mm]). Details of these metrics can be found in Comparison and Evaluation of Methods for Liver Segmentation From CT Datasets
  • System Implementation: Framework based on Theano library, using a GPU of NVIDIA GeForce GTX TITAN Z.
  • Method

1. vanishing gradients problem
文中提到來梯度消失的問題,在3D的網(wǎng)絡中可能會更加嚴重。解決方案是用多個中間層的預測輸出來設計Loss,
$$\mathcal{L}=\mathcal{L}{o}(\mathcal{X};W)+\sum{\eta_h\cdot\mathcal{L}{h}(\mathcal{X};W_h,w_h)}+[regularization]$$
用權重$\eta_h$來控制各個隱層的重要性,從而解決前面幾層的梯度消失,這個我個人認為不是很站的住腳,原因是一旦出現(xiàn)梯度消失,這個梯度是很小的,大概就是可以認為是0,那么要乘一個很大很大的權重才可以把數(shù)值拉上來,即使這樣,其實并沒有根本解決梯度消失。另外,ReLU的提出好像就是為了解決這個問題的,我不確定如果在3D中用這個激活函數(shù)還需不需要考慮梯度消失問題。

2. 條件隨機場(CRF)模型
這個就很拼學術功底了,也是我為什么感覺自己的本科學歷不夠用的重要原因,正常情況下,我是不可能會想到要用這個模型來優(yōu)化結果的。文章中的篇幅很小,需要拓展學習。我所知道的是作者引入了很多參數(shù)($\mu_1$,$\mu_2$,$\theta_{\alpha}$,$\theta_{\beta}$,$\theta_{\gamma}$),來解一個entropy funciton,用到的方法依然是grid search。

  • Results and Conclusions:

1. 3D DSN > 3D CNN | CRF works good

EVALUATION
EVALUATION

VISUALIZATION
VISUALIZATION

2. Shorter runtime - 5s for 3D DSN and 87s for CRF.

COMPARISON WITH OTHER TEAM
COMPARISON WITH OTHER TEAM

可以看出,3D到網(wǎng)絡運行到時間很短,而條件隨機場處理很費時間。

[4] 3D Fully Convolutional Networks for Intervertebral Disc Localization and Segmentation. paper

這篇文章在算法上就只是把2D的FCN變成了3D的FCN,其他沒有什么改進的地方,應用到了一個椎間盤的分割數(shù)據(jù)集中。

  • Application: Intervertebral discs (IVDs) (椎間盤) for volumetric data.
  • Dataset: MICCAI 2015 Challenge on Automatic Intervertebral Disc Localization and Segmentation.
  • Preprocessing: subtracting the mean value before inputting into the network.
  • System Implementation: 3D FCN using the framework based on Theano library, using a GPU of NVIDIA GeForce GTX X. 2D FCN was implemented with Matlab and C++.
  • Comparison: 2D FCN - the input is the adjacent slices (3 slices input and the output is the binary mask of the middle slice).
  • Evaluation: For IVD localization - mean localization distance (MLD) with standard deviation (SD), successful detection rate $P$. For IVD segmentation - mean dice overlap coefficients (MD) with SD, mean average absolute distance (MAAD) with SD.
  • Results and Conclusions:

1. 3D FCN > 2D FCN

TEST1
TEST1

TEST2
TEST2

總體來看,這篇論文的論點很簡單,方法有創(chuàng)新(2D$\longrightarrow$3D),但是比較常規(guī),結論也很簡單,但是從我的角度看很有學習的必要,因為在這種情況下要發(fā)表,很考驗寫作的能力了,舉例來說,寫實驗結果的時候,如果讓我寫,那就是一句話:3D FCN performs better than 2D FCN both in IVD localization and segmentation. 完事兒了。:-)

[5] VoxResNet: Deep Voxelwise Residual Networks for Volumetric Brain Segmentation. H Chen, Q Dou, L Yu, P Heng [CUHK] (2016). paper.

  • Propose a deep voxelwise residual network, referred as VoxResNet (3D Residual Network).
  • An auto-context version of VoxResNet is proposed
The architecture of VoxResNet
The architecture of VoxResNet

auto-context
auto-context

Comparison of VoxResNet, Auto-context VoxResNet and Ground truth
Comparison of VoxResNet, Auto-context VoxResNet and Ground truth

[6] Evaluation and comparison of 3D intervertebral disc localization and segmentation methods for 3D T2 MR data: A grand challenge. paper

這篇期刊是對椎間盤檢測和分割[Review.4]的一個比較詳細的介紹,也讓我直觀的感覺到了會議論文和期刊論文的區(qū)別,期刊就像對會議論文的每一個點都展開來描述的一樣。隨著CVPR,IPMI,MICCAI投完,我們也要開始投期刊了,把幾個會議的內容充實起來,變成一篇豐滿的期刊~沒有時間仔細看了!Review到此為止。


Related works

[1] V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. Fausto Milletari, Nassir Navab, Seyed-Ahmad Ahmadi [Johns Hopkins University]. paper.

  • Propose an approach to 3D image segmentation based on a volumetric, fully convolutional neural network (3D-FCN).
  • Introduce a novel objective function, optimise using Dice coefficient. In this way we can deal with situa- tions where there is a strong imbalance between the number of foreground and background voxels.
The architecture of V-Net
The architecture of V-Net



[2] 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox [University of Freiburg, Google Deepmind]. paper, code (Caffe).

  • NVIDIA TitanX GPU
2D U-Net Architecture
2D U-Net Architecture

3D U-Net Architecture
3D U-Net Architecture

[3] Deep MRI brain extraction: A 3D convolutional neural network for skull stripping. Jens Kleesiek, Gregor Urban, Alexander Hubert [Heidelberg University Hospital]. paper.

CNN architecture details
CNN architecture details

[4] Integrating Online and Offline 3D Deep Learning for Automated Polyp Detection in Colonoscopy Videos. Lequan Yu, Hao Chen, Qi Dou [CUHK] (2016). paper

Offline 3D FCN 1
Offline 3D FCN 1

Offline 3D FCN 2
Offline 3D FCN 2

Offline 3D FCN 3
Offline 3D FCN 3

Comparison
Comparison
  • The authors compared three different CNN architectures. 說實話這個的參考價值很低,因為很大程序上取決于經(jīng)驗和試湊。

Discussions online

1. Are there any deep learning libraries that have 3D volumetric/spatial convolutions running on a CPU or a GPU?

A recent addition, but Keras now supports 3D convolution. It should work for voxels and video sequences.

2. 3D CNN in Keras - Action Recognition

3. Software: https://github.com/facebook/C3D


Separable 3D CNN

1. References papers

[1] Learning Separable Filters. Amos Sironi, Bugra Tekin, Roberto Rigamonti [EPFL] 2014. paper -- check Section 5.5.

2. Try on

Examine the separability of the kernels in the pre-trained CNNs, check http://www.mathworks.com/matlabcentral/fileexchange/28238-kernel-decomposition


Some Questions

  • 在論文的最后作者給出了3D的卷積核的可視化圖,我不清楚放這個有什么用,能說明什么結果?
  • 我發(fā)現(xiàn)作者每篇論文中的網(wǎng)絡kernel,stride,pooing大小都不太一樣,這個是憑感覺決定的嗎?
  • [3] paper 的多層融合以及各個層多權重$\eta_h$的訓練,編程是怎么實現(xiàn)的?
  • [4] paper 對于3D FCN代碼是否有開源,文章中的結論是3D>2D,是否對于3D的FCN有其他細節(jié)的改進,因為根據(jù)我自己的實驗結果,精確度差不多啊。
  • 作者的團隊現(xiàn)在在用什么框架,是自己編程還是用開源的代碼,如今3D的代碼Lasagne的開源程度如何?

祝好!

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

友情鏈接更多精彩內容