介紹
U-Net是15年出來的在顯微組織切片細胞分割領域大獲成功的一個CNN segmentation模型。它借鑒了當時新提出不久的FCN網絡,進一步有效利用了各個尺度context feature map所具有的信息,充分使用了各種可行的
數據增強的手段,在生物細胞組織切片這種精度高、物體數目密集的圖片數據集上獲得了較好的效果。
關于FCN的具體情況,可參考區(qū)區(qū)的此篇博文:Semantic segmentation系列其一:FCN。
U-Net網絡本質上就是一個FCN。但它也有些值得贊許的地方像通過在Conv中使用valid padding,使得feature map經過conv層時不斷減少兩個邊緣像素;最終幾經裁減并下采樣的feature map獲得了我們想要的mask
map的大小。它的具體網絡結構請見下圖。
U-Net
或者U-Net中最值得一說的是它處理像生物組織切片此類數據的有效方式及所使用的有些特點的loss函數吧。
數據輸入
筆者因為客戶項目需求也增接觸過某種類型的人體顯微組織切片數據。項目需求是找到一種好的方案來定位出切片上的有問題細胞區(qū)域并給出初步判斷,標明此切片的陰、陽性類別。
乍聽上去并不難,像是一個典型的object detection或者復雜一點那就是semantic segmentation的問題??伤臄祿s并非像我們平時玩壞了的Imagenet、Cifar10/100或者COCO/VOC那樣‘正規(guī)’。??蛻艚o我們的數據集
甚至標注都存在問題,要么有些圖片標注不全(漏標),要么就是標注錯誤(錯標)(可能是請老專家們費力盯著切片去標時錢沒給夠吧?。?。每張圖片更是有數百萬個像素,單個圖片文件大小有幾個GB。
碰到這樣的需求你會如何做呢?不過多討論,先看下U-Net作者們的做法??赡軙行┙梃b吧。
首先,它們將大的圖片按照Tiling的方式切成許多個patch,然后再將patch作為網絡的輸入。因為Segmentation對圖片之上物體的位置要求極其嚴格,因此使用Valid padding的方式對feature maps進行conv操作、處理。
這意味著我們需要讓輸入的patch有著更大的context,如此才能保證圖片經過一層層conv的去border(因為valid padding的使用)最終也能保證其位置信息正確。
如下為U-Net在對輸入圖片進行overlapping patch處理的方式。
數據預處理
一般像這種生物類的切片數據集成本都較高(隱私、標注成本等),所以數據集規(guī)模都不會太大。為此需要多用些data augmentation的方法才能保證數據集對模型的充分擬合。
本篇中作者用的data augmentation方法有:shift/rotation(比如多些不同類型的角度旋轉),進行些平滑過濾、變形等(被證明為在使用小的數據集時極其有效。)。
網絡結構
從開篇圖中亦可看出U-Net網絡起的名字確實是名符其實。它共有兩個Patch組成,一條為contracting path(left side),它主要是典型的CNN特征提取網絡(用于逐級精煉、提取特征);另一條則為expansive path,用于使用
contracting path得到的不同尺度、級別的特征進一步上采樣、變形、轉換為高精度的mask map。
從里面隱約已經看出了后來的DSSD模型與FPN模型中使用的結構的影子。
模型訓練
在訓練時,我們通過在使用輸入的label mask map與最終網絡輸出的結果mask map之間逐個元素進行cross-entropy求和來得到loss,并進行backward propagation計算。
如下為loss函數計算公式。
有趣的是為了更好地學習出細胞間的細小分隔元素(以更好地表示細胞間的邊緣),作者在loss函數計算時附加了一個反映元素級別位置的權重矩陣(w(x),x為圖片上的每一個像素)。
其中w(x)的值則是根據訓練數據集中所知的segmentation map的類別情況及其上像素分布位置而定的。它的計算公式如下:
乍看時感覺像是在使用先驗的標注信息,有些像是作弊??梢凰伎妓孟褚卜蠙C器學習的一貫做法,像Yolo v2中使用的對訓練數據集中的所有ground truth box進行聚類以獲得其prior box 合理width/height的做法不也是如此嘛。
實驗結果
下面是它對細胞組織切片的處理結果,可以直觀感覺到它的效果。
最后下表為它在2015ISBI細胞定位大賽中與其它模型比較的結果。
代碼分析
下面是U-Net 的prototxt的構成??梢娝怯玫膙alid conv,同時在expand path中使用的上采樣則是用了deconvolution。
name: 'phseg_v5'
force_backward: true
layers { top: 'data' top: 'label' name: 'loaddata' type: HDF5_DATA hdf5_data_param { source: 'aug_deformed_phseg_v5.txt' batch_size: 1 } include: { phase: TRAIN }}
layers { bottom: 'data' top: 'd0b' name: 'conv_d0a-b' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 64 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd0b' top: 'd0b' name: 'relu_d0b' type: RELU }
layers { bottom: 'd0b' top: 'd0c' name: 'conv_d0b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 64 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd0c' top: 'd0c' name: 'relu_d0c' type: RELU }
layers { bottom: 'd0c' top: 'd1a' name: 'pool_d0c-1a' type: POOLING pooling_param { pool: MAX kernel_size: 2 stride: 2 } }
layers { bottom: 'd1a' top: 'd1b' name: 'conv_d1a-b' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd1b' top: 'd1b' name: 'relu_d1b' type: RELU }
layers { bottom: 'd1b' top: 'd1c' name: 'conv_d1b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd1c' top: 'd1c' name: 'relu_d1c' type: RELU }
layers { bottom: 'd1c' top: 'd2a' name: 'pool_d1c-2a' type: POOLING pooling_param { pool: MAX kernel_size: 2 stride: 2 } }
layers { bottom: 'd2a' top: 'd2b' name: 'conv_d2a-b' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 256 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd2b' top: 'd2b' name: 'relu_d2b' type: RELU }
layers { bottom: 'd2b' top: 'd2c' name: 'conv_d2b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 256 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd2c' top: 'd2c' name: 'relu_d2c' type: RELU }
layers { bottom: 'd2c' top: 'd3a' name: 'pool_d2c-3a' type: POOLING pooling_param { pool: MAX kernel_size: 2 stride: 2 } }
layers { bottom: 'd3a' top: 'd3b' name: 'conv_d3a-b' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 512 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd3b' top: 'd3b' name: 'relu_d3b' type: RELU }
layers { bottom: 'd3b' top: 'd3c' name: 'conv_d3b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 512 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd3c' top: 'd3c' name: 'relu_d3c' type: RELU }
layers { bottom: 'd3c' top: 'd3c' name: 'dropout_d3c' type: DROPOUT dropout_param { dropout_ratio: 0.5 } include: { phase: TRAIN }}
layers { bottom: 'd3c' top: 'd4a' name: 'pool_d3c-4a' type: POOLING pooling_param { pool: MAX kernel_size: 2 stride: 2 } }
layers { bottom: 'd4a' top: 'd4b' name: 'conv_d4a-b' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 1024 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd4b' top: 'd4b' name: 'relu_d4b' type: RELU }
layers { bottom: 'd4b' top: 'd4c' name: 'conv_d4b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 1024 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd4c' top: 'd4c' name: 'relu_d4c' type: RELU }
layers { bottom: 'd4c' top: 'd4c' name: 'dropout_d4c' type: DROPOUT dropout_param { dropout_ratio: 0.5 } include: { phase: TRAIN }}
layers { bottom: 'd4c' top: 'u3a' name: 'upconv_d4c_u3a' type: DECONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 512 pad: 0 kernel_size: 2 stride: 2 weight_filler { type: 'xavier' }} }
layers { bottom: 'u3a' top: 'u3a' name: 'relu_u3a' type: RELU }
layers { bottom: 'd3c' bottom: 'u3a' top: 'd3cc' name: 'crop_d3c-d3cc' type: CROP }
layers { bottom: 'u3a' bottom: 'd3cc' top: 'u3b' name: 'concat_d3cc_u3a-b' type: CONCAT }
layers { bottom: 'u3b' top: 'u3c' name: 'conv_u3b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 512 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u3c' top: 'u3c' name: 'relu_u3c' type: RELU }
layers { bottom: 'u3c' top: 'u3d' name: 'conv_u3c-d' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 512 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u3d' top: 'u3d' name: 'relu_u3d' type: RELU }
layers { bottom: 'u3d' top: 'u2a' name: 'upconv_u3d_u2a' type: DECONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 256 pad: 0 kernel_size: 2 stride: 2 weight_filler { type: 'xavier' }} }
layers { bottom: 'u2a' top: 'u2a' name: 'relu_u2a' type: RELU }
layers { bottom: 'd2c' bottom: 'u2a' top: 'd2cc' name: 'crop_d2c-d2cc' type: CROP }
layers { bottom: 'u2a' bottom: 'd2cc' top: 'u2b' name: 'concat_d2cc_u2a-b' type: CONCAT }
layers { bottom: 'u2b' top: 'u2c' name: 'conv_u2b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 256 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u2c' top: 'u2c' name: 'relu_u2c' type: RELU }
layers { bottom: 'u2c' top: 'u2d' name: 'conv_u2c-d' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 256 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u2d' top: 'u2d' name: 'relu_u2d' type: RELU }
layers { bottom: 'u2d' top: 'u1a' name: 'upconv_u2d_u1a' type: DECONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 pad: 0 kernel_size: 2 stride: 2 weight_filler { type: 'xavier' }} }
layers { bottom: 'u1a' top: 'u1a' name: 'relu_u1a' type: RELU }
layers { bottom: 'd1c' bottom: 'u1a' top: 'd1cc' name: 'crop_d1c-d1cc' type: CROP }
layers { bottom: 'u1a' bottom: 'd1cc' top: 'u1b' name: 'concat_d1cc_u1a-b' type: CONCAT }
layers { bottom: 'u1b' top: 'u1c' name: 'conv_u1b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u1c' top: 'u1c' name: 'relu_u1c' type: RELU }
layers { bottom: 'u1c' top: 'u1d' name: 'conv_u1c-d' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u1d' top: 'u1d' name: 'relu_u1d' type: RELU }
layers { bottom: 'u1d' top: 'u0a' name: 'upconv_u1d_u0a' type: DECONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 pad: 0 kernel_size: 2 stride: 2 weight_filler { type: 'xavier' }} }
layers { bottom: 'u0a' top: 'u0a' name: 'relu_u0a' type: RELU }
layers { bottom: 'd0c' bottom: 'u0a' top: 'd0cc' name: 'crop_d0c-d0cc' type: CROP }
layers { bottom: 'u0a' bottom: 'd0cc' top: 'u0b' name: 'concat_d0cc_u0a-b' type: CONCAT }
layers { bottom: 'u0b' top: 'u0c' name: 'conv_u0b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 64 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u0c' top: 'u0c' name: 'relu_u0c' type: RELU }
layers { bottom: 'u0c' top: 'u0d' name: 'conv_u0c-d' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 64 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u0d' top: 'u0d' name: 'relu_u0d' type: RELU }
layers { bottom: 'u0d' top: 'score' name: 'conv_u0d-score' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { engine: CAFFE num_output: 2 pad: 0 kernel_size: 1 weight_filler { type: 'xavier' }} }
layers { bottom: 'score' bottom: 'label' top: 'loss' name: 'loss' type: SOFTMAX_LOSS loss_param { ignore_label: 2 }include: { phase: TRAIN }}
參考文獻
- U-Net: Convolutional Networks for Biomedical Image Segmentation, Olaf-Ronneberger, 2015
- https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/