Task 7 FCN README

FCN8s tensorflow ADE20k

1. introduction

This is a fully-connected network(8 strides) implementation on the dataset ADE20k, using tensorflow.
The implementation is largely based on the paper arXiv: Fully Convolutional Networks for Semantic Segmentation and 2 implementation from other githubs: FCN.tensorflow and semantic-segmentation-pytorch.

net dataset competition framework arXiv paper
FCN8s ADE20k MIT Scene Parsing Benchmark (SceneParse150) tensorflow 1.4 and python 3.6 arXiv: Fully Convolutional Networks for Semantic Segmentation

2. how to run

  1. Download and extract the dataset.
    1. Download the .zip dataset. link: http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip
    2. Put the .zip dataset under ./Data_zoo/MIT_SceneParsing/ and extract.
  2. Start training. Simply run FCN_train.py
  3. To test the model, simply run FCN_test.py.
    It will test the first 100 validation images by default. The validation dataset contains 2000 images, so if you want to test more images, simply modified the variable TEST_NUM, e.g. 1000.
  4. To use the model to infer images,
    1. Put the .jpg images you want to infer in the folder ./infer
    2. Make sure there is a folder ./output (to store the result).
    3. Run FCN_infer.py, it will process all .jpg images under ./infer and put the predicted annotations under ./output

3. code logic

  • Each time process a batch containing 2 images.
    If the size of 2 images/annotations are different, enlarge the smaller image/annotation so that they are the same size. The enlarged size must be rounded so that it can be divided by 32(because the size is downsized 32 times at most, when processed through the FCN network).
  • I use the function scipy.misc.imresize to resize.
    The function param interp(interpolation) for resizing images is "bilinear", while that for resizing annotations is "nearest".
  • The optimizer is Adam Optimizer with learning rate 1e-5.

4. running

4.1 train

When you run FCN_train.py, you will see:

  • It will cost about 7~8 hours on Nvidia GeForce GTX 1080 11GB.
  • The training loss is hard to be improved when it's around 0.8 ~ 1.5. This may be the limitation of FCN.

4.2 test

When you run FCN_test.py, you will see:

After processing all validation images, It will print the metrics.

You can uncomment lines 130~143 to see some well processed results.

4.3 infer

Just put the .jpg images in ./infer, run ./FCN_infer.py and the predicted annotations will be put in ./output.

5. results

The metrics of testing 100 validation images is:

pixel_accuracy mean_accuracy mean IU(mean iou) frequency weighted IU
0.6739 0.4332 0.3644 0.5024

(There are 151 classes, where class index 0 is "others". You just need to care the accuracy on the 150 classes)

Here are some examples:

6. other links

  • A simple Chinese readme is here: 簡書
  • Another blog written previously in Chinese is here: task7 FCN分析,the contents about code may not be useful, because I make a lot of change on the code since then.

以上兩個(gè)鏈接是我寫的兩個(gè)關(guān)于FCN的學(xué)習(xí)報(bào)告。
第二個(gè)學(xué)習(xí)報(bào)告寫的比較早,關(guān)于代碼的部分可以選擇性地看,因?yàn)楹芏啻a已經(jīng)大改了。
第一個(gè)報(bào)告是根據(jù)這份代碼寫成的,更具參考性

相關(guān)閱讀

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容