[論文筆記]A guide to convolution arithmetic for deep learning

Github

Chapter 1. Introduction

1.1 Discrete convolutions

N: N-D
n: number of output feature maps
m: number of input feature maps
k_j: kernel size along axis j
i_j: input size along axis j
s_j: stride (distance between two consecutive positions of the kernel) along axis j
p_j: zero padding (number of zeros concatenated at the beginning and at the end of an axis) along axis j

1.2 Pooling

Pooling operations reduce the size of feature maps by using some function to summarize subregions, such as taking the average or the maximum value.

Chapter 2. Convolution arithmetic

The analysis of the relationship between convolutional layer properties is eased by the fact that they don’t interact across axes. Because of that, this chapter will focus on the following simplified setting:

  • 2-D discrete convolutions (N = 2)
  • square inputs (i_1 = i_2 = i),
  • square kernel size (k_1 = k_2 = k),
  • same strides along both axes (s_1 = s_2 = s),
  • same zero padding along both axes (p_1 = p_2 = p).

Note: the results outlined here also generalize to the N-D and non-square cases.

2.1 No zero padding, unit strides (p=0, s=1)

Relationship 1. For any i and k, and for s = 1 and p = 0, o = (i - k) + 1.

2.2 Zero padding, unit strides (p>0, s=1)

Relationship 2. For any i, k and p, and for s = 1,
o = (i - k) + 2p + 1.

2.2.1 Half (same) padding (p=\frac{k-1}{2})

Relationship 3. For any i and for k odd (k = 2n + 1,n\in N), s = 1 and p = n,o= (i+2p)-k+1=(i+2n)-(2n+1)+1=i

2.2.2 Full padding (p=k-1)

Relationship 4. For any i and k, and for p = k - 1 and s = 1, o = i + 2(k - 1) - (k - 1)= i + (k - 1).

2.3 No zero padding, non-unit strides (p=0, s>1)

Relationship 5. For any i, k and s, and for p = 0,
o =\lfloor\frac{i - k}{s}\rfloor+1.

2.4 Zero padding, non-unit strides (p>0, s>1)

Relationship 6. For any i, k, p and s,
o =\lfloor\frac{i+2p-k}{s}\rfloor+1.

Chapter 3. Pooling arithmetic

Pooling does not involve zero padding (p=0).

Relationship 7. For any i, k and s,
o =\lfloor\frac{i - k}{s}\rfloor+1.

Chapter 4. Transposed convolution arithmetic

The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution.
Note: transposed convolution properties don’t interact across axes
We still use the same settings as chapter 2 in the following.

4.1 Convolution as a matrix operation

4.2 Transposed convolution

4.3 No zero padding, unit strides, transposed (p=0, s=1, C^T)

Relationship 8. A convolution described by s = 1, p = 0 and k has an associated transposed convolution described by k' = k, s' = s and p' = k - 1 and its output size is o' =i' + (k - 1):

i\xrightarrow[]{\quad k, s=1, p=0\quad} o=i-k+1
i'=o=i-k+1\xrightarrow[]{\; k'=k, s'=s, p'=k-1\;} o'=i'+2p'-k'+1=i

4.4 Zero padding, unit strides, transposed (p>0, s=1, C^T)

Relationship 9. A convolution described by s = 1, k and p has an associated transposed convolution described by k' = k, s' = s and p' = k - p - 1 and its output size is
o' = i + (k - 1) - 2p

i\xrightarrow[]{\quad k, s=1, p\quad} o=i+2p-k+1
i'=o\xrightarrow[]{\; k'=k, s'=s, p'\;} o'=i'+2p'-k'+1=i\implies p'=k-p-1

4.4.1 Half (same) padding, transposed (p=\frac{k-1}{2}, C^T)

Relationship 10. A convolution described by k = 2n+1,n\in N, s = 1 and p = n has an associated transposed convolution described by k'= k, s'= s and p' = k-p-1=(2p+1)-p-1=p and its output size is o'=i'.

4.4.2 Full padding, transposed (p=k-1, C^T)

Relationship 11. A convolution described by s = 1, k and p =k-1 has an associated transposed convolution described by k' = k, s' = s and p' = k-p-1=0 and its output size is o'=i'-(k-1).

4.5 No zero padding, non-unit strides, transposed (p=0, s>1, C^T)

Relationship 12. A convolution described by p=0, k and s and whose input size is such that i-k is a multiple of s, has an associated transposed convolution described by\tilde{i'},k' = k, s' = 1 and p' = k-1, where \tilde{i'} is the size of the stretched input obtained by adding s-1 zeros between each input unit, and its output size is o'= s(i'-1)+k.

i\xrightarrow[]{\quad k, s, p=0\quad} o=\lfloor\frac{i-k}{s}\rfloor+1
\tilde{i'}=i'+(s-1)(i'-1)=s(i'-1)+1\xrightarrow[]{\; k'=k, s'=1, p'=k-1\;} o'=\lfloor\frac{\tilde{i'}+2p'-k'}{s'}\rfloor+1=s(i'-1)+k

4.6 Zero padding, non-unit strides, transposed (p>0, s>1, C^T)

Relationship 13. A convolution described by p, k and s and whose input size is such that i+2p-k is a multiple of s, has an associated transposed convolution described by\tilde{i'},k' = k, s' = 1 and p' = k-p-1, where \tilde{i'} is the size of the stretched input obtained by adding s-1 zeros between each input unit, and its output size is o'= s(i'-1)+k-2p.

i\xrightarrow[]{\quad k, s, p\quad} o=\lfloor\frac{i+2p-k}{s}\rfloor+1
\tilde{i'}=i'+(s-1)(i'-1)=s(i'-1)+1\xrightarrow[]{\; k'=k, s'=1, p'=k-p-1\;} o'=\lfloor\frac{\tilde{i'}+2p'-k'}{s'}\rfloor+1=s(i'-1)+k-2p

Relationship 14.A convolution described by p, k and s has an associated transposed convolution described bya, \tilde{i'},k' = k, s' = 1 and p' = k-p-1, where \tilde{i'} is the size of the stretched input obtained by adding s-1 zeros between each input unit, and a=i+2p-k mod s represents the number of zeros added to the bottom and right
edges of the input, its output size is o'= s(i'-1)+a+k-2p.

Chapter 5. Miscellaneous convolutions

5.1 Dilated convolutions

Dilated convolutions are used to cheaply increase the receptive field of output units without increasing the kernel size, there are usually d-1 spaces inserted between kernel elements such that d = 1 corresponds to a regular convolution.
A kernel of size k dilated by a factor d has an effective size \hat{k}=k+(k-1)(d-1)

Relationship 15. For any i, k, p and s, and for a dilation rate d, o=\lfloor\frac{i+2p-\hat{k}}{s}\rfloor+1=\lfloor\frac{i+2p-k-(k-1)(d-1)}{s}\rfloor+1.

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,871評(píng)論 0 10
  • 添加方法到相應(yīng)的菜單欄using UnityEngine;using UnityEditor;public cla...
    AngerCow閱讀 513評(píng)論 0 0
  • 驕陽似火,獨(dú)自坐在自家陽臺(tái)上,看著人來人往,車水馬龍的街道,回頭看著那張熟睡中中依然掛著甜蜜笑容的女孩,又回憶起曾...
    一只小邋遢閱讀 341評(píng)論 0 0
  • 平凡,生活中處處皆是。 親情、友情平凡不過,但它賜予我們的關(guān)愛與幫助,讓我們感動(dòng),讓我們的心靈一次一次地洗禮;家庭...
    禾小沫閱讀 93評(píng)論 0 0

友情鏈接更多精彩內(nèi)容