?文章鏈接?https://arxiv.org/abs/1708.05237
貢獻(xiàn)點(diǎn):
1)proposing a?scale-equitable?face detection frame work to handle different scales of faces well.
2)improving the recall rate of small faces by a?scale compensation anchor matching strategy.
3)reducing the false positive rate of small faces via a?max-out background label.
這篇文章作者其實(shí)是在SSD網(wǎng)絡(luò)結(jié)構(gòu)的基礎(chǔ)上針對(duì)人臉檢測(cè)數(shù)據(jù)集的特點(diǎn)做了一些改進(jìn)。
傳統(tǒng)基于anchor檢測(cè)方法的缺點(diǎn):
Comparing with other methods, anchor-based detection?methods are more robust in complicated scenes and their speed is invariant to object numbers. However, as indicated in [12],the performance of anchor-based detectors drop dramatically as the objects becoming smaller.
Anchor-based方法沒(méi)有scale-invariant(尺度不變性).對(duì)大物體檢測(cè)的好,對(duì)小物體不行。
沒(méi)有scale-invariant原因:


1.Biased?framework(不適當(dāng)?shù)木W(wǎng)絡(luò)結(jié)構(gòu))
(1)Firstly, the stride size of the lowest anchor-associated layer is too large (e.g.,8 pixels in [26] and 16 pixels in [38]), therefore small and medium faces have been highly squeezed on these layers and have few features for detection. Fig.1(a).
后面卷積層的步長(zhǎng)變的很大,比如conv5_3的stride為16。這樣會(huì)忽略掉一部分小物體。
(2)Secondly, small face, anchor scale and receptive field are mutual mismatch: anchor scale mismatches receptive field and both are too large to fit small face.see Fig.1(b).
Anchor的尺度對(duì)小目標(biāo)設(shè)計(jì)的不合適。
2. Anchor matching strategy.
those faces whose scale distribute away from anchor scales can not match enough anchors, such as tiny and outer face in Fig.1(c), leading to their low recall rate.
因?yàn)閍nchor設(shè)計(jì)的問(wèn)題,導(dǎo)致有些小臉沒(méi)有足夠多的anchor與其相匹配,故而降低了檢測(cè)率。
3. Background from small anchors.
As illustrated in Fig.1(d), these small anchors lead to a sharp increase in the number of negative anchors on the background,bringing about many false positive faces.
若降低anchor的尺度(如在conv3_3加入小尺度的anchor),會(huì)大大增加負(fù)樣本數(shù)量。
為改進(jìn)傳統(tǒng)方法存在的問(wèn)題,本文的方法:
1.scale-equitable?face detection framework:


從圖3(a)可以看出理想感受野比實(shí)際感受野小很多。According to this theory, the anchor should be significantly smaller than theoretical receptive field in order to match the effective receptive field (see the specific example in Fig.3(b)).
As shown in the second and third column in Tab.1, the scales of our anchors are?4times?its interval. We call it?equal-proportion interval principle(illustrated in Fig.3(c)), which guarantees that different scales of anchor have the same density on the image, so that various scales face can approximately match the same number of anchors.
網(wǎng)絡(luò)結(jié)構(gòu)依舊沿用SSD的網(wǎng)絡(luò)結(jié)構(gòu)。因?yàn)樵W(wǎng)絡(luò)的anchor尺度設(shè)置有點(diǎn)大,所以作者重新設(shè)置了anchor的尺度。并且作者認(rèn)為stride決定了anchor的間隔。所以設(shè)置每層stride的大小為每層anchor尺度的1/4.作者稱(chēng)其為equal-proportion interval principle
2.Scale compensation anchor matching strategy
為了使某些小物體有足夠多的anchor與其相匹配,所以適當(dāng)降低了閾值。
Stage?one:We follow current anchor matching method but decrease threshold from 0:5 to 0:35 in order to increase the average number of matched anchors.
Stage?Two:After stage one, some faces still do not match enough anchors, such as tiny and outer faces marked with the gray dotted curve in Fig.4(a). We deal with each of these faces as follow:
firstly picking out anchors whose jaccard overlap with this face are higher than 0:1, then sorting them to select top-N as matched anchors of this face. We set N as the average number from stage one.
3.?Maxout background label
該方法是為了平衡負(fù)樣本與正樣本的比例?具體方法如下,但是沒(méi)太明白。
we propose to apply a more sophisticated classification strategy on the lowest layer to handle the complicated background from small anchors. We apply the max-out background label for the conv3_3 detection layer. For each of the smallest anchors, we predict Nm(Nm is the maxout background label)scores for background label and then choose the highest as its final score, as illustrated in Fig.4(b).