一、主要貢獻(xiàn)
作者以RetinaNet和FCOS為例,分析了anchor-based和anchor-free的性能差異的原因:
- 1、每個(gè)位置的anchor數(shù)量不同。retinanet每個(gè)點(diǎn)多個(gè)anchor,fcos每個(gè)點(diǎn)只有一個(gè)anchor point
- 2、正負(fù)樣本的定義方法不同。retinanet使用IOU的雙閾值,fcos使用空間和尺度限制
- 3、回歸的初始狀態(tài)。retinanet是修改先驗(yàn)的anchor;fcos是使用anchor point。
ATSS論文的主要貢獻(xiàn):
- 1、指出anchor-based和anchor-free的檢測方法的本質(zhì)區(qū)別是由于正負(fù)樣本的定義不同
- 2、提出一個(gè)通過目標(biāo)的統(tǒng)計(jì)特征,在訓(xùn)練過程中自適應(yīng)進(jìn)行正負(fù)樣本分配
- 3、證明在一個(gè)位置放置多個(gè)anchor去檢測目標(biāo)是一個(gè)低效的方法
- 4、在沒有任何成本的情況下達(dá)到了COCO上最好的表現(xiàn)
拋出了一個(gè)在目標(biāo)檢測領(lǐng)域的核心問題,即label asign,如何分配正負(fù)樣本?
二、分析anchor-free和anchor-based方法的差距
作者為了公平的比較兩者實(shí)際的差異,使用相同的訓(xùn)練方法和tricks,并且將RetinaNet每個(gè)位置的anchor設(shè)為1。但是兩者依舊存在0.8%的差距。

作者繼續(xù)分析了存在差距的原因:
-
1、正負(fù)樣本的定義方法
image.png -
2、回歸的初始狀態(tài),即對(duì)anchor回歸還是對(duì)一個(gè)中心點(diǎn)回歸。
image.png
通過以下實(shí)驗(yàn)的,得出結(jié)論:正負(fù)樣本的定義方法才是核心原因

三、提出Adaptive Training Sample Selection
在訓(xùn)練的過程中,通過目標(biāo)的統(tǒng)計(jì)特征,自動(dòng)進(jìn)行正負(fù)樣本的劃分。具體過程:
1、對(duì)于每個(gè)
ground-truth,通過
距離選擇
個(gè)離其中心點(diǎn)最近的
anchor,對(duì)于層特征金字塔,共存在
個(gè)候選的正樣本。
2、計(jì)算挑選出來的候選的正樣本和
之間的IOU。計(jì)算相應(yīng)的均值
和標(biāo)準(zhǔn)差
。
3、通過均值和標(biāo)準(zhǔn)差這兩個(gè)統(tǒng)計(jì)特征,得到閾值
4、如果候選樣本中IOU大于
,并且候選樣本的中心點(diǎn)位于
ground-truth中,將其標(biāo)記為正樣本-
5、如果一個(gè)
anchor box被分配給了多個(gè)ground-truth,僅保留IOU最大的。
image.png 1、為什么通過中心點(diǎn)的歐式距離選擇候選的正樣本?
對(duì)于RetinaNet和FCOS,越靠近ground-truth,預(yù)測效果越好。-
2、為什么使用了均值和標(biāo)準(zhǔn)差作為IOU閾值?
可以自動(dòng)調(diào)節(jié)選取正負(fù)樣本的閾值。比如當(dāng)出現(xiàn)高方差的時(shí)候,往往意味著有一個(gè)FPN層出現(xiàn)了較高的IOU,說明該層非常適合這個(gè)物體的預(yù)測,因此最終的正樣本都出自該層;而出現(xiàn)低方差的時(shí)候,說明有多個(gè)FPN層適合預(yù)測這個(gè)物體,因此會(huì)在多個(gè)層選取正樣本。
image.png 3、為什么限制
anchor box的中心點(diǎn)要在ground-truth中?
中心點(diǎn)在ground-truth之外的anchor box往往屬于poor candidates。使用ground-truth外的特征去預(yù)測ground-truth。4、采用這種
label asign劃分正負(fù)樣本是否有效
根據(jù)統(tǒng)計(jì)統(tǒng)計(jì)學(xué),雖然不是標(biāo)準(zhǔn)的正態(tài)分布,但是仍然大約會(huì)有16%的候選樣本會(huì)被劃分為正樣本,每一個(gè)ground-truth在不同尺度、不同比例、不同位置都會(huì)分配個(gè)正樣本。相反對(duì)于
RetinaNet和FCOS的分配策略而言,大的物體會(huì)有更多的正樣本,這并不是一種公平的方式。-
5、如何選擇超參數(shù)
?
對(duì)于的選擇并不敏感。
image.png
四、結(jié)果驗(yàn)證
1、使用了 ATSS后,RetinaNet和FCOS無明顯差距

2、不同尺度和不同比例的
anchor box效果都很魯棒

3、引入ATSS策略后,設(shè)置
anchor數(shù)量與結(jié)果沒有明顯的關(guān)系。
4、ATSS的性能

五、源碼實(shí)現(xiàn)
源碼參考了mmdetection的實(shí)現(xiàn):
@BBOX_ASSIGNERS.register_module()
class ATSSAssigner(BaseAssigner):
"""Assign a corresponding gt bbox or background to each bbox.
Each proposals will be assigned with `0` or a positive integer
indicating the ground truth index.
- 0: negative sample, no assigned gt
- positive integer: positive sample, index (1-based) of assigned gt
Args:
topk (float): number of bbox selected in each level
"""
def __init__(self,
topk,
iou_calculator=dict(type='BboxOverlaps2D'),
ignore_iof_thr=-1):
self.topk = topk
self.iou_calculator = build_iou_calculator(iou_calculator)
self.ignore_iof_thr = ignore_iof_thr
# https://github.com/sfzhang15/ATSS/blob/master/atss_core/modeling/rpn/atss/loss.py
def assign(self,
bboxes,
num_level_bboxes,
gt_bboxes,
gt_bboxes_ignore=None,
gt_labels=None):
"""Assign gt to bboxes.
The assignment is done in following steps
1. compute iou between all bbox (bbox of all pyramid levels) and gt
2. compute center distance between all bbox and gt
3. on each pyramid level, for each gt, select k bbox whose center
are closest to the gt center, so we total select k*l bbox as
candidates for each gt
4. get corresponding iou for the these candidates, and compute the
mean and std, set mean + std as the iou threshold
5. select these candidates whose iou are greater than or equal to
the threshold as postive
6. limit the positive sample's center in gt
Args:
bboxes (Tensor): Bounding boxes to be assigned, shape(n, 4).
num_level_bboxes (List): num of bboxes in each level
gt_bboxes (Tensor): Groundtruth boxes, shape (k, 4).
gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are
labelled as `ignored`, e.g., crowd boxes in COCO.
gt_labels (Tensor, optional): Label of gt_bboxes, shape (k, ).
Returns:
:obj:`AssignResult`: The assign result.
"""
INF = 100000000
bboxes = bboxes[:, :4]
num_gt, num_bboxes = gt_bboxes.size(0), bboxes.size(0)
# compute iou between all bbox and gt
overlaps = self.iou_calculator(bboxes, gt_bboxes)
# assign 0 by default
assigned_gt_inds = overlaps.new_full((num_bboxes, ),
0,
dtype=torch.long)
if num_gt == 0 or num_bboxes == 0:
# No ground truth or boxes, return empty assignment
max_overlaps = overlaps.new_zeros((num_bboxes, ))
if num_gt == 0:
# No truth, assign everything to background
assigned_gt_inds[:] = 0
if gt_labels is None:
assigned_labels = None
else:
assigned_labels = overlaps.new_full((num_bboxes, ),
-1,
dtype=torch.long)
return AssignResult(
num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels)
# compute center distance between all bbox and gt
gt_cx = (gt_bboxes[:, 0] + gt_bboxes[:, 2]) / 2.0
gt_cy = (gt_bboxes[:, 1] + gt_bboxes[:, 3]) / 2.0
gt_points = torch.stack((gt_cx, gt_cy), dim=1)
bboxes_cx = (bboxes[:, 0] + bboxes[:, 2]) / 2.0
bboxes_cy = (bboxes[:, 1] + bboxes[:, 3]) / 2.0
bboxes_points = torch.stack((bboxes_cx, bboxes_cy), dim=1)
distances = (bboxes_points[:, None, :] -
gt_points[None, :, :]).pow(2).sum(-1).sqrt()
if (self.ignore_iof_thr > 0 and gt_bboxes_ignore is not None
and gt_bboxes_ignore.numel() > 0 and bboxes.numel() > 0):
ignore_overlaps = self.iou_calculator(
bboxes, gt_bboxes_ignore, mode='iof')
ignore_max_overlaps, _ = ignore_overlaps.max(dim=1)
ignore_idxs = ignore_max_overlaps > self.ignore_iof_thr
distances[ignore_idxs, :] = INF
assigned_gt_inds[ignore_idxs] = -1
# Selecting candidates based on the center distance
candidate_idxs = []
start_idx = 0
for level, bboxes_per_level in enumerate(num_level_bboxes):
# on each pyramid level, for each gt,
# select k bbox whose center are closest to the gt center
end_idx = start_idx + bboxes_per_level
distances_per_level = distances[start_idx:end_idx, :]
selectable_k = min(self.topk, bboxes_per_level)
_, topk_idxs_per_level = distances_per_level.topk(
selectable_k, dim=0, largest=False)
candidate_idxs.append(topk_idxs_per_level + start_idx)
start_idx = end_idx
candidate_idxs = torch.cat(candidate_idxs, dim=0)
# get corresponding iou for the these candidates, and compute the
# mean and std, set mean + std as the iou threshold
candidate_overlaps = overlaps[candidate_idxs, torch.arange(num_gt)]
overlaps_mean_per_gt = candidate_overlaps.mean(0)
overlaps_std_per_gt = candidate_overlaps.std(0)
overlaps_thr_per_gt = overlaps_mean_per_gt + overlaps_std_per_gt
is_pos = candidate_overlaps >= overlaps_thr_per_gt[None, :]
# limit the positive sample's center in gt
for gt_idx in range(num_gt):
candidate_idxs[:, gt_idx] += gt_idx * num_bboxes
ep_bboxes_cx = bboxes_cx.view(1, -1).expand(
num_gt, num_bboxes).contiguous().view(-1)
ep_bboxes_cy = bboxes_cy.view(1, -1).expand(
num_gt, num_bboxes).contiguous().view(-1)
candidate_idxs = candidate_idxs.view(-1)
# calculate the left, top, right, bottom distance between positive
# bbox center and gt side
l_ = ep_bboxes_cx[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 0]
t_ = ep_bboxes_cy[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 1]
r_ = gt_bboxes[:, 2] - ep_bboxes_cx[candidate_idxs].view(-1, num_gt)
b_ = gt_bboxes[:, 3] - ep_bboxes_cy[candidate_idxs].view(-1, num_gt)
is_in_gts = torch.stack([l_, t_, r_, b_], dim=1).min(dim=1)[0] > 0.01
is_pos = is_pos & is_in_gts
# if an anchor box is assigned to multiple gts,
# the one with the highest IoU will be selected.
overlaps_inf = torch.full_like(overlaps,
-INF).t().contiguous().view(-1)
index = candidate_idxs.view(-1)[is_pos.view(-1)]
overlaps_inf[index] = overlaps.t().contiguous().view(-1)[index]
overlaps_inf = overlaps_inf.view(num_gt, -1).t()
max_overlaps, argmax_overlaps = overlaps_inf.max(dim=1)
assigned_gt_inds[
max_overlaps != -INF] = argmax_overlaps[max_overlaps != -INF] + 1
if gt_labels is not None:
assigned_labels = assigned_gt_inds.new_full((num_bboxes, ), -1)
pos_inds = torch.nonzero(
assigned_gt_inds > 0, as_tuple=False).squeeze()
if pos_inds.numel() > 0:
assigned_labels[pos_inds] = gt_labels[
assigned_gt_inds[pos_inds] - 1]
else:
assigned_labels = None
return AssignResult(
num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels)




