Adaptive Training Sample Selection

一、主要貢獻(xiàn)

作者以RetinaNetFCOS為例,分析了anchor-basedanchor-free的性能差異的原因:

  • 1、每個(gè)位置的anchor數(shù)量不同。retinanet每個(gè)點(diǎn)多個(gè)anchor,fcos每個(gè)點(diǎn)只有一個(gè)anchor point
  • 2、正負(fù)樣本的定義方法不同。retinanet使用IOU的雙閾值,fcos使用空間和尺度限制
  • 3、回歸的初始狀態(tài)。retinanet是修改先驗(yàn)的anchor;fcos是使用anchor point。

ATSS論文的主要貢獻(xiàn)

  • 1、指出anchor-based和anchor-free的檢測方法的本質(zhì)區(qū)別是由于正負(fù)樣本的定義不同
  • 2、提出一個(gè)通過目標(biāo)的統(tǒng)計(jì)特征,在訓(xùn)練過程中自適應(yīng)進(jìn)行正負(fù)樣本分配
  • 3、證明在一個(gè)位置放置多個(gè)anchor去檢測目標(biāo)是一個(gè)低效的方法
  • 4、在沒有任何成本的情況下達(dá)到了COCO上最好的表現(xiàn)

拋出了一個(gè)在目標(biāo)檢測領(lǐng)域的核心問題,即label asign,如何分配正負(fù)樣本?

二、分析anchor-free和anchor-based方法的差距

作者為了公平的比較兩者實(shí)際的差異,使用相同的訓(xùn)練方法和tricks,并且將RetinaNet每個(gè)位置的anchor設(shè)為1。但是兩者依舊存在0.8%的差距。

image.png

作者繼續(xù)分析了存在差距的原因:

  • 1、正負(fù)樣本的定義方法


    image.png
  • 2、回歸的初始狀態(tài),即對(duì)anchor回歸還是對(duì)一個(gè)中心點(diǎn)回歸。


    image.png

通過以下實(shí)驗(yàn)的,得出結(jié)論:正負(fù)樣本的定義方法才是核心原因

image.png

三、提出Adaptive Training Sample Selection

在訓(xùn)練的過程中,通過目標(biāo)的統(tǒng)計(jì)特征,自動(dòng)進(jìn)行正負(fù)樣本的劃分。具體過程:

  • 1、對(duì)于每個(gè)ground-truthg,通過L2距離選擇k個(gè)離其中心點(diǎn)最近的anchor,對(duì)于\mathcal L層特征金字塔,共存在k \times \mathcal L個(gè)候選的正樣本。

  • 2、計(jì)算挑選出來的候選的正樣本和g之間的IOU。計(jì)算相應(yīng)的均值m_g和標(biāo)準(zhǔn)差v_g。

  • 3、通過均值和標(biāo)準(zhǔn)差這兩個(gè)統(tǒng)計(jì)特征,得到閾值t_g = m_g + v_g

  • 4、如果候選樣本中IOU大于t_g,并且候選樣本的中心點(diǎn)位于ground-truth中,將其標(biāo)記為正樣本

  • 5、如果一個(gè)anchor box被分配給了多個(gè)ground-truth,僅保留IOU最大的。

    image.png

  • 1、為什么通過中心點(diǎn)的歐式距離選擇候選的正樣本?
    對(duì)于RetinaNetFCOS,越靠近ground-truth,預(yù)測效果越好。

  • 2、為什么使用了均值和標(biāo)準(zhǔn)差作為IOU閾值?
    可以自動(dòng)調(diào)節(jié)選取正負(fù)樣本的閾值。比如當(dāng)出現(xiàn)高方差的時(shí)候,往往意味著有一個(gè)FPN層出現(xiàn)了較高的IOU,說明該層非常適合這個(gè)物體的預(yù)測,因此最終的正樣本都出自該層;而出現(xiàn)低方差的時(shí)候,說明有多個(gè)FPN層適合預(yù)測這個(gè)物體,因此會(huì)在多個(gè)層選取正樣本。

    image.png

  • 3、為什么限制anchor box的中心點(diǎn)要在ground-truth中?
    中心點(diǎn)在ground-truth之外的anchor box往往屬于poor candidates。使用ground-truth外的特征去預(yù)測ground-truth

  • 4、采用這種label asign劃分正負(fù)樣本是否有效
    根據(jù)統(tǒng)計(jì)統(tǒng)計(jì)學(xué),雖然不是標(biāo)準(zhǔn)的正態(tài)分布,但是仍然大約會(huì)有16%的候選樣本會(huì)被劃分為正樣本,每一個(gè)ground-truth在不同尺度、不同比例、不同位置都會(huì)分配0.2 \times k \times \mathcal L個(gè)正樣本。相反對(duì)于RetinaNetFCOS的分配策略而言,大的物體會(huì)有更多的正樣本,這并不是一種公平的方式。

  • 5、如何選擇超參數(shù)k?
    對(duì)于k的選擇并不敏感。

    image.png

四、結(jié)果驗(yàn)證

1、使用了 ATSS后,RetinaNetFCOS無明顯差距

image.png

2、不同尺度和不同比例的anchor box效果都很魯棒
image.png

image.png

3、引入ATSS策略后,設(shè)置anchor數(shù)量與結(jié)果沒有明顯的關(guān)系。
image.png

4、ATSS的性能
image.png

五、源碼實(shí)現(xiàn)

源碼參考了mmdetection的實(shí)現(xiàn):

@BBOX_ASSIGNERS.register_module()
class ATSSAssigner(BaseAssigner):
    """Assign a corresponding gt bbox or background to each bbox.

    Each proposals will be assigned with `0` or a positive integer
    indicating the ground truth index.

    - 0: negative sample, no assigned gt
    - positive integer: positive sample, index (1-based) of assigned gt

    Args:
        topk (float): number of bbox selected in each level
    """

    def __init__(self,
                 topk,
                 iou_calculator=dict(type='BboxOverlaps2D'),
                 ignore_iof_thr=-1):
        self.topk = topk
        self.iou_calculator = build_iou_calculator(iou_calculator)
        self.ignore_iof_thr = ignore_iof_thr

    # https://github.com/sfzhang15/ATSS/blob/master/atss_core/modeling/rpn/atss/loss.py

    def assign(self,
               bboxes,
               num_level_bboxes,
               gt_bboxes,
               gt_bboxes_ignore=None,
               gt_labels=None):
        """Assign gt to bboxes.

        The assignment is done in following steps

        1. compute iou between all bbox (bbox of all pyramid levels) and gt
        2. compute center distance between all bbox and gt
        3. on each pyramid level, for each gt, select k bbox whose center
           are closest to the gt center, so we total select k*l bbox as
           candidates for each gt
        4. get corresponding iou for the these candidates, and compute the
           mean and std, set mean + std as the iou threshold
        5. select these candidates whose iou are greater than or equal to
           the threshold as postive
        6. limit the positive sample's center in gt


        Args:
            bboxes (Tensor): Bounding boxes to be assigned, shape(n, 4).
            num_level_bboxes (List): num of bboxes in each level
            gt_bboxes (Tensor): Groundtruth boxes, shape (k, 4).
            gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are
                labelled as `ignored`, e.g., crowd boxes in COCO.
            gt_labels (Tensor, optional): Label of gt_bboxes, shape (k, ).

        Returns:
            :obj:`AssignResult`: The assign result.
        """
        INF = 100000000
        bboxes = bboxes[:, :4]
        num_gt, num_bboxes = gt_bboxes.size(0), bboxes.size(0)

        # compute iou between all bbox and gt
        overlaps = self.iou_calculator(bboxes, gt_bboxes)

        # assign 0 by default
        assigned_gt_inds = overlaps.new_full((num_bboxes, ),
                                             0,
                                             dtype=torch.long)

        if num_gt == 0 or num_bboxes == 0:
            # No ground truth or boxes, return empty assignment
            max_overlaps = overlaps.new_zeros((num_bboxes, ))
            if num_gt == 0:
                # No truth, assign everything to background
                assigned_gt_inds[:] = 0
            if gt_labels is None:
                assigned_labels = None
            else:
                assigned_labels = overlaps.new_full((num_bboxes, ),
                                                    -1,
                                                    dtype=torch.long)
            return AssignResult(
                num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels)

        # compute center distance between all bbox and gt
        gt_cx = (gt_bboxes[:, 0] + gt_bboxes[:, 2]) / 2.0
        gt_cy = (gt_bboxes[:, 1] + gt_bboxes[:, 3]) / 2.0
        gt_points = torch.stack((gt_cx, gt_cy), dim=1)

        bboxes_cx = (bboxes[:, 0] + bboxes[:, 2]) / 2.0
        bboxes_cy = (bboxes[:, 1] + bboxes[:, 3]) / 2.0
        bboxes_points = torch.stack((bboxes_cx, bboxes_cy), dim=1)

        distances = (bboxes_points[:, None, :] -
                     gt_points[None, :, :]).pow(2).sum(-1).sqrt()

        if (self.ignore_iof_thr > 0 and gt_bboxes_ignore is not None
                and gt_bboxes_ignore.numel() > 0 and bboxes.numel() > 0):
            ignore_overlaps = self.iou_calculator(
                bboxes, gt_bboxes_ignore, mode='iof')
            ignore_max_overlaps, _ = ignore_overlaps.max(dim=1)
            ignore_idxs = ignore_max_overlaps > self.ignore_iof_thr
            distances[ignore_idxs, :] = INF
            assigned_gt_inds[ignore_idxs] = -1

        # Selecting candidates based on the center distance
        candidate_idxs = []
        start_idx = 0
        for level, bboxes_per_level in enumerate(num_level_bboxes):
            # on each pyramid level, for each gt,
            # select k bbox whose center are closest to the gt center
            end_idx = start_idx + bboxes_per_level
            distances_per_level = distances[start_idx:end_idx, :]
            selectable_k = min(self.topk, bboxes_per_level)
            _, topk_idxs_per_level = distances_per_level.topk(
                selectable_k, dim=0, largest=False)
            candidate_idxs.append(topk_idxs_per_level + start_idx)
            start_idx = end_idx
        candidate_idxs = torch.cat(candidate_idxs, dim=0)

        # get corresponding iou for the these candidates, and compute the
        # mean and std, set mean + std as the iou threshold
        candidate_overlaps = overlaps[candidate_idxs, torch.arange(num_gt)]
        overlaps_mean_per_gt = candidate_overlaps.mean(0)
        overlaps_std_per_gt = candidate_overlaps.std(0)
        overlaps_thr_per_gt = overlaps_mean_per_gt + overlaps_std_per_gt

        is_pos = candidate_overlaps >= overlaps_thr_per_gt[None, :]

        # limit the positive sample's center in gt
        for gt_idx in range(num_gt):
            candidate_idxs[:, gt_idx] += gt_idx * num_bboxes
        ep_bboxes_cx = bboxes_cx.view(1, -1).expand(
            num_gt, num_bboxes).contiguous().view(-1)
        ep_bboxes_cy = bboxes_cy.view(1, -1).expand(
            num_gt, num_bboxes).contiguous().view(-1)
        candidate_idxs = candidate_idxs.view(-1)

        # calculate the left, top, right, bottom distance between positive
        # bbox center and gt side
        l_ = ep_bboxes_cx[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 0]
        t_ = ep_bboxes_cy[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 1]
        r_ = gt_bboxes[:, 2] - ep_bboxes_cx[candidate_idxs].view(-1, num_gt)
        b_ = gt_bboxes[:, 3] - ep_bboxes_cy[candidate_idxs].view(-1, num_gt)
        is_in_gts = torch.stack([l_, t_, r_, b_], dim=1).min(dim=1)[0] > 0.01
        is_pos = is_pos & is_in_gts

        # if an anchor box is assigned to multiple gts,
        # the one with the highest IoU will be selected.
        overlaps_inf = torch.full_like(overlaps,
                                       -INF).t().contiguous().view(-1)
        index = candidate_idxs.view(-1)[is_pos.view(-1)]
        overlaps_inf[index] = overlaps.t().contiguous().view(-1)[index]
        overlaps_inf = overlaps_inf.view(num_gt, -1).t()

        max_overlaps, argmax_overlaps = overlaps_inf.max(dim=1)
        assigned_gt_inds[
            max_overlaps != -INF] = argmax_overlaps[max_overlaps != -INF] + 1

        if gt_labels is not None:
            assigned_labels = assigned_gt_inds.new_full((num_bboxes, ), -1)
            pos_inds = torch.nonzero(
                assigned_gt_inds > 0, as_tuple=False).squeeze()
            if pos_inds.numel() > 0:
                assigned_labels[pos_inds] = gt_labels[
                    assigned_gt_inds[pos_inds] - 1]
        else:
            assigned_labels = None
        return AssignResult(
            num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels)
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容