論文筆記-Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning

題目:Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning

來源:CVPR 2014

論文主頁(有matlab代碼):https://cvl.gist.ac.kr/project/cmot.html

本文主要解決的是軌跡問題,甚至在出現(xiàn)嚴重遮擋的情況下還能有很好的軌跡。


先插播個廣告哈!對理解本paper很有用!

多目標跟蹤(MOT)

比如說,有一段視頻,視頻是由 N 個 連續(xù)幀構成的。從第一幀到最后一幀,里面有多個目標,不斷地有出有進,不斷地運動。我們的目的是對每個目標,能跟其他目標區(qū)分開,能跟蹤它在不同幀中的軌跡。最經(jīng)典的應用就是路口監(jiān)控中看到的行人了。

Input: detection responses。

現(xiàn)在拿來一段視頻,我們先用 state-of-art detectors 把各幀中的人檢測出來,包括位置、大小。當然,這個檢測是有 noise 的,否則也就沒必要用各種 MOT 方法來處理了。每個 detection產(chǎn)生的結果就是 response,它還有一個可信度,比如80%可信,20%可信。

Output: Tracklets. 就是最終得到每個 target 的軌跡。

Challenges: Occlusions、similar appearance、complex motion,、false alarms

Occlusions:有三種,即被場景中的物體遮擋,被其他 target 遮擋,被自己遮擋(如變形,無法檢測到)。遮擋之后,本來應該檢測到的target 就檢測不到了。解決的方法如根據(jù) temporal 信息,估計出某一幀的某個位置有 target 被遮擋了。

Appearance:一是怎么把 target 與背景分開;二是怎么把不同的 target 分開。一般需要設計一個很好的外觀模型,比如用 HOG、color histogram 等等。

Motion: 最簡單的情況是勻速直線運動,這樣我們很好預測下一幀中這個 target 在哪兒,但實際情況往往并不如此。比如可能來了個急轉(zhuǎn)彎,可能突然轉(zhuǎn)身往回走。解決之道,一般要設計更靈活、更復雜的運動模型。

False Alarms: detector 給出了 response,但實際上那個地方并沒有 target,也就是說誤檢了。這就要根據(jù)可信度,以及多種 refinement (細化)方法來甄別了。

Approaches

目前比較有代表性的有兩種:

1、Detection-based data association.

多目標跟蹤,可以看作一個數(shù)據(jù)關聯(lián)問題,連續(xù)兩幀之間的tracklets or detections 做 link,形成 Longer tracklets。

最經(jīng)典的框架是Nevatia 他們在2008年 ECCV 發(fā)表的論文Robust Object Tracking by Hierarchical Association of Detection Responses中提出的多層跟蹤框架。

low-level:把連續(xù)幀中的 detection responses 連起來成為 short tracklets,并用閾值去掉 unsafe 的,剩下 reliable tracklets。

mid-level:對 low-level 得到的 tracklets,對每對tracklet 計算一個 link probability 或 affinity score,然后用 Hungarian 算法做 global optimal assignment,得到 longer tracklets.

high-level:這里就是對 mid-level 得到的 tracklets 做 refine 了,比如做一個 entry-exit map,估計tracklets 的 start 和 end,對于沒有 reach entry-exit points 的,做一個 completion。如,尋找 moving group,并據(jù)此完善 group 中的 targets 的 tracklets。

這三層是一個基礎性的、開放的框架,人們可以在每個層次中不斷添加使用新的方法,可以看到,許多論文都是構建在這個框架上的。

2、Energy minimization.

很多問題都可以轉(zhuǎn)化為一個能量最小化的問題:在解空間中,每個解都對應一個 cost或者說是 energy,我們要做的就是把這個 cost function 表示出來,并找到一個合適的方法求最優(yōu)解。MOT 大神 Anton Milan 在2014年 PAMI 發(fā)表的Continuous Energy Minimization for Multi-Target Tracking就是一個典型。已知的是所有 detection responses,解空間就是這些個 responses 構成的所有可能的 tracklets 組合。每個組合都有一個 cost ,尋找一個最優(yōu)的組合。本文清晰的闡述了 cost function 的構成以及 minimization。值得一提的是,大神的這篇文章構造了一個連續(xù)的 cost function,這樣容易求解;它用了 jump move,跳出局部最優(yōu),尋找全局最優(yōu)。

啊,不愧是大神啊。


廣告插播結束,下面接著說論文吧。

Abstract

在線的多目標跟蹤蠻難的

because of frequent occlusion by clutter or other objects, similar appearances of different objects, and other factors.

雖然難,但也不是沒有解決方法。

In this paper, we propose a robust online multi-object tracking method that can handle these difficulties effectively.

具體的工作流程如下:

We first propose the tracklet confidence using the detectability(可檢測性)and continuity(連續(xù)性)of a tracklet, and formulate a multi-object tracking problem based on the tracklet confidence.

The multi-object tracking problem is then solved by associating tracklets in different ways according to their confidence values.

Based on this strategy, tracklets sequentially grow with online-provided detections, and fragmented tracklets are linked up with others without any iterative and expensive associations.

Here, for reliable association between tracklets and detections, we also propose a novel online learning method using an incremental linear discriminant analysis(ILDA) for discriminating the appearances of objects.(也就是說用ILDA的作用是通過判別物體的外觀確保tracklets和detections之間的可信賴的關聯(lián))

By exploiting the proposed learning method, tracklet association can be successfully achieved even under severe occlusion.

1. Introduction

The goal of multi-object tracking is to estimate the states of multiple objects while conserving their identifications under appearance and motion variations with time.(在外觀和運動發(fā)生變化的情況下,MOT的作用是估計狀態(tài)states和保持id)

多目標跟蹤存在的挑戰(zhàn)是:

In a complex scene, this problem is especially challenging due to frequent occlusion by clutter or other objects, similar appearances of different objects, and so on.(MOT的最大挑戰(zhàn)就是頻繁遮擋和外觀相似)

現(xiàn)如今,提出了一種tracking-by-detection methods,在性能上有很大的改進。因為即使在擁擠的環(huán)境下也能提供可靠的detections??梢?,良好的detections是有利于跟蹤的。

The tracking-by-detection methods generally build long trajectories of objects by associating detections provided by detectors.

分成2類,batch and online methods。(哇哦,Multiple Object Tracking: A Literature Review上也有說耶?。?/p>

1、Batch methods

Batch methods usually utilize the detections of whole frames together to link fragmented

trajectories (i.e. tracklets) due to occlusion.

However, the performance of the batch methods is still limited under long-term occlusion because of the difficulty in distinguishing different objects.It is thus difficult to apply the batch methods to real-time applications.(因為這種方法要求huge computation due to the iterative associations for generating globally optimized tracks)

batch method這種方法:detection=>traclets=>trajectories。在發(fā)生頻繁遮擋的情況下,因為難于判別不同物體,效果不是很好。

2、Online methods

Online methods can be applied to real-time applications because they sequentially(順序) build trajectories based on frame-by-frame(逐幀)association using online information up to the present frame.

However,online methods tend to produce fragmented trajectories and to drift under occlusion.

有問題也是小case啦,本文提出的方法可以解決上述的問題。

The proposed method is based on:

(1) tracklet confidence to handle track fragments due to occlusion or unreliable detections (說明了跟蹤碎片產(chǎn)生的原因是occlusion和unreliable detections,tracklet confidence就是用來處理跟蹤碎片的)

(2) online discriminative appearance learning to handle similar appearances of different objects in tracklet association.(在進行traclet association的時候,用ILDA來處理不同物體的相似外觀)

解決多目標跟蹤問題的strategy:associating tracklets in different ways according to their confidence values,更加具體的講:
reliable tracklets having high confidence are locally associated with online-provided detections,

whereas fragmented tracklets having low confidence are globally associated with other tracklets and detections.

我們可以看出,the core steps of the proposed method are the local and global associations.

并且,In both steps,appearance modeling is crucial for associating tracklets and detections of the same object while distinguishing different objects.

也就是說,在局部關聯(lián)和全局關聯(lián)這兩個步驟中,外觀學習是十分重要的。

在這里,外觀學習是很關鍵的,在這種情況下:區(qū)分不同的物體時,把同一個物體檢測出來,并且把小段軌跡給關聯(lián)起來。

名詞:ILDA,指的是incremental linear discriminant analysis(增量線性判別分析)

To this end(為此), we also propose a novel online discriminative appearance learning taking into consideration two main issues in multi-object tracking:

在多目標跟蹤領域的兩個問題:

(1) online learning to update appearance models according to ongoing(持續(xù)) tracking results

(根據(jù)持續(xù)的跟蹤結果在線更新外觀模型)=======> appearance model update

(2)online training sample collection for discriminating appearances of multiple tracked objects

(在線訓練用于區(qū)分多個跟蹤對象的外觀的樣本集合)==========> good training samples

The proposed online learning method is designed in consideration of two issues together to learn discriminative appearance models using an incremental linear discriminant analysis (ILDA).

ILDA特別好,一下子解決了model update 和samples的問題。具體說來,This allows us to distinguish each object and also incrementally update learned appearance models with online tracking results.

By exploiting the proposed appearance learning, tracklet assocition can be successfully performed even under occlusion.(由于有個好外觀,所以在發(fā)生遮擋的情況下,也能成功的進行軌跡關聯(lián))

本文的貢獻點:

(i) proposition(提出) of a tracklet confidence for evaluating tracklet’s reliability(可靠性), and two-step association using the tracklet confidence for building optimal tracklets

(提出用于評估tracklet可靠性的tracklet置信度,以及使用tracklet置信度來建立最佳Tracklet的兩步關聯(lián))

(ii) proposition of an online learning method for discriminating different objects and adapting learned appearances with ongoing tracking results

(提出一種用于區(qū)分不同的對象的在線學習方法,并通過持續(xù)的跟蹤結果調(diào)整學習的外觀)

(iii) proposition of a practical whole online tracking structure by effectively combining our methods

(通過有效結合我們的方法,提出實用的整體在線跟蹤結構)

這個結構如下所示:

Here, since the tracklet confidence lies in [0, 1], we consider a tracklet as a reliable tracklet with high confidence when conf (T i ) > 0.5; otherwise it is considered as the fragmented? tracklet with low-confidence.

2. Related Works

Some previous works related to online multi-object tracking and online appearance learning, the focus of this paper, are introduced in this section.

這部分實際上也是從兩個方法介紹的:tracklets+appearance model。

對于tracklets,在線的跟蹤方法在本地就frame-by-frame的把detections給build trajectories。對把detections關聯(lián)成tracklets有很多方法,易drift。在發(fā)生遮擋時有一篇文章建議對一下,CVPR2012的:Part-based multiple-person tracking with partial occlusion handling.在本文中,使用一種基于部件(part)的model to correctly associate detections under partial occlusion.

對于appearance model,要是能同時區(qū)分具有相似外觀的物體,并且根據(jù)跟蹤結果進行model update就好了??墒聦嵣喜皇沁@樣。舉個例子吧。有些方法在外觀上使用了color和other feature histograms,但是卻不能處理跟蹤物體的外觀變化。===>應該更新model。好吧,有些例子采用了用ensemble learning and online boosting的online learning methods更新了model,但是他們的方法是把一個object從background中區(qū)分的,rather than from other objects。=======>最好能學習一個Appearance model判別不同的objects,這里可以需要采集samples(collect positive samples from the same traclets and negative samples from other tracklets after? low-level associations),可能用到AdaBoost或者MIL instance learning的方法。但是因為是a batch manner,學習到的這種用于判別不同物體的模型難以更新。

3. Online Tracking with Tracklet Confidence

單詞釋義:????????????????????

velocity:速度

posterior probability:后驗概率

intuitively:直觀地

這部分關于的是小軌跡 tracklet confidence。為了理解代碼,這部分要仔細研讀。

問題的本質(zhì)

物體 i 在某一幀frame t 的出現(xiàn)是通過a binary function來設值的,設為1或者0。當物體i出現(xiàn)時,物體才是有狀態(tài)的,這種狀態(tài)用p,s,v來表示,分別代表position,size和velocity。至此,關于Object這塊兒說清楚了。(物體 = 出現(xiàn)?:1+狀態(tài))

對于出現(xiàn)的物體(有狀態(tài))可以連成軌跡T,這種軌跡往往是up to frame t, we define a tracklet of the object i as a set of states(用直到該幀的一系列狀態(tài)組成軌跡)

Note that directly solving Eq. (1) is not feasible in practice because the possible combinations of T 1:t and Z 1:t is innumerable.(想要通過最大化后驗概率估計的方法,在給定檢測Z的情況下,尋找最優(yōu)的軌跡T,根本行不通,因為T與Z的組合是無窮的)

所以,我們重新設定這種問題,使用的是小段軌跡的置信度。然后,提出了一種切實可行的解決方案。

3.1. Tracklet Confidence

Tracklet confidence can be intuitively(直觀地) interpreted as how well the constructed (構建好的)tracklet matches the real trajectory of the object.

一個好的軌跡應該盡可能的與真實的軌跡相match,而這種match程度可以通過tracklet confidence來衡量。

我們所考慮的是滿足下列要求的擁有高置信度的可信賴的軌跡。

Length: a short tracklet tends to be unreliable. A long tracklet is more likely to be a correct tracklet of an object.

Occlusion: a severely occluded tracklet by other tracklets is not appropriate as a reliable tracklet.

Affinity(密切關系、相似、近似、親和力): a high affinity between a tracklet and an associated detection indicates that the tracklets is reliable.

(軌跡和相關檢測之間的高親和力表明軌跡是可靠的。)

cardinality:基數(shù)

小段軌跡置信度的相關定義如下:




System parameters: All parameters have been found experimentally, and remained unchanged for all datasets.From an extensive evaluation, we find that most parameters do not affect the overall performance of our system much.In the affinity model in Eq. (10), all parameters (i.e. positions, sizes and velocities) are automatically determined by tracking results except for O F and O B , which were set to diag[30 2 75 2 ]. The same threshold θ = 0.4 is used for the local and global association.

3.2. Formulation with Tracklet Confidence

To effectively solve the online multi-object tracking problem, we reformulate the online multi-object problem Eq. (1) by using the tracklet confidence as

現(xiàn)在,用Eq 3來重新設定要解決的問題,用到了小段軌跡置信度。

問題的解決分成2大階段來進行。

階段1:tracklets with high confidence are locally associated with online-provided detections,


階段2:tracklets with low confidence, which are more likely to be fragmented, are globally associated with other tracklets and detections.

更具體的說,

the tracklets with high confidence are first considered to be locally associated with detections because more reliable detections originate from them rather than from tracklets with low confidence

這樣的話,The local association(本地關聯(lián))between the tracklets and detections allows us to progressively(逐步)grow locally optimal tracklets with online provided detections.

再次說明一下the reformulate的成果,解決方案是:

高置信度軌跡做本地關聯(lián)(用online-provided detections)

低置信度軌跡做全局關聯(lián)(用other tracklets and detections),低置信度軌跡可能是碎片哦。

為了搞懂代碼,以下兩部分(本地關聯(lián)和全局關聯(lián))必須仔細研究。

3.3. Local Association of Tracklets(超級重要)

下面我們來說說如何做local association。

本地關聯(lián)中,具有高置信度的軌跡的生成離不開一系列的檢測。為把detection responses和tracklets關聯(lián)起來,用到了pairwise association。在t幀時刻,當h個具有高置信度的軌跡和n個detections存在的時候,就可以定義a score matrix了。之后,我們用匈牙利算法決定tracklet-detection pairs =>so that the total affinity is maximized. When the association cost of a tracklet-detection pair is less than a pre-defined threshold,檢測Z就和軌跡進行關聯(lián)。

對于已經(jīng)和檢測Z關聯(lián)的軌跡,將進行如下步驟:

1)根據(jù)檢測Z更新p and v。s也會被更新,通過把最近幾幀的關聯(lián)檢測平均化。

2) 使用檢測Z和公式2更新置信度conf。

需要說明的是,可以跳過本地關聯(lián)直接進行全局關聯(lián),這樣做的結果是計算量大且容易造成歧義,性能Is also degraded。

3.4. Global Association of Tracklets(超級重要)

下面介紹了在不同的Event 下,參數(shù)的計算方式。

The same threshold θ used in the local association is also employed to select a reliable association pair having a high affinity score.

Once the cost matrix is computed, the optimal association pairs,which minimize the global association cost in G, are determined using the Hungarian algorithm , and the tracklets and their confidence values are updated with the results.

4. Discriminative Appearance Learning

在這部分,我們就要進行判別式外觀的學習了。

In the proposed learning method, online training samples are collected from tracked objects, and a discriminative projection space is updated with the collected samples using ILDA.

(在提出的學習方法中,從跟蹤對象收集在線訓練樣本,并用ILDA法收集的樣本來更新判別式投影空間。)

通過將軌跡的外觀模型投射到判別式投影空間,我們使軌跡的外觀更具判別性。

用新樣本更新判別式外觀的在線學習方法往往采用ILDA,當然也可以采用the online boosting method 。

The main reason for using ILDA is that appearances of multiple objects can be distinguished with a single updated LDA projection matrix, whereas specific classifiers of the objects are required in the boosting method.

A further benefit of using ILDA lies in its ability to memorize the discriminative information for a long time.This makes it possible to accurately identify objects even under significant pose(姿態(tài)) and appearance changes and long-term occlusion.

4.1. Training Sample Collection

在每個幀中,我們圍繞軌跡的精細位置,收集不同位置和尺寸的N個圖像補丁,用以判別不同的軌跡。But we only extract image patches from the tracklets with high confidence。在本文的圖中Training samples from the trackets with high confidence (red) and low confidence (blue)。


我們要提取的是高置信度的圖像補丁,即樣本。

這里是進行多種samples的采樣過程(正樣本和負樣本~)

啊,ILDA有降維的作用哦。

4.2. Online-Learning Algorithm

一些單詞翻譯:

scatter:離散 ? ? ? ? ?

eigenvector:特征向量?????????

incrementally:遞增地????????????

sufficient:充足的

eigendecomposition:特征分解

pseudo code:偽代碼

identical:一致、相同

還有一些公式,看原文吧!

It is necessary to incrementally update the existing projection matrix with updated samples because not all training samples are available in online multi-object tracking.

Figure 4 shows the updated projection matrices using batch LDA and ILDA, proving the accuracy of ILDA.

Figure 5中,We can see that the ILDA method is much more effective than the boosting method in terms of computation cost and identification (識別)accuracy.

In the batch LDA, a projection matrix U is constructed(構建).

We employ the ILDA method using sufficient spanning sets by maximizing class separability(可分離性) of the given training set.

這兩種方法相關的數(shù)學描述看paper。

5. Experiments


6. Conclusion

方法是具有有效性和魯棒性的。

We build optimal(最佳的) tracklets by sequentially linking tracklets and detections using the proposed local and global association according to their confidence.

Furthermore, the proposed online appearance learning allows us to discriminate(區(qū)分)multiple objects in both associations even in complex sequences.(即使在復雜的序列中也能區(qū)分兩個關聯(lián)中的多個對象。)

最后編輯于
?著作權歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內(nèi)容

  • **2014真題Directions:Read the following text. Choose the be...
    又是夜半驚坐起閱讀 11,118評論 0 23
  • 這個吃米的蟲子,夏天又復活了。 面條袋里,糯米袋里,米桶里全都是。 可能它一直就存在,只是天氣熱了出來散散步,活動...
    米莊閱讀 216評論 0 0
  • 十月六日,晴(四) 七區(qū)公寓 沙漠氣候以晝夜溫差大為特征,清晨的氣溫還算舒適,而到了正午簡直就像進了烤箱。我坐在熱...
    夏槿11閱讀 375評論 0 3
  • 孟門府邸, 墨黛青檐, 一枝流蘇, 似仙子下凡, 帶著雪兒, 帶著滿身書氣, 染綠腳下石階, 涂滿庭院國色。 聽,...
    阿里123閱讀 267評論 0 0

友情鏈接更多精彩內(nèi)容