《Effective Detection of Multimedia Protocol Tunneling using Machine Learning》譯文(二)

正文之前

緊接上文: 《Effective Detection of Multimedia Protocol Tunneling using Machine Learning》譯文(一)

正文

4 Decision Tree-based Classification

In this section, we depart from the use of similarity-based classifiers for detecting the presence of covert traffic. As it is unpractical to explore all possible machine learning algorithms, we focus our experiments in a subset of algorithms based on decision trees. We have chosen these algorithms due to their ability of handling data in a nonlinear fashion, their ability to perform feature selection, and the ease of interpretation of the resulting models. Our results show that this approach is highly effective at detecting covert traffic in the systems under study.

在本節(jié)中,我們不再使用基于相似性的分類器來(lái)檢測(cè)隱蔽流量的存在。由于探索所有可能的機(jī)器學(xué)習(xí)算法是不切實(shí)際的,我們將我們的實(shí)驗(yàn)集中在基于決策樹(shù)的算法子集中。我們選擇這些算法是因?yàn)樗鼈兡軌蛞苑蔷€性方式處理數(shù)據(jù),能夠執(zhí)行特征選擇,并且易于解釋所得到的模型。我們的結(jié)果表明,這種方法在檢測(cè)所研究系統(tǒng)中的隱蔽流量方面非常有效。

4.1 Selected Classifiers

We present a description of the decision-tree based algorithms we have chosen for conducting our experiments:

我們提供了我們?yōu)檫M(jìn)行實(shí)驗(yàn)而選擇的基于決策樹(shù)的算法的描述:

Decision Trees [41] build a model in the form of a tree structure, where each tree node is either a decision or leaf node, representing a branch or a label, respectively. Decision nodes split the current branch by an attribute. A splitting attribute is commonly chosen according to its expected information gain, i.e. the expected reduction in entropy caused by choosing the attribute for a split. The importance of each particular attribute can be assessed by analyzing the tree structure, where nodes closer to the root have a higher importance than those down the tree. Despite its simple interpretation, decision trees can result in complex models unable to generalize well or can build unstable models due to the presence of large numbers of correlated features. A popular way to mitigate such disadvantages is to use decision tree ensembles.

Decision Trees [41]以樹(shù)結(jié)構(gòu)的形式構(gòu)建模型,其中每個(gè)樹(shù)節(jié)點(diǎn)分別是表示分支或標(biāo)簽的決策或葉節(jié)點(diǎn)。決策節(jié)點(diǎn)按屬性拆分當(dāng)前分支。通常根據(jù)其預(yù)期信息增益來(lái)選擇分裂屬性,即通過(guò)選擇分割的屬性而導(dǎo)致的熵的預(yù)期減少。可以通過(guò)分析樹(shù)結(jié)構(gòu)來(lái)評(píng)估每個(gè)特定屬性的重要性,其中更靠近根的節(jié)點(diǎn)比樹(shù)下的節(jié)點(diǎn)具有更高的重要性。盡管其解釋簡(jiǎn)單,但決策樹(shù)可能導(dǎo)致復(fù)雜模型無(wú)法很好地推廣,或者由于存在大量相關(guān)特征而可能構(gòu)建不穩(wěn)定模型。減輕這些缺點(diǎn)的一種流行方法是使用決策樹(shù)集合。

Random Forests [6] are an ensemble learning method, where a label is predicted by performing a majority vote over the output of multiple decisions trees. To prevent overfitting, Random Forests introduce variance in the model through bootstrap aggregation, i.e. each tree is trained using a random sample (with replacement) of the training set. Additionally, Random Forests select random attributes of the feature set when building each tree, a technique named feature bagging. One method for assessing the importance of an attribute is to average its information gain across all trees in the ensemble.

隨機(jī)森林[6]是一種集合學(xué)習(xí)方法,其中通過(guò)對(duì)多個(gè)決策樹(shù)的輸出執(zhí)行多數(shù)投票來(lái)預(yù)測(cè)標(biāo)簽。為了防止過(guò)度擬合,隨機(jī)森林通過(guò)自舉聚合在模型中引入方差,即使用訓(xùn)練集的隨機(jī)樣本(替換)訓(xùn)練每棵樹(shù)。此外,隨機(jī)森林在構(gòu)建每棵樹(shù)時(shí)選擇要素集的隨機(jī)屬性,這是一種名為特征裝袋的技術(shù)。評(píng)估屬性重要性的一種方法是在整體中的所有樹(shù)上平均其信息增益。

eXtreme Gradient Boosting (XGBoost) [9] is another technique for building a model based on an ensemble of decision trees; it relies on a technique known as gradient tree boosting. XGBoost starts by building a shallow decision tree (i.e., a weak learner). In each step, XGBoost creates a new tree which optimizes the predictions performed by trees in earlier stages. XGBoost benefits from a regularized model formalization to control overfitting. The importance of individual attributes can be computed in a similar fashion to that of Random Forests. We find the use of XGBoost to be promising among a large pool of classification algorithms. In fact, XGBoost has played a central role on multiple winning solutions for recent data mining competitions, spawning multiple domains, such as the KDD Cup 2016 [12, 44]

eXtreme Gradient Boosting(XGBoost)[9]是另一種基于決策樹(shù)集合構(gòu)建模型的技術(shù);它依賴于一種稱為漸變樹(shù)增強(qiáng)的技術(shù)。 XGBoost首先建立一個(gè)淺層決策樹(shù)(即弱學(xué)習(xí)者)。在每個(gè)步驟中,XGBoost都會(huì)創(chuàng)建一個(gè)新樹(shù),以優(yōu)化樹(shù)在早期階段執(zhí)行的預(yù)測(cè)。 XGBoost受益于正則化模型規(guī)范化以控制過(guò)度擬合問(wèn)題??梢砸耘cRandom Forests類似的方式計(jì)算個(gè)體屬性的重要性。我們發(fā)現(xiàn)使用XGBoost在大量分類算法中很有前途。實(shí)際上,XGBoost在最近的數(shù)據(jù)挖掘競(jìng)賽的多個(gè)獲勝解決方案中發(fā)揮了核心作用,產(chǎn)生了多個(gè)領(lǐng)域,例如2016年KDD杯[12,44]

The next sections detail our experiments for evaluating the unobservability of Facet and DeltaShaper with the decision tree-based classifiers enumerated above. In our experiments we have used two distinct sets of features: summary statistics and quantized packet lengths. We omit a discussion over CovertCast, as we have found that all of these techniques can identify its covert traffic with a negligible false positive rate.

接下來(lái)的部分詳細(xì)介紹了我們使用上面列舉的基于決策樹(shù)的分類器評(píng)估Facet和DeltaShaper的不可觀察性的實(shí)驗(yàn)。在我們的實(shí)驗(yàn)中,我們使用了兩組不同的特征:匯總統(tǒng)計(jì)和量化分組長(zhǎng)度。我們省略了對(duì)CovertCast的討論,因?yàn)槲覀儼l(fā)現(xiàn)所有這些技術(shù)都可以用可忽略的誤報(bào)率識(shí)別其隱蔽流量。

4.2 Feature Set 1: Summary Statistics

The collection of encrypted traffic provides an adversarywith two main sources of data for extracting features necessary for the detection of covert channels: a timeseriesof packet lengths, and a time series of packet interarrivaltimes. Our first set of features comprises a collection of summary statistics computed over the network traces of legitimate and covert traffic. This is a prevalent approach at generating features for the problem of encrypted traffic fingerprinting [22, 38, 49]. Such set of features has notbeen previously applied in the detection of covert channels generated by multimedia protocol tunneling. As for the choice of summary statistics, we compute multiple descriptive statistics for the ingress/ egress packet flows of a connection as a whole, as well as for ingress/ egress traffic individually. This feature set includes simple descriptive statistics over the packet lengthand interarrival time timeseries - such as maximum,minimum, mean, and percentiles - as well as higherorder statistics like the skew or kurtosis of these timeseries. We also consider burst behavior [l], where aburst is a sequence of consecutive packets transmitted along the same direction of a given connection. A total of 166 features are used for training our classifiers. Due to space constraints, we relegate a full listing of the summary statistics we have considered to the appendix. Next, we present our main findings after attempting to detect multimedia protocol tunneling covert channels using the decision-tree based classifiers we have described, while feeding them with our collection of summary statistics. We report the performance of each classifier over 10-fold cross-validation.

加密流量的集合為對(duì)手提供了兩個(gè)主要數(shù)據(jù)源,用于提取檢測(cè)隱蔽信道所需的特征:分組長(zhǎng)度的時(shí)間序列和分組間隔時(shí)間的時(shí)間序列。我們的第一組功能包括通過(guò)合法和隱蔽流量的網(wǎng)絡(luò)跟蹤計(jì)算的匯總統(tǒng)計(jì)信息的集合。這是為加密流量指紋識(shí)別問(wèn)題生成特征的一種流行方法[22,38,49]。這些特征集先前尚未應(yīng)用于多媒體協(xié)議隧道生成的隱蔽信道的檢測(cè)。至于摘要統(tǒng)計(jì)的選擇,我們計(jì)算整個(gè)連接的入口/出口數(shù)據(jù)包流的多個(gè)描述性統(tǒng)計(jì)數(shù)據(jù),以及單獨(dú)的入口/出口流量。此功能集包括對(duì)包長(zhǎng)度到達(dá)時(shí)間間隔的簡(jiǎn)單描述性統(tǒng)計(jì) - 例如最大值,最小值,平均值和百分位數(shù) - 以及更高階的統(tǒng)計(jì)數(shù)據(jù),如這些時(shí)間序列的偏斜或峰度。我們還考慮突發(fā)行為[1],其中aburst是沿給定連接的相同方向發(fā)送的連續(xù)分組序列。共有166個(gè)特征用于訓(xùn)練我們的分類器。由于空間限制,我們將我們考慮的摘要統(tǒng)計(jì)信息的完整列表放到附錄中。接下來(lái),我們?cè)趪L試使用以下方法檢測(cè)多媒體協(xié)議隧道隱蔽信道之后,給出了我們的主要發(fā)現(xiàn)。我們已經(jīng)描述了基于決策樹(shù)的分類器,同時(shí)用我們的匯總統(tǒng)計(jì)數(shù)據(jù)集來(lái)訓(xùn)練它們。我們報(bào)告每個(gè)分類器的性能超過(guò)十折交叉驗(yàn)證。

十折交叉驗(yàn)證,英文名叫做10-fold cross-validation,用來(lái)測(cè)試算法準(zhǔn)確性。是常用的測(cè)試方法。將數(shù)據(jù)集分成十份,輪流將其中9份作為訓(xùn)練數(shù)據(jù),1份作為測(cè)試數(shù)據(jù),進(jìn)行試驗(yàn)。每次試驗(yàn)都會(huì)得出相應(yīng)的正確率(或差錯(cuò)率)。10次的結(jié)果的正確率(或差錯(cuò)率)的平均值作為對(duì)算法精度的估計(jì),一般還需要進(jìn)行多次10折交叉驗(yàn)證(例如10次10折交叉驗(yàn)證),再求其均值,作為對(duì)算法準(zhǔn)確性的估計(jì)。

1. The use of Random Forest/ XGBoost, used in tandem with summary statistics, largely undermines the unobservability claims of state-of-the-art multimedia protocol tunneling systems. Figure 2 shows the ROC curve for our decision tree -based classifiers when detecting Facet and DeltaShaper traffic resorting to summary statistic features (ST). Random Forest - ST exhibits a minimum AUC=0.95 when classifying all configurations of Facet taffic, while XGBoost - ST exhibits a minimum AUC=0.97. When compared to XGBoost - ST, the x2 classifier attains a maximum AUC=0.85. For DeltaShaper traffic, XGBoost - ST attains an AUC which is 0.22 larger for both DeltaShaper configurations, when compared to that obtained by the x2 classifier.

1.隨機(jī)森林/ XGBoost的使用與摘要統(tǒng)計(jì)一起使用,在很大程度上破壞了最先進(jìn)的多媒體協(xié)議隧道系統(tǒng)的不可觀察性要求。圖2顯示了檢測(cè)Facet和DeltaShaper流量采用匯總統(tǒng)計(jì)特征(ST)時(shí)基于決策樹(shù)的分類器的ROC曲線。隨機(jī)森林 - ST 當(dāng)對(duì)所有Facet流量配置進(jìn)行分類時(shí),表現(xiàn)出最小AUC = 0.95,而XGBoost-ST表現(xiàn)出最小AUC = 0.97。與XGBoost-ST相比,x2分類器達(dá)到最大AUC = 0.85。對(duì)于DeltaShaper流量,XGBoost-ST獲得的AUC比兩個(gè)DeltaShaper配置大0.22,與x2分類器相比。

AUC(Area Under Curve)被定義為ROC曲線下與坐標(biāo)軸圍成的面積,顯然這個(gè)面積的數(shù)值不會(huì)大于1。又由于ROC曲線一般都處于y=x這條直線的上方,所以AUC的取值范圍在0.5和1之間。使用AUC值作為評(píng)價(jià)標(biāo)準(zhǔn)是因?yàn)楹芏鄷r(shí)候ROC曲線并不能清晰的說(shuō)明哪個(gè)分類器的效果更好,而作為一個(gè)數(shù)值,對(duì)應(yīng)AUC更大的分類器效果更好。

2. It is possible to flag a vast majority of covert channels with a very small number of false positives. An adversary that aims at flagging at least 90% of all Facet s=50% connections incurs in a 14.1% FPR when resorting to Random Forest - ST, and a FPR as short as 7.1% when resorting to XGBoost - ST. To flag at least 70% of the same kind of traffic, XGBoost - ST incurs in a FPR of only 1%. In comparison, Figure 1a shows that for correctly identifying just 70% of Facet s=50% traffic when resorting to the X2 classifier, an adversary would face an alarming 21 .5% FPR. The situation is similar for an adversary wishing to flag 90% of DeltaShaper (320x 240,8x 8,6, 1) raffic. For flagging 90% of this kind of traffic, Random Forest ST incurs in a 30.3% FPR and XGBoost ST incurs in a 12.1% FPR. To flag 70% of the same kind of traffic, XGBoost - ST incurs in a FPR of 4%. Flagging just 70% of this kind of traffic with the X- classifier would amount to a 32.2% FPR.

2.有可能用絕對(duì)少量的誤報(bào)來(lái)標(biāo)記絕大多數(shù)隱蔽通道。旨在標(biāo)記至少90% Facet s = 50%連接的對(duì)手在使用隨機(jī)森林 - ST時(shí)產(chǎn)生14.1%的FPR,并且當(dāng)使用XGBoost-ST時(shí),F(xiàn)PR減少至7.1%。而如果只是為了標(biāo)記至少70%的同樣流量,XGBoost-ST的FPR僅為1%。相比之下,圖1a顯示,在使用X2分類器時(shí),為了正確識(shí)別70%的Facet s = 50%的流量,對(duì)手將面臨驚人的21.5%FPR。對(duì)于希望標(biāo)記90% DeltaShaper(320x 240,8x 8,6,1)交易的對(duì)手而言,情況類似。為了標(biāo)記90%的此類流量,Random Forest ST產(chǎn)生30.3%的FPR,而XGBoost ST產(chǎn)生12.1%的FPR。為了標(biāo)記70%的同類流量,XGBoost-ST的FPR為4%。用X2分類器標(biāo)記這種流量的70%將達(dá)到32.2%的FPR。

4.3 Feature Set 2: Quantized PLs(特征集2:量化PL)

An alternative feature set is comprised of the quantized frequency distribution of packet lengths, where each K size bin acts as an individual feature. While this feature set is akin to that previously used in KL and EMD similarity-based classifiers, we process these features in a fundamentally different way. In particular, the similarity-based classifiers output a distance score based on the overall difference of the packet lengths frequency distribution, while failing to adjust this score according to the importance of relevant regions of the feature space. Informally, they risk to dilute the greater discriminating power of a given feature among that of possibly irrelevant features [35]. We aim at exploiting the different relevance of particular ranges of the feature space by feeding this feature set to decision tree-based classifiers.

In terms of feature sets, for Facet, we take as features the quantized frequency distribution of packet lengths for the flow carrying covert data. We use K=5 as we have experimentally verified that the classification performance of our decision tree-based algorithms benefit from a finegrained quantization. As for DeltaShaper, and due to the system's bidirectionality, we use the quantized frequency distribution of packet sizes flowing in both directions. Here, we also apply a quantization with K=5. Note that the evaluation performed with the similarity-based classifiers described in Section 3 also considers the same selection on the direction of traffic flows to analyze.

Next, we describe our findings after attempting to identify covert traffic with such feature sets. Figure 3 shows the ROC curve for our decision treebased classifiers when detecting Facet and DeltaShaper traffic resorting to quantized packet lengths as features (PL).

另一個(gè)特征集包括分組長(zhǎng)度的量化頻率分布,其中每個(gè)K大小的區(qū)間作為單獨(dú)的特征。雖然此功能集類似于之前在KL和EMD基于相似性的分類器中使用的功能集,但我們以完全不同的方式處理這些功能。具體地,基于相似度的分類器基于分組長(zhǎng)度頻率分布的總體差異輸出距離分?jǐn)?shù),而未能根據(jù)特征空間的相關(guān)區(qū)域的重要性來(lái)調(diào)整該分?jǐn)?shù)。非正式地,它們有可能削弱特定特征在可能不相關(guān)的特征之間的更大區(qū)別能力[35]。我們的目標(biāo)是通過(guò)將此特征集提供給基于決策樹(shù)的分類器來(lái)利用特征空間的特定范圍的不同相關(guān)性。

就特征集而言,對(duì)于Facet,我們將流量攜帶隱蔽數(shù)據(jù)的分組長(zhǎng)度的量化頻率分布作為特征。我們使用K = 5,因?yàn)槲覀円呀?jīng)通過(guò)實(shí)驗(yàn)驗(yàn)證了基于決策樹(shù)的算法的分類性能受益于細(xì)粒度量化。至于DeltaShaper,由于系統(tǒng)的雙向性,我們使用在兩個(gè)方向上流動(dòng)的數(shù)據(jù)包大小的量化頻率分布。這里,我們還應(yīng)用K = 5的量化。請(qǐng)注意,使用第3節(jié)中描述的基于相似性的分類器執(zhí)行的評(píng)估也考慮了要分析的業(yè)務(wù)流方向的相同選擇。

接下來(lái),我們?cè)趪L試使用此類功能集識(shí)別隱蔽流量后描述我們的發(fā)現(xiàn)。圖3顯示了當(dāng)檢測(cè)Facet和DeltaShaper流量采用量化數(shù)據(jù)包長(zhǎng)度作為特征(PL)時(shí),基于決策樹(shù)的分類器的ROC曲線。

1. Quantized packet lengths outperform the use of summary statistics. In general, the AUC obtained by our decision-tree based classifiers is comparable or superior to the AUC obtained by the same classifiers when making use of summary statistics. Both Random Forest - PL and XGBoost - PL obtain an AUC=0.99 when identifying Facet traffic. This represents a maximum improvement of 0.04 over Random Forest ST and 0.02 over XGBoost - ST. While Decision Tree - PL attains a maximum AUC=0.91, it is still short of the maximum AUC attained by Random Forest - ST. This trend is similar in the classification of DeltaShaper traffic, where the AUC obtained by Decision Tree - PL is also inferior to that of tree ensembles. The detection of (160x 120,4 x 4,6, 1) DeltaShaper traffic benefits the most from packet length features, where XGBoost - PL attains an AUC=0.85, 0.08 larger than that obtained by XGBoost - ST. Interestingly, the detection of (320 x 240,8 x 8,6, 1) DeltaShaper raffic is better performed by XGBoost - ST, albeit by a slight improvement of 0.01 over the AUC of XGBoost - PL.

1.量化的數(shù)據(jù)包長(zhǎng)度優(yōu)于使用匯總統(tǒng)計(jì)數(shù)據(jù)。通常,我們的基于決策樹(shù)的分類器獲得的AUC與使用匯總統(tǒng)計(jì)時(shí)由相同分類器獲得的AUC相當(dāng)或更優(yōu)。隨機(jī)森林 - PL和XGBoost - PL在識(shí)別Facet流量時(shí)獲得AUC = 0.99。這表示與Random Forest ST相比最大改善0.04,而在XGBoost-ST上最大改善0.02。當(dāng)決策樹(shù) - PL達(dá)到最大AUC = 0.91時(shí),它仍然低于隨機(jī)森林 - ST獲得的最大AUC。這種趨勢(shì)在DeltaShaper流量的分類中類似,其中Decision Tree-PL獲得的AUC也低于樹(shù)集合的AUC。(160x 120,4 x 4,6,1)DeltaShaper流量的檢測(cè)從包長(zhǎng)度特征中獲益最多,其中XGBoost-PL達(dá)到AUC = 0.85,比XGBoost-ST獲得的大0.08。有趣的是,XGBoost-ST可以更好地檢測(cè)(320 x 240,8 x 8,6,1)DeltaShaper流量,盡管比XGBoost-PL的AUC稍微提高了0.01。

4.4 Feature Importance

The above set of experiments allowed us to implicitly identify which features are more important to distinguish between two classes of raffic. Figure 4a shows the top 20 most important summary statistics for detecting Facet traffic s =50%, as reported by the XGBoost algorithm. Figure 4b summarizes the 20 most important quantized ranges of packet lengths. The features annotated with “Out” correspond to those generated by the packet flow directed towards the client (carrying the covert payload), while the features annotated with In" correspond to the packet flow directed towards the Facet server. Figure 4c depicts the top 20 most important summary statistics for detecting DeltaShaper (320 x 240,8 x 8,6, 1) raffie, as reported by XGBoost. Similarly, Figure 4d depicts the most important quantized ranges of packet lengths for detecting the same kind of traffic. Each feature is annotated with "Out or In", depending on the particular Skype peer originating covert traffic. We note that both peers generate covert traffic simultaneously due to DeltaShaper's bidirectionality. Below, we discuss the main findings of our analysis.

上述一組實(shí)驗(yàn)使我們能夠隱含地確定哪些特征對(duì)于區(qū)分兩類流量更為重要。圖4a顯示了檢測(cè)Facet流量s = 50%的前20個(gè)最重要的匯總統(tǒng)計(jì)信息,如XGBoost算法所報(bào)告的。圖4b總結(jié)了20個(gè)最重要的分組長(zhǎng)度量化范圍。用“Out”注釋的特征對(duì)應(yīng)于由指向客戶端的分組流(攜帶隱蔽有效載荷)生成的特征,而用“In”注釋的特征對(duì)應(yīng)于指向Facet服務(wù)器的分組流。圖4c描繪了前20個(gè)用于檢測(cè)DeltaShaper(320 x 240,8 x 8,6,1)raffie的最重要的摘要統(tǒng)計(jì)數(shù)據(jù),如XGBoost所報(bào)告的那樣。圖4d描述了用于檢測(cè)相同類型流量的最重要的數(shù)據(jù)包長(zhǎng)度量化范圍。用“Out”或“In”注釋,取決于特定的Skype同伴發(fā)起的隱蔽流量。我們注意到由于DeltaShaper的雙向性,兩個(gè)對(duì)等體同時(shí)生成隱蔽流量。下面,我們討論我們分析的主要發(fā)現(xiàn)。

1. Facet is more vulnerable to analysis based on packet lengths and burst behavior. Figure 4a shows that Facet detection is driven by features related to the packet lengths and the burst behavior of the connection, whereas packet timing does not contribute as much. An interesting observation is that the majority of packet bursts features considered important for classification are those included in the flow directed towards the Facet server, which carries no covert data. This fact suggests that Skype flows exhibit some degree of codependency and that both flows provide useful information for distinguishing between legitimate and covert transmissions. Features included in the top 10, and that directly concern the length of packets,index summary statistics from the flow carrying covert data. This suggests that the flow carrying covert data is the prime target for inspection when analyzing packet lengths. Additionally, packet lengths comprehended between the 10th and 40th percentiles, amounting to packets with a mean length comprehended between 138 and 213 bytes, have a superior discriminating power among other packet sizes(翻譯的不是很好。。。我回頭再看看). XGBoost ranks 123 of the 166 features with a non-zero importance score.

1. Facet更容易受到基于數(shù)據(jù)包長(zhǎng)度和突發(fā)行為的分析的影響。圖4a表示出了Facet檢測(cè)由與分組長(zhǎng)度和連接的突發(fā)行為相關(guān)的特征驅(qū)動(dòng),而分組定時(shí)沒(méi)有貢獻(xiàn)那么多。一個(gè)有趣的觀察結(jié)果是,對(duì)于分類而言被認(rèn)為重要的大多數(shù)分組突發(fā)特征是包含在面向Facet服務(wù)器的流中的那些,其不攜帶隱蔽數(shù)據(jù)。這一事實(shí)表明Skype流程表現(xiàn)出一定程度的相互依賴性,并且兩種流程都提供了有用的信息來(lái)區(qū)分合法和隱蔽傳輸。前十名中包含的功能,直接涉及數(shù)據(jù)包的長(zhǎng)度,來(lái)自攜帶隱蔽數(shù)據(jù)的流的索引摘要統(tǒng)計(jì)。這表明攜帶隱蔽數(shù)據(jù)的流量是分析數(shù)據(jù)包長(zhǎng)度時(shí)檢查的主要目標(biāo)。另外,在第10和第40百分位之間理解的分組長(zhǎng)度相當(dāng)于具有在138和213字節(jié)之間的平均長(zhǎng)度的分組,在其他分組大小之間具有優(yōu)越的區(qū)分能力。 XGBoost對(duì)166個(gè)特征中重要性得分非零的123個(gè)進(jìn)行排名。

2. DeltaShaper is more vulnerable to analysis based packet lengths. Figure 4c shows that l0 of the most important features for detecting DeltaShaper regard descriptive statistics of packet lengths. In particular, 7 out of the top l0 most important features for identifying DeltaShaper traffic are related to the length of transmitted packets. Contrary to Facet, these features include a mixture of traffic originating in different peers, which is expected according to the bidirectionality of the covert channel. We find the most influential packet lengths to be within the range of the 40th and 80th percentiles, amounting to packets with a mean length comprehended between 1026-1180 bytes. XGBoost ranks 132 of the 166 features with an importance score larger than zero.

2. DeltaShaper的分析對(duì)數(shù)據(jù)包的分組長(zhǎng)度更加敏感。圖4c顯示了用于檢測(cè)DeltaShaper的10個(gè)最重要的特征是關(guān)于分組長(zhǎng)度的描述性統(tǒng)計(jì)。特別是,用于識(shí)別DeltaShaper流量的前10個(gè)最重要特征中的7個(gè)與傳輸分組的長(zhǎng)度有關(guān)。與Facet相反,這些特征包括源自不同對(duì)等體的流量混合,這是根據(jù)隱蔽信道的雙向性預(yù)期的。我們發(fā)現(xiàn)最有影響的數(shù)據(jù)包長(zhǎng)度在40和80百分位的范圍內(nèi),相當(dāng)于平均長(zhǎng)度在1026-1180字節(jié)之間的數(shù)據(jù)包。 XGBoost在166個(gè)特征中對(duì)重要性得分為非零的132個(gè)特征進(jìn)行排名。

2. DeltaShaper is more vulnerable to analysis based packet lengths. Figure 4c shows that l0 of the most important features for detecting DeltaShaper regard descriptive statistics of packet lengths. In particular, 7 out of the top l0 most important features for identifying DeltaShaper traffic are related to the length of transmitted packets. Contrary to Facet, these features include a mixture of traffic originating in different peers, which is expected according to the bidirectionality of the covert channel. We find the most influential packet lengths to be within the range of the 40th and 80th percentiles, amounting to packets with a mean length comprehended between 1026-1180 bytes. XGBoost ranks 132 of the 166 features with an importance score larger than zero.

2. DeltaShaper更容易受到基于分析的數(shù)據(jù)包長(zhǎng)度的影響。圖4c顯示了用于檢測(cè)DeltaShaper的10個(gè)最重要的特征是關(guān)于分組長(zhǎng)度的描述性統(tǒng)計(jì)。特別是,用于識(shí)別DeltaShaper流量的前10個(gè)最重要特征中的7個(gè)與傳輸分組的長(zhǎng)度有關(guān)。與Facet相反,這些特征包括源自不同對(duì)等體的流量混合,這是根據(jù)隱蔽信道的雙向性預(yù)期的。我們發(fā)現(xiàn)最有影響的數(shù)據(jù)包長(zhǎng)度在40和80百分位的范圍內(nèi),相當(dāng)于平均長(zhǎng)度在1026-1180字節(jié)之間的數(shù)據(jù)包。 XGBoost在166個(gè)特征中排名132,重要性分?jǐn)?shù)大于零。

3. Facet covert channels can be spotted by looking for packets with a length comprehended between 115-195 bytes. Figure 4b not only shows that the most important bin corresponds to that by the packets which length is close to 150, but also that the top 10 features are dominated by packets which lengths are in the range of 115 to 195 bytes. This result concurs with our previous observation, where the most important percentiles of packet lengths focused packets with a mean length between 137 and 200 bytes. This observation is also true when detecting Facet s={12.5%,25%} traffic. This finding suggests that the major factor leading to the distinguishing of Facet traffic concerns the packets carrying audio, which are typically located in the range between 100 and 200 bytes [37]. Additionally, we can observe that some of the least important features included in the top 20 for identifying Facet s = 50% flows include packets with a length between 945-985 bytes. This result hints that larger areas dedicated to video payload translate into packet-level modifications in a higher range of the feature space. Additionally, XGBoost ranks only 175 out of 300 features with a non-zero importance score, suggesting that only approximately half of the quantized packet length bins contribute for the discrimination of Facet traffic.

3.通過(guò)查找長(zhǎng)度在115-195字節(jié)之間的數(shù)據(jù)包,可以發(fā)現(xiàn)Facet隱蔽信道。圖4b不僅表明最重要的bin對(duì)應(yīng)于長(zhǎng)度接近150的分組,而且前10個(gè)特征由長(zhǎng)度在115到195字節(jié)范圍內(nèi)的分組控制。這個(gè)結(jié)果與我們之前的觀察一致,其中最重要的數(shù)據(jù)包長(zhǎng)度百分位集中了平均長(zhǎng)度在137到200字節(jié)之間的數(shù)據(jù)包。當(dāng)檢測(cè)到Facet s = {12.5%,25%}流量時(shí),這種觀察也是如此。這一發(fā)現(xiàn)表明,導(dǎo)致Facet流量區(qū)分的主要因素涉及攜帶音頻的數(shù)據(jù)包,這些數(shù)據(jù)包通常位于100到200字節(jié)之間[37]。另外,我們可以觀察到,前20中用于識(shí)別Facet s = 50%流的一些最不重要的特征包括長(zhǎng)度在945-985字節(jié)之間的分組。該結(jié)果暗示專用于視頻有效載荷的較大區(qū)域轉(zhuǎn)換為更高范圍的特征空間中的分組級(jí)修改。此外,XGBoost在具有非零重要性分?jǐn)?shù)的300個(gè)特征中僅排名175,這表明僅有大約一半的量化分組長(zhǎng)度分組有助于區(qū)分Facet流量。

4. DeltaShaper covert channels can be spotted by looking for packets with a length between 85-100 and 1105-1205 bytes. Figure 4d shows that the two most important features for identifying DeltaShaper ?320 × 240,8×8,6,1? traffic correspond to the packets which size is close to 100 bytes (flowing in both directions). The top 20 features are dominated by packet length bins in the range from 85-100 and 1105-1205 bytes, suggesting that DeltaShaper data modulation markedly affects two distinct regions of the feature space. The region including larger packets roughly overlaps the mean length of the packets included in the most important percentiles of our analysis of summary statistics. Considering that DeltaShaper’s covert data embedding procedure specifically targets the video layer of Skype calls, this finding suggests that such modulation largely affects larger packets of the connection. When classifying DeltaShaper ?320×240,8×8,6,1? traffic, XGBoost ranks 253 out of 600 features with a non-zero importance score.

4.通過(guò)查找長(zhǎng)度在85-100和1105-1205字節(jié)之間的數(shù)據(jù)包,可以發(fā)現(xiàn)DeltaShaper隱蔽通道。圖4d顯示識(shí)別DeltaShaper?320×240,8×8,6,1?流量的兩個(gè)最重要的特征對(duì)應(yīng)于大小接近100字節(jié)(在兩個(gè)方向上流動(dòng))的分組。前20個(gè)特征由分組長(zhǎng)度箱控制,范圍從85-100和1105-1205字節(jié),這表明DeltaShaper數(shù)據(jù)調(diào)制顯著影響特征空間的兩個(gè)不同區(qū)域。包括較大數(shù)據(jù)包的區(qū)域與我們的摘要統(tǒng)計(jì)分析中最重要百分位中包括的數(shù)據(jù)包的平均長(zhǎng)度大致重疊??紤]到DeltaShaper的隱蔽數(shù)據(jù)嵌入程序?qū)iT針對(duì)Skype呼叫的視頻層,這一發(fā)現(xiàn)表明這種調(diào)制很大程度上影響了較大的連接數(shù)據(jù)包。在對(duì)DeltaShaper?320×240,8×8,6,1?流量進(jìn)行分類時(shí),XGBoost在600個(gè)功能中排名253,具有非零重要性分?jǐn)?shù)。

The most important features for detecting DeltaShaper ?160 × 120, 4 × 4, 6, 1? traffic largely overlap the two feature set regions already reported. However, we verify that the region including larger packet lengths was significantly expanded, including bins representing packets with a size within the range of 885-1200 bytes.

檢測(cè)DeltaShaper?160×120,4×4,6,1?流量的最重要特征在很大程度上重疊了已報(bào)告的兩個(gè)特征集區(qū)域。但是,我們驗(yàn)證包含較大數(shù)據(jù)包長(zhǎng)度的區(qū)域是否顯著擴(kuò)展,包括表示大小在885-1200字節(jié)范圍內(nèi)的數(shù)據(jù)包的bin。

4.5 Alternative Dataset Evaluation

We have constructed and handled our dataset by following the same methodology adopted by previous works under study. However, this methodology may raise a few concerns. In particular, the covert streams (positive class) have been produced using the available legitimate videos (negative class), which may introduce some form of correlation among classes. Furthermore, this methodology generates a 1:1 ratio of positive to negative classes, which may be unrealistic if covert streams are a minority among the traffic found in the wild. Thus, one may wonder how accurate is our classifier if: i) the positive class is no longer correlated with the negative class during testing; ii) the positive-to-negative sample ratio is low during testing. To validate the effectiveness of our approach, we performed two additional experiments.

我們按照以前研究中采用的相同方法構(gòu)建和處理了我們的數(shù)據(jù)集。但是,這種方法可能引起一些擔(dān)憂。特別是,隱藏流(正類)已經(jīng)使用可用的合法視頻(否定類)產(chǎn)生,這可能在類之間引入某種形式的相關(guān)性。此外,這種方法產(chǎn)生了1:1的正負(fù)比例,如果隱蔽流在公共環(huán)境中發(fā)現(xiàn)的流量中占少數(shù),這可能是不現(xiàn)實(shí)的。因此,如果出現(xiàn)以下情況,我們可能會(huì)想知道我們的分類器有多準(zhǔn)確:i)在測(cè)試過(guò)程中,正類不再與負(fù)類相關(guān); ii)測(cè)試期間正負(fù)樣本比率低。為了驗(yàn)證我們的方法的有效性,我們進(jìn)行了另外兩個(gè)實(shí)驗(yàn)。

First, we performed an experiment which removed the correlations between the positive and negative classes. We split our legitimate traffic dataset in half, using only one half as legitimate samples. Then, for creating our covert video dataset, we selected those covert videos which embed modulated data in the legitimate videos out of our reduced legitimate traffic dataset. We then used XGBoost to build a model through 10-fold crossvalidation. To prevent the fitting of results to a particular choice of the initial legitimate samples, we repeated the process 10 times while randomly choosing such samples.

首先,我們進(jìn)行了一項(xiàng)實(shí)驗(yàn),刪除了正負(fù)類之間的相關(guān)性。我們將合法的流量數(shù)據(jù)集分成兩半,只使用一半作為合法樣本。然后,為了創(chuàng)建我們的隱蔽視頻數(shù)據(jù)集,我們選擇了那些將調(diào)制數(shù)據(jù)嵌入合法視頻中的隱蔽視頻,這些視頻來(lái)自我們簡(jiǎn)化的合法流量數(shù)據(jù)集。然后我們使用XGBoost通過(guò)十折交叉驗(yàn)證來(lái)構(gòu)建模型。為了防止將結(jié)果擬合到初始合法樣本的特定選擇,我們重復(fù)該過(guò)程10次,同時(shí)隨機(jī)選擇這樣的樣本。

Second, we performed an experiment where we keep the positive-to-negative sample ratio low during testing. We split our data in training / testing sets in a 70 / 30 proportion, and where we kept the training set ratio as 1:1, and keep the positive to negative ratio of the testing set to 1:100. To prevent the fitting of results to a particular split of the data, we randomly choose each set 10 times.

其次,我們進(jìn)行了一項(xiàng)實(shí)驗(yàn),在測(cè)試過(guò)程中我們將正負(fù)采樣率保持在較低水平。我們將訓(xùn)練/測(cè)試集中的數(shù)據(jù)分成70/30比例,并且我們將訓(xùn)練集比率保持為1:1,并將測(cè)試集的正負(fù)比保持為1:100。為了防止將結(jié)果擬合到特定的數(shù)據(jù)分割,我們隨機(jī)選擇每組10次。

The results of our additional experiments suggest that possible correlations among training and testing data, as well as sample ratios, do not limit the accuracy of our approach. For our first experiment, XGBoost obtained an AUC=0.94 for DeltaShaper ?320 × 240, 8 × 8, 6, 1? traffic (only 0.01 less than the results reported in Section 4.3), and an AUC=0.99 for traffic pertaining to Facet s=50% configuration. As for the second experiment, XGBoost was able to correctly identify 90% of Facet s=50% traffic with an FPR of only 2%, while it was able to identify 90% of DeltaShaper ?320×240,8×8,6,1? traffic with an FPR of 18% (only 4% larger).

我們的其他實(shí)驗(yàn)結(jié)果表明,訓(xùn)練和測(cè)試數(shù)據(jù)之間可能的相關(guān)性以及樣本比率并不會(huì)限制我們方法的準(zhǔn)確性。對(duì)于我們的第一個(gè)實(shí)驗(yàn),XGBoost獲得了DeltaShaper?320×240,8×8,6,1?流量的AUC = 0.94(僅比第4.3節(jié)中報(bào)告的結(jié)果少0.01),并且與Facet相關(guān)的流量的AUC = 0.99 s = 50%配置。至于第二個(gè)實(shí)驗(yàn),XGBoost能夠正確識(shí)別90%的Facet s = 50%流量,F(xiàn)PR僅為2%,同時(shí)能夠識(shí)別出DeltaShaper的總數(shù)為<300×240,8×8,6, 1?流量,F(xiàn)PR為18%(僅增加4%)。

4.6 Practical Considerations

This section details several practical considerations which may be useful to an adversary considering the use of decision tree classifiers for the detection of covert channels. The following results reflect processing time in a VM configuration akin to that described in Section 2.4.

本節(jié)詳細(xì)介紹了考慮使用決策樹(shù)分類器檢測(cè)隱蔽通道的對(duì)手可能有用的幾個(gè)實(shí)際考慮因素。以下結(jié)果反映了VM配置中的處理時(shí)間,類似于第2.4節(jié)中描述的處理時(shí)間。

Feature extraction. The extraction of quantized packet length bins from a 60 second Facet network trace amounts to an average of 0.33s per sample. Generating summary statistics describing the same type of traffic flow amounts to an average of 0.44s per sample. This result indicates that an adversary can quickly generate feature vectors for conducting subsequent classification.

特征提取。從60秒的Facet網(wǎng)絡(luò)軌跡中提取量化的分組長(zhǎng)度,每個(gè)樣本平均0.33s。生成描述相同類型的流量的匯總統(tǒng)計(jì)數(shù)據(jù)平均為每個(gè)樣本0.44s。該結(jié)果表明攻擊者可以快速生成特征向量以進(jìn)行后續(xù)分類。

Memory and storage requirements. Table 2 depicts the memory and storage requirements for holding a single Facet or DeltaShaper sample. In our Python implementation, a NumPy [47] array storing the quantized packet lengths describing a Facet sample (300 attributes) occupies 2.4kB of memory per sample. In comparison, an array containing the bi-grams required by the χ2 classifier occupy a total of 45kB per sample. The numbers in Table 2 suggest that an adversary can efficiently store and process large datasets. As an example, storing 1 million Facet quantized packet lengths feature vectors in a raw ASCII text file would only occupy approximately 1GB of disk space. Storing summary statistics in raw ASCII text would occupy nearly twofold the space due to the characters required to represent floating-point precision.

內(nèi)存和存儲(chǔ)要求。表2描述了保存單個(gè)Facet或DeltaShaper示例的內(nèi)存和存儲(chǔ)要求。在我們的Python實(shí)現(xiàn)中,存儲(chǔ)描述Facet樣本(300個(gè)屬性)的量化包長(zhǎng)度的NumPy [47]數(shù)組占用每個(gè)樣本2.4kB的內(nèi)存。相比之下,包含χ2分類器所需的二元組的陣列每個(gè)樣本總共占據(jù)45kB。表2中的數(shù)字表明,攻擊者可以有效地存儲(chǔ)和處理大型數(shù)據(jù)集。例如,在原始ASCII文本文件中存儲(chǔ)100萬(wàn)個(gè)Facet量化包長(zhǎng)度特征向量只占用大約1GB的磁盤(pán)空間。由于表示浮點(diǎn)精度所需的字符,在原始ASCII文本中存儲(chǔ)摘要統(tǒng)計(jì)信息將占用幾乎兩倍的空間。

Model building and classification speed. Table 3 depicts the average training time of our classifiers, as well as the average time to output a prediction. Building a Decision Tree PL for identifying Facet traffic takes an average of 0.27s. For an ensemble composed of 100 trees, Random Forest PL and XGBoost – PL models are built in 1.45s and 0.41s, respectively. Moreover, the average classification time for an individual sample is 180μs for XGBoost – PL. XGBoost is not only more accurate but also trains faster and exhibits a faster classification speed than Random Forest. This relation is also present when classifying DeltaShaper traffic. These results stress the fact that an adversary would benefit from using XGBoost to detect multimedia protocol tunneling covert channels.

模型構(gòu)建和分類速度。表3描述了我們的分類器的平均訓(xùn)練時(shí)間,以及輸出預(yù)測(cè)的平均時(shí)間。構(gòu)建用于識(shí)別Facet流量的決策樹(shù)PL平均需要0.27秒。對(duì)于由100棵樹(shù)組成的集合,Random Forest PL和XGBoost-PL模型分別建于1.45s和0.41s。此外,對(duì)于XGBoost-PL,單個(gè)樣品的平均分類時(shí)間為180μs。 XGBoost不僅更準(zhǔn)確,而且訓(xùn)練更快,并且比Random Forest表現(xiàn)出更快的分類速度。在對(duì)DeltaShaper流量進(jìn)行分類時(shí)也會(huì)出現(xiàn)此關(guān)系。這些結(jié)果強(qiáng)調(diào)了這樣一個(gè)事實(shí),即攻擊者將受益于使用XGBoost來(lái)檢測(cè)多媒體協(xié)議隧道隱蔽信道。

Generalization ability of the classifiers. A classifier with good generalization ability is able to perform correct predictions for previously unseen data. Albeit the AUC obtained by our decision tree-based classifiers suggests that these can generalize well, we further assess their classification performance when training data is severely limited. We split our data in two 10 / 90 training and testing sets, and report the mean AUC obtained by the classifier after repeating this process 10 times while randomly choosing the samples making part of each set. In this setting, when classifying Facet s=50%, XGBoost PL attains an AUC=0.98, only 0.01 short of that obtained after 10x cross-validation. For DeltaShaper ?160×120,4×4,6,1? traffic, XGBoost PL attains an AUC 0.1 smaller than their 10x cross-validation counterpart. These results suggest that an adversary can build accurate decision tree-based classifiers for detecting covert traffic while resorting to a small sample of data.

分類器的泛化能力。具有良好泛化能力的分類器能夠?qū)ο惹拔匆?jiàn)過(guò)的數(shù)據(jù)執(zhí)行正確的預(yù)測(cè)。雖然我們基于決策樹(shù)的分類器獲得的AUC表明這些可以很好地推廣,但是當(dāng)訓(xùn)練數(shù)據(jù)嚴(yán)重受限時(shí),我們會(huì)進(jìn)一步評(píng)估其分類性能。我們將數(shù)據(jù)分成兩個(gè)10/90訓(xùn)練和測(cè)試集,隨機(jī)選擇構(gòu)成每組測(cè)試數(shù)據(jù)的樣本,重復(fù)測(cè)試10次后報(bào)告分類器獲得的平均AUC。在此設(shè)置中,當(dāng)分級(jí)Facet s = 50%時(shí),XGBoost PL達(dá)到AUC = 0.98,僅比十折交叉驗(yàn)證后獲得的結(jié)果減少了0.01。對(duì)于DeltaShaper?160×120,4×4,6,1?流量,XGBoost PL的AUC比其十折交叉驗(yàn)證減少了0.1。這些結(jié)果表明,攻擊者可以構(gòu)建準(zhǔn)確的基于決策樹(shù)的分類器,以便在采用少量數(shù)據(jù)樣本的同時(shí)檢測(cè)隱蔽流量。

Impact of network traces collection time. Table 4 depicts the AUC obtained by XGBoost – PL when detecting different types of covert traffic for varying time-spans of traffic flows collection. Results show that capturing traffic by 30s is enough for attaining the same classification performance achieved in our initial experiments, which admitted 60s traffic captures. The numbers in Table 4 also show that classification performance decreases monotonically for traffic collections fewer than 30s, suggesting that the inspection of at least 30s of video traffic provides the adversary with sufficient data for identifying covert traffic flows with low false positives.

網(wǎng)絡(luò)跟蹤收集時(shí)間的影響。表4描述了XGBoost-PL在針對(duì)流量收集的不同時(shí)間跨度下,檢測(cè)不同類型的隱蔽流量時(shí)獲得的AUC。結(jié)果表明,捕獲30s網(wǎng)絡(luò)流量足以獲得在我們最初的實(shí)驗(yàn)相同的分類性能,從60秒的流量捕獲的結(jié)果就能看出來(lái)。表4中的數(shù)字還表明,對(duì)于少于30秒的流量收集,分類性能單調(diào)下降,這表明至少30s視頻流量的檢查為對(duì)手提供了足夠的數(shù)據(jù)來(lái)識(shí)別具有低誤報(bào)率的隱蔽流量。

正文之后

好氣?。。∈裁磿r(shí)候簡(jiǎn)書(shū)還有長(zhǎng)度限制了?這是被攻擊了???

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,847評(píng)論 0 10
  • 銷售的定義是能夠找出商品所能提供的特殊利益或服務(wù),滿足客戶的特殊需求。 不難看出,銷售有幾個(gè)關(guān)鍵詞,一個(gè)是商品或是...
    358741519617閱讀 1,062評(píng)論 0 0
  • 高冷范兒的女神總是被我們羨慕 不如也把自己變成女神把 女生逆襲女神計(jì)劃 變白變瘦的每天計(jì)劃 一:去除痤瘡疤痕 把姜...
    jane愛(ài)閱讀 11,068評(píng)論 4 22
  • 每一天,很期待上班,因?yàn)椋梢钥吹侥莻€(gè)忙碌的身影,你看到她,就好像吃了興奮劑一樣,渾身每一個(gè)細(xì)胞都跟著興奮。
    白衣布衫閱讀 132評(píng)論 0 0
  • 春天充滿希望,趕緊做夢(mèng),然后努力實(shí)現(xiàn)這個(gè)夢(mèng),必競(jìng)秋天也不遠(yuǎn)了。
    A分享閱讀 289評(píng)論 0 1

友情鏈接更多精彩內(nèi)容