Facebook:BigGraph 中文文檔-評估(PyTorch)

目錄

圖嵌入是一種從圖中生成無監(jiān)督節(jié)點特征(node features)的方法,生成的特征可以應(yīng)用在各類機(jī)器學(xué)習(xí)任務(wù)上。現(xiàn)代的圖網(wǎng)絡(luò),尤其是在工業(yè)應(yīng)用中,通常會包含數(shù)十億的節(jié)點(node)和數(shù)萬億的邊(edge)。這已經(jīng)超出了已知嵌入系統(tǒng)的處理能力。Facebook開源了一種嵌入系統(tǒng),PyTorch-BigGraph(PBG),系統(tǒng)對傳統(tǒng)的多關(guān)系嵌入系統(tǒng)做了幾處修改讓系統(tǒng)能擴(kuò)展到能處理數(shù)十億節(jié)點和數(shù)萬億條邊的圖形。

本系列為翻譯的pytouch的官方手冊,希望能幫助大家快速入門GNN及其使用,全文十五篇,文中如果有勘誤請隨時聯(lián)系。

(一)Facebook開源圖神經(jīng)網(wǎng)絡(luò)-Pytorch Biggraph

(二)Facebook:BigGraph 中文文檔-數(shù)據(jù)模型(PyTorch)

(三)Facebook:BigGraph 中文文檔-從實體嵌入到邊分值(PyTorch)

(四)Facebook:BigGraph 中文文檔-I/O格式化(PyTorch)

(五)Facebook:BigGraph 中文文檔-批預(yù)處理

(六)Facebook:BigGraph 中文文檔-分布式模式(PyTorch)

(七)Facebook:BigGraph 中文文檔-損失計算(PyTorch)

(八)Facebook:BigGraph 中文文檔-評估(PyTorch)


Evaluation 評估

During training, the average loss is reported for each edge bucket at each pass. Evaluation metrics can be computed on held-out data during or after training to measure the quality of trained embeddings.

在訓(xùn)練過程中,為每個邊塊每次傳入的平均損失報告。評估指標(biāo)在訓(xùn)練中或者訓(xùn)練結(jié)束時? 計算并用于評估被訓(xùn)練好的嵌入的質(zhì)量。

Offline evaluation 離線評估

The?torchbiggraph_eval?command will perform an offline evaluation of trained PBG embeddings on a validation dataset. This dataset should contain held-out data not included in the training dataset. It is invoked in the same way as the training command and takes the same arguments.

torchbiggraph_eval命令將在驗證集上為已訓(xùn)練好的PBG嵌入執(zhí)行離線評估。這個數(shù)據(jù)集應(yīng)該包含在held-out數(shù)據(jù)集并且不包含在訓(xùn)練數(shù)據(jù)集中。命令行的調(diào)用和訓(xùn)練命令用同樣的方式,并且使用同樣的參數(shù)。

It is generally advisable to have two versions of the config file, one for training and one for evaluation, with the same parameters except for the edge paths, in order to evaluate a separate (and often smaller) set of edges. (It’s also possible to use a single config file and have it produce different output based on environment variables or other context). Training-specific config parameters (e.g., the learning rate, loss function, …) will be ignored during evaluation.

通常來說 建議配置文件中包含兩個版本,一個用于訓(xùn)練,一個用于評估,除了邊的路徑之外,參數(shù)相同,以便讓評估一個獨立的(通常來說更?。┑倪吋仙线M(jìn)行。(也可以使用單個配置文件,并根據(jù)環(huán)境變量或其他上下文生成不同的輸出)。評估時將忽略訓(xùn)練特定配置參數(shù)(例如,學(xué)習(xí)率、損失函數(shù)等)。

The metrics are first reported on each bucket, and a global average is computed at the end. (If multiple edge paths are in use, metrics are computed separately for each of them but still ultimately averaged).

評估值的計算現(xiàn)在每個塊上計算,然后計算全局的平均值(如果使用了多邊路徑,則分別計算每個邊路徑的度量值,最后依舊使用平均值)。

Many metrics are statistics based on the “ranks” of the edges of the validation set. The rank of a positive edge is determined by the rank of its score against the scores of a certain number of negative edges. A rank of 1 is the “best” outcome as it means that the positive edge had a higher score than all the negatives. Higher values are “worse” as they indicate that the positive didn’t stand out.

許多度量是居于驗證集的邊的排序做的統(tǒng)計。正白案的排序是由其相對于一定數(shù)量的負(fù)邊的得分的排序來確定的。排名為1是“最好”的結(jié)果,因為它意味著正邊的得分比所有負(fù)邊的得分都要高。越高的數(shù)值代表“更差”,這說明正向樣本表現(xiàn)并不突出。

It may happen that some of the negative samples used in the rank computation are in fact other positive samples, which are expected to have a high score and may thus cause adverse effects on the rank. This effect is especially visible on smaller graphs, in particular when all other entities are used to construct the negatives. To fix it, and to match what is typically done in the literature, a so-called “filtered” rank is used in the FB15k demo script (and there only), where positive samples are filtered out when computing the rank of an edge. It is hard to scale this technique to large graphs, and thus it is not enabled globally. However, filtering is less important on large graphs as it’s less likely to see a training edge among the sampled negatives.

在一些情況下,使用的負(fù)樣本在排序計算實際上可能是其他正樣本,而本身這些正樣本期望具有較高的分值。這會引起對排序造成不利的影響。這種影響在圖相較較小的情況下比較明顯,尤其是當(dāng)所有的其他實體都被用來構(gòu)造負(fù)樣本的情況下。為了解決這個問題并和文檔中所做的工作相匹配,F(xiàn)B15k演示腳本(僅該demo)中使用了一個叫“過濾”的排序,在計算邊緣排序時過濾出正樣本。這種技術(shù)很難擴(kuò)展到大型圖,因此無法全局啟用。然而,對于大型圖來說過濾并不重要,因為他不太可能在采樣的負(fù)樣本中看到訓(xùn)練邊緣。

The metrics are:

計算指標(biāo)包括:

Mean Rank: the average of the ranks of all positives (lower is better, best is 1).

平均排序:所有正樣本的平均排序等級(越低越好,最好是1)

Mean Reciprocal Rank (MRR): the average of the?reciprocal?of the ranks of all positives (higher is better, best is 1).

平均倒數(shù)排序:所有正向排序的平均值(越高越好,最好是1)

Hits@1: the fraction of positives that rank better than all their negatives, i.e., have a rank of 1 (higher is better, best is 1).

命中@1:排名好于所有負(fù)樣本的正樣本的比例,即排名為1(越高越好,最好是1)

Hits@10: the fraction of positives that rank in the top 10 among their negatives (higher is better, best is 1).

命中@10:排名在前10的正樣本的比例(越高越好,最好是1)

Hits@50: the fraction of positives that rank in the top 50 among their negatives (higher is better, best is 1).

命中@50:排名在前50的正樣本的比例(越高越好,最好是1)

Area Under the Curve (AUC): an estimation of the probability that a randomly chosen positive scores higher than a randomly chosen negative (any?negative, not only the negatives constructed by corrupting that positive).

曲線下面積(auc):對隨機(jī)選擇的正分?jǐn)?shù)高于隨機(jī)選擇的負(fù)分?jǐn)?shù)的概率的估計。(任何負(fù)樣本,不僅是通過正樣本生成的負(fù)樣本)


Evaluation during training 線上評估

Offline evaluation is a slow process that is intended to be run after training is complete to evaluate the final model on a held-out set of edges constructed by the user. However, it’s useful to be able to monitor overfitting as training progresses. PBG offers this functionality, by calculating the same metrics as the offline evaluation before and after each pass on a small set of training edges. These stats are printed to the logs.

離線評估是一個緩慢的過程,目標(biāo)是在訓(xùn)練完成后運行,用來完成對最終模型在held-out集合的邊上的結(jié)果評估。然而,隨著訓(xùn)練的進(jìn)行,能監(jiān)控過擬合是很有用的。PBG提供了這樣的特性,每次計算一組小的訓(xùn)練邊的集合,然后通過計算于離線評估是否相同來度量,這些數(shù)據(jù)被打印到日志中。

The metrics are computed on a set of edges that is held out automatically from the training set. To be more explicit: using this feature means that training happens on?fewer?edges, as some are excluded and reserved for this evaluation. The holdout fraction is controlled by the?eval_fraction?config parameter (setting it to zero thus disables this feature). The evaluations before and after each training iteration happen on the same set of edges, thus are comparable. Moreover, the evaluations for the same edge chunk, edge path and bucket at different epochs also use the same set of edges.

評估值是在一個邊集合中在持有的訓(xùn)練集合上自動計算得出的,更明確的說:這個特性標(biāo)識訓(xùn)練在較少的邊上進(jìn)行,應(yīng)為有些變被預(yù)留用于此評估。持有集合的分?jǐn)?shù)由eval_fraction config參數(shù)來控制(如果要禁用,將其置為0)。?每次訓(xùn)練迭代前后的評價都發(fā)生在同一組邊上,這讓結(jié)果具有可比性。此外,對于不同迭代的同一邊緣塊、邊路徑和桶的評價也使用相同的邊集和。

Evaluation metrics are computed both before and after training each edge bucket because it provides insight into whether the partitioned training is working. If the partitioned training is converging, then the gap between the “before” and “after” statistics should go to zero over time.On the other hand, if the partitioned training is causing the model to overfit on each edge bucket (thus decreasing performance for other edge buckets) then there will be a persistent gap between the “before” and “after” statistics.

在訓(xùn)練每個邊的塊前后都會計算評估值,這樣可以觀察訓(xùn)練是否有效。如果分區(qū)訓(xùn)練正在收斂,那隨著時間推移,“before”和“after”統(tǒng)計數(shù)據(jù)之間的差值應(yīng)該為0。另外一方面,如果分區(qū)訓(xùn)練導(dǎo)致模型在每個邊桶上過擬合(這樣會降低其他邊緣桶的性能),則“before”和“after”統(tǒng)計之前將存在持續(xù)的差。

It’s possible to use different batch sizes for?same-batch?and?uniform negative sampling?by tuning the?eval_num_batch_negs?and the?eval_num_uniform_negs config parameters.

通過調(diào)整eval_num_batch_negs 和 eval_num_uniform_negs這兩配置,可以在同批次和均勻負(fù)采樣中使用不同的大小批次。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容