KL散度
KL散度,又叫相對熵,用于衡量兩個分布(離散分布和連續(xù)分布)之間的距離。
設(shè) 、
是離散隨機(jī)變量
的兩個概率分布,則
對
的KL散度是:
KLDivLoss
對于包含個樣本的batch數(shù)據(jù)
,
是神經(jīng)網(wǎng)絡(luò)的輸出,并且進(jìn)行了歸一化和對數(shù)化;
是真實的標(biāo)簽(默認(rèn)為概率),
與
同維度。
第個樣本的損失值
計算如下:
class KLDivLoss(_Loss):
__constants__ = ['reduction']
def __init__(self, size_average=None, reduce=None, reduction='mean'):
super(KLDivLoss, self).__init__(size_average, reduce, reduction)
def forward(self, input, target):
return F.kl_div(input, target, reduction=self.reduction)
pytorch中通過torch.nn.KLDivLoss類實現(xiàn),也可以直接調(diào)用F.kl_div 函數(shù),代碼中的size_average與reduce已經(jīng)棄用。reduction有四種取值mean,batchmean, sum, none,對應(yīng)不同的返回。 默認(rèn)為
mean
例子:
import torch
import torch.nn as nn
import math
def validate_loss(output, target):
val = 0
for li_x, li_y in zip(output, target):
for i, xy in enumerate(zip(li_x, li_y)):
x, y = xy
loss_val = y * (math.log(y, math.e) - x)
val += loss_val
return val / output.nelement()
torch.manual_seed(20)
loss = nn.KLDivLoss()
input = torch.Tensor([[-2, -6, -8], [-7, -1, -2], [-1, -9, -2.3], [-1.9, -2.8, -5.4]])
target = torch.Tensor([[0.8, 0.1, 0.1], [0.1, 0.7, 0.2], [0.5, 0.2, 0.3], [0.4, 0.3, 0.3]])
output = loss(input, target)
print("default loss:", output)
output = validate_loss(input, target)
print("validate loss:", output)
loss = nn.KLDivLoss(reduction="batchmean")
output = loss(input, target)
print("batchmean loss:", output)
loss = nn.KLDivLoss(reduction="mean")
output = loss(input, target)
print("mean loss:", output)
loss = nn.KLDivLoss(reduction="none")
output = loss(input, target)
print("none loss:", output)
輸出:
default loss: tensor(0.6209)
validate loss: tensor(0.6209)
batchmean loss: tensor(1.8626)
mean loss: tensor(0.6209)
none loss: tensor([[1.4215, 0.3697, 0.5697],
[0.4697, 0.4503, 0.0781],
[0.1534, 1.4781, 0.3288],
[0.3935, 0.4788, 1.2588]])