EAST結(jié)構(gòu)分析+pytorch源碼實(shí)現(xiàn)

@[toc]

一. U-Net的前車之鑒

在介紹EAST網(wǎng)絡(luò)之前我們先介紹一下前面的幾個(gè)網(wǎng)絡(luò)，看看這個(gè)EAST網(wǎng)絡(luò)怎么來的？為什么來的？

當(dāng)然這里的介紹僅僅是引出EAST而不是詳細(xì)的講解其他網(wǎng)絡(luò)，有需要的讀者可以去看看這三個(gè)優(yōu)秀網(wǎng)絡(luò)。

1.1 FCN網(wǎng)絡(luò)結(jié)構(gòu)

? FCN網(wǎng)絡(luò)，在之前FCN從原理到代碼的理解已經(jīng)詳細(xì)分析了，有需要的可以去看看，順便跑一跑代碼。

圖1-1

網(wǎng)絡(luò)的由來

不管是識(shí)別（傳統(tǒng)機(jī)器學(xué)習(xí)、CNN）還是檢測(cè)（SSD、YOLO等），都只是基于大塊的特征進(jìn)行的，檢測(cè)之后都是以長方形去表示檢測(cè)結(jié)果，由于這是其算法內(nèi)部回歸的結(jié)果導(dǎo)致，而且feature map經(jīng)過卷積一直減小，如果強(qiáng)行進(jìn)行256X256到512X512的插值，那么結(jié)果可以想象，邊界非常不好。

那么如何實(shí)現(xiàn)圖1-1所示的結(jié)果呢？把每個(gè)像素都進(jìn)行分割？

網(wǎng)絡(luò)的成果

FCN給出的方法是使用反卷積進(jìn)行上采樣操作，使得經(jīng)過CNN之后減小的圖能夠恢復(fù)大小。

當(dāng)然作者還提出一個(gè)好方法，不同的feature map進(jìn)行組合，使得感受野進(jìn)行擴(kuò)充。

注釋：筆者認(rèn)為使用反卷積有兩個(gè)作用，其一是使得計(jì)算LOSS比較方便，標(biāo)簽和結(jié)果可以直接進(jìn)行計(jì)算。其二是可以進(jìn)行參數(shù)的學(xué)習(xí)，更為智能化。

1.2 U-NET網(wǎng)絡(luò)

U-net網(wǎng)絡(luò)之前沒怎么看過，現(xiàn)在也僅僅是大概看了論文和相關(guān)資料，內(nèi)部實(shí)現(xiàn)不是很了解。

圖1-2

網(wǎng)絡(luò)的由來

FCN完全可以做到基于像素點(diǎn)的分割，為什么還要這個(gè)U-net網(wǎng)絡(luò)??？

FCN網(wǎng)絡(luò)檢測(cè)的效果還可以，但是其邊緣的處理就特別的差。雖然說多個(gè)層進(jìn)行合并，但是合并的內(nèi)容雜亂無章，導(dǎo)致最后的信息沒有完全得到。

總的來說FCN分割的效果不夠，精度也不夠。

網(wǎng)絡(luò)的成果

U-net提出了對(duì)稱的網(wǎng)絡(luò)結(jié)構(gòu)，使得網(wǎng)絡(luò)參數(shù)的學(xué)習(xí)效果更好（為什么對(duì)稱網(wǎng)絡(luò)學(xué)習(xí)更好，這個(gè)理解不透，如果是結(jié)果再放大一倍使得不對(duì)稱不也一樣嗎？感覺還是網(wǎng)絡(luò)結(jié)構(gòu)設(shè)計(jì)的好，而不是對(duì)稱）

不同feature map合并的方式更加優(yōu)化，使得在邊緣分割（細(xì)節(jié)）上更加優(yōu)秀。

網(wǎng)絡(luò)架構(gòu)清晰明了，分割效果也很好，現(xiàn)在醫(yī)學(xué)圖像分割領(lǐng)域還能看見身影。

1.3 CTPN網(wǎng)絡(luò)

剛開始準(zhǔn)備使用CTPN進(jìn)行文本的檢測(cè)，所以看了一些相關(guān)資料，致命缺點(diǎn)是不能檢測(cè)帶角度文字和網(wǎng)絡(luò)比較復(fù)雜。

圖1-3

網(wǎng)絡(luò)的由來

文本檢測(cè)和其他檢測(cè)卻別很大，比如用SSD檢測(cè)文本就比較困難（邊緣檢測(cè)不好），如何針對(duì)文本進(jìn)行檢測(cè)？

網(wǎng)絡(luò)的成果

CTPN網(wǎng)絡(luò)有很多創(chuàng)造的想法-->>

目標(biāo)分割小塊，然后一一進(jìn)行檢測(cè)，針對(duì)文本分割成height>width的方式，使得檢測(cè)的邊緣更為精確。

使用BiLSTM對(duì)小塊進(jìn)行連接，針對(duì)文本之間的相關(guān)性。

CTPN想法具有創(chuàng)造性，但是太過復(fù)雜。

首先樣本的制作麻煩
每個(gè)小框進(jìn)行回歸，框的大小自己定義
邊緣特意進(jìn)行偏移處理
使用RNN進(jìn)行連接

檢測(cè)水平效果還是不錯(cuò)的，但是對(duì)于傾斜的文本就不行了。

為什么不加一個(gè)angle進(jìn)行回歸？

本就很復(fù)雜的網(wǎng)絡(luò)，如果再給每個(gè)小box加一個(gè)angle參數(shù)會(huì)更復(fù)雜，當(dāng)然是可以實(shí)施的。

二. EAST結(jié)構(gòu)分析

2.1 結(jié)構(gòu)簡(jiǎn)述

EAST原名為: An Efficient and Accurate Scene Text Detector

結(jié)構(gòu)：檢測(cè)層（PVANet） + 合并層 + 輸出層

圖2-1

下圖圖2-2是檢測(cè)效果，任意角度的文本都可以檢測(cè)到。

注意：EAST只是一個(gè)檢測(cè)網(wǎng)絡(luò)，如需識(shí)別害的使用CRNN等識(shí)別網(wǎng)絡(luò)進(jìn)行后續(xù)操作。

圖2-2

具體網(wǎng)絡(luò)在2-2節(jié)進(jìn)行詳細(xì)介紹=====>>>

2.2 結(jié)構(gòu)詳解

整體結(jié)構(gòu)

EAST根據(jù)他的名字，我們知道就是高效的文本檢測(cè)方法。

上面我們介紹了CTPN網(wǎng)絡(luò)，其標(biāo)簽制作很麻煩，結(jié)構(gòu)很復(fù)雜（分割成小方框然后回歸還要RNN進(jìn)行合并）

看下圖圖2-3，只要進(jìn)行類似FCN的結(jié)構(gòu)，計(jì)算LOSS就可以進(jìn)行訓(xùn)練。測(cè)試的時(shí)候走過網(wǎng)絡(luò)，運(yùn)行NMS就可以得出結(jié)果。太簡(jiǎn)單了是不是？

圖2-3

特征提取層

特征的提取可以任意網(wǎng)絡(luò)（VGG、RES-NET等檢測(cè)網(wǎng)絡(luò)），本文以VGG為基礎(chǔ)進(jìn)行特征提取。這個(gè)比較簡(jiǎn)單，看一下源碼就可以清楚，見第四章源碼分析

特征合并層

在合并層中，首先在定義特征提取層的時(shí)候把需要的輸出給保留下來，通過forward函數(shù)把結(jié)構(gòu)進(jìn)行輸出。之后再合并層調(diào)用即可

如下代碼定義，其中合并的過程再下面介紹

#提取VGG模型訓(xùn)練參數(shù)
class extractor(nn.Module):
    def __init__(self, pretrained):
        super(extractor, self).__init__()
        vgg16_bn = VGG(make_layers(cfg, batch_norm=True))
        if pretrained:
            vgg16_bn.load_state_dict(torch.load('./pths/vgg16_bn-6c64b313.pth'))
        self.features = vgg16_bn.features
    
    def forward(self, x):
        out = []
        for m in self.features:
            x = m(x)
            #提取maxpool層為后續(xù)合并
            if isinstance(m, nn.MaxPool2d):
                out.append(x)
        return out[1:]

特征合并層

合并特征提取層的輸出，具體的定義如下代碼所示，代碼部分已經(jīng)注釋.

其中x中存放的是特征提取層的四個(gè)輸出

    def forward(self, x):

        y = F.interpolate(x[3], scale_factor=2, mode='bilinear', align_corners=True)
        y = torch.cat((y, x[2]), 1)
        y = self.relu1(self.bn1(self.conv1(y)))     
        y = self.relu2(self.bn2(self.conv2(y)))
        
        y = F.interpolate(y, scale_factor=2, mode='bilinear', align_corners=True)
        y = torch.cat((y, x[1]), 1)
        y = self.relu3(self.bn3(self.conv3(y)))     
        y = self.relu4(self.bn4(self.conv4(y)))
        
        y = F.interpolate(y, scale_factor=2, mode='bilinear', align_corners=True)
        y = torch.cat((y, x[0]), 1)
        y = self.relu5(self.bn5(self.conv5(y)))     
        y = self.relu6(self.bn6(self.conv6(y)))
        
        y = self.relu7(self.bn7(self.conv7(y)))
        return y

輸出層

輸出層包括三個(gè)部分，這里以RBOX為例子，發(fā)現(xiàn)網(wǎng)上都沒有QUAN為例子的？

首先QUAN的計(jì)算是為了防止透視變換的存在，正常情況下不存在這些問題，正常的斜框可以解決。

因?yàn)?code>QUAN的計(jì)算沒啥好處，前者已經(jīng)完全可以解決正常的檢測(cè)問題，后者回歸四個(gè)點(diǎn)相對(duì)來說較為困難（如果文本變化較大就更困難，所以SSD和YOLO無法檢測(cè)文本的原因）。

如果想得到特殊的文本，基本考慮別的網(wǎng)絡(luò)了（比如彎曲文字的檢測(cè)）

    def forward(self, x):
        score = self.sigmoid1(self.conv1(x))
        loc   = self.sigmoid2(self.conv2(x)) * self.scope
        angle = (self.sigmoid3(self.conv3(x)) - 0.5) * math.pi
        geo   = torch.cat((loc, angle), 1) 
        return score, geo

三. EAST細(xì)節(jié)分析

3.1 標(biāo)簽制作

注意：這里是重點(diǎn)和難點(diǎn)?。?！

文章說要把標(biāo)簽向里縮進(jìn)0.3

筆者認(rèn)為這樣做的目的是提取到更為準(zhǔn)確的信息，不論是人工標(biāo)注的好與不好，我們按照0.3縮小之后提取的特征都是全部的文本信息。

但是這樣做也會(huì)丟失一些邊緣信息，如果按照上述的推斷，那么SSD或YOLO都可以這樣設(shè)計(jì)標(biāo)簽了。

作者肯定是經(jīng)過測(cè)試的，有好處有壞處吧！

圖3-1

標(biāo)簽格式為：5個(gè)geometry（4個(gè)location+1個(gè)angle） + 1個(gè)score ==6 × N × M

其中(b)為score圖，（d）為四個(gè)location圖， （e）為angle圖

上圖可能看的不清楚，下面以手繪圖進(jìn)行說明：

圖3-2

上圖可能看不清楚，下面再用文字大概說一下吧！

先進(jìn)行0.3縮放，這個(gè)時(shí)候的圖就是score
以沒縮放的圖像為基準(zhǔn)，畫最小外接矩形，這個(gè)外接矩形的角度就是angle。這個(gè)大小是縮放的的圖大小。感覺直接以score圖做角度也一樣的。
score圖的每個(gè)像素點(diǎn)到最小外接矩形的距離為四個(gè)location圖。

3.2 LOSS計(jì)算

LOSS計(jì)算就比較簡(jiǎn)單的，直接回歸location、angle、score即可。

    def forward(self, gt_score, pred_score, gt_geo, pred_geo, ignored_map):
        #圖像中不存在目標(biāo)直接返回0
        if torch.sum(gt_score) < 1:
            return torch.sum(pred_score + pred_geo) * 0
        #score loss 采用Dice方式計(jì)算，沒有采用log熵計(jì)算，為了防止樣本不均衡問題
        classify_loss = get_dice_loss(gt_score, pred_score*(1-ignored_map))
        #geo loss采用Iou方式計(jì)算（計(jì)算每個(gè)像素點(diǎn)的loss）
        iou_loss_map, angle_loss_map = get_geo_loss(gt_geo, pred_geo)
        #計(jì)算一整張圖的loss，angle_loss_map*gt_score去除不是目標(biāo)點(diǎn)的像素（感覺這句話應(yīng)該放在前面減少計(jì)算量，放在這里沒有減少計(jì)算loss的計(jì)算量）
        angle_loss = torch.sum(angle_loss_map*gt_score) / torch.sum(gt_score)
        iou_loss = torch.sum(iou_loss_map*gt_score) / torch.sum(gt_score)
        geo_loss = self.weight_angle * angle_loss + iou_loss#這里的權(quán)重設(shè)置為1
        print('classify loss is {:.8f}, angle loss is {:.8f}, iou loss is {:.8f}'.format(classify_loss, angle_loss, iou_loss))
        return geo_loss + classify_loss

注意：這里score的LOSS使用Dice方式，因?yàn)槠胀ǖ慕徊骒責(zé)o法解決樣本不均衡問題?。?！

圖3-3

3.3 NMS計(jì)算

NMS使用的是locality NMS，也就是為了針對(duì)EAST而提出來的。

首先我們先來看看這個(gè)LANMS的原理和過程：

import numpy as np
from shapely.geometry import Polygon

def intersection(g, p):
    #取g,p中的幾何體信息組成多邊形
    g = Polygon(g[:8].reshape((4, 2)))
    p = Polygon(p[:8].reshape((4, 2)))

    # 判斷g,p是否為有效的多邊形幾何體
    if not g.is_valid or not p.is_valid:
        return 0

    # 取兩個(gè)幾何體的交集和并集
    inter = Polygon(g).intersection(Polygon(p)).area
    union = g.area + p.area - inter
    if union == 0:
        return 0
    else:
        return inter/union

def weighted_merge(g, p):
    # 取g,p兩個(gè)幾何體的加權(quán)（權(quán)重根據(jù)對(duì)應(yīng)的檢測(cè)得分計(jì)算得到）
    g[:8] = (g[8] * g[:8] + p[8] * p[:8])/(g[8] + p[8])
    
    #合并后的幾何體的得分為兩個(gè)幾何體得分的總和
    g[8] = (g[8] + p[8])
    return g

def standard_nms(S, thres):
    #標(biāo)準(zhǔn)NMS
    order = np.argsort(S[:, 8])[::-1]
    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)
        ovr = np.array([intersection(S[i], S[t]) for t in order[1:]])
        inds = np.where(ovr <= thres)[0]
        order = order[inds+1]
        
    return S[keep]

def nms_locality(polys, thres=0.3):
    '''
    locality aware nms of EAST
    :param polys: a N*9 numpy array. first 8 coordinates, then prob
    :return: boxes after nms
    '''
    S = []    #合并后的幾何體集合
    p = None   #合并后的幾何體
    for g in polys:
        if p is not None and intersection(g, p) > thres:    #若兩個(gè)幾何體的相交面積大于指定的閾值，則進(jìn)行合并
            p = weighted_merge(g, p)
        else:    #反之，則保留當(dāng)前的幾何體
            if p is not None:
                S.append(p)
            p = g
    if p is not None:
        S.append(p)
    if len(S) == 0:
        return np.array([])
    return standard_nms(np.array(S), thres)

if __name__ == '__main__':
    # 343,350,448,135,474,143,369,359
    print(Polygon(np.array([[343, 350], [448, 135],
                            [474, 143], [369, 359]])).area)

別看那么多代碼，講的很玄乎，其實(shí)很簡(jiǎn)單：

遍歷每個(gè)預(yù)測(cè)的框，然后按照交集大于某個(gè)值K就合并相鄰的兩個(gè)框。
合并完之后就按照正常NMS消除不合理的框就行了。

注意： 為什么相鄰的框合并？

因?yàn)槊總€(gè)像素預(yù)測(cè)一個(gè)框（不明白就自己去看上面LOSS計(jì)算），一個(gè)目標(biāo)的幾百上千個(gè)框基本都是重合的（如果預(yù)測(cè)的準(zhǔn)的話），所以說相鄰的框直接進(jìn)行合并就行了。
其實(shí)豎直和橫向都合并一次最好，反正原理一樣的。

四. Pytorch源碼分析

源碼就不進(jìn)行分析了，上面已經(jīng)說得非常明白了，基本每個(gè)難點(diǎn)和重點(diǎn)都說到了。

有一點(diǎn)小bug，現(xiàn)進(jìn)行說明：

訓(xùn)練的時(shí)候出現(xiàn)孔樣本跑死

SampleNum = 3400 #定義樣本數(shù)量，應(yīng)對(duì)空標(biāo)簽的文本bug，臨時(shí)處理方案
class custom_dataset(data.Dataset):
    def __init__(self, img_path, gt_path, scale=0.25, length=512):
        super(custom_dataset, self).__init__()
        self.img_files = [os.path.join(img_path, img_file) for img_file in sorted(os.listdir(img_path))]
        self.gt_files  = [os.path.join(gt_path, gt_file) for gt_file in sorted(os.listdir(gt_path))]
        self.scale = scale
        self.length = length

    def __len__(self):
        return len(self.img_files)

    def __getitem__(self, index):
        with open(self.gt_files[index], 'r') as f:
            lines = f.readlines()
        while(len(lines)<1):
            index = int(SampleNum*np.random.rand())
            with open(self.gt_files[index], 'r') as f:
                lines = f.readlines()
        vertices, labels = extract_vertices(lines)
        
        img = Image.open(self.img_files[index])
        img, vertices = adjust_height(img, vertices) 
        img, vertices = rotate_img(img, vertices)
        img, vertices = crop_img(img, vertices, labels, self.length,index)
        transform = transforms.Compose([transforms.ColorJitter(0.5, 0.5, 0.5, 0.25), \
                                        transforms.ToTensor(), \
                                        transforms.Normalize(mean=(0.5,0.5,0.5),std=(0.5,0.5,0.5))])
        
        score_map, geo_map, ignored_map = get_score_geo(img, vertices, labels, self.scale, self.length)
        return transform(img), score_map, geo_map, ignored_map

測(cè)試的時(shí)候讀取PIL會(huì)出現(xiàn)RGBA情況

    img_path    = './013.jpg'
    model_path  = './pths/model_epoch_225.pth'
    res_img     = './res.bmp'
    img = Image.open(img_path)
    img = np.array(img)[:,:,:3]
    img = Image.fromarray(img)

后續(xù)工作

這個(gè)代碼感覺有點(diǎn)問題，訓(xùn)練速度很慢，猜測(cè)是數(shù)據(jù)處理部分。

原版EAST每個(gè)點(diǎn)都進(jìn)行回歸，太浪費(fèi)時(shí)間了，后續(xù)參考AdvanceEAST進(jìn)行修改，同時(shí)加個(gè)人理解優(yōu)化

網(wǎng)絡(luò)太大了，只適合服務(wù)器或者PC上跑，當(dāng)前網(wǎng)絡(luò)已經(jīng)修改到15MB，感覺還是有點(diǎn)大。

后續(xù)還要加識(shí)別部分，困難重重。。。。。。

這里的代碼都是github上的，筆者只是搬運(yùn)工而已?。?！

原作者下載地址

五. 第一次更新內(nèi)容

2019-6-30更新

之前提到這個(gè)工程的代碼有幾個(gè)缺陷，在這里進(jìn)行詳細(xì)的解決

訓(xùn)練速度很慢

這是由于源代碼的數(shù)據(jù)處理部分編寫有問題導(dǎo)致，隨機(jī)crop中對(duì)于邊界問題處理
以下給出解決方案，具體修改請(qǐng)讀者對(duì)比源代碼即可：

def crop_img(img, vertices, labels, length, index):
    '''crop img patches to obtain batch and augment
    Input:
        img         : PIL Image
        vertices    : vertices of text regions <numpy.ndarray, (n,8)>
        labels      : 1->valid, 0->ignore, <numpy.ndarray, (n,)>
        length      : length of cropped image region
    Output:
        region      : cropped image region
        new_vertices: new vertices in cropped region
    '''
    try:
        h, w = img.height, img.width
        # confirm the shortest side of image >= length
        if h >= w and w < length:
            img = img.resize((length, int(h * length / w)), Image.BILINEAR)
        elif h < w and h < length:
            img = img.resize((int(w * length / h), length), Image.BILINEAR)
        ratio_w = img.width / w
        ratio_h = img.height / h
        assert(ratio_w >= 1 and ratio_h >= 1)

        new_vertices = np.zeros(vertices.shape)
        if vertices.size > 0:
            new_vertices[:,[0,2,4,6]] = vertices[:,[0,2,4,6]] * ratio_w
            new_vertices[:,[1,3,5,7]] = vertices[:,[1,3,5,7]] * ratio_h
        #find four limitate point by vertices
        vertice_x = [np.min(new_vertices[:, [0, 2, 4, 6]]), np.max(new_vertices[:, [0, 2, 4, 6]])]
        vertice_y = [np.min(new_vertices[:, [1, 3, 5, 7]]), np.max(new_vertices[:, [1, 3, 5, 7]])]
        # find random position
        remain_w = [0,img.width - length]
        remain_h = [0,img.height - length]
        if vertice_x[1]>length:
            remain_w[0] = vertice_x[1] - length
        if vertice_x[0]<remain_w[1]:
            remain_w[1] = vertice_x[0]
        if vertice_y[1]>length:
            remain_h[0] = vertice_y[1] - length
        if vertice_y[0]<remain_h[1]:
            remain_h[1] = vertice_y[0]

        start_w = int(np.random.rand() * (remain_w[1]-remain_w[0]))+remain_w[0]
        start_h = int(np.random.rand() * (remain_h[1]-remain_h[0]))+remain_h[0]
        box = (start_w, start_h, start_w + length, start_h + length)
        region = img.crop(box)
        if new_vertices.size == 0:
            return region, new_vertices

        new_vertices[:,[0,2,4,6]] -= start_w
        new_vertices[:,[1,3,5,7]] -= start_h
    except IndexError:
        print("\n crop_img function index error!!!\n,imge is %d"%(index))
    else:
        pass
    return region, new_vertices

LOSS剛開始收斂下降，到后面就呈現(xiàn)抖動(dòng)（像過擬合現(xiàn)象），檢測(cè)效果角度很差

由于Angle Loss角度計(jì)算錯(cuò)誤導(dǎo)致，請(qǐng)讀者閱讀作者原文進(jìn)行對(duì)比

def find_min_rect_angle(vertices):
    '''find the best angle to rotate poly and obtain min rectangle
    Input:
        vertices: vertices of text region <numpy.ndarray, (8,)>
    Output:
        the best angle <radian measure>
    '''
    angle_interval = 1
    angle_list = list(range(-90, 90, angle_interval))
    area_list = []
    for theta in angle_list: 
        rotated = rotate_vertices(vertices, theta / 180 * math.pi)
        x1, y1, x2, y2, x3, y3, x4, y4 = rotated
        temp_area = (max(x1, x2, x3, x4) - min(x1, x2, x3, x4)) * \
                    (max(y1, y2, y3, y4) - min(y1, y2, y3, y4))
        area_list.append(temp_area)
    
    sorted_area_index = sorted(list(range(len(area_list))), key=lambda k : area_list[k])
    min_error = float('inf')
    best_index = -1
    rank_num = 10
    # find the best angle with correct orientation
    for index in sorted_area_index[:rank_num]:
        rotated = rotate_vertices(vertices, angle_list[index] / 180 * math.pi)
        temp_error = cal_error(rotated)
        if temp_error < min_error:
            min_error = temp_error
            best_index = index

    if angle_list[best_index]>0:
        return (angle_list[best_index] - 90) / 180 * math.pi

    return (angle_list[best_index]+90) / 180 * math.pi

修改網(wǎng)絡(luò)從50MB到15MB，對(duì)于小樣本訓(xùn)練效果很好

這里比較簡(jiǎn)單，直接修改VGG和U-NET網(wǎng)絡(luò)feature map即可

cfg = [32, 32, 'M', 64, 64, 'M', 128, 128, 128, 'M', 256, 256, 256, 'M', 256, 256, 256, 'M']
#合并不同的feature map
class merge(nn.Module):
    def __init__(self):
        super(merge, self).__init__()

        self.conv1 = nn.Conv2d(512, 128, 1)
        self.bn1 = nn.BatchNorm2d(128)
        self.relu1 = nn.ReLU()
        self.conv2 = nn.Conv2d(128, 128, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(128)
        self.relu2 = nn.ReLU()

        self.conv3 = nn.Conv2d(256, 64, 1)
        self.bn3 = nn.BatchNorm2d(64)
        self.relu3 = nn.ReLU()
        self.conv4 = nn.Conv2d(64, 64, 3, padding=1)
        self.bn4 = nn.BatchNorm2d(64)
        self.relu4 = nn.ReLU()

        self.conv5 = nn.Conv2d(128, 32, 1)
        self.bn5 = nn.BatchNorm2d(32)
        self.relu5 = nn.ReLU()
        self.conv6 = nn.Conv2d(32, 32, 3, padding=1)
        self.bn6 = nn.BatchNorm2d(32)
        self.relu6 = nn.ReLU()

        self.conv7 = nn.Conv2d(32, 32, 3, padding=1)
        self.bn7 = nn.BatchNorm2d(32)
        self.relu7 = nn.ReLU()
        #初始化網(wǎng)絡(luò)參數(shù)
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

小的字體檢測(cè)很好，大的字體檢測(cè)不到（部分檢測(cè)不到）情況

這里是模仿AdvanceEAST的方法進(jìn)行訓(xùn)練，先在小圖像進(jìn)行訓(xùn)練，然后遷移到大圖像即可。

意思就是先將圖像縮小到254254訓(xùn)練得到modeul_254.pth
然后在將圖像resize到384384，網(wǎng)絡(luò)參數(shù)使用modeul_254.pth，訓(xùn)練得到modeul_384.pth
。。。一次進(jìn)行512或者更大的圖像即可

針對(duì)圖像訓(xùn)練和檢測(cè)的慢（相對(duì)于其他檢測(cè)網(wǎng)絡(luò)）

這里需要根據(jù)原理來說了，是因?yàn)槿康南袼囟夹枰A(yù)測(cè)和計(jì)算loss，可以看看AdvanceEAST的網(wǎng)絡(luò)進(jìn)行處理即可

修改網(wǎng)絡(luò)說明

訓(xùn)練樣本3000
測(cè)試樣本100
檢測(cè)精度85%，IOU準(zhǔn)確度80%
5個(gè)epoch收斂結(jié)束（這些都是這里測(cè)試的）
兩塊1080TI，訓(xùn)練時(shí)間10分鐘左右

這里是我完整的工程

五. 參考文獻(xiàn)

https://arxiv.org/pdf/1704.03155.pdf
https://www.cnblogs.com/skyfsm/p/9776611.html
LANMS源碼
https://blog.csdn.net/qq_14845119/article/details/78986449
http://campar.in.tum.de/pub/milletari2016Vnet/milletari2016Vnet.pdf
https://blog.csdn.net/liuxiaoheng1992/article/details/82870923
https://zhuanlan.zhihu.com/p/37504120
https://blog.csdn.net/attitude_yu/article/details/80724187
http://www.itdecent.cn/p/6e35829a38de
https://blog.csdn.net/wangdongwei0/article/details/84576044
https://blog.csdn.net/weixin_41783077/article/details/83789743#commentsedit
https://blog.csdn.net/qq_14845119/article/details/80787753
https://zhuanlan.zhihu.com/p/50126479
寫博客太費(fèi)時(shí)間了，基本花了五天左右，參考文獻(xiàn)不整理了
部分參考資料找不到出處了，如有侵權(quán)很抱歉，請(qǐng)告知?jiǎng)h除！

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

EAST結(jié)構(gòu)分析+pytorch源碼實(shí)現(xiàn)