在多卡上訓(xùn)練的過程為先將模型和數(shù)據(jù)加載到第一張卡上，然后copy至其他卡。batchsize最好設(shè)為卡的整數(shù)倍，比如兩張卡，bs為2，那么每張卡分別計(jì)算bs=1的結(jié)果，在model.forward之后將不同卡上返回的結(jié)果合并傳回再做下一步計(jì)算。

下面是踩坑后總結(jié)的需要注意的一些點(diǎn)。

1. 關(guān)于cuda與to(device)

model 與 data 全部采用 to(device)方法來遷移至顯存中，其中model.to(device)中的device, 默認(rèn)是cuda:0就可以。但是在網(wǎng)絡(luò)計(jì)算過程中的中間數(shù)據(jù)，要放到哪個(gè)設(shè)備上是要根據(jù)當(dāng)時(shí)在哪一塊卡上計(jì)算決定的，因此，應(yīng)將device設(shè)置為當(dāng)前相關(guān)數(shù)據(jù)在的卡，x.to(related_data.device)。

2. 關(guān)于model與model.module

在調(diào)用model內(nèi)部定義的變量或函數(shù)時(shí)，由于已經(jīng)使用 model = nn.DataParallel(model)包裹，因此必須要寫做model.module才能正常調(diào)用。但是，調(diào)用model.forward()時(shí)必須要直接用model本身，否則無法使用多卡。

3. 關(guān)于多卡計(jì)算結(jié)果合并

必須確保forward最后return的均為tensor, 不能使用其他數(shù)據(jù)結(jié)構(gòu)包裹，也不能返回標(biāo)量，多個(gè)tensor分別返回。

4. 關(guān)于nn.ParameterList()不能用于多卡的問題

多卡不支持復(fù)制屬性為List的參數(shù)，代碼中定義的nn.ParameterList()會(huì)在多卡計(jì)算中無法復(fù)制到其他卡上，導(dǎo)致計(jì)算時(shí)參數(shù)為空的問題。需要寫成module.register_parameter()的形式才能隨module一起成功復(fù)制。示例如下

 for k in range(len(self.filters) + 1):
            # Weights
            H_init = np.log(np.expm1(1 / scale / filters[k + 1]))
            H_k = nn.Parameter(torch.ones((n_channels, filters[k + 1], filters[k])))  # apply softmax for non-negativity
            torch.nn.init.constant_(H_k, H_init)
            self.register_parameter('H_{}'.format(k), H_k)

            # Scale factors
            a_k = nn.Parameter(torch.zeros((n_channels, filters[k + 1], 1)))
            self.register_parameter('a_{}'.format(k), a_k)

            # Biases
            b_k = nn.Parameter(torch.zeros((n_channels, filters[k + 1], 1)))
            torch.nn.init.uniform_(b_k, -0.5, 0.5)
            self.register_parameter('b_{}'.format(k), b_k)

5. 其他

關(guān)于loss的計(jì)算部分不能放在model里，不然會(huì)產(chǎn)生不在同一張卡無法計(jì)算的問題
返回的tensor如果在原有網(wǎng)絡(luò)中是按batch做了某種運(yùn)算才返回的，這部分也需要加在model外另外做，因?yàn)槎嗫ㄓ?jì)算返回的是單獨(dú)計(jì)算的結(jié)果。

更改為DistributedDataParallel

注意幾個(gè)點(diǎn)：

模型和數(shù)據(jù)在兩張卡上是一模一樣的，相當(dāng)于分別在兩張卡上啟動(dòng)了兩個(gè)python程序
python -m torch.distributed.launch --nproc_per_node=2 腳本其他命令行參數(shù)
一定要確保當(dāng)前的模型和數(shù)據(jù)都設(shè)置了CUDA，在包裹為module之后一定還要加一句model.cuda()。全局可以寫一個(gè)或者torch.cuda.set_device(local_rank)

            self.model=nn.parallel.DistributedDataParallel(
                self.model,
                device_ids=[opt.local_rank],
                output_device=opt.local_rank,
                broadcast_buffers=False,
            )

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

2020-10-29 Pytorch 程序單卡到多卡

2020-10-29 Pytorch 程序單卡到多卡

1. 關(guān)于cuda與to(device)

2. 關(guān)于model與model.module

3. 關(guān)于多卡計(jì)算結(jié)果合并

4. 關(guān)于nn.ParameterList()不能用于多卡的問題

5. 其他

更改為DistributedDataParallel

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

2020-10-29 Pytorch 程序單卡到多卡

1. 關(guān)于cuda與to(device)

2. 關(guān)于model與model.module

3. 關(guān)于多卡計(jì)算結(jié)果合并

4. 關(guān)于nn.ParameterList()不能用于多卡的問題

5. 其他

更改為DistributedDataParallel

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av