論文復(fù)現(xiàn) :
論文詳述
Multiagent cooperation and competition with deep reinforcement learning

pong game-two agents
- 基礎(chǔ)模型:pong game, two agents
- 算法結(jié)構(gòu):dqn
- reward:scoring:(-1,1) conceding(-1)
未擊中球得-1,擊中球得分between (-1,1)
雙方均擊中球得分0,游戲繼續(xù)
- reward:scoring:(-1,1) conceding(-1)

reward
- 訓(xùn)練參數(shù)
- 50 epochs, 250000 time steps each.
- exploration rate: 1.0 to 0.05(in the 1000000 time steps) and stays fixed at that value

parameters.png
- 結(jié)果分析
-
是否收斂:monitor average maximal Q-values of 500 randomly selected game situations, set aside before training begins
Q values -
訓(xùn)練效果反饋:
- Average paddle-bounces per point 在一方得分前球在players間來回的次數(shù)
- Average wall-bounces per paddle-bounce 球在到達(dá)一方前撞墻的次數(shù)
- Average serving time per point 球丟了以后players restart game的反應(yīng)時間(一些rewarding scheme下players不希望重啟游戲,serving time很長,如p = -1)
-
結(jié)果分析
- scoring = -1時,雙方為合作狀態(tài)(均不希望球掉落)
最終雙方均升至頁面最上方,球水平傳來傳去
合作模式video-youtube
1.png - scoring = 1時,雙方為競爭模式(希望自己多得分)
競爭模式video-youtube
2.png - p range from -1 to 1

3.png
-
multiplayer dqn vs single-player
(score表示a勝b的得分)
4
本文遵守知識共享協(xié)議:署名-非商業(yè)性使用-相同方式共享 (BY-NC-SA)及簡書協(xié)議
轉(zhuǎn)載請注明:作者空空格格,首發(fā)簡書 Jianshu.com



