算是剛開始入門增強(qiáng)學(xué)習(xí)吧，結(jié)合畢設(shè)的要求，將增強(qiáng)學(xué)習(xí)的Q-learning和視頻游戲結(jié)合起來，花幾天時(shí)間啃透了yenchenlin的一個(gè)不錯(cuò)的項(xiàng)目，加了好多注釋和自己的理解，幾乎可以說是很簡(jiǎn)單易讀了，希望能夠?qū)δ阌兴鶐椭?/p>

GitHub地址: ReDeepLearningFlappyBird
https://github.com/ZhangRui111/ReDeepLearningFlappyBird

Using Deep Q-Network to Learn How To Play Flappy Bird

Based on DeepLearningFlappyBird

Overview

This project follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning [2] and shows that this learning algorithm can be further generalized to the notorious Flappy Bird.

bird_demo.gif

7 mins version: DQN for flappy bird

How to Run?

git clone https://github.com/ZhangRui111/ReDeepLearningFlappyBird.git
cd DeepLearningFlappyBird
python deep_q_network.py

The program is running with pretrained weights by default. If you want to train the network from the scratch. You can delete /saved_networks/checkpoint. By the way, the training precess may take several days depending on your hardware.

About deep_q_network.py

case1

case2

Deep Q-Network Algorithm

The pseudo-code for the Deep Q Learning algorithm, as given in [1], can be found below:

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ? select random action a_t
        otherwise select a_t=max_a  Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:=
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^' )  Q(s_(j+1),a'; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

Network Architecture

According to [1], I first preprocessed the game screens with following steps:

Convert image to grayscale
Resize image to 80x80
Stack last 4 frames to produce an 80x80x4 input array for network

The architecture of the network is shown in the figure below. The first layer convolves the input image with an 8x8x4x32 kernel at a stride size of 4. The output is then put through a 2x2 max pooling layer. The second layer convolves with a 4x4x32x64 kernel at a stride of 2. We then max pool again. The third layer convolves with a 3x3x64x64 kernel at a stride of 1. We then max pool one more time. The last hidden layer consists of 256 fully connected ReLU nodes.

image

The final output layer has the same dimensionality as the number of valid actions which can be performed in the game, where the 0th index always corresponds to doing nothing. The values at this output layer represent the Q function given the input state for each valid action. At each time step, the network performs whichever action corresponds to the highest Q value using a ? greedy policy.

Disclaimer

This work is highly based on the following repos:

[sourabhv/FlapPyBird] (https://github.com/sourabhv/FlapPyBird)
asrivat1/DeepLearningVideoGames
yenchenlin/DeepLearningFlappyBird

原創(chuàng)文章，轉(zhuǎn)載請(qǐng)注明出處: http://www.itdecent.cn/p/755f9f2604d0

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

增強(qiáng)學(xué)習(xí)玩轉(zhuǎn)FlappyBird

增強(qiáng)學(xué)習(xí)玩轉(zhuǎn)FlappyBird

Using Deep Q-Network to Learn How To Play Flappy Bird

Overview

How to Run?

About deep_q_network.py

Deep Q-Network Algorithm

Network Architecture

Disclaimer

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

增強(qiáng)學(xué)習(xí)玩轉(zhuǎn)FlappyBird

Using Deep Q-Network to Learn How To Play Flappy Bird

Overview

How to Run?

About deep_q_network.py

Deep Q-Network Algorithm

Network Architecture

Disclaimer

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av